RNAInvBench is a Python framework that provides environments, tasks, data, baseline algorithms, evaluation metrics and analysis tools for training/testing models for 2D and 3D RNA Inverse Design. Notably within this benchmark, we split the overarching problem of RNA Inverse Design into three key tasks:
- Secondary Pseudoknot-Free Inverse Design: We aim to go from a 2D RNA structure, represented by dot-bracket notation, to a 1D RNA Sequence. We do not include the complex motif of pseudoknots, and most algorithms included here only include canonical base-pairings (A-U and G-C base-pairings only).
- Secondary Pseudoknotted Inverse Design: We aim to go from a 2D RNA structure that may or may not contain pseudoknots, to a 1D RNA Sequence. Notably the 2D RNA structures here are also represented by dot-bracket notation, but an extended version is used, where the type of bracket corresponds to the level of the pseudoknot. All pseudoknotted inverse design algorithms within this benchmark consider non-canonical pairings in their sequence design.
- Tertiary Inverse Design: We aim to go from a 3D RNA structure to a 1D RNA sequence. Key to this is to note that many 3D RNA structures are made up of a several chains of RNA, and thus they must be split into single chain RNAs before they can be properly fed into the inverse folding algorithms. The 3D RNA structure is represented by the PDB file, which is further converted into tensors for algorithmic representation.
The package contains the following algorithms:
- Secondary Pseudoknot-Free Inverse Design:
- antaRNA
- aRNAque
- GREED-RNA
- IncaRNAtion
- LEARNA
- LibLEARNA
- MCTS (Monte Carlo Tree Search)
- Meta-LEARNA
- Meta-LEARNA-Adapt
- OmniGenome-GA
- RNAInverse
- SAMFEO
- SentRNA
- Secondary Pseudoknotted Inverse Design:
- antaRNA
- aRNAque
- MCTS (Monte Carlo Tree Search)
- Tertiary Inverse Design:
- gRNAde
- RDesign
- RiboDiffusion
| Task | Dataset | Seq. & Struct. Count | Min Length | Max Length | Source |
|---|---|---|---|---|---|
| PK-free | Rfam-Learn-Train | 65000 | 50 | 450 | Runge2019 |
| PK-free | TR0-PKfree | 9815 | 33 | 498 | Singh2019 |
| PK-inc | TR0-PKinc | 999 | 37 | 496 | Singh2019 |
| PK-inc | Pseudobase++ | 251 | 21 | 137 | Kleinkauf2015 |
| Tertiary | RNAsolo | 4025 | 11 | 4455 | Joshi2023 |
| Task | Dataset | Seq. & Struct. Count | Min Length | Max Length | Source |
|---|---|---|---|---|---|
| PK-free | VL0-PKfree | 1193 | 33 | 447 | Singh2019 |
| PK-inc | VL0-PKinc | 107 | 53 | 497 | Singh2019 |
| Tertiary | RNAsolo | 100 | 11 | 714 | Joshi2023 |
| Task | Dataset | Seq. & Struct. Count | Min Length | Max Length | Source |
|---|---|---|---|---|---|
| PK-free | Eterna100-v1 | 100 | 11 | 399 | Singh2019 |
| PK-free | Eterna100-v2 | 100 | 11 | 399 | Koodli2021 |
| PK-free | Rfam-Learn-Test | 100 | 50 | 446 | Runge2019 |
| PK-free | Rfam-Taneda | 29 | 54 | 451 | Taneda2012 |
| PK-free | Rfam-Test | 63 | 35 | 273 | Kleinkauf2015 |
| PK-free | RNA-Strand | 50 | 20 | 98 | Kleinkauf2015 |
| PK-free | TS0-PKfree | 1176 | 27 | 499 | Singh2019 |
| PK-inc | TS0-PKinc | 129 | 22 | 481 | Singh2019 |
| PK-inc | Pseudobase++ | 251 | 21 | 137 | Kleinkauf2015 |
| Tertiary | DAS-Split | 98 | 19 | 159 | Joshi2023 |
| Tertiary | structsim-v2-split | 51 | 21 | 373 | Joshi2023 |
| Tertiary | RNA-Puzzles | 94 | 6 | 256 | Magnus2020RNAPuzzles |
| Tertiary | CASP15 | 122 | 43 | 720 | Elofsson2023CASP15 |
The package contains dockerfiles for each environment that is required to use the algorithms. Please make use of these when using the package!
We provide example code to run the algorithms, each example run uses the following naming convention: [algorithm_name]_run.py
There are two key ways to run RNAInvBench. The recommended way is through docker, however you can also run it through creating your own environment through pyenv or conda.
1. Install RNAInvBench source files.
2. Install docker (if not done so already).
3. Navigate to the project directory.
4. Run docker-compose up --build -d to build the docker container.
Once the docker-container has been fully built, you should see a new container in your docker application. You can then use the following commands to run the example main.py scripts provided:NOTE: RL represents the particular docker container to use, you will need to choose between rl, optim and antarna.
NOTE2: rnainvbench-main-rl-1 represents the name of the RL docker container, your name may differ - you should check the docker application for your own unique name.
docker exec -it $(docker-compose ps -q rl) rnainvbench-main-rl-1 python /app/rl_design_algos/main.py
In each of the rl_design_algos, rna_design_algorithms and Tertiary_Design folders, an environment.yml file is provided. To run the algorithms within these folders, you need to install this environment.yml file.
Run conda env create -f environment.yml to install the environment.
To run the code afterwards, you simply need to use the main.py file provided within each of the folders.