Skip to content

COLA-Laboratory/RNAInvBench

Repository files navigation

RNAInvBench: Benchmarking RNA Inverse Folding with Secondary and Tertiary Structures

RNAInvBench is a Python framework that provides environments, tasks, data, baseline algorithms, evaluation metrics and analysis tools for training/testing models for 2D and 3D RNA Inverse Design. Notably within this benchmark, we split the overarching problem of RNA Inverse Design into three key tasks:

  • Secondary Pseudoknot-Free Inverse Design: We aim to go from a 2D RNA structure, represented by dot-bracket notation, to a 1D RNA Sequence. We do not include the complex motif of pseudoknots, and most algorithms included here only include canonical base-pairings (A-U and G-C base-pairings only).
  • Secondary Pseudoknotted Inverse Design: We aim to go from a 2D RNA structure that may or may not contain pseudoknots, to a 1D RNA Sequence. Notably the 2D RNA structures here are also represented by dot-bracket notation, but an extended version is used, where the type of bracket corresponds to the level of the pseudoknot. All pseudoknotted inverse design algorithms within this benchmark consider non-canonical pairings in their sequence design.
  • Tertiary Inverse Design: We aim to go from a 3D RNA structure to a 1D RNA sequence. Key to this is to note that many 3D RNA structures are made up of a several chains of RNA, and thus they must be split into single chain RNAs before they can be properly fed into the inverse folding algorithms. The 3D RNA structure is represented by the PDB file, which is further converted into tensors for algorithmic representation.

The package contains the following algorithms:

  • Secondary Pseudoknot-Free Inverse Design:
    • antaRNA
    • aRNAque
    • GREED-RNA
    • IncaRNAtion
    • LEARNA
    • LibLEARNA
    • MCTS (Monte Carlo Tree Search)
    • Meta-LEARNA
    • Meta-LEARNA-Adapt
    • OmniGenome-GA
    • RNAInverse
    • SAMFEO
    • SentRNA
  • Secondary Pseudoknotted Inverse Design:
    • antaRNA
    • aRNAque
    • MCTS (Monte Carlo Tree Search)
  • Tertiary Inverse Design:
    • gRNAde
    • RDesign
    • RiboDiffusion

Dataset Overview

Training Data

Task Dataset Seq. & Struct. Count Min Length Max Length Source
PK-free Rfam-Learn-Train 65000 50 450 Runge2019
PK-free TR0-PKfree 9815 33 498 Singh2019
PK-inc TR0-PKinc 999 37 496 Singh2019
PK-inc Pseudobase++ 251 21 137 Kleinkauf2015
Tertiary RNAsolo 4025 11 4455 Joshi2023

Validation Data

Task Dataset Seq. & Struct. Count Min Length Max Length Source
PK-free VL0-PKfree 1193 33 447 Singh2019
PK-inc VL0-PKinc 107 53 497 Singh2019
Tertiary RNAsolo 100 11 714 Joshi2023

Testing Data

Task Dataset Seq. & Struct. Count Min Length Max Length Source
PK-free Eterna100-v1 100 11 399 Singh2019
PK-free Eterna100-v2 100 11 399 Koodli2021
PK-free Rfam-Learn-Test 100 50 446 Runge2019
PK-free Rfam-Taneda 29 54 451 Taneda2012
PK-free Rfam-Test 63 35 273 Kleinkauf2015
PK-free RNA-Strand 50 20 98 Kleinkauf2015
PK-free TS0-PKfree 1176 27 499 Singh2019
PK-inc TS0-PKinc 129 22 481 Singh2019
PK-inc Pseudobase++ 251 21 137 Kleinkauf2015
Tertiary DAS-Split 98 19 159 Joshi2023
Tertiary structsim-v2-split 51 21 373 Joshi2023
Tertiary RNA-Puzzles 94 6 256 Magnus2020RNAPuzzles
Tertiary CASP15 122 43 720 Elofsson2023CASP15

The package contains dockerfiles for each environment that is required to use the algorithms. Please make use of these when using the package!

We provide example code to run the algorithms, each example run uses the following naming convention: [algorithm_name]_run.py

Installation Instructions

There are two key ways to run RNAInvBench. The recommended way is through docker, however you can also run it through creating your own environment through pyenv or conda.

Docker Installation Instructions

1. Install RNAInvBench source files.

2. Install docker (if not done so already).

3. Navigate to the project directory.

4. Run docker-compose up --build -d to build the docker container.

Once the docker-container has been fully built, you should see a new container in your docker application. You can then use the following commands to run the example main.py scripts provided:

NOTE: RL represents the particular docker container to use, you will need to choose between rl, optim and antarna.

NOTE2: rnainvbench-main-rl-1 represents the name of the RL docker container, your name may differ - you should check the docker application for your own unique name.

docker exec -it $(docker-compose ps -q rl) rnainvbench-main-rl-1 python /app/rl_design_algos/main.py

Conda Installation Instructions

In each of the rl_design_algos, rna_design_algorithms and Tertiary_Design folders, an environment.yml file is provided. To run the algorithms within these folders, you need to install this environment.yml file.

Run conda env create -f environment.yml to install the environment.

To run the code afterwards, you simply need to use the main.py file provided within each of the folders.

About

Pipeline for interacting with and implementing algorithms to solve the RNA Design problem.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •