# RNA Design with OmniGenomeModelForRNADesign


In this tutorial, we will walk through how to set up and use the `OmniGenomeModelforRNADesign` class to design RNA sequences. We will cover the following topics:
1. Setting up the model
2. Running RNA design
3. Saving and loading results
4. Fine-tuning the parameters
5. Visualizing RNA structures
    

## Tutorial 1: Setting Up the OmniGenome Model for RNA Design

In [1]:

# Install dependencies (run this if needed)
!pip install OmniGenome torch transformers autocuda viennaRNA tqdm -U

Collecting OmniGenome
  Downloading OmniGenome-0.1.1a0-py3-none-any.whl.metadata (3.9 kB)
Collecting transformers
  Using cached transformers-4.45.1-py3-none-any.whl.metadata (44 kB)
Collecting tokenizers<0.21,>=0.20 (from transformers)
  Using cached tokenizers-0.20.0-cp39-none-win_amd64.whl.metadata (6.9 kB)
Downloading OmniGenome-0.1.1a0-py3-none-any.whl (118 kB)
   ---------------------------------------- 0.0/118.2 kB ? eta -:--:--
   --- ------------------------------------ 10.2/118.2 kB ? eta -:--:--
   ---------- ---------------------------- 30.7/118.2 kB 435.7 kB/s eta 0:00:01
   ---------------------------------------- 118.2/118.2 kB 1.1 MB/s eta 0:00:00
Using cached transformers-4.45.1-py3-none-any.whl (9.9 MB)
Using cached tokenizers-0.20.0-cp39-none-win_amd64.whl (2.3 MB)
Installing collected packages: tokenizers, transformers, OmniGenome
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.19.1
    Uninstalling tokenizers-0.19.1:
      Successfu

DEPRECATION: pytorch-lightning 1.7.6 has a non-standard dependency specifier torch>=1.9.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.*; python_version >= "3.7". pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
textattack 0.3.8 requires transformers==4.30.0, but

In [None]:
from omnigenome import OmniGenomeModelForRNADesign  # Assuming this is where the model class is defined
 
# Initialize the model for RNA design
model = OmniGenomeModelForRNADesign(model_path="anonymous8/OmniGenome-186M")


### Explanation
- **model_path**: Path to the pre-trained model for RNA design. We are using `"anonymous8/OmniGenome-186M"`.
    

## Tutorial 2: Running RNA Sequence Design

In [3]:

# Define the target RNA structure
structure = "(((....)))"  # Example of a simple RNA hairpin structure

# Run the genetic algorithm to design RNA sequences
best_sequences = model.run_rna_design(structure=structure, mutation_ratio=0.5, num_population=100, num_generation=100)

# Print the best sequence(s)
print("Best RNA sequences:", best_sequences)


Best RNA sequences: ['GCTGCTGGGC', 'GCTGTGGGGC', 'GCCAGCTGGC', 'GCTCTGGAGC', 'GCTGATGGGC', 'GGTGGCAGCC', 'GCCAAAGGGC', 'GCTGGAGGGC', 'GCCAAAGGGC', 'CGGATTCCCG', 'GCTCTCAAGC', 'GCTGTGGGGC', 'GGGCTTTCCC', 'GCTCAAGGGC', 'GCGCGCGCGC', 'CGCCTCGGCG', 'GCTGAGAGGC', 'GCTGCAGGGC', 'GCTGAAGGGC', 'GGCGAGGGCC', 'GCTAGGAGGC', 'GGGCTTGCCC', 'GGGATGGCCC', 'GCTGCCAAGC', 'GGCGAGGGCC', 'GCTGGCGGGC', 'GCCTTTTGGC', 'GGTGAAGGCC', 'GGCGGCGGCC', 'GCGGCTGCGC', 'GCTGCATGGC', 'GCTGTGGGGC', 'CGCGCGGGCG', 'GGTGCCCGCC', 'TGGAACCCCA', 'GCCCATGGGC', 'CCGAAGCCGG', 'GGGGGGGCCC', 'GCTGCATAGC', 'GCCCTCTGGC', 'GCCGCGGGGC', 'GCTACATGGC', 'GCGGGAGCGC', 'GGTGGCTGCC', 'GCCGTGGGGC', 'GCGCCCCCGC', 'GGTGTCAGCC', 'GGTGTGGGCC', 'GCTCCCGGGC', 'GCTGAGGAGC', 'GCTGCTGGGC', 'GGCCTTCGCC', 'GCGCCCCCGC', 'GCCCTTGGGC', 'GCCGTGGGGC', 'GGCGGCGGCC', 'CGTGCTGACG', 'CCTGAGGAGG', 'GCTACTTGGC', 'TGCGAGGGCA', 'GGCAAAGGCC', 'GCTGAAGAGC', 'CGGCTTGCCG', 'GGGCTTGCCC', 'GCTGAAGAGC', 'GCTGAAGGGC', 'GCCAGTGGGC', 'GGCGCGGGCC', 'GCGGAGGCGC', 'CCTGAGGGGG',


In this tutorial, we:
- Defined the RNA structure
- Ran the genetic algorithm for RNA design
    

## Tutorial 3: Saving and Loading Designed RNA Sequences

In [4]:

import json

# Save the best sequences to a file
output_file = "best_rna_sequences.json"
with open(output_file, "w") as f:
    json.dump({"structure": structure, "best_sequences": best_sequences}, f)

print(f"Best sequences saved to {output_file}")


Best sequences saved to best_rna_sequences.json


In [5]:

# Load the sequences from the saved file
with open(output_file, "r") as f:
    loaded_data = json.load(f)

print("Loaded RNA structure:", loaded_data["structure"])
print("Loaded best sequences:", loaded_data["best_sequences"])


Loaded RNA structure: (((....)))
Loaded best sequences: ['GCTGCTGGGC', 'GCTGTGGGGC', 'GCCAGCTGGC', 'GCTCTGGAGC', 'GCTGATGGGC', 'GGTGGCAGCC', 'GCCAAAGGGC', 'GCTGGAGGGC', 'GCCAAAGGGC', 'CGGATTCCCG', 'GCTCTCAAGC', 'GCTGTGGGGC', 'GGGCTTTCCC', 'GCTCAAGGGC', 'GCGCGCGCGC', 'CGCCTCGGCG', 'GCTGAGAGGC', 'GCTGCAGGGC', 'GCTGAAGGGC', 'GGCGAGGGCC', 'GCTAGGAGGC', 'GGGCTTGCCC', 'GGGATGGCCC', 'GCTGCCAAGC', 'GGCGAGGGCC', 'GCTGGCGGGC', 'GCCTTTTGGC', 'GGTGAAGGCC', 'GGCGGCGGCC', 'GCGGCTGCGC', 'GCTGCATGGC', 'GCTGTGGGGC', 'CGCGCGGGCG', 'GGTGCCCGCC', 'TGGAACCCCA', 'GCCCATGGGC', 'CCGAAGCCGG', 'GGGGGGGCCC', 'GCTGCATAGC', 'GCCCTCTGGC', 'GCCGCGGGGC', 'GCTACATGGC', 'GCGGGAGCGC', 'GGTGGCTGCC', 'GCCGTGGGGC', 'GCGCCCCCGC', 'GGTGTCAGCC', 'GGTGTGGGCC', 'GCTCCCGGGC', 'GCTGAGGAGC', 'GCTGCTGGGC', 'GGCCTTCGCC', 'GCGCCCCCGC', 'GCCCTTGGGC', 'GCCGTGGGGC', 'GGCGGCGGCC', 'CGTGCTGACG', 'CCTGAGGAGG', 'GCTACTTGGC', 'TGCGAGGGCA', 'GGCAAAGGCC', 'GCTGAAGAGC', 'CGGCTTGCCG', 'GGGCTTGCCC', 'GCTGAAGAGC', 'GCTGAAGGGC', 'GCCAGTGGGC', 'GGCG

## Tutorial 4: Fine-Tuning Parameters for Better RNA Sequence Design

In [None]:

# Run the design with a higher mutation ratio
best_sequences = model.run_rna_design(structure=structure, mutation_ratio=0.7, num_population=100, num_generation=100)
print("Best RNA sequences with higher mutation:", best_sequences)


In [None]:

# Run the design with a larger population size
best_sequences = model.run_rna_design(structure=structure, mutation_ratio=0.5, num_population=200, num_generation=100)
print("Best RNA sequences with larger population:", best_sequences)


In [None]:

# Run the design for more generations
best_sequences = model.run_rna_design(structure=structure, mutation_ratio=0.5, num_population=100, num_generation=200)
print("Best RNA sequences with more generations:", best_sequences)


## Tutorial 5: Visualizing the RNA Structure


You can visualize the RNA secondary structure using external tools like RNAfold from ViennaRNA.

### Step 1: Install RNAfold
To install RNAfold, you can use the following command (if on Ubuntu):

```bash
sudo apt-get install vienna-rna
```

### Step 2: Visualizing the Designed RNA
After obtaining your RNA sequence, you can visualize its secondary structure using RNAfold:

```bash
echo "GCGCUACGUCGCGAU" | RNAfold
```

This will output the predicted secondary structure along with the minimum free energy (MFE).


## Conclusion


By following these tutorials, you can:
- Set up and initialize the OmniGenomeModelforRNADesign for RNA sequence design.
- Run RNA sequence design with a genetic algorithm.
- Tune the parameters to optimize the design process.
- Save and load results.
- Visualize the RNA secondary structure using RNAfold.

Explore more advanced configurations and tweak parameters for better results!
