RIdiffusion is a hyperbolic discrete diffusion model designed for 3D RNA inverse folding. Unlike traditional RNA design models that primarily focus on secondary structures, RIdiffusion integrates geometric deep learning and hyperbolic embedding to efficiently generate RNA sequences that fold into complex 3D structures.
- Hyperbolic Space Representation: Captures intricate RNA 3D structures more effectively than Euclidean methods.
- Discrete Diffusion Process: Enhances sequence recovery accuracy for inverse folding.
- Generative Model: Outperforms existing state-of-the-art (SOTA) approaches in low-data scenarios.
- Optimized for Functional RNA Design: Enables novel sequence generation for biotechnological and biomedical applications.
This repository provides a demo script to set up the environment, preprocess datasets, and run RIdiffusion for generating RNA sequences based on RNA 3D backbone.
Below is an overview of the RIdiffusion architecture: 
Ensure you have Python 3.9 or later installed. The main dependencies include:
torch==2.5.0torch-geometric==2.6.1torch-cluster==1.6.3torch-scatter==2.1.2torch-sparse==0.6.18torch-spline-conv==1.2.2
To set up your environment, use the following commands:
# 1. Create and activate a Conda environment
conda create --name RIdiffusion python=3.10 -y
conda activate RIdiffusion
# 2. Install PyTorch (adjust versions according to your CUDA compatibility)
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu118
# 3. Install PyTorch Geometric dependencies (compatible with PyTorch 2.5.0 & CUDA 11.8)
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv -f https://pytorch-geometric.com/whl/torch-2.5.0+cu118.html
# 4. Install PyTorch Geometric
pip install torch-geometric
# 5. Install additional required libraries
pip install ema-pytorch pandas matplotlib einops seaborn biopython rdkit scikit-learnNote: Modify the dependency versions based on your CUDA and PyTorch setup.
Move to the dataset source folder:
cd ./dataset_src/Download the dataset from Google Drive and place it in ./dataset_src/.
tar -zxvf preprocessed_dataset.tar.gzRun the following command to generate graph files, which will be stored in graph_dataset/:
cd ..
python generate_graph_ss.pyTo generate RNA sequences based on demo PDB structures in ./input_pdb/, run:
python seq_generator.py --pdb_dir ./input_pdb/demo
Or, to use RNA targets from CASP15, run:
python seq_generator.py --pdb_dir ./input_pdb/CASP15_structure
This will generate structured sequence outputs as described in the original manuscript.