Skip to content

DasLab/zetafold

Repository files navigation

ZetaFold

(C) R. Das & Das laboratory, Stanford University 2018-2019

What this is

A Python package for modeling the statistical physics of RNA folding at the secondary structure level.

Goals:

  • Code that is easy to read so that humans can easily extend it to model new RNA physics
  • Code with numerous tests built in so that extensions are correct
  • A package that can learn from the huge data sets our lab is collecting.

A separate C++ package zetafoldplus with the same functionality and matching python bindings and likely up to 100x the speed is being developed separately in a private repository.

Features

This code brings together features pioneered in (but scattered across) prior packages:

  • Multi-strand calculations
  • Circular RNAs
  • Co-axial stacking
  • True partition function calculations in N^3 time
  • Base pair probability estimates
  • Gradients of predicted observables with respect to energy model parameters, to enable learning from data
  • Enumerative backtracking to get all structures and their Boltzmann weights
  • Stochastic backtracking to get Boltzmann-sampled structures
  • Minimum free energy structures
  • Rapid calculation of gradients (mostly N^2) to enable efficient learning from large data sets
  • Modeling of ligand/protein binding to RNA hairpins and internal loops (coming soon)
  • Modeling of protein binding to RNA single-stranded segments (coming soon)
  • Generalized base pairs (e.g., both Watson-Crick and Sugar/Hoogsteen G-A pairs) (coming soon)
  • 'Classic' Turner2004 & ContraFold parameters (coming soon)

This code also presents entirely new features, based on recent theoretical insights from R. Das & laboratory:

  • Cross-checks based on computation of the partition function N different ways for each RNA.
  • Linear motifs identified by Rosetta or by crystallography as having favorable energy bonuses (coming soon)
  • Loop penalties that rise like the logarithm of the number of loop nucleotides, still in N^3 time (coming soon)
  • Parameters for chemically modified bases, and some modified backbones, based on Rosetta calculations (coming soon)
  • Modeling of protein binding to RNA, including proper steric exclusion effects. (coming soon)
  • Modeling of RNA tertiary contacts, through a novel iterative sampling method, Rosetta-calculated properties of the contacts, and efficient C_eff calculations. (coming soon)
  • Tracking and propagation of estimated model uncertainties. (coming soon)
  • Easy install through sudo pip (coming soon)

License

This code is being released with the MIT license. So you can distribute it with your code.

Getting started

Clone this repository, and just type:

./zetafold.py

to run tests on a bunch of example sequences.

To run on tRNA(phe) from yeast and get a (pseudo)MFE structure:

./zetafold.py -s GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA --mfe

To get base pair probabilities for tRNA(phe) from yeast (takes about 2x the computation):

./zetafold.py -s GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA --bpp

To circularize:

./zetafold.py -s GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA --circle

To run on a multi-strand system, type:

./zetafold.py -s GCAACG CGAAGC

To re-run tRNA as a totally weird circular permutation:

./zetafold.py -s UGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGAC --circle

Should get the same answer as above linear case!

Contributing

More information on making contributions coming soon.

About

Sandbox for secondary structure modeling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages