# Predicting RNA secondary structure with LinearRNA

LinearRNA includes a series of linear-time algorithms/softwares for RNA secondary structure analysis: **LinearFold** and **LinearPartition**. 

# Install dependency

In [3]:
 !pip install paddlehelix
 from IPython.display import clear_output
 clear_output()
 print("Install successfully")

Install successfully


# Part I: LinearFold

**LinearFold** is the first linear-time prediction algorithm/software for RNA secondary structures. 
The LinearFold paper has been accepted by ISMB, a top-level conference on computational biology and published on Bioinformatics, an authoritative journal. The link of the paper is: [LinearFold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search](https://academic.oup.com/bioinformatics/article/35/14/i295/5529205)

## Machine learning model
```bash
linear_fold_c(rna_sequence, beam_size = 100, use_constraints = False, constraint = "", no_sharp_turn = True)
```
Using the machine learning model proposed by [CONTRAfold](https://pubmed.ncbi.nlm.nih.gov/16873527/).

### Parameter setting
- rna_sequence: the input RNA sequence to predict the secondary structure.
- beam_size: int (default 100), set 0 to turn off the beam pruning.
- use_constraints: bool (default False), enable adding constraints when predicting structures.
- constraint: string (default ""), the constraint sequence. It works when the parameter use_constraints is Ture. The  constraint sequence should have the same length as the RNA sequence. "? . ( )" indicates a position for which the proper matching is unknown, unpaired, left or right parenthesis respectively. The parentheses must be well-banlanced and non-crossing.
- no_sharp_turn: bool (default True), disable sharpturn in prediction.
### Return Value
- tuple(string, float): return a tuple including the predicted structures and the folding free energy.



In [4]:
import pahelix.toolkit.linear_rna as linear_rna
input_sequence = "AACUCCGCCAGGCCUGGAAGGGAGCAACGGUAGUGACACUCUCUGUGUGCGUAGGUUGCCUAGCUACCAUUU"
linear_rna.linear_fold_c(input_sequence)

('..((((.(((....)))...))))....((((((............................))))))....',
 0.4548597317188978)

In [5]:
# with constraints
constraint = "??(???(??????)?(????????)???(??????(???????)?)???????????)??.???????????"
linear_rna.linear_fold_c(input_sequence, use_constraints = True, constraint = constraint)

('..(.(((......)((........))(((......(.......).))).....))..)..............',
 -27.328358240425587)

## Thermodynamic model
```bash
linear_fold_v(rna_sequence, beam_size = 100, use_constraints = False, constraint = "", no_sharp_turn = True)
```
Using the thermodynamic model from [Vienna RNAfold](https://almob.biomedcentral.com/articles/10.1186/1748-7188-6-26).
The parameters are the same as the machine learning-based model.

In [7]:
linear_rna.linear_fold_v(input_sequence)

('..((((.(((....)))...))))....((((((.((((.....))))...((((...))))))))))....',
 -18.4)

In [8]:
# with constriants
linear_rna.linear_fold_v(input_sequence, use_constraints = True, constraint = constraint)

('..(.(((......)((........))(((......(.......).))).....))..)..............',
 13.4)

# Part II: LinearPartition

**LienarPartition** is the first linear-time partition function and base pair probabilities calculation algorithm/software for RNA secondary structures. The LinearPartition paper has been accepted by ISMB, a top-level conference on computational biology and published on Bioinformatics, an authoritative journal. The link of the paper is: [LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities](https://academic.oup.com/bioinformatics/article/36/Supplement_1/i258/5870487)

## Machine learning model
```bash
linear_fold_c(rna_sequence, beam_size = 100, use_constraints = False, constraint = "", no_sharp_turn = True)
```
Using the machine learning model from [CONTRAfold](https://pubmed.ncbi.nlm.nih.gov/16873527/).

### Parameter setting
- rna_sequence: string, the input RNA sequence to calculate partition function and base pair probabities. 
- beam_size: int (default 100), set 0 to turn off the beam pruning.
- bp_cutoff: double (default 0.0), only output base pairs with correponding proabilities whose values larger than the bp_cutoff (between 0 and 1).
- no_sharp_turn: bool (default True), enable sharpturn in prediction.
### Return
- tuple(string, list): ruturn a tuple consisting the partition function value, and a list of base pair probabilities

In [9]:
input_sequence = "UGAGUUCUCGAUCUCUAAAAUCG"
linear_rna.linear_partition_c("UGAGUUCUCGAUCUCUAAAAUCG", bp_cutoff = 0.2)

(0.6399469375610352,
 [(4, 13, 0.2007068395614624),
  (10, 22, 0.24661558866500854),
  (11, 21, 0.2457289695739746),
  (12, 20, 0.20926791429519653)])

## Thermodynamic model
```bash
linear_fold_v(rna_sequence, beam_size = 100, use_constraints = False, constraint = "", no_sharp_turn = True)
```
Using the thermodynamic model from [Vienna RNAfold](https://almob.biomedcentral.com/articles/10.1186/1748-7188-6-26).
The parameters are the same as the machine learning model.

In [10]:
linear_rna.linear_partition_v(input_sequence, bp_cutoff = 0.5)

(-1.9573111534118652,
 [(2, 15, 0.833134651184082),
  (3, 14, 0.8365526795387268),
  (4, 13, 0.8355389833450317)])