# Figure Out Stem-loop Structure

In order to help determine where in the miRNA sgRNAs cut, we added the RNAfold secondary structure for each miRNA. The input for RNAfold is a fasta file with the stem-loop sequence to fold which was created using the below code.

In [1]:
import data_processing as dp

def output_stem_fasta(db_name, sql_version="MySQL", firewall=False):
    """
        Creates a fasta file with all of the primary miRNA stemloop sequences
    """
    db_con = dp.DatabaseConnection(sql_version, db_name=db_name, firewall=firewall)
    
    rows = db_con.fetch_query("SELECT PriMiRName, StemLoopSeq FROM PrimaryMicroRNA;")
    with open('miRLoopSeqFile.fa', 'w') as fout:
        for row in rows:
            if sql_version == "MSSQL":
                name = row.PriMiRName
                seq = row.StemLoopSeq
            else:
                name, seq = row
            
            output = ">{}\n{}\n".format(name, seq)
            fout.write(output)
    db_con.close_cursor()
    db_con.close_connection()

In [2]:
output_stem_fasta("miR-test", firewall=True)

The RNAfold dot-bracket structure for each miRNA stemloop was caluclated using the ViennaRNA package (installed from <a href="http://www.tbi.univie.ac.at/RNA/">here</a>). The RNAfold program (Documentation <a href="https://www.tbi.univie.ac.at/RNA/RNAfold.1.html">here</a>) was used to predict the secondary structure of the stemloop primary miRNA sequence. Since the miRBase website does not provide any additional information about the parameters used to determine the secondary structure of the miRNA stemloops (only that they used RNAfold, see <a href="http://www.mirbase.org/help/search.shtml">Search Q&amp;A</a>), no constraints on folding were used. 

The miRNA stem-loop sequence file created above was moved into the same folder RNAfold was installed in.

In [3]:
%cd "C:/Program Files (x86)/ViennaRNA Package/"
!RNAfold --noPS < miRLoopSeqFile.fa > miRSecStructure.txt

C:\Program Files (x86)\ViennaRNA Package


The predicted structures were then imported into the database.

In [4]:
import data_processing as dp

def import_RNAfold(fold_file, db_name, sql_version="MySQL", firewall=False):
    """
        Imports the primary miRNA RNA fold data and fills in the PrimaryMicroRNA RNAfold column
        
        fold_file: File with RNAfold secondary structure created by ViennaRNA RNAfold --noPS
        db_name: name of the database to connect to
        sql_version [optional]: version of SQL of the database (ie MSSQL or MySQL); default: MySQL
        firewall [optional]: if behind a firewall, will use ssh tunneling to connect to database; default: False
    """
    db_con = dp.DatabaseConnection(sql_version, db_name=db_name, firewall=firewall)
    
    fold_dict = {"RNAfold": []}
    pri_dict = {"PriMiRName": []}
    with open(fold_file, "r") as f:
        for line in f:
            # The name line is denoted by '>'
            if line[0] == ">":
                miRName = line.strip("\n")[1:]
            # Skip the line with the RNA sequence
            elif line[0] in "AUCG":
                continue
            else:
                dotFold = line.split(" ")[0] # Removes the minimum free energy, leaving the structure
                fold_dict["RNAfold"] += [dotFold]
                pri_dict["PriMiRName"] += [miRName]
    db_con.update_many_rows(fold_dict, pri_dict, "PrimaryMicroRNA")
    db_con.close_cursor()
    db_con.close_connection()

In [5]:
import_RNAfold("miRSecStructure.txt", "miR-test", firewall=True)