Skip to content

Latest commit

 

History

History
275 lines (227 loc) · 11.3 KB

README.md

File metadata and controls

275 lines (227 loc) · 11.3 KB

RNA structure library design software (FOREST.py)

The software for the extraction of single and multi-terminal loops and design of an RNA structure library for FOREST (Folded RNA Element Profiling with Structure library)

KomatsuKR-2020-FOREST-Github

Installation

git clone https://github.com/KRK13/FOREST2020.git

Contents

  • Demo

    • Data
      • barcode25mer_10000.txt
      • test.fa
      • test_100.fa
    • Result_MotifExtraction
      • result.txt
      • result_100.txt
    • Result_LibraryGeneration
      • RNAstructurelibrary_bn3.txt
      • RNAstructurelibrary_bn3.dnatemp.txt
      • RNAstructurelibrary_bn3.array.txt
  • FOREST.py

  • README.md


Requirement

  • Python 3 (3.7.3)

Usage

Input

Positional arguments:
filename Input file: a multiple FASTA file with RNA secondary structures in dot-bracket format

Optional arguments:
-h, --help: Show the help message
-L: Limit the maximum length of extracted motifs (Default: 136)
-bn: Determine the number of barcodes (Default: 5)

Optional arguments for changing the output.
-lib, --library: Generate a list of RNA probes derived from the extracted RNA motifs(stdout)
-b, --barcodes: Specify a multiple FASTA file that contains DNA barcodes (If you use --library, this option is required)
-t, --templates: Generate a list of DNA templates that contain reverse complementary DNA of RNA probe with T7 promoter
-a, --array: Generate a list of DNA barcodes assigned to RNA probes (stdout)

Output

RNA motifs (Default)

Library design mode (--library)

RNA probes generated by concatenating the following components.

  • RNA barcodes (The option can determine the number of barcodes -b. The default parameter sets to 5)
  • Commonstem structure forward
  • RNA region
  • Stabilizing stem reverse

Template design mode (--library --templates)

DNA template of the RNA structure library for ordering an oligo pool.

DNA barcode microarray design mode (--library --array)

DNA barcodes assigned to RNA probes for ordering a DNA barcode microarray.

File Instructions

The input file contains names, RNA sequences, and RNA secondary structures represented in the dot-bracket format.

# test.fa.txt
>Name-1
CAGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGUACGAU
.(.(((..(((((((((((((...........)))).)).)))))))..).)).).....
>Name-2
AUCAGAAACUUUAAUUCCGGAGUAGGUACAGAUAUCGCCCACCGGAUAGCUCGCUAGGACUCUCUGCGCAACUUCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGAUAUGCGCACGGACAGGCACUAGUGAGGAGAAAGGGUUGUCGUGCACGACCAUCGGAUUUUAGGGUAAAACCCUCUACGAGCAGAGUUCUAUAGC
...((((.((((...(((((....((((....)))).....)))))...((((((((..((((.((((((...(((((((...((((((((...((((........((((.((((....((((......))))...((.((((.((......)))))))).........))))))))))))....))).))))).....)))))))..)))))).)))...)..)))))))).......((((((.....)))))).(((......(((((...)))))...)))..)))))))).....

Demo

Executed by MacPro (2013 Late) RAM 64GB

cd FOREST2020

# Motif extraction
python FOREST.py ./Demo/Data/test.fa.txt -L 134 > result.txt
real    0m0.057s
user    0m0.028s
sys    0m0.012s 

# Motif extraction
python FOREST.py ./Demo/Data/test_100.fa.txt -L 134 > result_100.txt
real    0m4.188s
user    0m4.139s
sys    0m0.029s

# RNA probe design - Three different barcodes per RNA structure (Default: 5)
python FOREST.py ./Demo/Data/test_100.fa.txt --library --barcodes ./Demo/Data/barcode25mer_100000.txt > RNAstructurelibrary.txt -bn 3 > RNAstructurelibrary_bn3.txt
real    0m4.756s
user    0m4.672s
sys    0m0.058s

# DNA template pool design
python FOREST.py ./Demo/Data/test_100.fa.txt --library --barcodes ./Demo/Data/barcode25mer_100000.txt > RNAstructurelibrary.txt -bn 3  --templates > RNAstructurelibrary_bn3.dnatemp.txt
real    0m4.863s
user    0m4.782s
sys    0m0.055s

# Generation of the DNA barcode strand that captures RNA probes
python FOREST.py ./Demo/Data/test_100.fa.txt --library --barcodes ./Demo/Data/barcode25mer_100000.txt > RNAstructurelibrary.txt -bn 3  --array > RNAstructurelibrary_bn3.array.txt
real	0m4.790s
user	0m4.709s
sys	0m0.055s

Examples

(1) TerminalMotifExtraction. The length limitation is set to 134 nt.

python FOREST.py test.fa.txt -L 134

>Name-1_Motif_1
AGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGU
(.(((..(((((((((((((...........)))).)).)))))))..).)).)
>Name-2_Motif_1
UCCGGAGUAGGUACAGAUAUCGCCCACCGGA
(((((....((((....)))).....)))))
>Name-2_Motif_2
CCGAAUGUACUCGG
((((......))))
>Name-2_Motif_3
CGAUGGGCAGCUCAGUCUCCCACG
((.((((.((......))))))))
>Name-2_Motif_4
GGUUGUCGUGCACGACC
((((((.....))))))
>Name-2_Motif_5
UCGGAUUUUAGGGUAAAACCCUCUACGA
(((......(((((...)))))...)))
>Name-2_Multi_1_ComplexLevel_1
UCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGA
(((((((...((((((((...((((........((((.((((....((((......))))...((.((((.((......)))))))).........))))))))))))....))).))))).....)))))))

(2) Design the RNA structure library.

python FOREST.py ./Demo/Data/test.fa.txt -L 134 --library --barcodes ./Demo/Data/barcode25mer_100000.txt -bn 2

GGGGCAAACUUUAGCCGGUUGUUGGCUAGUGUACGAAGUUUCAGCAGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGUGCUGAAGCUUCGUGCAC
>Name-1_Motif_1_Barcode_2
GGGUGGGUGGUCUUAUACGGUUGACUAGGUGUACGAAGUUUCAGCAGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGUGCUGAAGCUUCGUGCAC
>Name-2_Motif_1_Barcode_1
GGGAAAUAAGACUGGCUCGGGCAUUCUCGUGUACGAAGUUUCAGCUCCGGAGUAGGUACAGAUAUCGCCCACCGGAGCUGAAGCUUCGUGCAC
>Name-2_Motif_1_Barcode_2
GGGUCUGUGUCACCCUAGAGACAGAUGCGUGUACGAAGUUUCAGCUCCGGAGUAGGUACAGAUAUCGCCCACCGGAGCUGAAGCUUCGUGCAC
>Name-2_Motif_2_Barcode_1
GGGAUUUACAGUCCAUCCCAUGCCGCAGGUGUACGAAGUUUCAGCCCGAAUGUACUCGGGCUGAAGCUUCGUGCAC
>Name-2_Motif_2_Barcode_2
GGGUGAUAAGUGUCCGGGUCCGGGUGUAGUGUACGAAGUUUCAGCCCGAAUGUACUCGGGCUGAAGCUUCGUGCAC
>Name-2_Motif_3_Barcode_1
GGGCGUUGACUGCUUAAUGAGAUGUGGCGUGUACGAAGUUUCAGCCGAUGGGCAGCUCAGUCUCCCACGGCUGAAGCUUCGUGCAC
>Name-2_Motif_3_Barcode_2
GGGUAGUCCCUUCAGCGCCGGCAAUUAGGUGUACGAAGUUUCAGCCGAUGGGCAGCUCAGUCUCCCACGGCUGAAGCUUCGUGCAC
>Name-2_Motif_4_Barcode_1
GGGCAGACUGAUAGUAUGCACACGCUUUGUGUACGAAGUUUCAGCGGUUGUCGUGCACGACCGCUGAAGCUUCGUGCAC
>Name-2_Motif_4_Barcode_2
GGGAGCCUUCAGUCCAAGCAUGACUGACGUGUACGAAGUUUCAGCGGUUGUCGUGCACGACCGCUGAAGCUUCGUGCAC
>Name-2_Motif_5_Barcode_1
GGGCUACACUACGAGUAGCGGCAUAUAGGUGUACGAAGUUUCAGCUCGGAUUUUAGGGUAAAACCCUCUACGAGCUGAAGCUUCGUGCAC
>Name-2_Motif_5_Barcode_2
GGGACGACAAAUUCCGUCCGUUGAGUGUGUGUACGAAGUUUCAGCUCGGAUUUUAGGGUAAAACCCUCUACGAGCUGAAGCUUCGUGCAC
>Name-2_Multi_1_ComplexLevel_1_Barcode_1
GGGGUAGAUUACUGGCGGGACUGGUCAAGUGUACGAAGUUUCAGCUCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGAGCUGAAGCUUCGUGCAC
>Name-2_Multi_1_ComplexLevel_1_Barcode_2
GGGGUAUGAAUCUGCUCAUUUAACGCGCGUGUACGAAGUUUCAGCUCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGAGCUGAAGCUUCGUGCAC

(3) Generate ssDNA template sequences of (2)

python FOREST.py ./Demo/Data/test.fa.txt -L 134 --library --barcodes ./Demo/Data/barcode25mer_100000.txt -bn 2 --templates

>Name-1_Motif_1_Barcode_1_template
GTGCACGAAGCTTCAGCACGATCCGCAATGCCCAGTCAAGCCCCAAATATCCCTGCTAGCATCGATGCCCTGCTGAAACTTCGTACACTAGCCAACAACCGGCTAAAGTTTGCCCCTATAGTGAGTCGTATTAGCGC
>Name-1_Motif_1_Barcode_2_template
GTGCACGAAGCTTCAGCACGATCCGCAATGCCCAGTCAAGCCCCAAATATCCCTGCTAGCATCGATGCCCTGCTGAAACTTCGTACACCTAGTCAACCGTATAAGACCACCCACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_1_Barcode_1_template
GTGCACGAAGCTTCAGCTCCGGTGGGCGATATCTGTACCTACTCCGGAGCTGAAACTTCGTACACGAGAATGCCCGAGCCAGTCTTATTTCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_1_Barcode_2_template
GTGCACGAAGCTTCAGCTCCGGTGGGCGATATCTGTACCTACTCCGGAGCTGAAACTTCGTACACGCATCTGTCTCTAGGGTGACACAGACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_2_Barcode_1_template
GTGCACGAAGCTTCAGCCCGAGTACATTCGGGCTGAAACTTCGTACACCTGCGGCATGGGATGGACTGTAAATCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_2_Barcode_2_template
GTGCACGAAGCTTCAGCCCGAGTACATTCGGGCTGAAACTTCGTACACTACACCCGGACCCGGACACTTATCACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_3_Barcode_1_template
GTGCACGAAGCTTCAGCCGTGGGAGACTGAGCTGCCCATCGGCTGAAACTTCGTACACGCCACATCTCATTAAGCAGTCAACGCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_3_Barcode_2_template
GTGCACGAAGCTTCAGCCGTGGGAGACTGAGCTGCCCATCGGCTGAAACTTCGTACACCTAATTGCCGGCGCTGAAGGGACTACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_4_Barcode_1_template
GTGCACGAAGCTTCAGCGGTCGTGCACGACAACCGCTGAAACTTCGTACACAAAGCGTGTGCATACTATCAGTCTGCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_4_Barcode_2_template
GTGCACGAAGCTTCAGCGGTCGTGCACGACAACCGCTGAAACTTCGTACACGTCAGTCATGCTTGGACTGAAGGCTCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_5_Barcode_1_template
GTGCACGAAGCTTCAGCTCGTAGAGGGTTTTACCCTAAAATCCGAGCTGAAACTTCGTACACCTATATGCCGCTACTCGTAGTGTAGCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_5_Barcode_2_template
GTGCACGAAGCTTCAGCTCGTAGAGGGTTTTACCCTAAAATCCGAGCTGAAACTTCGTACACACACTCAACGGACGGAATTTGTCGTCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Multi_1_ComplexLevel_1_Barcode_1_template
GTGCACGAAGCTTCAGCTCCACCACGTATGATCGTAAATCTACACGCCCGTGTGTCGGTAGGTCGTGGGAGACTGAGCTGCCCATCGCATCCGAGTACATTCGGGTCTCACAGCGGGTGTCAATACGCGGTTTTTCGATCGTTTAGTGGAGCTGAAACTTCGTACACTTGACCAGTCCCGCCAGTAATCTACCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Multi_1_ComplexLevel_1_Barcode_2_template
GTGCACGAAGCTTCAGCTCCACCACGTATGATCGTAAATCTACACGCCCGTGTGTCGGTAGGTCGTGGGAGACTGAGCTGCCCATCGCATCCGAGTACATTCGGGTCTCACAGCGGGTGTCAATACGCGGTTTTTCGATCGTTTAGTGGAGCTGAAACTTCGTACACGCGCGTTAAATGAGCAGATTCATACCCCTATAGTGAGTCGTATTAGCGC

(4) Generate DNA sequences for the DNA barcode microarray compatible with (2)

python FOREST.py ./Demo/Data/test.fa.txt -L 134 --library --barcodes ./Demo/Data/barcode25mer_100000.txt -bn 2 --array

>Name-1_Motif_1_Barcode_1_array
TAGCCAACAACCGGCTAAAGTTTGCCCC
>Name-1_Motif_1_Barcode_2_array
CTAGTCAACCGTATAAGACCACCCACCC
>Name-2_Motif_1_Barcode_1_array
GAGAATGCCCGAGCCAGTCTTATTTCCC
>Name-2_Motif_1_Barcode_2_array
GCATCTGTCTCTAGGGTGACACAGACCC
>Name-2_Motif_2_Barcode_1_array
CTGCGGCATGGGATGGACTGTAAATCCC
>Name-2_Motif_2_Barcode_2_array
TACACCCGGACCCGGACACTTATCACCC
>Name-2_Motif_3_Barcode_1_array
GCCACATCTCATTAAGCAGTCAACGCCC
>Name-2_Motif_3_Barcode_2_array
CTAATTGCCGGCGCTGAAGGGACTACCC
>Name-2_Motif_4_Barcode_1_array
AAAGCGTGTGCATACTATCAGTCTGCCC
>Name-2_Motif_4_Barcode_2_array
GTCAGTCATGCTTGGACTGAAGGCTCCC
>Name-2_Motif_5_Barcode_1_array
CTATATGCCGCTACTCGTAGTGTAGCCC
>Name-2_Motif_5_Barcode_2_array
ACACTCAACGGACGGAATTTGTCGTCCC
>Name-2_Multi_1_ComplexLevel_1_Barcode_1_array
TTGACCAGTCCCGCCAGTAATCTACCCC
>Name-2_Multi_1_ComplexLevel_1_Barcode_2_array
GCGCGTTAAATGAGCAGATTCATACCCC

Appendix

The standard output of FOREST.py --library --templates is formatted as a multiple FASTA file.
It can be directly uploaded or submitted to the suppliers of an oligo pool listed below.

* SureDesign (Agilent technologies)
* Oligo pools (Twist biosciences)
* oPool oligo pool (IDT)

Also, the DNA barcode microarray can be ordered from Agilent technologies (SureDesign).

Reference

RNA structurome-wide discovery of functional interactions with a multiplexed motif library
Kaoru R. Komatsu, Toshiki Taya, Sora Matsumoto, Emi Miyashita, Shunnichi Kashida and Hirohide Saito

WO2018003809A1: Hirohide Saito, Kaoru R. Komatsu
US20200048685A1, US10435738B2, EP3093342B1, JP6594776B2: Hirohide Saito, Toshiki Taya, Shunnichi Kashida