The software for the extraction of single and multi-terminal loops and design of an RNA structure library for FOREST (Folded RNA Element Profiling with Structure library)
git clone https://github.com/KRK13/FOREST2020.git
-
Demo
- Data
- barcode25mer_10000.txt
- test.fa
- test_100.fa
- Result_MotifExtraction
- result.txt
- result_100.txt
- Result_LibraryGeneration
- RNAstructurelibrary_bn3.txt
- RNAstructurelibrary_bn3.dnatemp.txt
- RNAstructurelibrary_bn3.array.txt
- Data
-
FOREST.py
-
README.md
- Python 3 (3.7.3)
Positional arguments:
filename Input file: a multiple FASTA file with RNA secondary structures in dot-bracket format
Optional arguments:
-h, --help
: Show the help message
-L
: Limit the maximum length of extracted motifs (Default: 136)
-bn
: Determine the number of barcodes (Default: 5)
Optional arguments for changing the output.
-lib, --library
: Generate a list of RNA probes derived from the extracted RNA motifs(stdout)
-b, --barcodes
: Specify a multiple FASTA file that contains DNA barcodes (If you use --library
, this option is required)
-t, --templates
: Generate a list of DNA templates that contain reverse complementary DNA of RNA probe with T7 promoter
-a, --array
: Generate a list of DNA barcodes assigned to RNA probes (stdout)
RNA motifs (Default)
RNA probes generated by concatenating the following components.
- RNA barcodes (The option can determine the number of barcodes -b. The default parameter sets to 5)
- Commonstem structure forward
- RNA region
- Stabilizing stem reverse
DNA template of the RNA structure library for ordering an oligo pool.
DNA barcodes assigned to RNA probes for ordering a DNA barcode microarray.
The input file contains names, RNA sequences, and RNA secondary structures represented in the dot-bracket format.
# test.fa.txt
>Name-1
CAGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGUACGAU
.(.(((..(((((((((((((...........)))).)).)))))))..).)).).....
>Name-2
AUCAGAAACUUUAAUUCCGGAGUAGGUACAGAUAUCGCCCACCGGAUAGCUCGCUAGGACUCUCUGCGCAACUUCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGAUAUGCGCACGGACAGGCACUAGUGAGGAGAAAGGGUUGUCGUGCACGACCAUCGGAUUUUAGGGUAAAACCCUCUACGAGCAGAGUUCUAUAGC
...((((.((((...(((((....((((....)))).....)))))...((((((((..((((.((((((...(((((((...((((((((...((((........((((.((((....((((......))))...((.((((.((......)))))))).........))))))))))))....))).))))).....)))))))..)))))).)))...)..)))))))).......((((((.....)))))).(((......(((((...)))))...)))..)))))))).....
Executed by MacPro (2013 Late) RAM 64GB
cd FOREST2020
# Motif extraction
python FOREST.py ./Demo/Data/test.fa.txt -L 134 > result.txt
real 0m0.057s
user 0m0.028s
sys 0m0.012s
# Motif extraction
python FOREST.py ./Demo/Data/test_100.fa.txt -L 134 > result_100.txt
real 0m4.188s
user 0m4.139s
sys 0m0.029s
# RNA probe design - Three different barcodes per RNA structure (Default: 5)
python FOREST.py ./Demo/Data/test_100.fa.txt --library --barcodes ./Demo/Data/barcode25mer_100000.txt > RNAstructurelibrary.txt -bn 3 > RNAstructurelibrary_bn3.txt
real 0m4.756s
user 0m4.672s
sys 0m0.058s
# DNA template pool design
python FOREST.py ./Demo/Data/test_100.fa.txt --library --barcodes ./Demo/Data/barcode25mer_100000.txt > RNAstructurelibrary.txt -bn 3 --templates > RNAstructurelibrary_bn3.dnatemp.txt
real 0m4.863s
user 0m4.782s
sys 0m0.055s
# Generation of the DNA barcode strand that captures RNA probes
python FOREST.py ./Demo/Data/test_100.fa.txt --library --barcodes ./Demo/Data/barcode25mer_100000.txt > RNAstructurelibrary.txt -bn 3 --array > RNAstructurelibrary_bn3.array.txt
real 0m4.790s
user 0m4.709s
sys 0m0.055s
python FOREST.py test.fa.txt -L 134
>Name-1_Motif_1
AGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGU
(.(((..(((((((((((((...........)))).)).)))))))..).)).)
>Name-2_Motif_1
UCCGGAGUAGGUACAGAUAUCGCCCACCGGA
(((((....((((....)))).....)))))
>Name-2_Motif_2
CCGAAUGUACUCGG
((((......))))
>Name-2_Motif_3
CGAUGGGCAGCUCAGUCUCCCACG
((.((((.((......))))))))
>Name-2_Motif_4
GGUUGUCGUGCACGACC
((((((.....))))))
>Name-2_Motif_5
UCGGAUUUUAGGGUAAAACCCUCUACGA
(((......(((((...)))))...)))
>Name-2_Multi_1_ComplexLevel_1
UCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGA
(((((((...((((((((...((((........((((.((((....((((......))))...((.((((.((......)))))))).........))))))))))))....))).))))).....)))))))
python FOREST.py ./Demo/Data/test.fa.txt -L 134 --library --barcodes ./Demo/Data/barcode25mer_100000.txt -bn 2
GGGGCAAACUUUAGCCGGUUGUUGGCUAGUGUACGAAGUUUCAGCAGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGUGCUGAAGCUUCGUGCAC
>Name-1_Motif_1_Barcode_2
GGGUGGGUGGUCUUAUACGGUUGACUAGGUGUACGAAGUUUCAGCAGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGUGCUGAAGCUUCGUGCAC
>Name-2_Motif_1_Barcode_1
GGGAAAUAAGACUGGCUCGGGCAUUCUCGUGUACGAAGUUUCAGCUCCGGAGUAGGUACAGAUAUCGCCCACCGGAGCUGAAGCUUCGUGCAC
>Name-2_Motif_1_Barcode_2
GGGUCUGUGUCACCCUAGAGACAGAUGCGUGUACGAAGUUUCAGCUCCGGAGUAGGUACAGAUAUCGCCCACCGGAGCUGAAGCUUCGUGCAC
>Name-2_Motif_2_Barcode_1
GGGAUUUACAGUCCAUCCCAUGCCGCAGGUGUACGAAGUUUCAGCCCGAAUGUACUCGGGCUGAAGCUUCGUGCAC
>Name-2_Motif_2_Barcode_2
GGGUGAUAAGUGUCCGGGUCCGGGUGUAGUGUACGAAGUUUCAGCCCGAAUGUACUCGGGCUGAAGCUUCGUGCAC
>Name-2_Motif_3_Barcode_1
GGGCGUUGACUGCUUAAUGAGAUGUGGCGUGUACGAAGUUUCAGCCGAUGGGCAGCUCAGUCUCCCACGGCUGAAGCUUCGUGCAC
>Name-2_Motif_3_Barcode_2
GGGUAGUCCCUUCAGCGCCGGCAAUUAGGUGUACGAAGUUUCAGCCGAUGGGCAGCUCAGUCUCCCACGGCUGAAGCUUCGUGCAC
>Name-2_Motif_4_Barcode_1
GGGCAGACUGAUAGUAUGCACACGCUUUGUGUACGAAGUUUCAGCGGUUGUCGUGCACGACCGCUGAAGCUUCGUGCAC
>Name-2_Motif_4_Barcode_2
GGGAGCCUUCAGUCCAAGCAUGACUGACGUGUACGAAGUUUCAGCGGUUGUCGUGCACGACCGCUGAAGCUUCGUGCAC
>Name-2_Motif_5_Barcode_1
GGGCUACACUACGAGUAGCGGCAUAUAGGUGUACGAAGUUUCAGCUCGGAUUUUAGGGUAAAACCCUCUACGAGCUGAAGCUUCGUGCAC
>Name-2_Motif_5_Barcode_2
GGGACGACAAAUUCCGUCCGUUGAGUGUGUGUACGAAGUUUCAGCUCGGAUUUUAGGGUAAAACCCUCUACGAGCUGAAGCUUCGUGCAC
>Name-2_Multi_1_ComplexLevel_1_Barcode_1
GGGGUAGAUUACUGGCGGGACUGGUCAAGUGUACGAAGUUUCAGCUCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGAGCUGAAGCUUCGUGCAC
>Name-2_Multi_1_ComplexLevel_1_Barcode_2
GGGGUAUGAAUCUGCUCAUUUAACGCGCGUGUACGAAGUUUCAGCUCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGAGCUGAAGCUUCGUGCAC
python FOREST.py ./Demo/Data/test.fa.txt -L 134 --library --barcodes ./Demo/Data/barcode25mer_100000.txt -bn 2 --templates
>Name-1_Motif_1_Barcode_1_template
GTGCACGAAGCTTCAGCACGATCCGCAATGCCCAGTCAAGCCCCAAATATCCCTGCTAGCATCGATGCCCTGCTGAAACTTCGTACACTAGCCAACAACCGGCTAAAGTTTGCCCCTATAGTGAGTCGTATTAGCGC
>Name-1_Motif_1_Barcode_2_template
GTGCACGAAGCTTCAGCACGATCCGCAATGCCCAGTCAAGCCCCAAATATCCCTGCTAGCATCGATGCCCTGCTGAAACTTCGTACACCTAGTCAACCGTATAAGACCACCCACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_1_Barcode_1_template
GTGCACGAAGCTTCAGCTCCGGTGGGCGATATCTGTACCTACTCCGGAGCTGAAACTTCGTACACGAGAATGCCCGAGCCAGTCTTATTTCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_1_Barcode_2_template
GTGCACGAAGCTTCAGCTCCGGTGGGCGATATCTGTACCTACTCCGGAGCTGAAACTTCGTACACGCATCTGTCTCTAGGGTGACACAGACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_2_Barcode_1_template
GTGCACGAAGCTTCAGCCCGAGTACATTCGGGCTGAAACTTCGTACACCTGCGGCATGGGATGGACTGTAAATCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_2_Barcode_2_template
GTGCACGAAGCTTCAGCCCGAGTACATTCGGGCTGAAACTTCGTACACTACACCCGGACCCGGACACTTATCACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_3_Barcode_1_template
GTGCACGAAGCTTCAGCCGTGGGAGACTGAGCTGCCCATCGGCTGAAACTTCGTACACGCCACATCTCATTAAGCAGTCAACGCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_3_Barcode_2_template
GTGCACGAAGCTTCAGCCGTGGGAGACTGAGCTGCCCATCGGCTGAAACTTCGTACACCTAATTGCCGGCGCTGAAGGGACTACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_4_Barcode_1_template
GTGCACGAAGCTTCAGCGGTCGTGCACGACAACCGCTGAAACTTCGTACACAAAGCGTGTGCATACTATCAGTCTGCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_4_Barcode_2_template
GTGCACGAAGCTTCAGCGGTCGTGCACGACAACCGCTGAAACTTCGTACACGTCAGTCATGCTTGGACTGAAGGCTCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_5_Barcode_1_template
GTGCACGAAGCTTCAGCTCGTAGAGGGTTTTACCCTAAAATCCGAGCTGAAACTTCGTACACCTATATGCCGCTACTCGTAGTGTAGCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_5_Barcode_2_template
GTGCACGAAGCTTCAGCTCGTAGAGGGTTTTACCCTAAAATCCGAGCTGAAACTTCGTACACACACTCAACGGACGGAATTTGTCGTCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Multi_1_ComplexLevel_1_Barcode_1_template
GTGCACGAAGCTTCAGCTCCACCACGTATGATCGTAAATCTACACGCCCGTGTGTCGGTAGGTCGTGGGAGACTGAGCTGCCCATCGCATCCGAGTACATTCGGGTCTCACAGCGGGTGTCAATACGCGGTTTTTCGATCGTTTAGTGGAGCTGAAACTTCGTACACTTGACCAGTCCCGCCAGTAATCTACCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Multi_1_ComplexLevel_1_Barcode_2_template
GTGCACGAAGCTTCAGCTCCACCACGTATGATCGTAAATCTACACGCCCGTGTGTCGGTAGGTCGTGGGAGACTGAGCTGCCCATCGCATCCGAGTACATTCGGGTCTCACAGCGGGTGTCAATACGCGGTTTTTCGATCGTTTAGTGGAGCTGAAACTTCGTACACGCGCGTTAAATGAGCAGATTCATACCCCTATAGTGAGTCGTATTAGCGC
python FOREST.py ./Demo/Data/test.fa.txt -L 134 --library --barcodes ./Demo/Data/barcode25mer_100000.txt -bn 2 --array
>Name-1_Motif_1_Barcode_1_array
TAGCCAACAACCGGCTAAAGTTTGCCCC
>Name-1_Motif_1_Barcode_2_array
CTAGTCAACCGTATAAGACCACCCACCC
>Name-2_Motif_1_Barcode_1_array
GAGAATGCCCGAGCCAGTCTTATTTCCC
>Name-2_Motif_1_Barcode_2_array
GCATCTGTCTCTAGGGTGACACAGACCC
>Name-2_Motif_2_Barcode_1_array
CTGCGGCATGGGATGGACTGTAAATCCC
>Name-2_Motif_2_Barcode_2_array
TACACCCGGACCCGGACACTTATCACCC
>Name-2_Motif_3_Barcode_1_array
GCCACATCTCATTAAGCAGTCAACGCCC
>Name-2_Motif_3_Barcode_2_array
CTAATTGCCGGCGCTGAAGGGACTACCC
>Name-2_Motif_4_Barcode_1_array
AAAGCGTGTGCATACTATCAGTCTGCCC
>Name-2_Motif_4_Barcode_2_array
GTCAGTCATGCTTGGACTGAAGGCTCCC
>Name-2_Motif_5_Barcode_1_array
CTATATGCCGCTACTCGTAGTGTAGCCC
>Name-2_Motif_5_Barcode_2_array
ACACTCAACGGACGGAATTTGTCGTCCC
>Name-2_Multi_1_ComplexLevel_1_Barcode_1_array
TTGACCAGTCCCGCCAGTAATCTACCCC
>Name-2_Multi_1_ComplexLevel_1_Barcode_2_array
GCGCGTTAAATGAGCAGATTCATACCCC
The standard output of FOREST.py --library --templates
is formatted as a multiple FASTA file.
It can be directly uploaded or submitted to the suppliers of an oligo pool listed below.
* SureDesign (Agilent technologies)
* Oligo pools (Twist biosciences)
* oPool oligo pool (IDT)
Also, the DNA barcode microarray can be ordered from Agilent technologies (SureDesign).
RNA structurome-wide discovery of functional interactions with a multiplexed motif library
Kaoru R. Komatsu, Toshiki Taya, Sora Matsumoto, Emi Miyashita, Shunnichi Kashida and Hirohide Saito
WO2018003809A1: Hirohide Saito, Kaoru R. Komatsu
US20200048685A1, US10435738B2, EP3093342B1, JP6594776B2: Hirohide Saito, Toshiki Taya, Shunnichi Kashida