Skip to content

KRK13/FOREST2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RNA structure library design software (FOREST.py)

The software for the extraction of single and multi-terminal loops and design of an RNA structure library for FOREST (Folded RNA Element Profiling with Structure library)

KomatsuKR-2020-FOREST-Github

Installation

git clone https://github.com/KRK13/FOREST2020.git

Contents

  • Demo

    • Data
      • barcode25mer_10000.txt
      • test.fa
      • test_100.fa
    • Result_MotifExtraction
      • result.txt
      • result_100.txt
    • Result_LibraryGeneration
      • RNAstructurelibrary_bn3.txt
      • RNAstructurelibrary_bn3.dnatemp.txt
      • RNAstructurelibrary_bn3.array.txt
  • FOREST.py

  • README.md


Requirement

  • Python 3 (3.7.3)

Usage

Input

Positional arguments:
filename Input file: a multiple FASTA file with RNA secondary structures in dot-bracket format

Optional arguments:
-h, --help: Show the help message
-L: Limit the maximum length of extracted motifs (Default: 136)
-bn: Determine the number of barcodes (Default: 5)

Optional arguments for changing the output.
-lib, --library: Generate a list of RNA probes derived from the extracted RNA motifs(stdout)
-b, --barcodes: Specify a multiple FASTA file that contains DNA barcodes (If you use --library, this option is required)
-t, --templates: Generate a list of DNA templates that contain reverse complementary DNA of RNA probe with T7 promoter
-a, --array: Generate a list of DNA barcodes assigned to RNA probes (stdout)

Output

RNA motifs (Default)

Library design mode (--library)

RNA probes generated by concatenating the following components.

  • RNA barcodes (The option can determine the number of barcodes -b. The default parameter sets to 5)
  • Commonstem structure forward
  • RNA region
  • Stabilizing stem reverse

Template design mode (--library --templates)

DNA template of the RNA structure library for ordering an oligo pool.

DNA barcode microarray design mode (--library --array)

DNA barcodes assigned to RNA probes for ordering a DNA barcode microarray.

File Instructions

The input file contains names, RNA sequences, and RNA secondary structures represented in the dot-bracket format.

# test.fa.txt
>Name-1
CAGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGUACGAU
.(.(((..(((((((((((((...........)))).)).)))))))..).)).).....
>Name-2
AUCAGAAACUUUAAUUCCGGAGUAGGUACAGAUAUCGCCCACCGGAUAGCUCGCUAGGACUCUCUGCGCAACUUCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGAUAUGCGCACGGACAGGCACUAGUGAGGAGAAAGGGUUGUCGUGCACGACCAUCGGAUUUUAGGGUAAAACCCUCUACGAGCAGAGUUCUAUAGC
...((((.((((...(((((....((((....)))).....)))))...((((((((..((((.((((((...(((((((...((((((((...((((........((((.((((....((((......))))...((.((((.((......)))))))).........))))))))))))....))).))))).....)))))))..)))))).)))...)..)))))))).......((((((.....)))))).(((......(((((...)))))...)))..)))))))).....

Demo

Executed by MacPro (2013 Late) RAM 64GB

cd FOREST2020

# Motif extraction
python FOREST.py ./Demo/Data/test.fa.txt -L 134 > result.txt
real    0m0.057s
user    0m0.028s
sys    0m0.012s 

# Motif extraction
python FOREST.py ./Demo/Data/test_100.fa.txt -L 134 > result_100.txt
real    0m4.188s
user    0m4.139s
sys    0m0.029s

# RNA probe design - Three different barcodes per RNA structure (Default: 5)
python FOREST.py ./Demo/Data/test_100.fa.txt --library --barcodes ./Demo/Data/barcode25mer_100000.txt > RNAstructurelibrary.txt -bn 3 > RNAstructurelibrary_bn3.txt
real    0m4.756s
user    0m4.672s
sys    0m0.058s

# DNA template pool design
python FOREST.py ./Demo/Data/test_100.fa.txt --library --barcodes ./Demo/Data/barcode25mer_100000.txt > RNAstructurelibrary.txt -bn 3  --templates > RNAstructurelibrary_bn3.dnatemp.txt
real    0m4.863s
user    0m4.782s
sys    0m0.055s

# Generation of the DNA barcode strand that captures RNA probes
python FOREST.py ./Demo/Data/test_100.fa.txt --library --barcodes ./Demo/Data/barcode25mer_100000.txt > RNAstructurelibrary.txt -bn 3  --array > RNAstructurelibrary_bn3.array.txt
real	0m4.790s
user	0m4.709s
sys	0m0.055s

Examples

(1) TerminalMotifExtraction. The length limitation is set to 134 nt.

python FOREST.py test.fa.txt -L 134

>Name-1_Motif_1
AGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGU
(.(((..(((((((((((((...........)))).)).)))))))..).)).)
>Name-2_Motif_1
UCCGGAGUAGGUACAGAUAUCGCCCACCGGA
(((((....((((....)))).....)))))
>Name-2_Motif_2
CCGAAUGUACUCGG
((((......))))
>Name-2_Motif_3
CGAUGGGCAGCUCAGUCUCCCACG
((.((((.((......))))))))
>Name-2_Motif_4
GGUUGUCGUGCACGACC
((((((.....))))))
>Name-2_Motif_5
UCGGAUUUUAGGGUAAAACCCUCUACGA
(((......(((((...)))))...)))
>Name-2_Multi_1_ComplexLevel_1
UCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGA
(((((((...((((((((...((((........((((.((((....((((......))))...((.((((.((......)))))))).........))))))))))))....))).))))).....)))))))

(2) Design the RNA structure library.

python FOREST.py ./Demo/Data/test.fa.txt -L 134 --library --barcodes ./Demo/Data/barcode25mer_100000.txt -bn 2

GGGGCAAACUUUAGCCGGUUGUUGGCUAGUGUACGAAGUUUCAGCAGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGUGCUGAAGCUUCGUGCAC
>Name-1_Motif_1_Barcode_2
GGGUGGGUGGUCUUAUACGGUUGACUAGGUGUACGAAGUUUCAGCAGGGCAUCGAUGCUAGCAGGGAUAUUUGGGGCUUGACUGGGCAUUGCGGAUCGUGCUGAAGCUUCGUGCAC
>Name-2_Motif_1_Barcode_1
GGGAAAUAAGACUGGCUCGGGCAUUCUCGUGUACGAAGUUUCAGCUCCGGAGUAGGUACAGAUAUCGCCCACCGGAGCUGAAGCUUCGUGCAC
>Name-2_Motif_1_Barcode_2
GGGUCUGUGUCACCCUAGAGACAGAUGCGUGUACGAAGUUUCAGCUCCGGAGUAGGUACAGAUAUCGCCCACCGGAGCUGAAGCUUCGUGCAC
>Name-2_Motif_2_Barcode_1
GGGAUUUACAGUCCAUCCCAUGCCGCAGGUGUACGAAGUUUCAGCCCGAAUGUACUCGGGCUGAAGCUUCGUGCAC
>Name-2_Motif_2_Barcode_2
GGGUGAUAAGUGUCCGGGUCCGGGUGUAGUGUACGAAGUUUCAGCCCGAAUGUACUCGGGCUGAAGCUUCGUGCAC
>Name-2_Motif_3_Barcode_1
GGGCGUUGACUGCUUAAUGAGAUGUGGCGUGUACGAAGUUUCAGCCGAUGGGCAGCUCAGUCUCCCACGGCUGAAGCUUCGUGCAC
>Name-2_Motif_3_Barcode_2
GGGUAGUCCCUUCAGCGCCGGCAAUUAGGUGUACGAAGUUUCAGCCGAUGGGCAGCUCAGUCUCCCACGGCUGAAGCUUCGUGCAC
>Name-2_Motif_4_Barcode_1
GGGCAGACUGAUAGUAUGCACACGCUUUGUGUACGAAGUUUCAGCGGUUGUCGUGCACGACCGCUGAAGCUUCGUGCAC
>Name-2_Motif_4_Barcode_2
GGGAGCCUUCAGUCCAAGCAUGACUGACGUGUACGAAGUUUCAGCGGUUGUCGUGCACGACCGCUGAAGCUUCGUGCAC
>Name-2_Motif_5_Barcode_1
GGGCUACACUACGAGUAGCGGCAUAUAGGUGUACGAAGUUUCAGCUCGGAUUUUAGGGUAAAACCCUCUACGAGCUGAAGCUUCGUGCAC
>Name-2_Motif_5_Barcode_2
GGGACGACAAAUUCCGUCCGUUGAGUGUGUGUACGAAGUUUCAGCUCGGAUUUUAGGGUAAAACCCUCUACGAGCUGAAGCUUCGUGCAC
>Name-2_Multi_1_ComplexLevel_1_Barcode_1
GGGGUAGAUUACUGGCGGGACUGGUCAAGUGUACGAAGUUUCAGCUCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGAGCUGAAGCUUCGUGCAC
>Name-2_Multi_1_ComplexLevel_1_Barcode_2
GGGGUAUGAAUCUGCUCAUUUAACGCGCGUGUACGAAGUUUCAGCUCCACUAAACGAUCGAAAAACCGCGUAUUGACACCCGCUGUGAGACCCGAAUGUACUCGGAUGCGAUGGGCAGCUCAGUCUCCCACGACCUACCGACACACGGGCGUGUAGAUUUACGAUCAUACGUGGUGGAGCUGAAGCUUCGUGCAC

(3) Generate ssDNA template sequences of (2)

python FOREST.py ./Demo/Data/test.fa.txt -L 134 --library --barcodes ./Demo/Data/barcode25mer_100000.txt -bn 2 --templates

>Name-1_Motif_1_Barcode_1_template
GTGCACGAAGCTTCAGCACGATCCGCAATGCCCAGTCAAGCCCCAAATATCCCTGCTAGCATCGATGCCCTGCTGAAACTTCGTACACTAGCCAACAACCGGCTAAAGTTTGCCCCTATAGTGAGTCGTATTAGCGC
>Name-1_Motif_1_Barcode_2_template
GTGCACGAAGCTTCAGCACGATCCGCAATGCCCAGTCAAGCCCCAAATATCCCTGCTAGCATCGATGCCCTGCTGAAACTTCGTACACCTAGTCAACCGTATAAGACCACCCACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_1_Barcode_1_template
GTGCACGAAGCTTCAGCTCCGGTGGGCGATATCTGTACCTACTCCGGAGCTGAAACTTCGTACACGAGAATGCCCGAGCCAGTCTTATTTCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_1_Barcode_2_template
GTGCACGAAGCTTCAGCTCCGGTGGGCGATATCTGTACCTACTCCGGAGCTGAAACTTCGTACACGCATCTGTCTCTAGGGTGACACAGACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_2_Barcode_1_template
GTGCACGAAGCTTCAGCCCGAGTACATTCGGGCTGAAACTTCGTACACCTGCGGCATGGGATGGACTGTAAATCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_2_Barcode_2_template
GTGCACGAAGCTTCAGCCCGAGTACATTCGGGCTGAAACTTCGTACACTACACCCGGACCCGGACACTTATCACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_3_Barcode_1_template
GTGCACGAAGCTTCAGCCGTGGGAGACTGAGCTGCCCATCGGCTGAAACTTCGTACACGCCACATCTCATTAAGCAGTCAACGCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_3_Barcode_2_template
GTGCACGAAGCTTCAGCCGTGGGAGACTGAGCTGCCCATCGGCTGAAACTTCGTACACCTAATTGCCGGCGCTGAAGGGACTACCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_4_Barcode_1_template
GTGCACGAAGCTTCAGCGGTCGTGCACGACAACCGCTGAAACTTCGTACACAAAGCGTGTGCATACTATCAGTCTGCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_4_Barcode_2_template
GTGCACGAAGCTTCAGCGGTCGTGCACGACAACCGCTGAAACTTCGTACACGTCAGTCATGCTTGGACTGAAGGCTCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_5_Barcode_1_template
GTGCACGAAGCTTCAGCTCGTAGAGGGTTTTACCCTAAAATCCGAGCTGAAACTTCGTACACCTATATGCCGCTACTCGTAGTGTAGCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Motif_5_Barcode_2_template
GTGCACGAAGCTTCAGCTCGTAGAGGGTTTTACCCTAAAATCCGAGCTGAAACTTCGTACACACACTCAACGGACGGAATTTGTCGTCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Multi_1_ComplexLevel_1_Barcode_1_template
GTGCACGAAGCTTCAGCTCCACCACGTATGATCGTAAATCTACACGCCCGTGTGTCGGTAGGTCGTGGGAGACTGAGCTGCCCATCGCATCCGAGTACATTCGGGTCTCACAGCGGGTGTCAATACGCGGTTTTTCGATCGTTTAGTGGAGCTGAAACTTCGTACACTTGACCAGTCCCGCCAGTAATCTACCCCTATAGTGAGTCGTATTAGCGC
>Name-2_Multi_1_ComplexLevel_1_Barcode_2_template
GTGCACGAAGCTTCAGCTCCACCACGTATGATCGTAAATCTACACGCCCGTGTGTCGGTAGGTCGTGGGAGACTGAGCTGCCCATCGCATCCGAGTACATTCGGGTCTCACAGCGGGTGTCAATACGCGGTTTTTCGATCGTTTAGTGGAGCTGAAACTTCGTACACGCGCGTTAAATGAGCAGATTCATACCCCTATAGTGAGTCGTATTAGCGC

(4) Generate DNA sequences for the DNA barcode microarray compatible with (2)

python FOREST.py ./Demo/Data/test.fa.txt -L 134 --library --barcodes ./Demo/Data/barcode25mer_100000.txt -bn 2 --array

>Name-1_Motif_1_Barcode_1_array
TAGCCAACAACCGGCTAAAGTTTGCCCC
>Name-1_Motif_1_Barcode_2_array
CTAGTCAACCGTATAAGACCACCCACCC
>Name-2_Motif_1_Barcode_1_array
GAGAATGCCCGAGCCAGTCTTATTTCCC
>Name-2_Motif_1_Barcode_2_array
GCATCTGTCTCTAGGGTGACACAGACCC
>Name-2_Motif_2_Barcode_1_array
CTGCGGCATGGGATGGACTGTAAATCCC
>Name-2_Motif_2_Barcode_2_array
TACACCCGGACCCGGACACTTATCACCC
>Name-2_Motif_3_Barcode_1_array
GCCACATCTCATTAAGCAGTCAACGCCC
>Name-2_Motif_3_Barcode_2_array
CTAATTGCCGGCGCTGAAGGGACTACCC
>Name-2_Motif_4_Barcode_1_array
AAAGCGTGTGCATACTATCAGTCTGCCC
>Name-2_Motif_4_Barcode_2_array
GTCAGTCATGCTTGGACTGAAGGCTCCC
>Name-2_Motif_5_Barcode_1_array
CTATATGCCGCTACTCGTAGTGTAGCCC
>Name-2_Motif_5_Barcode_2_array
ACACTCAACGGACGGAATTTGTCGTCCC
>Name-2_Multi_1_ComplexLevel_1_Barcode_1_array
TTGACCAGTCCCGCCAGTAATCTACCCC
>Name-2_Multi_1_ComplexLevel_1_Barcode_2_array
GCGCGTTAAATGAGCAGATTCATACCCC

Appendix

The standard output of FOREST.py --library --templates is formatted as a multiple FASTA file.
It can be directly uploaded or submitted to the suppliers of an oligo pool listed below.

* SureDesign (Agilent technologies)
* Oligo pools (Twist biosciences)
* oPool oligo pool (IDT)

Also, the DNA barcode microarray can be ordered from Agilent technologies (SureDesign).

Reference

RNA structurome-wide discovery of functional interactions with a multiplexed motif library
Kaoru R. Komatsu, Toshiki Taya, Sora Matsumoto, Emi Miyashita, Shunnichi Kashida and Hirohide Saito

WO2018003809A1: Hirohide Saito, Kaoru R. Komatsu
US20200048685A1, US10435738B2, EP3093342B1, JP6594776B2: Hirohide Saito, Toshiki Taya, Shunnichi Kashida

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages