# Design Barcodes

    by Pu Zheng
    2018.3.12
    
This code is written to develop orthogonal 30-mer 4-digit barcodes based on a previous 25-mer design [Design of 240,000 orthogonal 25mer DNA barcode probes](http://www.pnas.org/content/106/7/2289)

In [1]:
%run "E:\Users\puzheng\Documents\Startup_py3.py"
sys.path.append(r"E:\Users\puzheng\Documents")

import ImageAnalysis3 as ia
%matplotlib notebook

from ImageAnalysis3 import library_tools
from ImageAnalysis3.library_tools import _readout_folder
from ImageAnalysis3.library_tools import readouts


In [2]:
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
from Bio.SeqRecord import SeqRecord

## 0. Summarize existing readouts to file:
[cand_readouts.fasta](r'W:\Pu\Readouts\cand_readouts.fasta')

In [8]:
#readout_folder = r'W:\Pu\Readouts'
readout_folder = r'E:\Users\puzheng\Documents\Readouts'

In [59]:
existing_readouts = readouts._generate_existing_readouts(['Stvs.fasta','NDBs.fasta'], 
                                                         readout_folder=readout_folder)

- Start merging existing readouts from files: ['E:\\Users\\puzheng\\Documents\\Readouts\\Stvs.fasta', 'E:\\Users\\puzheng\\Documents\\Readouts\\NDBs.fasta']
-- 1326 readouts are loaded.
-- saving to file:E:\Users\puzheng\Documents\Readouts\cand_readouts.fasta


## 1. Select from 25-mer candidates
The file [bc25mer.240k.fasta](C:\Users\puzheng\Documents\Barcodes\bc25mer.240k.fasta) should be downloaded from:
[link](http://elledgelab.med.harvard.edu/?page_id=638)

In [60]:
reload(library_tools)
selected_candidates = readouts.Search_Candidates('bc25mer.240k.fasta', total_cand=1000, 
                                                 readout_folder=readout_folder)

- Start selecting readout candidates from E:\Users\puzheng\Documents\Readouts\bc25mer.240k.fasta,
	filtering with E:\Users\puzheng\Documents\Readouts\cand_readouts.fasta 
--- processing: TATGAGGACGAATCTCCCGCTTATA
--- candidate:0 CCAAATATGAGGACGAATCTCCCGCTTATA saved
--- processing: GGTCTTGACAAACGTGTGCTTGTAC
--- candidate:1 CCGTAGGTCTTGACAAACGTGTGCTTGTAC saved
--- processing: GTTTATCGGGCGTGGTGCTCGCATA
--- candidate:2 CTCGTGTTTATCGGGCGTGGTGCTCGCATA saved
--- processing: CCGATGTTGACGGACTAATCCTGAC
--- processing: TAGTAGTTCAGACGCCGTTAAGCGC
--- processing: CCGTACCTAGATACACTCAATTTGT
--- candidate:3 TCGGTCCGTACCTAGATACACTCAATTTGT saved
--- processing: GGGGTTCCGTTTTACATTCCAGGAA
--- processing: TATCCCGTGAAGCTTGAGTGGAATC
--- candidate:4 CCCTATATCCCGTGAAGCTTGAGTGGAATC saved
--- processing: CTGACGTGTGAGGCGCTAGAGCATA
--- processing: GGTATGGCACGCCTAATCTGGACAC
--- processing: GGATGCATGATCTAGGGCCTCGTCT
--- candidate:5 CTTAAGGATGCATGATCTAGGGCCTCGTCT saved
--- processing: GAGGTCTTTCATGCGTATAGTCACA
--- can

--- candidate:64 TGAGAGCGTTCAGCATTCATTACGTCTCAC saved
--- processing: GCTGAGGAAGCCCAATGTTCAGTAC
--- candidate:65 GCAATGCTGAGGAAGCCCAATGTTCAGTAC saved
--- processing: GTAACCTTAGCACGCCGGAGTGGAG
--- candidate:66 TACTAGTAACCTTAGCACGCCGGAGTGGAG saved
--- processing: GCGATTAGTTCTGTTGCTAAACCAG
--- candidate:67 CCACAGCGATTAGTTCTGTTGCTAAACCAG saved
--- processing: CCTTATGAGAGCTCGTTGTTCGGTG
--- candidate:68 CCGATCCTTATGAGAGCTCGTTGTTCGGTG saved
--- processing: CTTAAAGGTGATTCACACGTGTGCC
--- candidate:69 TGCATCTTAAAGGTGATTCACACGTGTGCC saved
--- processing: TTACTGAGTAACGTTCTACCCCGAA
--- processing: ATACTTATTCCGCTCATTGCACAGG
--- candidate:70 TGCGTATACTTATTCCGCTCATTGCACAGG saved
--- processing: TGAAACCAATTTCACCTCAGCGGCG
--- candidate:71 AAGTTTGAAACCAATTTCACCTCAGCGGCG saved
--- processing: GCGATCGGTAGGTACCTTTTCAGTA
--- processing: CGACCACTCGCCTCCCGTTATGATC
--- processing: GTTAATGCTGTTTAGCGTAACCTCG
--- candidate:72 CTCTTGTTAATGCTGTTTAGCGTAACCTCG saved
--- processing: GTCTAATATGGCGTAGCTCAACGCG
--- proces

--- candidate:121 TCCGAATACCTGTTATCCTGGGATGATTTC saved
--- processing: ATCGTTCAGGAATGCCGTTCTCGAC
--- processing: TCATACGTAATGGAGGCTCGCCTTG
--- processing: ACGTCTAGTGTGGTACGGTATCCAC
--- processing: ATAGAACCCCTCAGGATCTCATGAA
--- processing: CCGTCTGACAGGGTTTGAATTGATC
--- candidate:122 CCTGTCCGTCTGACAGGGTTTGAATTGATC saved
--- processing: CCTCTGGAATAATTGTACGCATAGT
--- processing: GGACTGATATCGAACTAGTCTTCGT
--- processing: TAGCCGTTATCGATTAATCGTCCTG
--- processing: GCCGATTACACCGTTAAATAACCTG
--- candidate:123 TCGTAGCCGATTACACCGTTAAATAACCTG saved
--- processing: CGGGTATGCTTGCCTACTAACTGTT
--- processing: GGACAGGGGAACCTCGGATATTCGC
--- processing: CATCATGTGGGGCTTCGTATTACGG
--- processing: AGGTAGTCTTCAGCATAGGTCGCTG
--- candidate:124 ACCAAAGGTAGTCTTCAGCATAGGTCGCTG saved
--- processing: GTCTGAACTAGGTCTTTCTCGCGGC
--- candidate:125 AGTCAGTCTGAACTAGGTCTTTCTCGCGGC saved
--- processing: CATAAACTGGCGGAGTTGCACTGAT
--- candidate:126 CTTCTCATAAACTGGCGGAGTTGCACTGAT saved
--- processing: GTGGGCCATCTTCCTGCGTATCAA

--- candidate:182 GCTGACAGCGTATGAATGGCAGTGTTCCGC saved
--- processing: TCGAGACATTCCCAGGTTCATTATC
--- candidate:183 TGTGTTCGAGACATTCCCAGGTTCATTATC saved
--- processing: TGATTGGTCATGAGCGCGAACGTTT
--- processing: ACCTCTCGGCGGTGATGATAAATAT
--- candidate:184 ACCGAACCTCTCGGCGGTGATGATAAATAT saved
--- processing: GAAAGTCTGCAACCTCCGATTGAAT
--- candidate:185 TCCAAGAAAGTCTGCAACCTCCGATTGAAT saved
--- processing: GTGTCTTATCGGCAAGCGCTACCCA
--- processing: AGGCCCACGACGCATAAGTAATCTT
--- candidate:186 TTTCTAGGCCCACGACGCATAAGTAATCTT saved
--- processing: GGAGTTCATACTTACGTCTCTGTTG
--- candidate:187 ACCCAGGAGTTCATACTTACGTCTCTGTTG saved
--- processing: TCTCTAGTCGGCGTGATACATAGAG
--- processing: GGTCGGGCAATTAGTTAATAACCCG
--- candidate:188 CCGGTGGTCGGGCAATTAGTTAATAACCCG saved
--- processing: GTTAAGGATCGCCGAGCTAGTACAG
--- processing: TAGAACCGGAGATCCCAGCATACAA
--- candidate:189 GACGTTAGAACCGGAGATCCCAGCATACAA saved
--- processing: TCATTTCTCCACTCCCAACTGGTGG
--- processing: TGTTCCAGGTTTCCGTCCCAATAAG
--- processing

--- candidate:237 TTGCACGTCACGGAACTGGTCAGACCAAAT saved
--- processing: CTCGGGATTAGTAGGACTCCCTGCG
--- processing: GTATCCAGCTCCAGACGCGAACTTC
--- processing: GCATCTGTAATTTTCCTACGCGGCA
--- processing: GCAGAGATCATCGTACCCATGAGGG
--- candidate:238 CGCATGCAGAGATCATCGTACCCATGAGGG saved
--- processing: ACGTGCCGGCTCTCGCTTATTTGAA
--- processing: ATTCGAAGCACTGGGTTGAGTAGCC
--- candidate:239 CCGCAATTCGAAGCACTGGGTTGAGTAGCC saved
--- processing: TTGCGCTGGTCATCGTCCATACTAA
--- processing: AGTCGGTTTGATGGAGGTTTGAGGA
--- processing: GGAGGCATTTAGTAGGACGACCGGC
--- candidate:240 TCACTGGAGGCATTTAGTAGGACGACCGGC saved
--- processing: AATAGTGCTGTACACCTTTTAACCC
--- processing: TACTAAACGGCACAACCTGGAAAGA
--- candidate:241 CTACTTACTAAACGGCACAACCTGGAAAGA saved
--- processing: GAATGGTCCGCTTCTATTGCGCAGG
--- processing: GTTAGTACTCCCCTCAAGCTGGGAG
--- processing: TGAGTGTCTCCTCGGTTCGAGGCTA
--- candidate:242 GCGAATGAGTGTCTCCTCGGTTCGAGGCTA saved
--- processing: AATTTCCCTACGAACCTTGCGTTAC
--- processing: CCAAGATGGTTTTCACCGACCGAC

--- processing: CTTGCTATAGGGCGTGCGTAAGACA
--- candidate:291 CTGCACTTGCTATAGGGCGTGCGTAAGACA saved
--- processing: GACAATACACTACGCGGCAGCACTC
--- processing: AAGGGTTTACATAAGCGATGTCCTG
--- candidate:292 ACCCAAAGGGTTTACATAAGCGATGTCCTG saved
--- processing: CATTGGGTTCTACCATATGCTTAAA
--- processing: GTCGATTCTTATGGAACCTGCGGAC
--- candidate:293 TAGCTGTCGATTCTTATGGAACCTGCGGAC saved
--- processing: AGGTTCTGTTGTAGCGCCTGTCTAG
--- processing: AATTTCGCTGTCTACGTACTGCTAC
--- processing: CACTAGCTGGCAGAGACCTGCATCG
--- candidate:294 TATATCACTAGCTGGCAGAGACCTGCATCG saved
--- processing: ATAGCGTACACTTTGCGATCGGCCG
--- processing: GGTAGATAATGGTCCTACGAGACAT
--- candidate:295 TCCCTGGTAGATAATGGTCCTACGAGACAT saved
--- processing: CGGTTGAGGCATCTGATTCGCTGTC
--- candidate:296 CCTTACGGTTGAGGCATCTGATTCGCTGTC saved
--- processing: TTCGCTATTTGCCGGACTAGGAAAG
--- candidate:297 CATCTTTCGCTATTTGCCGGACTAGGAAAG saved
--- processing: ATGAACTGTGTGCGACATTGCCCGA
--- candidate:298 CGTCTATGAACTGTGTGCGACATTGCCCGA saved
--- processing

--- processing: AAGATGTTCGACCTACGCTCGTAGA
--- processing: TTATTCATGAACCGTGCCTATTTGT
--- candidate:354 CGGCTTTATTCATGAACCGTGCCTATTTGT saved
--- processing: CCGGTGTGAGTTGTTGTGATCCACA
--- candidate:355 CCTAACCGGTGTGAGTTGTTGTGATCCACA saved
--- processing: GTGTACCAACCCACTTGATCGACCC
--- processing: GTAAGGGCCAAAACGCCTTTCGAGT
--- processing: TGTTGCCTCTGGTCAGGGAATCACA
--- candidate:356 GGGCTTGTTGCCTCTGGTCAGGGAATCACA saved
--- processing: AGTCCACTGCCCGATGAACGAATCC
--- processing: CTACAACGCAGATTACAACCTCAGT
--- processing: CTTGTACTTGGGCTTGAGCGGCATG
--- candidate:357 CTCGACTTGTACTTGGGCTTGAGCGGCATG saved
--- processing: TCTGGTTTAGTCTTACCTGGCTATT
--- candidate:358 GCCTATCTGGTTTAGTCTTACCTGGCTATT saved
--- processing: CAAAGGGAGTTATCCTTGTCTCCTG
--- candidate:359 GCCTTCAAAGGGAGTTATCCTTGTCTCCTG saved
--- processing: GCTAGACGTGGGCCAATACATTAGG
--- candidate:360 ACCATGCTAGACGTGGGCCAATACATTAGG saved
--- processing: TACGGGTATAGTCGCGTCGAGTATC
--- processing: GTTGGGGTCCGGATCAACGTATAGC
--- processing: ATCCTATTTAG

--- candidate:410 ACCTAGGAGGTCGAAGCGCGTTACGCTTAG saved
--- processing: GTTCTTTAAGCCCCTCCAGAGTAAT
--- processing: CCCTCAACGTTCACAAACGGACCCT
--- processing: GCACAGCTACTCTCATATAGACCAA
--- candidate:411 GGGTAGCACAGCTACTCTCATATAGACCAA saved
--- processing: CACTACCAGTCATTGCTCGTAATGG
--- candidate:412 GCGTTCACTACCAGTCATTGCTCGTAATGG saved
--- processing: CAATGACTGGGCAACTGTCTACAGG
--- candidate:413 TATCTCAATGACTGGGCAACTGTCTACAGG saved
--- processing: GTCTTGCGTGCGTAACAACTAAATT
--- processing: GGACATGTAGGACTTTCAACCGCCA
--- candidate:414 AGAGAGGACATGTAGGACTTTCAACCGCCA saved
--- processing: GTATACATTCGGCGGGCCACTGTTC
--- processing: AGAGGCTAGAACTGGTGCTGATCCC
--- candidate:415 GTCGTAGAGGCTAGAACTGGTGCTGATCCC saved
--- processing: GGCGTTTACAATATCGACTCTTCAA
--- candidate:416 CAGGTGGCGTTTACAATATCGACTCTTCAA saved
--- processing: GTTGCACTAACGGTGGATTCCTGTA
--- processing: ATGAATGGTCCAAACGGATGTTTCC
--- candidate:417 TCTCAATGAATGGTCCAAACGGATGTTTCC saved
--- processing: CGCAGACCTACGGATCTTAGCGCTC
--- processing

--- processing: GAACTCCGTGGCTAAGAAGTAGAAT
--- processing: TGAGTATCCCAACCATCTGGTTTAG
--- processing: GGCTTGACACGCTTAGTAAGTGCAC
--- processing: GCGATTCTTTCATTCAGTAGGCGCC
--- candidate:475 TTCTAGCGATTCTTTCATTCAGTAGGCGCC saved
--- processing: GTGAGCTAGTTCTGTACTACCCTAC
--- candidate:476 GATTAGTGAGCTAGTTCTGTACTACCCTAC saved
--- processing: CTCAGTATGGCGTCTTGAAGTACGA
--- processing: TTGTCTTCAATAGGTCTAACCTTCC
--- candidate:477 GTGCATTGTCTTCAATAGGTCTAACCTTCC saved
--- processing: AGACGCTTGTAGTCGGTCACACAAT
--- processing: TAAAGATCACTCGTCAGGGTTGCCT
--- candidate:478 CATGATAAAGATCACTCGTCAGGGTTGCCT saved
--- processing: CTGATCGGTATTGACTTTTACGCGA
--- processing: GGTCAGTCTATAACTCCCGATAGGT
--- candidate:479 TCGGTGGTCAGTCTATAACTCCCGATAGGT saved
--- processing: AACTGTTGACTATAGCTCAGCCGTG
--- candidate:480 GGTCAAACTGTTGACTATAGCTCAGCCGTG saved
--- processing: GCAAGGTGTATCGCGCTTTAAAGCG
--- candidate:481 CCGGTGCAAGGTGTATCGCGCTTTAAAGCG saved
--- processing: CTCAACGATATCAGACTCATCTCTC
--- processing: GGCGTCACCAT

--- candidate:529 CGAGTTTGCTTTGAACTCGAATAACCACTG saved
--- processing: TGCCTGATGATATAAGTCTAAGCCC
--- candidate:530 CTGCATGCCTGATGATATAAGTCTAAGCCC saved
--- processing: TACGTAGATGATCTGTTTCGGGTAC
--- candidate:531 CCCGTTACGTAGATGATCTGTTTCGGGTAC saved
--- processing: CTGAGTGGACTTTGGAATTGCTCAC
--- candidate:532 CAGCTCTGAGTGGACTTTGGAATTGCTCAC saved
--- processing: ATCTTCAAGTCCGCAGGCATGGTTT
--- candidate:533 TCTAAATCTTCAAGTCCGCAGGCATGGTTT saved
--- processing: TAATGTCAGTGTTGCCGCACTTACT
--- candidate:534 ATCGATAATGTCAGTGTTGCCGCACTTACT saved
--- processing: AACCTGGACCCTGTGTATCGTTACT
--- candidate:535 GTGTAAACCTGGACCCTGTGTATCGTTACT saved
--- processing: ATCGAGGTCGCTTTTGTATGCATAC
--- processing: TAACGACGGTCAGTGGGTAACCTTA
--- candidate:536 ACCTATAACGACGGTCAGTGGGTAACCTTA saved
--- processing: AGGAAGGTAAGATTTGCAAGGCGCT
--- processing: GAATCCCTAATAAACAACGCCGGAA
--- candidate:537 ATGGTGAATCCCTAATAAACAACGCCGGAA saved
--- processing: GTTTGTTCTCCCGTACCATGGATGC
--- candidate:538 GTATAGTTTGTTCTCCCGTACCATG

--- processing: GTAGGGGTCATGAACTCACCGCCTG
--- processing: TTTTCTCCATCAGTCTCGACCCGAT
--- processing: GTTTCTGTACTCCTAATTACCGTGA
--- candidate:591 TGGCTGTTTCTGTACTCCTAATTACCGTGA saved
--- processing: GGCAGTTTGGCCAGTCTTGAAATCA
--- candidate:592 CAGCAGGCAGTTTGGCCAGTCTTGAAATCA saved
--- processing: GAGTAAGTAGGGCGGGTTTGGGAAT
--- processing: GGCGGTCGTACTAACCATAGGATAG
--- processing: ACAGCAGTGATTCTAACGACATAAC
--- candidate:593 CGGAAACAGCAGTGATTCTAACGACATAAC saved
--- processing: TGGATTGTCCCCGGCGTAGTACTTA
--- processing: AGTCTCGGGATGCAGCATTCAATTA
--- candidate:594 ACACTAGTCTCGGGATGCAGCATTCAATTA saved
--- processing: CCGTACCCTTCCGCTGACAAACTTC
--- processing: CCCGATCTCGTTTATGATGCTTCCC
--- processing: AGTTGTGCGTCAGCCCTATAAGCTC
--- candidate:595 ACTGAAGTTGTGCGTCAGCCCTATAAGCTC saved
--- processing: CATTCTTAAGCCGGTTAAGTTTACC
--- processing: TTTGGCGCATCAAGCTCGTCGAACA
--- processing: AGCTCTTATATATTGTCCAGCCGCA
--- candidate:596 TGCTTAGCTCTTATATATTGTCCAGCCGCA saved
--- processing: TACTCCTCATGAGCTAGAACACCA

--- processing: GGTGCCGGGTTCTTGAAACAAAACC
--- processing: CACGGTCAAATTGCGGGCTATAGGG
--- processing: ATCGTTATGCCCGATCAAGCGCTGT
--- processing: TTTTGCACTACATCAGACGTGTGCA
--- processing: ACGGTACTAAACCCTGACATGTTGG
--- candidate:642 GCATTACGGTACTAAACCCTGACATGTTGG saved
--- processing: TGGATCGGATTGCTTTCTATTACCG
--- candidate:643 GTCCATGGATCGGATTGCTTTCTATTACCG saved
--- processing: GTAGTACGGCTCGCTAGCTATGTTT
--- processing: ATCAGCGTTATCAGGGTTCGGACGA
--- candidate:644 GCCGAATCAGCGTTATCAGGGTTCGGACGA saved
--- processing: ACCCTGAATCGTGTTGTATCGCACT
--- processing: CAAATCTGTACTGACGCCGCTTAGG
--- candidate:645 ATAATCAAATCTGTACTGACGCCGCTTAGG saved
--- processing: GCGTGCTTTGATACATCTCGAAAAC
--- processing: GGATTTAGCGCTTTCCTAGCCGTCA
--- processing: GATACTTTGCGCTAGTCCTTTGACC
--- candidate:646 ATGGAGATACTTTGCGCTAGTCCTTTGACC saved
--- processing: GCAAACTGATTGGCTTATACCGGAG
--- candidate:647 CTCTAGCAAACTGATTGGCTTATACCGGAG saved
--- processing: ACAAACCGGCAGCGGGATTCTATCT
--- processing: AGTGACCATGCTATGAAAGCGATG

--- candidate:699 CAATAGTGTGGATCTGACTACTTTACCGCG saved
--- processing: TTCTCGTTTTCGGGTGTAGCCATGC
--- processing: ATCAGTAGTCCGACATTTCTCGAAG
--- processing: CCGCACCTTTTGATGGTGTTCATCC
--- processing: ATTAACTAAACGCTGGCTGTCCGTC
--- candidate:700 GTTATATTAACTAAACGCTGGCTGTCCGTC saved
--- processing: GATATAGCCCAGGGCACTACAGCCA
--- processing: GAGCGAATTGTAGTGTCGTATCCAC
--- candidate:701 CGCATGAGCGAATTGTAGTGTCGTATCCAC saved
--- processing: TTCAACATGATGCGCGTCGACTTTA
--- candidate:702 GTCCTTTCAACATGATGCGCGTCGACTTTA saved
--- processing: GTATCGTGAACGTTCCGTCATGCAA
--- processing: CGATTAGGCGTTTAGTCATTGCCTC
--- candidate:703 ACGCACGATTAGGCGTTTAGTCATTGCCTC saved
--- processing: AATTGCTTACAGGGTTCCATTCGGA
--- candidate:704 CAACAAATTGCTTACAGGGTTCCATTCGGA saved
--- processing: ACTGCTAGGTGTCTATACTGTTGCT
--- candidate:705 CATCTACTGCTAGGTGTCTATACTGTTGCT saved
--- processing: GGCGGTCGGTACTTAAAGCTAACAC
--- candidate:706 TCGTAGGCGGTCGGTACTTAAAGCTAACAC saved
--- processing: CGGAACTTACTATCCCAACGGCGAG
--- processing

--- candidate:752 ACCCTAAGGCAATGTAACACAAGCGTGGCG saved
--- processing: GCTGATCGAATGCCTCCTGTGAAAG
--- processing: TGCTATCGGAAACATCCGTCACATT
--- processing: AGAATTGAGGGCTTCTCACTTGGGC
--- candidate:753 CCTAAAGAATTGAGGGCTTCTCACTTGGGC saved
--- processing: GTACACAACGCGCCAGATACCATGC
--- processing: GCAATCAAGCCCTCATACTTTAGTT
--- processing: AGACCCATCGATGCGAATGCTTGAC
--- processing: TGTGTTGGCTTTCGCAGTGCCTACT
--- processing: AATGGATATGAGGCTGGCAACGCCT
--- candidate:754 TCCTTAATGGATATGAGGCTGGCAACGCCT saved
--- processing: TCCATCTCCACATCAGCTATTCTTT
--- processing: AGGGACATTGCGCTTATGCAACTGC
--- candidate:755 AGTCAAGGGACATTGCGCTTATGCAACTGC saved
--- processing: AGATAGATGTCGGCCCAGCTTTGCC
--- candidate:756 TATCTAGATAGATGTCGGCCCAGCTTTGCC saved
--- processing: TAACGACATTCGGCTGCTGGTCTCG
--- processing: AATCGCGCTAATATGCCAAGAGACG
--- processing: TCTACGGTGGCTTCATGCGTGATTG
--- candidate:757 ACCTATCTACGGTGGCTTCATGCGTGATTG saved
--- processing: TTCACGTAGACGGGGATGTTGCCTT
--- processing: CTCTTTACTGAACCGTTGGTGTGA

--- candidate:813 CGTCACTGCTGGGTTTGTACTATGTCCATC saved
--- processing: ACTTAGTTATAGGCGGGCCGTCGGT
--- processing: TGGACAATCACGCGACCAGCGTTAT
--- processing: TCAGGCTAGGGCTCATGAACCGTTA
--- processing: GAAAGGCTTAGTTTCCGGCCGTCCT
--- candidate:814 GCAATGAAAGGCTTAGTTTCCGGCCGTCCT saved
--- processing: TGTGGGTCTATTCTGCGCCTACGAA
--- candidate:815 CCTAATGTGGGTCTATTCTGCGCCTACGAA saved
--- processing: TCACTTGGACGGCAACCCGTTCTTA
--- candidate:816 TGATATCACTTGGACGGCAACCCGTTCTTA saved
--- processing: AATTACTCTGCACCGCACTGTCGAT
--- candidate:817 TTGGAAATTACTCTGCACCGCACTGTCGAT saved
--- processing: GGAGATACTAATGGCTTTCCCGCAC
--- candidate:818 GGGCTGGAGATACTAATGGCTTTCCCGCAC saved
--- processing: AAGTGAAGGTGCCGTTCTCTCGGTC
--- candidate:819 CTGTAAAGTGAAGGTGCCGTTCTCTCGGTC saved
--- processing: ATCGGCTTTAGTCTTTCACGCTAAT
--- processing: TACGTGCGATGTCCATAGTGCTTAG
--- processing: TAAACAGGCCGGGGACATCCATTCT
--- processing: GTAGAGTGTTAGTACGTCGAAGTCT
--- processing: GGGTTTACCGTACGATTGGTGCTAT
--- processing: TACTAGAATTG

--- processing: AGTCCCAAGCCACATCAGGTCCTTG
--- processing: AATAAGGCGAGGTGCGTAGCTAAGT
--- processing: ACGTACCGACTTATACCTATTGGTG
--- candidate:863 ACCATACGTACCGACTTATACCTATTGGTG saved
--- processing: CTAATTCTTTTCGCGTTGTGCTGCC
--- processing: TAGGCCAGTAACGTGTTAGACAGCG
--- candidate:864 CGCGATAGGCCAGTAACGTGTTAGACAGCG saved
--- processing: CGTTGCTTTTGCCGGTCGGTAAGAA
--- processing: GTCTCGAATTAAGGTGTACTCGTGC
--- candidate:865 CCATTGTCTCGAATTAAGGTGTACTCGTGC saved
--- processing: TTATGCCATGTCGTCATTACAGCTA
--- candidate:866 CCAGTTTATGCCATGTCGTCATTACAGCTA saved
--- processing: TACCGGTGTCCGAGAAATTTGGCAC
--- candidate:867 GCTGTTACCGGTGTCCGAGAAATTTGGCAC saved
--- processing: GTCCAGGGAGCCGTTCAATAGGTAT
--- processing: CTAATGTTGCCCTAAATCTGGGGAA
--- processing: GAACTCGGAGTACACGTCTTATATA
--- candidate:868 TCGCTGAACTCGGAGTACACGTCTTATATA saved
--- processing: ACTTTGAATGGGTAGCCCCGGATTT
--- processing: GCTAACTATACGCGAGCAGGGACGT
--- candidate:869 AACTTGCTAACTATACGCGAGCAGGGACGT saved
--- processing: GATCATATGAA

--- processing: TGTCTACTTACAAGGCTGTAGGCGA
--- candidate:917 TTCCATGTCTACTTACAAGGCTGTAGGCGA saved
--- processing: ATATCTTCCGTGGTAGGGCGACCGA
--- candidate:918 AACTTATATCTTCCGTGGTAGGGCGACCGA saved
--- processing: TCGTATATTGCACCCTAGGCTTCCT
--- candidate:919 TAGGTTCGTATATTGCACCCTAGGCTTCCT saved
--- processing: TAGCAACCTTCCGTAACAAGGCACC
--- processing: GTAAACGCCGGGGTGTAGTTATTTG
--- processing: CATTGAAACGACTGTAGTCTAGTGC
--- candidate:920 ACTCACATTGAAACGACTGTAGTCTAGTGC saved
--- processing: GCAGGATTATTTTCGGCCTCACAGG
--- processing: ACGGAGTGTCTTTGGAACCGTTTCT
--- processing: ACTTATCGCAGGTAACGCAGCAAGT
--- processing: AATACTACTATCGCACCGGGTACGT
--- candidate:921 AGTTTAATACTACTATCGCACCGGGTACGT saved
--- processing: TGTCTTCCTGAATCCGCACTTTGCT
--- candidate:922 GATGTTGTCTTCCTGAATCCGCACTTTGCT saved
--- processing: TGCACGCTATAACCGGCTTCAATCA
--- processing: TGTCATATAAGCGTCTAACACGGCC
--- candidate:923 GGCGTTGTCATATAAGCGTCTAACACGGCC saved
--- processing: CGTTGCGTCATCGGGAGTGAATGAG
--- processing: GTAATGCTCTG

--- candidate:978 TTCGAGAGGCTCTAGACTACTGCGTATCAT saved
--- processing: AGCGGAGTAAATCGCCATTGACTTC
--- processing: CACGACAAATTCCCTTCGTTGTCTC
--- processing: TCGTACAGAGCCCGCACTTGATTAT
--- candidate:979 TTACATCGTACAGAGCCCGCACTTGATTAT saved
--- processing: GACGTCATACACGGTTGGGCCGTAC
--- processing: ATATCAGGAGGCCTGCTGGCAAACC
--- candidate:980 TCAGAATATCAGGAGGCCTGCTGGCAAACC saved
--- processing: GGGGAGATTTAGAGGCCCAGATACA
--- processing: TTTGACTTCAGCGGACCGGCTTTAG
--- processing: CAGGGATATTCTTTCGTTTGTGCAC
--- candidate:981 CGGCTCAGGGATATTCTTTCGTTTGTGCAC saved
--- processing: AACAGTAAGCGTTTCCGGCCTGATG
--- candidate:982 GCAATAACAGTAAGCGTTTCCGGCCTGATG saved
--- processing: ATTCTTTCGAGGTGACTTCCTCAAT
--- candidate:983 CGAGAATTCTTTCGAGGTGACTTCCTCAAT saved
--- processing: TCGTTAGTGTGGGAGCGCAAGGATG
--- processing: GCCTAATGCACTTTCTTAATGGACC
--- candidate:984 CGGTTGCCTAATGCACTTTCTTAATGGACC saved
--- processing: GTTACAATAATTGAGGCCGCGGTGC
--- candidate:985 CTCTTGTTACAATAATTGAGGCCGCGGTGC saved
--- processing

## 2. Filter barcode candidates by blast against genome

In [63]:
genome_kept_readouts = readouts.Filter_Readouts_by_Genome(readout_folder=readout_folder)

Query_1 27
Query_1 27
hard count: 0
soft count: 81
Query_2 34
Query_2 34
hard count: 1
Filtered out by hard threshold.
Query_3 27
Query_3 27
hard count: 0
soft count: 46
Query_4 28
Query_4 28
hard count: 0
soft count: 111
Filtered out by soft threshold count!
Query_5 32
Query_5 32
hard count: 1
Filtered out by hard threshold.
Query_6 27
Query_6 27
hard count: 0
soft count: 138
Filtered out by soft threshold count!
Query_7 31
Query_7 31
hard count: 0
soft count: 112
Filtered out by soft threshold count!
Query_8 30
Query_8 30
hard count: 1
Filtered out by hard threshold.
Query_9 29
Query_9 29
hard count: 0
soft count: 44
Query_10 28
Query_10 28
hard count: 0
soft count: 94
Query_11 39
Query_11 39
hard count: 1
Filtered out by hard threshold.
Query_12 30
Query_12 30
hard count: 0
soft count: 142
Filtered out by soft threshold count!
Query_13 34
Query_13 34
hard count: 1
Filtered out by hard threshold.
Query_14 31
Query_14 31
hard count: 1
Filtered out by hard threshold.
Query_15 29
Query_

Query_128 40
Query_128 40
hard count: 4
Filtered out by hard threshold.
Query_129 44
Query_129 44
hard count: 3
Filtered out by hard threshold.
Query_130 29
Query_130 29
hard count: 0
soft count: 60
Query_131 39
Query_131 39
hard count: 4
Filtered out by hard threshold.
Query_132 27
Query_132 27
hard count: 0
soft count: 117
Filtered out by soft threshold count!
Query_133 12
Query_133 12
hard count: 0
soft count: 7
Query_134 37
Query_134 37
hard count: 1
Filtered out by hard threshold.
Query_135 19
Query_135 19
hard count: 0
soft count: 17
Query_136 32
Query_136 32
hard count: 1
Filtered out by hard threshold.
Query_137 38
Query_137 38
hard count: 2
Filtered out by hard threshold.
Query_138 47
Query_138 47
hard count: 3
Filtered out by hard threshold.
Query_139 34
Query_139 34
hard count: 2
Filtered out by hard threshold.
Query_140 48
Query_140 48
hard count: 1
Filtered out by hard threshold.
Query_141 34
Query_141 34
hard count: 0
soft count: 133
Filtered out by soft threshold count!


Query_248 22
Query_248 22
hard count: 1
Filtered out by hard threshold.
Query_249 28
Query_249 28
hard count: 0
soft count: 25
Query_250 19
Query_250 19
hard count: 1
Filtered out by hard threshold.
Query_251 23
Query_251 23
hard count: 0
soft count: 67
Query_252 26
Query_252 26
hard count: 0
soft count: 95
Query_253 15
Query_253 15
hard count: 0
soft count: 13
Query_254 28
Query_254 28
hard count: 1
Filtered out by hard threshold.
Query_255 39
Query_255 39
hard count: 2
Filtered out by hard threshold.
Query_256 30
Query_256 30
hard count: 4
Filtered out by hard threshold.
Query_257 45
Query_257 45
hard count: 3
Filtered out by hard threshold.
Query_258 26
Query_258 26
hard count: 1
Filtered out by hard threshold.
Query_259 5
Query_259 5
hard count: 0
soft count: 1
Query_260 34
Query_260 34
hard count: 0
soft count: 87
Query_261 20
Query_261 20
hard count: 0
soft count: 22
Query_262 36
Query_262 36
hard count: 1
Filtered out by hard threshold.
Query_263 37
Query_263 37
hard count: 1
Fi

Query_370 31
Query_370 31
hard count: 1
Filtered out by hard threshold.
Query_371 29
Query_371 29
hard count: 1
Filtered out by hard threshold.
Query_372 31
Query_372 31
hard count: 0
soft count: 98
Query_373 32
Query_373 32
hard count: 1
Filtered out by hard threshold.
Query_374 22
Query_374 22
hard count: 0
soft count: 41
Query_375 21
Query_375 21
hard count: 0
soft count: 21
Query_376 21
Query_376 21
hard count: 0
soft count: 41
Query_377 32
Query_377 32
hard count: 0
soft count: 246
Filtered out by soft threshold count!
Query_378 27
Query_378 27
hard count: 1
Filtered out by hard threshold.
Query_379 27
Query_379 27
hard count: 0
soft count: 168
Filtered out by soft threshold count!
Query_380 25
Query_380 25
hard count: 0
soft count: 59
Query_381 27
Query_381 27
hard count: 0
soft count: 71
Query_382 18
Query_382 18
hard count: 0
soft count: 15
Query_383 60
Query_383 60
hard count: 38
Filtered out by hard threshold.
Query_384 30
Query_384 30
hard count: 1
Filtered out by hard thres

Query_486 28
Query_486 28
hard count: 2
Filtered out by hard threshold.
Query_487 13
Query_487 13
hard count: 0
soft count: 9
Query_488 23
Query_488 23
hard count: 1
Filtered out by hard threshold.
Query_489 27
Query_489 27
hard count: 2
Filtered out by hard threshold.
Query_490 27
Query_490 27
hard count: 1
Filtered out by hard threshold.
Query_491 33
Query_491 33
hard count: 1
Filtered out by hard threshold.
Query_492 37
Query_492 37
hard count: 0
soft count: 131
Filtered out by soft threshold count!
Query_493 83
Query_493 83
hard count: 5
Filtered out by hard threshold.
Query_494 27
Query_494 27
hard count: 0
soft count: 112
Filtered out by soft threshold count!
Query_495 21
Query_495 21
hard count: 0
soft count: 40
Query_496 32
Query_496 32
hard count: 0
soft count: 101
Filtered out by soft threshold count!
Query_497 30
Query_497 30
hard count: 2
Filtered out by hard threshold.
Query_498 38
Query_498 38
hard count: 3
Filtered out by hard threshold.
Query_499 21
Query_499 21
hard co

Query_602 32
Query_602 32
hard count: 1
Filtered out by hard threshold.
Query_603 27
Query_603 27
hard count: 2
Filtered out by hard threshold.
Query_604 23
Query_604 23
hard count: 0
soft count: 39
Query_605 41
Query_605 41
hard count: 0
soft count: 219
Filtered out by soft threshold count!
Query_606 28
Query_606 28
hard count: 0
soft count: 77
Query_607 32
Query_607 32
hard count: 1
Filtered out by hard threshold.
Query_608 25
Query_608 25
hard count: 2
Filtered out by hard threshold.
Query_609 87
Query_609 87
hard count: 1
Filtered out by hard threshold.
Query_610 27
Query_610 27
hard count: 0
soft count: 69
Query_611 42
Query_611 42
hard count: 0
soft count: 222
Filtered out by soft threshold count!
Query_612 17
Query_612 17
hard count: 1
Filtered out by hard threshold.
Query_613 32
Query_613 32
hard count: 0
soft count: 101
Filtered out by soft threshold count!
Query_614 31
Query_614 31
hard count: 2
Filtered out by hard threshold.
Query_615 33
Query_615 33
hard count: 0
soft coun

Query_722 40
Query_722 40
hard count: 3
Filtered out by hard threshold.
Query_723 28
Query_723 28
hard count: 0
soft count: 106
Filtered out by soft threshold count!
Query_724 44
Query_724 44
hard count: 1
Filtered out by hard threshold.
Query_725 43
Query_725 43
hard count: 4
Filtered out by hard threshold.
Query_726 29
Query_726 29
hard count: 1
Filtered out by hard threshold.
Query_727 41
Query_727 41
hard count: 1
Filtered out by hard threshold.
Query_728 23
Query_728 23
hard count: 1
Filtered out by hard threshold.
Query_729 42
Query_729 42
hard count: 3
Filtered out by hard threshold.
Query_730 27
Query_730 27
hard count: 1
Filtered out by hard threshold.
Query_731 29
Query_731 29
hard count: 1
Filtered out by hard threshold.
Query_732 31
Query_732 31
hard count: 0
soft count: 223
Filtered out by soft threshold count!
Query_733 37
Query_733 37
hard count: 1
Filtered out by hard threshold.
Query_734 42
Query_734 42
hard count: 2
Filtered out by hard threshold.
Query_735 24
Query_7

Query_836 29
Query_836 29
hard count: 0
soft count: 104
Filtered out by soft threshold count!
Query_837 22
Query_837 22
hard count: 0
soft count: 22
Query_838 40
Query_838 40
hard count: 1
Filtered out by hard threshold.
Query_839 24
Query_839 24
hard count: 1
Filtered out by hard threshold.
Query_840 27
Query_840 27
hard count: 1
Filtered out by hard threshold.
Query_841 39
Query_841 39
hard count: 2
Filtered out by hard threshold.
Query_842 32
Query_842 32
hard count: 2
Filtered out by hard threshold.
Query_843 33
Query_843 33
hard count: 3
Filtered out by hard threshold.
Query_844 28
Query_844 28
hard count: 1
Filtered out by hard threshold.
Query_845 43
Query_845 43
hard count: 1
Filtered out by hard threshold.
Query_846 23
Query_846 23
hard count: 1
Filtered out by hard threshold.
Query_847 37
Query_847 37
hard count: 1
Filtered out by hard threshold.
Query_848 31
Query_848 31
hard count: 0
soft count: 126
Filtered out by soft threshold count!
Query_849 35
Query_849 35
hard count:

Query_951 155
Query_951 155
hard count: 2
Filtered out by hard threshold.
Query_952 23
Query_952 23
hard count: 1
Filtered out by hard threshold.
Query_953 34
Query_953 34
hard count: 1
Filtered out by hard threshold.
Query_954 23
Query_954 23
hard count: 1
Filtered out by hard threshold.
Query_955 57
Query_955 57
hard count: 6
Filtered out by hard threshold.
Query_956 27
Query_956 27
hard count: 0
soft count: 143
Filtered out by soft threshold count!
Query_957 26
Query_957 26
hard count: 2
Filtered out by hard threshold.
Query_958 39
Query_958 39
hard count: 3
Filtered out by hard threshold.
Query_959 29
Query_959 29
hard count: 2
Filtered out by hard threshold.
Query_960 30
Query_960 30
hard count: 2
Filtered out by hard threshold.
Query_961 3
Query_961 3
hard count: 0
soft count: 0
Query_962 38
Query_962 38
hard count: 0
soft count: 151
Filtered out by soft threshold count!
Query_963 19
Query_963 19
hard count: 0
soft count: 26
Query_964 26
Query_964 26
hard count: 1
Filtered out by

## 4. Run the RNA secondary strcture testing

In [91]:
reload(library_tools)
structure_kept_readouts = readouts.Filter_Readouts_by_RNAfold(readout_folder=readout_folder)

[SeqRecord(seq=Seq('CCAAATATGAGGACGAATCTCCCGCTTATA', SingleLetterAlphabet()), id='cand_1', name='cand_1', description='cand_1 30mer_candidate', dbxrefs=[]),
 SeqRecord(seq=Seq('CTCGTGTTTATCGGGCGTGGTGCTCGCATA', SingleLetterAlphabet()), id='cand_3', name='cand_3', description='cand_3 30mer_candidate', dbxrefs=[]),
 SeqRecord(seq=Seq('CCTATGGTAACTGCGCATAGTTGGCTCTAT', SingleLetterAlphabet()), id='cand_9', name='cand_9', description='cand_9 30mer_candidate', dbxrefs=[]),
 SeqRecord(seq=Seq('CGCCTGGTTCTAAGTTTAGCGTAGCCGGTT', SingleLetterAlphabet()), id='cand_15', name='cand_15', description='cand_15 30mer_candidate', dbxrefs=[]),
 SeqRecord(seq=Seq('ATGATGGGTACATGCGCCTTACTCCTTGTG', SingleLetterAlphabet()), id='cand_17', name='cand_17', description='cand_17 30mer_candidate', dbxrefs=[]),
 SeqRecord(seq=Seq('ACAATTGCTTAATTTACGACCGATGCTGCG', SingleLetterAlphabet()), id='cand_24', name='cand_24', description='cand_24 30mer_candidate', dbxrefs=[]),
 SeqRecord(seq=Seq('GACGATCCATAGATTTCTCCGTGAGTCTT

# select adaptors

In [None]:
from ImageAnalysis3.library_tools import readouts
readout_cand_file = r'E:\Users\puzheng\Documents\Adaptors\selected_candidates_genome_structure.fasta'
adaptor_site_file = r'E:\Users\puzheng\Documents\Adaptors\Adaptor_sites.fasta'
barcode_dir = r'W:\Pu\Readouts'
existing_readout_files = [os.path.join(barcode_dir, 'Stvs.fasta'),
                          os.path.join(barcode_dir, 'NDBs.fasta')]
saved_readouts = readouts.Check_adaptors_against_fasta(readout_cand_file, adaptor_site_file, existing_readout_files,
                                                       save=True, save_name='final_new_readouts.fasta', save_adaptors=True)

In [None]:
# save if considered necessary
final_readouts = readouts.Save_Readouts(cand_readout_file='final_new_readouts.fasta', existing_readout_file='NDBs.fasta')

In [None]:
reload(library_tools)
splitted_readouts = readouts.Split_readouts_into_channels(final_readouts, num_channels=3, save_name='NDB_new')