# Twist: ORBIT cloning_scheme

------


In [5]:
import pandas as pd
import Bio.Seq as Seq
import Bio.SeqIO

pd.options.display.max_colwidth = 200

The cloning scheme that we will use to get single stranded oligos with no PCR handle overhangs from the oligo pool comes from [this paper](https://pubs.acs.org/doi/10.1021/sb5001565) on 'MO-MAGE'. Note that there is detailed information in the supplement.

Their strategy was to amplify subpools of oligos as usual, but there are a few clever modifications. First, when amplifying, the **reverse primer is modified to include a 5' phosphate group**. This 5' end, i.e. the '-' strand will be selectively degraded by the lambda exonuclease ([neb link](https://www.neb.com/products/m0262-lambda-exonuclease). Also during that PCR it seems that the fwd primer has multiple 5' end PO bonds that selectively protect the '+' strand. 

To cleave off the PCR handles, they used a uracil in the fwd primer to introduce a site for the USER enzyme, and they included a DpnII site in the reverse primer sequence. By annealing just the reverse primer you can create a double stranded template for DpnII, which will cleanly cleave off the 5' end of the '+' strand. See the diagram below:

<img align="center" src="mo_mage_diagram.png" alt="drawing" width="500"/>

In this way, they got ssDNA oligos with no overhangs. 

I will modify their approach slightly, mainly because the USER enzyme is quite expensive, and I would like to avoid it if possible. For my cloning scheme I will introduce two restriction sites: [bsrI](https://www.neb.com/products/r0527-bsri) and [dpnII](https://www.neb.com/products/r0543-dpnii). DpnII seemed to work well enough in their scheme for cleaving the 3'end. I chose BsrI for the 5' end because I can still include a T as the 3' overhang, allowing us to use the USER enzyme if this approach fails. 

<img src="dpnII.png" alt="drawing" width="500"/><img src="bsrI.png" alt="drawing" width="500"/>

Further, both of these enzymes are quite cheap, and they both work in the NEB 3.1 buffer, although they require different temperature (37 vs. 65) that will require a two step protocol...but at least we don't need to switch buffers. Note that there is potential STAR activity for DpnII, but there should only be 1 possible cut site, so I think that's fine. I think that many different restriction enzymes could work for these two sites, for example [nlaIII](https://www.neb.com/products/r0125-nlaiii) or [bsmI](https://www.neb.com/products/r0134-bsmi) could work for the 5' site as well, but it wouldn't allow us to cleanly include a 'T' as a backup plan...but good to keep in mind for the future. 

For short recognition sequences, we can actually just find orthogonal kosuri primers that have the desired restriction site sequences. It would actually benefit this particular order to have longer sequences to be closer in length to the reg-seq constructs. The only issue is that when we purify / clean up this reaction we will be trying to purify a 128 bp oligo from 20 bp oligo, which already may be difficult. The longer those flanking oligos are the harder that step may be...that said these pcr handles should have no homology to genome and hopefully wouldn't affect anything even if they are electroporated directly into cell. 

So, this notebook will find orthogonal primers that match our restriction sites (and add a T to bsrI site) to append to the ORBIT sequences of interest.

------

# Generate Orthogonal primers with RE sites

First, let's read in the fwd and reverse orthogonal primers. We'll go ahead and reverse complement these sequences since we will need them in that format to append to the ORBIT sequences.

In [6]:
df_rev = pd.DataFrame()
i = 0

for record in Bio.SeqIO.parse("reverse_finalprimers.fasta", "fasta"):
    df_rev.loc[i,'rev_seq']=str(record.seq)
    df_rev.loc[i, 'rev_seq_comp']=str(record.seq.reverse_complement())
    df_rev.loc[i, 'rev_primer_name'] = record.name
    i = i+1

In [7]:
df_rev

Unnamed: 0,rev_seq,rev_seq_comp,rev_primer_name
0,AAGTATCTTTCCTGTGCCCA,TGGGCACAGGAAAGATACTT,skpp-1-R
1,TGGTAGTAATAAGGGCGACC,GGTCGCCCTTATTACTACCA,skpp-2-R
2,AGGGGTATCGGATACTCAGA,TCTGAGTATCCGATACCCCT,skpp-3-R
3,ATCGATTCCCCGGATATAGC,GCTATATCCGGGGAATCGAT,skpp-4-R
4,TACTAACTGCTTCAGGCCAA,TTGGCCTGAAGCAGTTAGTA,skpp-5-R
...,...,...,...
2995,GTCCGTGTAGGATCGCCTTT,AAAGGCGATCCTACACGGAC,skpp-2996-R
2996,GACTCTAGTGCGGGTGGTAC,GTACCACCCGCACTAGAGTC,skpp-2997-R
2997,TTGACCAGGGTAAGCCGATC,GATCGGCTTACCCTGGTCAA,skpp-2998-R
2998,GATTCAAGACGGCACTCGGA,TCCGAGTGCCGTCTTGAATC,skpp-2999-R


Looks good. Now let's look for specific primers that end with the DpnII recognition site *GATC*.

In [8]:
df_rev_DpnII = df_rev.loc[df_rev['rev_seq'].str.endswith('GATC', na = False)]
df_rev_DpnII

Unnamed: 0,rev_seq,rev_seq_comp,rev_primer_name
349,CCAACCAGAATCGAACGATC,GATCGTTCGATTCTGGTTGG,skpp-350-R
468,GTGACATCACACGGTTGATC,GATCAACCGTGTGATGTCAC,skpp-469-R
527,AAGAGGGTCGTATTCCGATC,GATCGGAATACGACCCTCTT,skpp-528-R
861,CAGCTTTTGGACGATGGATC,GATCCATCGTCCAAAAGCTG,skpp-862-R
1584,AAAGCCCCACGGAATTGATC,GATCAATTCCGTGGGGCTTT,skpp-1585-R
1695,TCCGGCTCTCCCTTAAGATC,GATCTTAAGGGAGAGCCGGA,skpp-1696-R
1856,CGGCTAAGTGAAGTCCGATC,GATCGGACTTCACTTAGCCG,skpp-1857-R
1888,AACGGCAGGGATGAAAGATC,GATCTTTCATCCCTGCCGTT,skpp-1889-R
1910,ATCTTCGGAGGGGAGAGATC,GATCTCTCCCCTCCGAAGAT,skpp-1911-R
2389,GGCCGTTTAAGGGATCGATC,GATCGATCCCTTAAACGGCC,skpp-2390-R


Ok, there are about 10 different reverse primers that contain the restriction site. 

Now let's read in the fwd primers.

In [9]:
df_fwd = pd.DataFrame()
i = 0

for record in Bio.SeqIO.parse("forward_finalprimers.fasta", "fasta"):
    df_fwd.loc[i,'fwd_seq']=str(record.seq)
    df_fwd.loc[i, 'fwd_primer_name'] = record.name
    #df_fwd.loc[i, 'fwd_rev_comp']=str(record.seq.reverse_complement())
    i = i+1

In [10]:
df_fwd

Unnamed: 0,fwd_seq,fwd_primer_name
0,ATATAGATGCCGTCCTAGCG,skpp-1-F
1,CCCTTTAATCAGATGCGTCG,skpp-2-F
2,TTGGTCATGTGCTTTTCGTT,skpp-3-F
3,GGGTGGGTAAATGGTAATGC,skpp-4-F
4,TCCGACGGGGAGTATATACT,skpp-5-F
...,...,...
2995,GTCGATCACCGCCCCTTTTA,skpp-2996-F
2996,CACGGAGGCAGCAAGACTTA,skpp-2997-F
2997,AGGTCGAAGTGTCGCGTAAA,skpp-2998-F
2998,TGTGCACTATCGATCACGGG,skpp-2999-F


Let's look for the BsrI recognition site *ACTGG*.

In [11]:
df_fwd_BsrI = df_fwd.loc[df_fwd['fwd_seq'].str.endswith('ACTGG', na = False)]

df_fwd_BsrI

Unnamed: 0,fwd_seq,fwd_primer_name
390,TTAATCTTAGGCCCCACTGG,skpp-391-F
742,AGATTAGCTGCCGATACTGG,skpp-743-F
790,GATCCTTGACTACCGACTGG,skpp-791-F
1026,ATTTCTCCACTCCCAACTGG,skpp-1027-F
1490,AATGTCTTGCCCCTTACTGG,skpp-1491-F
2090,AAATCTTTGCCCTCCACTGG,skpp-2091-F
2473,TAAGCCCAATCTCCCACTGG,skpp-2474-F


Ok, there are 7 fwd primers that contain this ending sequence. 

Now let's add a 'T' to the 3' end so that we can have a T in case we need to use the USER enzyme. This nt will form the final nt in the BsrI recognition site, and cleavage occurs directly after this T, hopefully leaving only our ORBIT sequence of interest.

In [12]:
df_fwd_BsrI['fwd_seq_t'] = df_fwd['fwd_seq'] + 'T'

df_fwd_BsrI

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_fwd_BsrI['fwd_seq_t'] = df_fwd['fwd_seq'] + 'T'


Unnamed: 0,fwd_seq,fwd_primer_name,fwd_seq_t
390,TTAATCTTAGGCCCCACTGG,skpp-391-F,TTAATCTTAGGCCCCACTGGT
742,AGATTAGCTGCCGATACTGG,skpp-743-F,AGATTAGCTGCCGATACTGGT
790,GATCCTTGACTACCGACTGG,skpp-791-F,GATCCTTGACTACCGACTGGT
1026,ATTTCTCCACTCCCAACTGG,skpp-1027-F,ATTTCTCCACTCCCAACTGGT
1490,AATGTCTTGCCCCTTACTGG,skpp-1491-F,AATGTCTTGCCCCTTACTGGT
2090,AAATCTTTGCCCTCCACTGG,skpp-2091-F,AAATCTTTGCCCTCCACTGGT
2473,TAAGCCCAATCTCCCACTGG,skpp-2474-F,TAAGCCCAATCTCCCACTGGT


With that we can concatenate the fwd and reverse primer dataframes.

In [13]:
df_fwd_rev = pd.concat([df_fwd_BsrI.reset_index(drop = True), df_rev_DpnII[0:7].reset_index(drop = True)], axis = 1, sort = False)
df_fwd_rev

Unnamed: 0,fwd_seq,fwd_primer_name,fwd_seq_t,rev_seq,rev_seq_comp,rev_primer_name
0,TTAATCTTAGGCCCCACTGG,skpp-391-F,TTAATCTTAGGCCCCACTGGT,CCAACCAGAATCGAACGATC,GATCGTTCGATTCTGGTTGG,skpp-350-R
1,AGATTAGCTGCCGATACTGG,skpp-743-F,AGATTAGCTGCCGATACTGGT,GTGACATCACACGGTTGATC,GATCAACCGTGTGATGTCAC,skpp-469-R
2,GATCCTTGACTACCGACTGG,skpp-791-F,GATCCTTGACTACCGACTGGT,AAGAGGGTCGTATTCCGATC,GATCGGAATACGACCCTCTT,skpp-528-R
3,ATTTCTCCACTCCCAACTGG,skpp-1027-F,ATTTCTCCACTCCCAACTGGT,CAGCTTTTGGACGATGGATC,GATCCATCGTCCAAAAGCTG,skpp-862-R
4,AATGTCTTGCCCCTTACTGG,skpp-1491-F,AATGTCTTGCCCCTTACTGGT,AAAGCCCCACGGAATTGATC,GATCAATTCCGTGGGGCTTT,skpp-1585-R
5,AAATCTTTGCCCTCCACTGG,skpp-2091-F,AAATCTTTGCCCTCCACTGGT,TCCGGCTCTCCCTTAAGATC,GATCTTAAGGGAGAGCCGGA,skpp-1696-R
6,TAAGCCCAATCTCCCACTGG,skpp-2474-F,TAAGCCCAATCTCCCACTGGT,CGGCTAAGTGAAGTCCGATC,GATCGGACTTCACTTAGCCG,skpp-1857-R


And finally let's clean it up just to the sequences we will append to the beginning (`fwd_seq_t`) and end (`rev_seq_comp`) of the ORBIT oligos. In the future, we can return to this notebook to get the actual primer sequences we will use to amplify the ORBIT constructs.

In [14]:
df_fwd_rev = df_fwd_rev[['fwd_seq_t', 'rev_seq_comp', 'fwd_primer_name','rev_primer_name']]

df_fwd_rev

Unnamed: 0,fwd_seq_t,rev_seq_comp,fwd_primer_name,rev_primer_name
0,TTAATCTTAGGCCCCACTGGT,GATCGTTCGATTCTGGTTGG,skpp-391-F,skpp-350-R
1,AGATTAGCTGCCGATACTGGT,GATCAACCGTGTGATGTCAC,skpp-743-F,skpp-469-R
2,GATCCTTGACTACCGACTGGT,GATCGGAATACGACCCTCTT,skpp-791-F,skpp-528-R
3,ATTTCTCCACTCCCAACTGGT,GATCCATCGTCCAAAAGCTG,skpp-1027-F,skpp-862-R
4,AATGTCTTGCCCCTTACTGGT,GATCAATTCCGTGGGGCTTT,skpp-1491-F,skpp-1585-R
5,AAATCTTTGCCCTCCACTGGT,GATCTTAAGGGAGAGCCGGA,skpp-2091-F,skpp-1696-R
6,TAAGCCCAATCTCCCACTGGT,GATCGGACTTCACTTAGCCG,skpp-2474-F,skpp-1857-R


# Add primer sequences to ORBIT oligos

Now let's actually make our final TWIST constructs that contain our PCR handles, RE sites, and ORBIT targeting oligo.

In [15]:
df_1 = pd.read_csv("twist_orbit_tf_del_FL_short.csv")
df_2 = pd.read_csv("twist_orbit_tf_del_FL_long.csv")

df_3 = pd.read_csv("twist_orbit_tf_del_AO_short.csv")
df_4 = pd.read_csv("twist_orbit_tf_del_AO_long.csv")

In [46]:
ends_1 = df_fwd_rev.iloc[0,:].str.lower()

df_1['seq'] = ends_1['fwd_seq_t'] + df_1['oligo'] + ends_1['rev_seq_comp']

df_1['construct'] = 'orbit_TF_del_first_last_short'

df_1['forward_primers_0'] = [(int(ends_1['fwd_primer_name'].split('-')[1]), 0)] * len(df_1['construct'])
df_1['reverse_primers_0'] = [(int(ends_1['rev_primer_name'].split('-')[1]), 0)] * len(df_1['construct'])

df_1_clean = df_1[['seq','construct','forward_primers_0','reverse_primers_0']]
df_1_clean

Unnamed: 0,seq,construct,forward_primers_0,reverse_primers_0
0,ttaatcttaggccccactggtAATCTCTCTGCAACCAAAGTGAACCAATGAGAGGCAACAAGAATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGAGGGTGTTACATGAATTCATACTCAATTGCTGTCATCGGAGTGgatcgttcgattctggttgg,orbit_TF_del_first_last_short,"(391, 0)","(350, 0)"
1,ttaatcttaggccccactggtTATGCACAATAATGTTGTATCAACCACCATATCGGGTGACTTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAATCTCTGCCCCGTCGTTTCTGACGGCGGGGAAAATGTTGCTTAgatcgttcgattctggttgg,orbit_TF_del_first_last_short,"(391, 0)","(350, 0)"
2,ttaatcttaggccccactggtGATGAATGAGTTTTCTATAAACTTATACTTAATAATTAGAAGTTAatgatcctgacgacggagaccgccgtcgtcgacaagccCATGGTAACCTCTCATCTTACTTATGAAATTTTAATGTATTCTGTgatcgttcgattctggttgg,orbit_TF_del_first_last_short,"(391, 0)","(350, 0)"
3,ttaatcttaggccccactggtGCTTCGAAGAGAGACACTACCTGCAACAATCAGGAGCGCAATATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAAAAATTTAGCTAAACACATATGAATTTTCAGATGTGTTTTATCgatcgttcgattctggttgg,orbit_TF_del_first_last_short,"(391, 0)","(350, 0)"
4,ttaatcttaggccccactggtGGCTAAAATAGAATGAATCATCAATCCGCATAAGAAAATCCTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATCGGCTTTTTTAATCCCATACTTTTCCACAGGTAGATCCCAAgatcgttcgattctggttgg,orbit_TF_del_first_last_short,"(391, 0)","(350, 0)"
...,...,...,...,...
69,ttaatcttaggccccactggtTAAGGGCATCTGTTTTTTATATTCAAGAATGAAAAATTTTTGTCAatgatcctgacgacggagaccgccgtcgtcgacaagccCATTACCAATACCTTACATATATTACTCATTAATGTATGTGCGAAgatcgttcgattctggttgg,orbit_TF_del_first_last_short,"(391, 0)","(350, 0)"
70,ttaatcttaggccccactggtATATGAGTGTCGAATCCTTATCCAAAACAAGAGGTAACTCTCATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGAACAAATTTTATCAGGTGACGTTCCGTAAAAAGTTGTATGGAGgatcgttcgattctggttgg,orbit_TF_del_first_last_short,"(391, 0)","(350, 0)"
71,ttaatcttaggccccactggtAGCCATGCACCGTAGACCAGATAAGCTCAGCGCATCCGGCAGTTAatgatcctgacgacggagaccgccgtcgtcgacaagccCATTTCATACTTACCTTTTTGTACGTACTTACTAAAAGTAAGTTTgatcgttcgattctggttgg,orbit_TF_del_first_last_short,"(391, 0)","(350, 0)"
72,ttaatcttaggccccactggtGGTTATTTAACGGCGCGAGTGTAATCCTGCCAGTGCAAAAAATCAatgatcctgacgacggagaccgccgtcgtcgacaagccCATACATACTCCACTAGTTATCGTTGATTTTGTCCAACAACTTGTgatcgttcgattctggttgg,orbit_TF_del_first_last_short,"(391, 0)","(350, 0)"


In [47]:
ends_2 = df_fwd_rev.iloc[1,:].str.lower()

df_2['seq'] = ends_2['fwd_seq_t'] + df_2['oligo'] + ends_2['rev_seq_comp']

df_2['construct'] = 'orbit_TF_del_first_last_short'

df_2['forward_primers_0'] = [(int(ends_2['fwd_primer_name'].split('-')[1]), 0)] * len(df_2['construct'])
df_2['reverse_primers_0'] = [(int(ends_2['rev_primer_name'].split('-')[1]), 0)] * len(df_2['construct'])

df_2_clean = df_2[['seq','construct','forward_primers_0','reverse_primers_0']]
df_2_clean

Unnamed: 0,seq,construct,forward_primers_0,reverse_primers_0
0,agattagctgccgatactggtCTATATTATGTGATCTAAATCACTTTTAAGTCAGAGTGAATAATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAATTCATATTGTACTGTTACGTTGTACAAACCTGTGCCAACGGGgatcaaccgtgtgatgtcac,orbit_TF_del_first_last_short,"(743, 0)","(469, 0)"
1,agattagctgccgatactggtGAGTCTGGCGGATGTCGACAGACTCTATTTTTTTATGCAGTTTTAatgatcctgacgacggagaccgccgtcgtcgacaagccCATGACGCCACCGATAACCGTTATTTATCAGACCAAAGAAACTGGgatcaaccgtgtgatgtcac,orbit_TF_del_first_last_short,"(743, 0)","(469, 0)"
2,agattagctgccgatactggtCGACGAAAATGTCCAGGAAAAATCCTGGAGTCAGATTCAGGGTTAatgatcctgacgacggagaccgccgtcgtcgacaagccCATATGTTCGTGAATTTACAGGCGTTAGATTTACATACATTTGTGgatcaaccgtgtgatgtcac,orbit_TF_del_first_last_short,"(743, 0)","(469, 0)"
3,agattagctgccgatactggtGTGGCTCTTGCCACGGTTCAGCATCGGCAAACAGATCCAACATTAatgatcctgacgacggagaccgccgtcgtcgacaagccCATAATCAGCTCCCTGGTTAAGGATAGCCTTTAGGCTGCCCGGTCgatcaaccgtgtgatgtcac,orbit_TF_del_first_last_short,"(743, 0)","(469, 0)"
4,agattagctgccgatactggtTTAGCGAGAACTGGTCTTTTATTCGCACTCAGGAGTACATGTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATTTTTAACCTTAACGAAGAGCTATATTAATAACGGCATCAGCgatcaaccgtgtgatgtcac,orbit_TF_del_first_last_short,"(743, 0)","(469, 0)"
...,...,...,...,...
221,agattagctgccgatactggtAAAGAATTTCGCCAGTTAATGCATCTTTAATCGGGAACTTTCATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAACGTCAGAAGGTTAATTCTGTTTCCAGCAGCGTCAGGATACTTgatcaaccgtgtgatgtcac,orbit_TF_del_first_last_short,"(743, 0)","(469, 0)"
222,agattagctgccgatactggtCGCGGAATAATCACGCAATTAACTAAACAAGGTTTAGTGAAGATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATGGCGCGATAACGTAGAAAGGCTTCCCGAAGGAAGCCTTGATgatcaaccgtgtgatgtcac,orbit_TF_del_first_last_short,"(743, 0)","(469, 0)"
223,agattagctgccgatactggtCTATGTGATCTCCATTTCGATTGATTTAGTGTTTATTGACGTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATTATAAAAAAAACTTATTATTTATTTTAGTTTTTATCAGTGGgatcaaccgtgtgatgtcac,orbit_TF_del_first_last_short,"(743, 0)","(469, 0)"
224,agattagctgccgatactggtTGACGATTTTCCCCGTTCCCGGTTGCTGTACCGGGAACGTATTTAatgatcctgacgacggagaccgccgtcgtcgacaagccCATTTCTCCAGCACTCTGGAGAAATAGGCAAGACATTGGCAGAAAgatcaaccgtgtgatgtcac,orbit_TF_del_first_last_short,"(743, 0)","(469, 0)"


In [48]:
ends_3 = df_fwd_rev.iloc[2,:].str.lower()

df_3['seq'] = ends_3['fwd_seq_t'] + df_3['oligo'] + ends_3['rev_seq_comp']

df_3['construct'] = 'orbit_TF_del_first_last_short'

df_3['forward_primers_0'] = [(int(ends_3['fwd_primer_name'].split('-')[1]), 0)] * len(df_3['construct'])
df_3['reverse_primers_0'] = [(int(ends_3['rev_primer_name'].split('-')[1]), 0)] * len(df_3['construct'])

df_3_clean = df_3[['seq','construct','forward_primers_0','reverse_primers_0']]
df_3_clean

Unnamed: 0,seq,construct,forward_primers_0,reverse_primers_0
0,gatccttgactaccgactggtCTCTCTGCAACCAAAGTGAACCAATGAGAGGCAACAAGAATGAACggcttgtcgacgacggcggtctccgtcgtcaggatcatCAACGCTGTAAACTTATTTGAGGGTGTTACATGAATTCATACTCAgatcggaatacgaccctctt,orbit_TF_del_first_last_short,"(791, 0)","(528, 0)"
1,gatccttgactaccgactggtGCACAATAATGTTGTATCAACCACCATATCGGGTGACTTATGCGAggcttgtcgacgacggcggtctccgtcgtcaggatcatCTGTTCGACCAGGAGCTTTAATCTCTGCCCCGTCGTTTCTGACGGgatcggaatacgaccctctt,orbit_TF_del_first_last_short,"(791, 0)","(528, 0)"
2,gatccttgactaccgactggtAAACTTATACTTAATAATTAGAAGTTACATATCATCAGCTGTGTAatgatcctgacgacggagaccgccgtcgtcgacaagccAAGCATGGTAACCTCTCATCTTACTTATGAAATTTTAATGTATTCgatcggaatacgaccctctt,orbit_TF_del_first_last_short,"(791, 0)","(528, 0)"
3,gatccttgactaccgactggtTCGAAGAGAGACACTACCTGCAACAATCAGGAGCGCAATATGTCAggcttgtcgacgacggcggtctccgtcgtcaggatcatAGTAAGAACATTTGCAGTTAAAAATTTAGCTAAACACATATGAATgatcggaatacgaccctctt,orbit_TF_del_first_last_short,"(791, 0)","(528, 0)"
4,gatccttgactaccgactggtTAAAATAGAATGAATCATCAATCCGCATAAGAAAATCCTATGGAAggcttgtcgacgacggcggtctccgtcgtcaggatcatATGCGTACCATCAAGCCCTGATCGGCTTTTTTAATCCCATACTTTgatcggaatacgaccctctt,orbit_TF_del_first_last_short,"(791, 0)","(528, 0)"
...,...,...,...,...
69,gatccttgactaccgactggtATATTCAAGAATGAAAAATTTTTGTCATTCCTTATGCTCCTTACAatgatcctgacgacggagaccgccgtcgtcgacaagccCGCCATTACCAATACCTTACATATATTACTCATTAATGTATGTGCgatcggaatacgaccctctt,orbit_TF_del_first_last_short,"(791, 0)","(528, 0)"
70,gatccttgactaccgactggtTGAGTGTCGAATCCTTATCCAAAACAAGAGGTAACTCTCATGCTTggcttgtcgacgacggcggtctccgtcgtcaggatcatAATCTCAAAAGACGATACTGAACAAATTTTATCAGGTGACGTTCCgatcggaatacgaccctctt,orbit_TF_del_first_last_short,"(791, 0)","(528, 0)"
71,gatccttgactaccgactggtAGATAAGCTCAGCGCATCCGGCAGTTATGCCGCACGTTCATCCCGatgatcctgacgacggagaccgccgtcgtcgacaagccACTCATTTCATACTTACCTTTTTGTACGTACTTACTAAAAGTAAGgatcggaatacgaccctctt,orbit_TF_del_first_last_short,"(791, 0)","(528, 0)"
72,gatccttgactaccgactggtGTGTAATCCTGCCAGTGCAAAAAATCAACAACCACTCTTAACGCCatgatcctgacgacggagaccgccgtcgtcgacaagccATACATACATACTCCACTAGTTATCGTTGATTTTGTCCAACAACTgatcggaatacgaccctctt,orbit_TF_del_first_last_short,"(791, 0)","(528, 0)"


In [49]:
ends_4 = df_fwd_rev.iloc[3,:].str.lower()

df_4['seq'] = ends_4['fwd_seq_t'] + df_4['oligo'] + ends_4['rev_seq_comp']

df_4['construct'] = 'orbit_TF_del_first_last_short'

df_4['forward_primers_0'] = [(int(ends_4['fwd_primer_name'].split('-')[1]), 0)] * len(df_4['construct'])
df_4['reverse_primers_0'] = [(int(ends_4['rev_primer_name'].split('-')[1]), 0)] * len(df_4['construct'])

df_4_clean = df_4[['seq','construct','forward_primers_0','reverse_primers_0']]
df_4_clean

Unnamed: 0,seq,construct,forward_primers_0,reverse_primers_0
0,atttctccactcccaactggtTATTATGTGATCTAAATCACTTTTAAGTCAGAGTGAATAATGGAAggcttgtcgacgacggcggtctccgtcgtcaggatcatGGGCGCGGGAAAGAGAAGTAATTCATATTGTACTGTTACGTTGTAgatccatcgtccaaaagctg,orbit_TF_del_first_last_short,"(1027, 0)","(862, 0)"
1,atttctccactcccaactggtCAGACTCTATTTTTTTATGCAGTTTTAACTTTGCAGATAGCCGCAatgatcctgacgacggagaccgccgtcgtcgacaagccAGCCATGACGCCACCGATAACCGTTATTTATCAGACCAAAGAAACgatccatcgtccaaaagctg,orbit_TF_del_first_last_short,"(1027, 0)","(862, 0)"
2,atttctccactcccaactggtAAAATCCTGGAGTCAGATTCAGGGTTATTCGTTAGTGGCAGGATTatgatcctgacgacggagaccgccgtcgtcgacaagccTGCCATATGTTCGTGAATTTACAGGCGTTAGATTTACATACATTTgatccatcgtccaaaagctg,orbit_TF_del_first_last_short,"(1027, 0)","(862, 0)"
3,atttctccactcccaactggtCAGCATCGGCAAACAGATCCAACATTACCTCTCCTCATTTTCAGCatgatcctgacgacggagaccgccgtcgtcgacaagccTTTCATAATCAGCTCCCTGGTTAAGGATAGCCTTTAGGCTGCCCGgatccatcgtccaaaagctg,orbit_TF_del_first_last_short,"(1027, 0)","(862, 0)"
4,atttctccactcccaactggtGCGAGAACTGGTCTTTTATTCGCACTCAGGAGTACATGTATGAGGggcttgtcgacgacggcggtctccgtcgtcaggatcatAGAGAACGCACTGTCGCCTGATTTTTAACCTTAACGAAGAGCTATgatccatcgtccaaaagctg,orbit_TF_del_first_last_short,"(1027, 0)","(862, 0)"
...,...,...,...,...
221,atttctccactcccaactggtGAATTTCGCCAGTTAATGCATCTTTAATCGGGAACTTTCATGAAAggcttgtcgacgacggcggtctccgtcgtcaggatcatAGCGCCCGTTTTCAGGGCTAACGTCAGAAGGTTAATTCTGTTTCCgatccatcgtccaaaagctg,orbit_TF_del_first_last_short,"(1027, 0)","(862, 0)"
222,atttctccactcccaactggtGGAATAATCACGCAATTAACTAAACAAGGTTTAGTGAAGATGAGAggcttgtcgacgacggcggtctccgtcgtcaggatcatGCGCAGTTACGACAGATTTGATGGCGCGATAACGTAGAAAGGCTTgatccatcgtccaaaagctg,orbit_TF_del_first_last_short,"(1027, 0)","(862, 0)"
223,atttctccactcccaactggtTGTGATCTCCATTTCGATTGATTTAGTGTTTATTGACGTATGTACggcttgtcgacgacggcggtctccgtcgtcaggatcatCGTGAGGTTAATCGTGATTGATTATAAAAAAAACTTATTATTTATgatccatcgtccaaaagctg,orbit_TF_del_first_last_short,"(1027, 0)","(862, 0)"
224,atttctccactcccaactggtCCGGTTGCTGTACCGGGAACGTATTTAATTCCCCTGCATCGCCCGatgatcctgacgacggagaccgccgtcgtcgacaagccTAGCATTTCTCCAGCACTCTGGAGAAATAGGCAAGACATTGGCAGgatccatcgtccaaaagctg,orbit_TF_del_first_last_short,"(1027, 0)","(862, 0)"


In [50]:
df_1_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_first_last_short.csv")
df_2_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_first_last_long.csv")
df_3_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_avd_ovlp_short.csv")
df_4_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_avd_ovlp_long.csv")

# Computational Environment

In [69]:
%load_ext watermark
%watermark -v -p wgregseq,numpy,pandas

Python implementation: CPython
Python version       : 3.8.5
IPython version      : 7.10.0

wgregseq: 0.0.1
numpy   : 1.18.1
pandas  : 1.1.5

