# Exploring sequence of Tn10 

(c) 2020 Tom Röschinger. This work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT).

In [1]:
import wgregseq
%load_ext autoreload
%autoreload 2

import pandas as pd

First we read the FASTA file obtained from Genebank.

In [2]:
with open ("tn10.fasta", "r") as file:
    data = file.read().split('\n')[1:]
    sequence = "".join(data)

Organization of tetR/tetA regulation:
- two operators that can be bound independently by TetR
- tetA is repressed by both tetO1 and tetO2
- tetR is repressed only by tetO1
- Affinity of tetO2 to TetR is about twice as high as tetO1

![](tn10_tet.png)

From Genebank, we can find the positions for *tetA* and *tetR*. The repressor gene is reversed, so we will have to obtain the complementary sequence in case we are interested in the actual sequences.

In [3]:
# Exact positions from Genebank
tetR_pos = [4702, 5328]
tetA_pos = [5407, 6612]

tetA = sequence[tetA_pos[0]-1:tetA_pos[1]]
tetR = wgregseq.complement_seq(sequence[tetA_pos[0]-1:tetA_pos[1]])

In [4]:
# Positions taken from Bertram 2008
tetO2_pos = [tetA_pos[0] - 28, tetA_pos[0] - 10]
tetO1_pos = [tetA_pos[0] - 58, tetA_pos[0] - 40]

tetO2 = sequence[tetO2_pos[0]:tetO2_pos[1]+1]
tetO1 = sequence[tetO1_pos[0]:tetO1_pos[1]+1]

In [5]:
tetO2

'TCCCTATCAGTGATAGAGA'

In [6]:
tetO1

'ACTCTATCATTGATAGAGT'

In [7]:
rev_tetO1 = wgregseq.complement_seq(tetO1, rev=True)
rev_tetO1

'ACTCTATCAATGATAGAGT'

In [8]:
rev_tetO2 = wgregseq.complement_seq(tetO2, rev=True)
rev_tetO2

'TCTCTATCACTGATAGGGA'

In [9]:
# TSS estimated from -10 region
tetA_TSS = tetO2_pos[0]+1
tetR_TSS1 = tetO2_pos[0] - 19
tetR_TSS2 = tetO1_pos[0] - 8

In [10]:
P_tetA = sequence[tetA_TSS-36:tetA_TSS+1]
P_tetA

'TTGACACTCTATCATTGATAGAGTTATTTTACCACTC'

In [11]:
P_tetR1 = wgregseq.complement_seq(sequence[tetO1_pos[1]-6:tetA_pos[0] - 8], rev=True)
P_tetR1

'TTCTCTATCACTGATAGGGAGTGGTAAAATAACTCTAT'

In [12]:
P_tetR2 = wgregseq.complement_seq(sequence[tetO1_pos[0]-8:tetO2_pos[0]-1], rev=True)
P_tetR2

'TGGTAAAATAACTCTATCAATGATAGAGTGTCAACAA'

In [13]:
len(P_tetA)

37

In [14]:
lavUV5 = 'TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGG'

## Constructs

All constructs which are include a tet operator need to be integrated into a cell which expresses the tet repressor. However, the inserts which only have the promoter should be observed in the absence of the repressor to identify the binding energy matrix for the -10/-35 regions.

### RegSeq tetA

This construct should be straight forward. One promoter, two operator binding sites. 

In [15]:
regseq_tetA = sequence[tetA_TSS-115:tetA_TSS+45]
regseq_tetA

'GACCTCATTAAGCAGCTCTAATGCGCTGTTAATCACTTTACTTTTATCTAATCTAGACATCATTAATTCCTAATTTTTGTTGACACTCTATCATTGATAGAGTTATTTTACCACTCCCTATCAGTGATAGAGAAAAGTGAAATGAATAGTTCGACAAAGA'

### LacUV5 + individual operators downstream

In [16]:
lacUV5_tetO1 = lavUV5 + tetO1
lacUV5_tetO1

'TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGACTCTATCATTGATAGAGT'

In [17]:
mutants_single = wgregseq.mutations_det(lacUV5_tetO1, mut_per_seq=1, site_start=-20)
tetO1_df_single = pd.DataFrame({"seq":mutants_single})
tetO1_df_single["description"] = "lacUV5_tetO1 single mutant"
tetO1_df_single.head()

Unnamed: 0,seq,description
0,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGACTCT...,lacUV5_tetO1 single mutant
1,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGaACTCT...,lacUV5_tetO1 single mutant
2,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGcCTCT...,lacUV5_tetO1 single mutant
3,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGAaTCT...,lacUV5_tetO1 single mutant
4,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGACaCT...,lacUV5_tetO1 single mutant


In [18]:
mutants_double = wgregseq.mutations_det(lacUV5_tetO1, mut_per_seq=2, num_mutants=100, site_start=-20)
tetO1_df_double = pd.DataFrame({"seq":mutants_double})
tetO1_df_double["description"] = "lacUV5_tetO1 double mutant"
tetO1_df_double.head()



Unnamed: 0,seq,description
0,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGACTCT...,lacUV5_tetO1 double mutant
1,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGAaTgT...,lacUV5_tetO1 double mutant
2,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGtACTCT...,lacUV5_tetO1 double mutant
3,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGACTCT...,lacUV5_tetO1 double mutant
4,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGcACTCT...,lacUV5_tetO1 double mutant


In [19]:
lacUV5_tetO2 = lavUV5 + tetO2
lacUV5_tetO2

'TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGTCCCTATCAGTGATAGAGA'

In [20]:
mutants_single = wgregseq.mutations_det(lacUV5_tetO2, mut_per_seq=1, site_start=-20)
tetO2_df_single = pd.DataFrame({"seq":mutants_single})
tetO2_df_single["description"] = "lacUV5_tetO2 single mutant"

mutants_double = wgregseq.mutations_det(lacUV5_tetO2, mut_per_seq=2, num_mutants=100, site_start=-20)
tetO2_df_double = pd.DataFrame({"seq":mutants_double})
tetO2_df_double["description"] = "lacUV5_tetO2 double mutant"
tetO2_df_double.head()

tet_df = pd.concat([tetO1_df_single, tetO1_df_double, tetO2_df_single, tetO2_df_double], ignore_index=True)
tet_df



Unnamed: 0,seq,description
0,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGACTCT...,lacUV5_tetO1 single mutant
1,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGaACTCT...,lacUV5_tetO1 single mutant
2,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGcCTCT...,lacUV5_tetO1 single mutant
3,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGAaTCT...,lacUV5_tetO1 single mutant
4,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGACaCT...,lacUV5_tetO1 single mutant
...,...,...
313,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGTCCCT...,lacUV5_tetO2 double mutant
314,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGTCCCT...,lacUV5_tetO2 double mutant
315,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGTCCaT...,lacUV5_tetO2 double mutant
316,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGTCCCT...,lacUV5_tetO2 double mutant


## -10/-35 without operators

This one we could right away without quantifying how many repressors are around.

In [21]:
PR1 = wgregseq.complement_seq(sequence[tetR_TSS1:tetR_TSS1+45], rev=True)
PR1

'TCACTTTTCTCTATCACTGATAGGGAGTGGTAAAATAACTCTATC'

In [22]:
PR2 = wgregseq.complement_seq(sequence[tetR_TSS2:tetR_TSS2+45], rev=True)
PR2

'ATAGGGAGTGGTAAAATAACTCTATCAATGATAGAGTGTCAACAA'

In [23]:
PA = sequence[tetA_TSS-45:tetA_TSS+1]
PA

'TAATTTTTGTTGACACTCTATCATTGATAGAGTTATTTTACCACTC'

In [24]:
wgregseq.mutations_rand(PR2, 1000, 0.1)


array(['ATAGGGAGTGGTAAAATAACTCTATCAATGATAGAGTGTCAACAA',
       'ATAGGGAGTGGTAcAATAACTCTcaCtATtATAGAGTGTCAcCAA',
       'ATAGGGAGTGGTAAAATAACTCTATCAATGATAGAGTGTCAACAA', ...,
       'ATAcGcAGTGGTAAAATAACTCTATCAATGATAaAGTGTCAcCAA',
       'ATAGGGAGTGGTAAAATAACTcTATCAATGATAGAGaGTtAACAA',
       'ATAcGtAGTGGTAAAATAACTCTtTCAATGATAGAGTGTgtACAA'], dtype='<U45')

In [25]:
import numpy as np
np.empty(10, dtype=object)#[np.array([1,2])])

array([None, None, None, None, None, None, None, None, None, None],
      dtype=object)