# Exploring sequence of Tn10 

(c) 2020 Tom Röschinger. This work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT).

***

In this notebook we look at the transposon **tn10**, which contains a natural system for the expression of *tetA*, which is regulated by *tetR*.

In [42]:
import wgregseq
%load_ext autoreload
%autoreload 2

import pandas as pd

from bokeh.plotting import figure
import bokeh.io

bokeh.io.output_notebook()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


First we read the FASTA file obtained from Genebank.

In [47]:
with open ("tn10.fasta", "r") as file:
    data = file.read().split('\n')[1:]
    sequence = "".join(data)

Organization of tetR/tetA regulation:
- two operators that can be bound independently by TetR
- tetA is repressed by both tetO1 and tetO2
- tetR is repressed only by tetO1
- Affinity of tetO2 to TetR is about twice as high as tetO1

![](tn10_tet.png)

From Genebank, we can find the positions for *tetA* and *tetR*. The repressor gene is reversed, so we will have to obtain the complementary sequence in case we are interested in the actual sequences.

In [48]:
# Exact positions from Genebank
tetR_pos = [4702, 5328]
tetA_pos = [5407, 6612]

For simple access, let's extract the region between the two genes, which contains all regulatory elements.

In [212]:
intergenic_region = sequence[5328:5407-1]
intergenic_region_rev = wgregseq.complement_seq(intergenic_region, rev=True)
intergenic_region

'TAATTCCTAATTTTTGTTGACACTCTATCATTGATAGAGTTATTTTACCACTCCCTATCAGTGATAGAGAAAAGTGAA'

In [144]:
len(intergenic_region)

78

Let's extract the sequences for both operators.

In [218]:
tetO1 = intergenic_region[21:40]
print("tetO1: ", tetO1)

tetO2 = intergenic_region[51:70]
print("tetO2: ", tetO2)

tetO1:  ACTCTATCATTGATAGAGT
tetO2:  TCCCTATCAGTGATAGAGA


In [219]:
rev_tetO1 = wgregseq.complement_seq(tetO1, rev=True)
rev_tetO1

'ACTCTATCAATGATAGAGT'

In [220]:
rev_tetO2 = wgregseq.complement_seq(tetO2, rev=True)
rev_tetO2

'TCTCTATCACTGATAGGGA'

In [221]:
P_tetA = intergenic_region[16:53]
P_tetA

'TTGACACTCTATCATTGATAGAGTTATTTTACCACTC'

In [222]:
P_tetR1 = intergenic_region_rev[7:45]
P_tetR1

'TTCTCTATCACTGATAGGGAGTGGTAAAATAACTCTAT'

In [223]:
P_tetR2 = intergenic_region_rev[28:65]
P_tetR2

'TGGTAAAATAACTCTATCAATGATAGAGTGTCAACAA'

In [224]:
lavUV5 = 'TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGG'

## Constructs

All constructs which are include a tet operator need to be integrated into a cell which expresses the tet repressor. However, the inserts which only have the promoter should be observed in the absence of the repressor to identify the binding energy matrix for the -10/-35 regions.

### LacUV5 + individual operators downstream

In [225]:
mutants_single = wgregseq.mutations_det(tetO1, mut_per_seq=1)
tetO1_df_single = pd.DataFrame({"seq":mutants_single})
tetO1_df_single["construct"] = "lacUV5_tetO1 single mutant"
tetO1_df_single.head()

Unnamed: 0,seq,construct
0,cCTCTATCATTGATAGAGT,lacUV5_tetO1 single mutant
1,AaTCTATCATTGATAGAGT,lacUV5_tetO1 single mutant
2,ACaCTATCATTGATAGAGT,lacUV5_tetO1 single mutant
3,ACTaTATCATTGATAGAGT,lacUV5_tetO1 single mutant
4,ACTCaATCATTGATAGAGT,lacUV5_tetO1 single mutant


In [226]:
mutants_double_O1 = wgregseq.mutations_det(tetO1, mut_per_seq=2, num_mutants=400, site_start=-20)
tetO1_df_double = pd.DataFrame({"seq":mutants_double_O1})
tetO1_df_double["construct"] = "lacUV5_tetO1 double mutant"
tetO1_df_double.head()

Unnamed: 0,seq,construct
0,ACTCcATCATTGATtGAGT,lacUV5_tetO1 double mutant
1,tCTCTATCgTTGATAGAGT,lacUV5_tetO1 double mutant
2,tCTCTATCtTTGATAGAGT,lacUV5_tetO1 double mutant
3,tCTCTATCATTGATAGtGT,lacUV5_tetO1 double mutant
4,AaTCTATCATTGATAGAGa,lacUV5_tetO1 double mutant


In [227]:
wgregseq.mutation_coverage(tetO1, mutants_double_O1)

array([0.14  , 0.12  , 0.095 , 0.12  , 0.0925, 0.1175, 0.0925, 0.1175,
       0.12  , 0.095 , 0.1075, 0.1075, 0.1025, 0.0925, 0.12  , 0.0725,
       0.075 , 0.105 , 0.1075])

In [228]:
mutants_single_O2 = wgregseq.mutations_det(tetO2, mut_per_seq=1)
tetO2_df_single = pd.DataFrame({"seq":mutants_single})
tetO2_df_single["construct"] = "lacUV5_tetO2 single mutant"

mutants_double_O2 = wgregseq.mutations_det(tetO2, mut_per_seq=2, num_mutants=400)
tetO2_df_double = pd.DataFrame({"seq":mutants_double})
tetO2_df_double["construct"] = "lacUV5_tetO2 double mutant"
tetO2_df_double.head()


Unnamed: 0,seq,construct
0,TCCaTATCAGTGATAGtGA,lacUV5_tetO2 double mutant
1,TCCCTATCcGTGATAGtGA,lacUV5_tetO2 double mutant
2,TCCCTATggGTGATAGAGA,lacUV5_tetO2 double mutant
3,TCCCTATCAGTaATAGAtA,lacUV5_tetO2 double mutant
4,TCCCTATCAGTacTAGAGA,lacUV5_tetO2 double mutant


In [229]:
wgregseq.mutation_coverage(tetO2, mutants_double_O2)

array([0.1175, 0.12  , 0.0825, 0.0825, 0.12  , 0.11  , 0.1075, 0.12  ,
       0.0925, 0.1225, 0.0675, 0.11  , 0.1175, 0.125 , 0.11  , 0.1   ,
       0.085 , 0.1   , 0.11  ])

In [230]:
tet_df = pd.concat([tetO1_df_single, tetO1_df_double, tetO2_df_single, tetO2_df_double], ignore_index=True)
lacUV5 = lacUV5 = 'TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGG'
tet_df.seq = [lacUV5 + seq for seq in tet_df.seq]
tet_df.head()

Unnamed: 0,seq,construct
0,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGcCTCT...,lacUV5_tetO1 single mutant
1,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGAaTCT...,lacUV5_tetO1 single mutant
2,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGACaCT...,lacUV5_tetO1 single mutant
3,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGACTaT...,lacUV5_tetO1 single mutant
4,TCGAGTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGACTCa...,lacUV5_tetO1 single mutant


In [231]:
tet_df['primer_added'] = False
tet_df.to_csv("../../../../data/twist_order/lacUV5_tetOx_single_double_mutants.csv")

## Native Promoter sequences

This one we could right away without quantifying how many repressors are around. We mutate the native promoter sequences at a 0.1 rate. Also, we scramble the -10/-35 region of the other tetR promoter, to minimize interaction between the promoters.

In [232]:
randomized_R2 = list(intergenic_region_rev)
randomized_R2[50:66] = list(wgregseq.gen_rand_seq(16).lower())
randomized_R2 = "".join(randomized_R2)
randomized_R2

'TTCACTTTTCTCTATCACTGATAGGGAGTGGTAAAATAACTCTATCAATGtttatggacttgtcagAATTAGGAATTA'

In [233]:
PR1_mutants = wgregseq.mutations_rand(randomized_R2, 1000, 0.1, site_start=6, site_end=46)
wgregseq.mutation_coverage(randomized_R2, PR1_mutants, site_start=6, site_end=46)

array([0.102, 0.084, 0.091, 0.091, 0.108, 0.095, 0.096, 0.067, 0.107,
       0.113, 0.094, 0.103, 0.075, 0.092, 0.085, 0.095, 0.1  , 0.11 ,
       0.086, 0.1  , 0.085, 0.104, 0.107, 0.087, 0.111, 0.099, 0.099,
       0.085, 0.089, 0.104, 0.103, 0.108, 0.086, 0.103, 0.092, 0.106,
       0.108, 0.113, 0.106, 0.11 ])

In [234]:
randomized_R1 = list(intergenic_region_rev)
randomized_R1[6:21] = list(wgregseq.gen_rand_seq(16).lower())
randomized_R1 = "".join(randomized_R1)
randomized_R1

'TTCACTtttgagcaattgatcaTAGGGAGTGGTAAAATAACTCTATCAATGATAGAGTGTCAACAAAAATTAGGAATTA'

In [235]:
PR2_mutants = wgregseq.mutations_rand(randomized_R1, 1000, 0.1, site_start=27, site_end=66)
wgregseq.mutation_coverage(randomized_R1, PR2_mutants, site_start=27, site_end=66)

array([0.097, 0.103, 0.112, 0.102, 0.098, 0.112, 0.105, 0.084, 0.094,
       0.106, 0.102, 0.103, 0.111, 0.092, 0.085, 0.096, 0.115, 0.114,
       0.089, 0.107, 0.109, 0.096, 0.107, 0.096, 0.097, 0.091, 0.103,
       0.106, 0.083, 0.098, 0.103, 0.097, 0.092, 0.094, 0.094, 0.094,
       0.099, 0.099, 0.091])

In [236]:
PA_mutants = wgregseq.mutations_rand(intergenic_region, 1000, 0.1, site_start=16, site_end=53)
wgregseq.mutation_coverage(intergenic_region, PA_mutants, site_start=16, site_end=53)

array([0.116, 0.094, 0.092, 0.107, 0.096, 0.094, 0.107, 0.105, 0.114,
       0.101, 0.11 , 0.114, 0.111, 0.09 , 0.096, 0.1  , 0.096, 0.101,
       0.102, 0.099, 0.087, 0.104, 0.08 , 0.1  , 0.099, 0.078, 0.098,
       0.09 , 0.1  , 0.102, 0.115, 0.091, 0.107, 0.083, 0.111, 0.093,
       0.113])

In [237]:
dfR1 = pd.DataFrame({'seq': PR1_mutants, 'primer_added' : False, 'construct' : "P_tetR1", 'note': "P_tetR2 -10 region randomized"})
dfR2 = pd.DataFrame({'seq': PR2_mutants, 'primer_added' : False, 'construct' : "P_tetR2", 'note': "P_tetR1 -35 region randomized"})
dfA = pd.DataFrame({'seq': PA_mutants, 'primer_added' : False, 'construct' : "P_tetA", 'note': ""})

df_promoters = pd.concat([dfR1, dfR2, dfA], ignore_index=True)
df_promoters

Unnamed: 0,seq,primer_added,construct,note
0,TTCACTgTTCTtTATCACTGATAGaGAtTGGTAAAATAAgTCTATC...,False,P_tetR1,P_tetR2 -10 region randomized
1,TTCACTTTTCTgTATCACTGATAGGaAGTGGTAAAAcAACTCTgTC...,False,P_tetR1,P_tetR2 -10 region randomized
2,TTCACTTTTCTCTcTCAgTGATAaGtgGTGGTAAAATtACTCTATC...,False,P_tetR1,P_tetR2 -10 region randomized
3,TTCACTTcTCTCTATCACTGATAGGGAGTGGTAAAATAACTCTATC...,False,P_tetR1,P_tetR2 -10 region randomized
4,TTCACTTTTCTCTATCACTGATAGGGAGTGaTAAAATAgCTCTATC...,False,P_tetR1,P_tetR2 -10 region randomized
...,...,...,...,...
2995,TAATTCCTAATTTTTGTTGACACTCTATCATTGcTAGAGaTATTTT...,False,P_tetA,
2996,TAATTCCTAATTTTTGgTGACACTCTATCATTGATAGAGgTATaTT...,False,P_tetA,
2997,TAATTCCTAATTTTTGTTGACAaTCTtTCATTGATAGAGTTATTTg...,False,P_tetA,
2998,TAATTCCTAATTTTTGTTGACACTCTATCATTGATAGAaTTATTTT...,False,P_tetA,


In [238]:
tet_df.to_csv("../../../../data/twist_order/natural_tet_promoters_mutated.csv")

## Computational environment

In [20]:
%load_ext watermark
%watermark -v -p numpy,pandas,wgregseq,bokeh

CPython 3.8.5
IPython 7.10.0

numpy 1.18.1
pandas 1.0.3
wgregseq 0.0.1
bokeh 2.0.2
