# Max Mass Reclamation



The `MinGenome` algorithm contains the following objective function:


$$\max\sum_{k\in K}y_k {d'}_k -\sum_{k\in K} x_k d_k$$

where the variables are

$$x_k = \begin{array}{l}
1, \text{ if gene or promoter $k$ is the first gene or promoter within the deleted segment  }\\
0, \text{ otherwise}
\end{array}
$$
and
$$y_k = \begin{array}{l}
1, \text{ if gene or promoter $k$ is immediately after the end of the deleted segment  }\\
0, \text{ otherwise}
\end{array}
$$
where 

$$\sum_{k\in K}x_k = 1$$
and $$\sum_{k\in K}y_k = 1$$


and the parameters are

* $d_k$: Position of the first nucleotide of the delted sequence starting from the origin of replication when gen eor pmoter $k$ is selected to be deleted in the begining of the stretch.  Note that $d_k$ is not always the start site of a gene or promoter. It is the first nucleotide of the nonoverlapped region between the gene/promoter $k$ and gene/promoter $k-1$
* ${d'}_k$: Position of the first nucleotide of the gene or promoter
k immediately after the deleted sequence

![Definitions of the start site and end site of a deletion in MinGenome algorithm](Figure8.jpg)

`MaxMassReclamation` is the `MinGenome` algorithm with the objective function above replaced with the following objective function:

$$\max\sum_{k\in K}y_k \sum_{i=1}^k{m}_i - \sum_{k\in K} x_k\sum_{i=1}^k{m}_i$$

where $m_i$ is the measured protein mass of gene $i$ and $\sum_{i=1}^k{m}_i$ is the cumulative protein mass of all genes from the origin of replication to the $k$th gene.   By subtracting the cumulative protein mass of the start gene ($x_k$)  from the cumulative protein mass of the end gene ($y_k$), we obtain the cumulative protein mass of the interval for the optimal solution

We maximize the mass that is knocked out based on absolute quantitative proteomics [Schmidt et al 2015](http://www.nature.com/articles/nbt.3418) in $(fg/cell)$


We then run `Max mass reclamation` iteratively without replacement.


Instead of maximizing the length of the knockout, we maximize the amount of protein mass that is reclaimed by a knockout.



In [28]:
import random
import numpy as np
genes = range (1,50)
essen_genes= [1,2,23,48]
mass = np.random.randint(1, 10, 50) 
cum_mass = np.cumsum(mass)
cum_mass[:4]
mass[:4]

array([ 9, 12, 20, 26])

array([9, 3, 8, 6])

In [26]:
x_k =22
y_k =47
cum_mass[y_k-1] -  cum_mass[x_k-1], np.sum(mass[x_k:y_k])

(108, 108)

In [32]:
import random
import numpy as np
genes = range (1,50)
essen_genes= [1,2,23,48]
mass = np.random.randint(1, 10, 50) 
cum_mass = np.insert(np.cumsum(mass), 0, 0)
cum_mass[:5], mass[:4]

(array([ 0,  8, 14, 21, 28]), array([8, 6, 7, 7]))

In [33]:
x_k =22
y_k =47
cum_mass[y_k] -  cum_mass[x_k], np.sum(mass[x_k:y_k])

(130, 130)

In [1]:
class PDF(object):
    def __init__(self, pdf, size=(1200,600)):
        self.pdf = pdf
        self.size = size

    def _repr_html_(self):
        return '<iframe src={0} width={1[0]} height={1[1]}></iframe>'.format(self.pdf, self.size)

    def _repr_latex_(self):
        return r'\includegraphics[width=1.0\textwidth]{{{0}}}'.format(self.pdf)


# Adding violacein pathway to *E. coli*

In [2]:
import cobra, pandas as pd, os, sys
from cobra.core import Reaction, Metabolite
ecolidir = os.path.join('data','Ecoli')
khk = cobra.io.read_sbml_model(os.path.join(ecolidir, 'iJO1366.xml'))
khk_induced = khk.copy()
vioA = cobra.core.Reaction('vioA', 
                           name='L-tryptophan oxidase', 
                           subsystem='Violacein biosynthesis', 
                          lower_bound=0,
                          upper_bound = 100)
vioB = cobra.core.Reaction('vioB', 
                           name="N-[2-(carboxylatoamino)-1,2-bis(1H-indol-3-yl)ethyl]carbamate synthase",
                           subsystem='Violacein biosynthesis', 
                          lower_bound=0,
                          upper_bound = 100)
vioE = cobra.core.Reaction('vioE', 
                           name="protodeoxyviolaceinate synthase",
                           subsystem='Violacein biosynthesis', 
                          lower_bound=0,
                          upper_bound = 100)
EX_violacein_e = cobra.core.Reaction('EX_violacein_e',
                                     name="protodeoxyviolaceinate export",
                           subsystem='Violacein biosynthesis', 
                          lower_bound=0,
                          upper_bound = 100)
CPD_11890_c = cobra.core.Metabolite('CPD_11890_c', 
                                   name="2-imino-3-(indol-3-yl)propanoate",
                                    formula="C11H9N2O2",
                                    charge=-1,
                                    compartment='c'
                           )
CPD_11890_c.annotation = {'biocyc': "META:CPD-11890"}
CPD_19471_c = cobra.core.Metabolite('CPD_19471_c', 
                                   name= "N-[2-(carboxylatoamino)-1,2-bis(1H-indol-3-yl)ethyl]carbamate",
                                    formula="C22H16N4O4",
                                    charge=-2,
                                    compartment='c'
                           )
CPD_19471_c.annotation = {'biocyc': "META:CPD-19471"}
CPD_14320_c = cobra.core.Metabolite('CPD_14320_c', 
                                   name= "protodeoxyviolaceinate",
                                    formula="C21H14N3O2",
                                    charge=-1,
                                    compartment='c'
                           )
CPD_14320_c.annotation = {'biocyc': "META:CPD-14320"}
khk_induced.add_metabolites([CPD_11890_c, CPD_19471_c, CPD_14320_c])
khk_induced.add_reactions([vioA, vioB, vioE, EX_violacein_e])
vioA.build_reaction_from_string('trp__L_c + o2_c --> h2o2_c + CPD_11890_c + h_c')
vioB.build_reaction_from_string('2 CPD_11890_c + h2o2_c --> CPD_19471_c + 2 h2o_c')
vioE.build_reaction_from_string('CPD_19471_c + 2 h_c --> co2_c + nh4_c + CPD_14320_c')
EX_violacein_e.build_reaction_from_string('CPD_14320_c -->')
cobra.io.save_json_model(khk_induced,
                         os.path.join(ecolidir, 
                                      'khk_induced.json'), 
                         pretty=True, 
                         sort=True)

In [3]:
%matplotlib inline
import cameo
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from cameo import phenotypic_phase_plane

p = phenotypic_phase_plane(khk_induced, variables=["BIOMASS_Ec_iJO1366_core_53p95M"], objective='EX_violacein_e')
p.plot()

In [28]:
from IPython.display import HTML
with khk_induced:
    khk_induced.objective = 'vioE'
    violacein_obj = khk_induced.optimize().fluxes
    tol = 1e-3
    display(HTML(violacein_obj[(violacein_obj > tol) | (violacein_obj < -tol)].to_frame('fluxes').sort_values('fluxes', ascending=False).to_html()))
    

Unnamed: 0,fluxes
EX_h2o_e,52.982482
ATPS4rpp,20.646837
CYTBO3_4pp,14.737226
NADH16pp,14.128954
EX_co2_e,10.877372
GLCtex_copy1,10.0
GAPD,9.964964
ENO,9.964964
O2tpp,9.707786
O2tex,9.707786


In [20]:
10*6 - 10.877*1 - 2.33*21

0.19299999999999784

## Genome view of predicted KOs

In [5]:
PDF('Visualizations/ProteinMassReclaimed.pdf')

## Metabolic map of predicted KOs

In [6]:
PDF('Visualizations/reclaimed.png')

# *E. coli* genes and promoters

In [35]:
import pandas as pd
df = pd.read_excel('data/Ecoli/genes_and_promoters.xlsx', sheet_name='all_clear_v2')
df

start, end = 190, 9191
genes_of_interval = df[(df['start'] >= start) & \
                       (df['end'] <= end) & \
                       (df['class'] == 'gene')]
genes_of_interval['gene_or_promoter'].tolist()

Unnamed: 0,gene_or_promoter,start,end,strand,class,genes_in_TU,start_if_select_as_start,cannot_as_start
0,PM00249,148,189,1,promoter,"[b0001, b0002, b0003, b0004]",148,0
1,b0001,190,255,1,gene,,190,0
2,b0002,337,2799,1,gene,,337,0
3,b0003,2801,3733,1,gene,,2801,0
4,b0004,3734,5020,1,gene,,3734,0
5,b0005,5234,5530,1,gene,,5234,0
6,b0006,5683,6459,-1,gene,,5683,0
7,b0007,6529,7959,-1,gene,,6529,0
8,PM0-9956,8191,8237,1,promoter,[b0008],8191,0
9,b0008,8238,9191,1,gene,,8238,0


['b0001', 'b0002', 'b0003', 'b0004', 'b0005', 'b0006', 'b0007', 'b0008']

In [2]:
df

Unnamed: 0,gene_or_promoter,start,end,strand,class,genes_in_TU,start_if_select_as_start,cannot_as_start
0,PM00249,148,189,1,promoter,"[b0001, b0002, b0003, b0004]",148,0
1,b0001,190,255,1,gene,,190,0
2,b0002,337,2799,1,gene,,337,0
3,b0003,2801,3733,1,gene,,2801,0
4,b0004,3734,5020,1,gene,,3734,0
5,b0005,5234,5530,1,gene,,5234,0
6,b0006,5683,6459,-1,gene,,5683,0
7,b0007,6529,7959,-1,gene,,6529,0
8,PM0-9956,8191,8237,1,promoter,[b0008],8191,0
9,b0008,8238,9191,1,gene,,8238,0


# Protein measurements in femtograms from Schmidt et al Table S13 

In [40]:
proteins_fg = pd.read_excel('data/Ecoli/Schmidt/nbt.3418-S2.xlsx', sheet_name='Table S13', skiprows=3)
proteins_fg

Unnamed: 0,Uniprot Accession,Gene,Description,Cellular protein location (according to www.uniprot.org),A14.07032,A14.07033,A14.07034,A14.07036,A14.07037,A14.07038,...,A14.07115,A14.07117,A14.07118,A14.07119,A14.07121,A14.07122,A14.07123,A14.07125,A14.07126,A14.07127
0,P04825,pepN,Aminopeptidase N OS=Escherichia coli (strain K...,Cell inner membrane,4.633803e-01,4.113094e-01,3.915993e-01,2.136611e-01,2.127355e-01,2.102780e-01,...,2.619060e-01,1.605115e-01,1.762474e-01,1.532321e-01,1.972634e-01,2.219234e-01,2.172675e-01,2.251897e-01,2.415721e-01,2.495236e-01
1,P0C0V0,degP,Protease do OS=Escherichia coli (strain K12) G...,Cell inner membrane,2.128826e-01,2.076923e-01,2.117216e-01,2.587472e-01,2.753042e-01,2.732864e-01,...,1.983017e-01,1.340973e-01,1.583725e-01,1.395959e-01,1.359015e-01,1.784184e-01,1.805699e-01,3.353371e-01,3.702367e-01,3.175771e-01
2,P0AAI3,ftsH,ATP-dependent zinc metalloprotease FtsH OS=Esc...,Cell inner membrane,4.640302e-01,4.143873e-01,4.943970e-01,2.440695e-01,2.623177e-01,2.724735e-01,...,2.041780e-01,1.721386e-01,1.648425e-01,1.837950e-01,1.980768e-01,2.045779e-01,2.061072e-01,2.451916e-01,2.672380e-01,2.739067e-01
3,P0ABC7,hflK,Modulator of FtsH protease HflK OS=Escherichia...,Cell inner membrane,1.772662e-01,1.711126e-01,1.828374e-01,9.544338e-02,1.122174e-01,1.019477e-01,...,9.121105e-02,6.832720e-02,5.526516e-02,6.789463e-02,7.846795e-02,8.284558e-02,8.471687e-02,1.138036e-01,1.236418e-01,1.141445e-01
4,P08506,dacC,D-alanyl-D-alanine carboxypeptidase dacC OS=Es...,Cell inner membrane,8.301700e-02,5.860804e-02,6.915716e-02,6.989639e-02,7.144231e-02,7.433892e-02,...,1.000501e-01,9.289993e-02,9.475696e-02,9.951884e-02,6.972466e-02,9.001456e-02,9.822825e-02,1.029425e-01,1.144356e-01,1.114165e-01
5,P0ABC3,hflC,Modulator of FtsH protease HflC OS=Escherichia...,Cell inner membrane,1.300773e-01,1.328812e-01,1.410357e-01,7.555463e-02,8.089087e-02,8.058817e-02,...,6.409134e-02,5.087432e-02,5.364527e-02,5.651761e-02,6.374615e-02,6.534776e-02,7.848423e-02,8.499128e-02,8.734255e-02,9.080262e-02
6,P0AEB2,dacA,D-alanyl-D-alanine carboxypeptidase dacA OS=Es...,Cell inner membrane,1.552737e-01,1.336722e-01,1.445040e-01,1.008689e-01,9.944921e-02,9.930472e-02,...,6.377710e-02,4.638581e-02,5.268501e-02,4.432679e-02,7.098773e-02,6.368619e-02,6.319260e-02,8.758517e-02,9.556234e-02,8.581793e-02
7,P0AG14,sohB,Probable protease sohB OS=Escherichia coli (st...,Cell inner membrane,2.783188e-02,2.469125e-02,3.184970e-02,2.659054e-02,2.699793e-02,2.712781e-02,...,2.533363e-02,1.572704e-02,1.355775e-02,1.584925e-02,1.831426e-02,2.265361e-02,2.310298e-02,3.180126e-02,4.520411e-02,4.042833e-02
8,P23865,prc,Tail-specific protease OS=Escherichia coli (st...,Cell inner membrane,7.892680e-02,7.982559e-02,7.756057e-02,6.429820e-02,6.502070e-02,6.326422e-02,...,5.010172e-02,4.358141e-02,4.393447e-02,4.231139e-02,5.313490e-02,5.614253e-02,5.380110e-02,6.429808e-02,6.086548e-02,7.163990e-02
9,P23894,htpX,Protease HtpX OS=Escherichia coli (strain K12)...,Cell inner membrane,2.854932e-02,3.011286e-02,2.742700e-02,2.133862e-02,2.654830e-02,2.387221e-02,...,1.062292e-02,9.383213e-03,1.036593e-02,1.237462e-02,1.836186e-02,1.611691e-02,1.531539e-02,2.066662e-02,2.192009e-02,3.154345e-02


## Switch to Uniprot

In [41]:
BW25113_fg = proteins_fg[['Uniprot Accession', 
                          'Description', 'Gene', 
                          'Cellular protein location (according to www.uniprot.org)', 
                          'A14.07036', 'A14.07037', 'A14.07038' ]].set_index('Uniprot Accession')
BW25113_fg

Unnamed: 0_level_0,Description,Gene,Cellular protein location (according to www.uniprot.org),A14.07036,A14.07037,A14.07038
Uniprot Accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
P04825,Aminopeptidase N OS=Escherichia coli (strain K...,pepN,Cell inner membrane,2.136611e-01,2.127355e-01,2.102780e-01
P0C0V0,Protease do OS=Escherichia coli (strain K12) G...,degP,Cell inner membrane,2.587472e-01,2.753042e-01,2.732864e-01
P0AAI3,ATP-dependent zinc metalloprotease FtsH OS=Esc...,ftsH,Cell inner membrane,2.440695e-01,2.623177e-01,2.724735e-01
P0ABC7,Modulator of FtsH protease HflK OS=Escherichia...,hflK,Cell inner membrane,9.544338e-02,1.122174e-01,1.019477e-01
P08506,D-alanyl-D-alanine carboxypeptidase dacC OS=Es...,dacC,Cell inner membrane,6.989639e-02,7.144231e-02,7.433892e-02
P0ABC3,Modulator of FtsH protease HflC OS=Escherichia...,hflC,Cell inner membrane,7.555463e-02,8.089087e-02,8.058817e-02
P0AEB2,D-alanyl-D-alanine carboxypeptidase dacA OS=Es...,dacA,Cell inner membrane,1.008689e-01,9.944921e-02,9.930472e-02
P0AG14,Probable protease sohB OS=Escherichia coli (st...,sohB,Cell inner membrane,2.659054e-02,2.699793e-02,2.712781e-02
P23865,Tail-specific protease OS=Escherichia coli (st...,prc,Cell inner membrane,6.429820e-02,6.502070e-02,6.326422e-02
P23894,Protease HtpX OS=Escherichia coli (strain K12)...,htpX,Cell inner membrane,2.133862e-02,2.654830e-02,2.387221e-02


## Compute cumulative distribution of mass across the genome

In [54]:
import numpy as np
bw_glucose = ['A14.07036', 'A14.07037', 'A14.07038']
b2u = pd.read_table('data/Ecoli/blatter-to-uniprot.tab', index_col = 'Blattner')
ecoli = df.join(b2u, on='gene_or_promoter').join(BW25113_fg, on='Uniprot')


minAbundance = ecoli[bw_glucose[0]].dropna().min()
display(minAbundance)
ecoli.loc[ecoli['class']== 'promoter', bw_glucose[0]] = minAbundance
ecoli[bw_glucose[0]].fillna(minAbundance,inplace=True)
ecoli['cumulativeMass'] = pd.concat([pd.Series([0]),
                                    np.cumsum(ecoli[bw_glucose[0]])], 
                                    ignore_index=True).iloc[:-1]
                          
ecoli.to_csv('data/Ecoli/{}_cumulative_mass.tab'.format(bw_glucose[0]), sep='\t', index=False)
ecoli

4.591655779049255e-07

Unnamed: 0,gene_or_promoter,start,end,strand,class,genes_in_TU,start_if_select_as_start,cannot_as_start,Uniprot,Description,Gene,Cellular protein location (according to www.uniprot.org),A14.07036,A14.07037,A14.07038,cumulativeMass
0,PM00249,148,189,1,promoter,"[b0001, b0002, b0003, b0004]",148,0,,,,,4.591656e-07,,,0.000000e+00
1,b0001,190,255,1,gene,,190,0,P0AD86,,,,4.591656e-07,,,4.591656e-07
2,b0002,337,2799,1,gene,,337,0,P00561,,,,4.591656e-07,,,9.183312e-07
3,b0003,2801,3733,1,gene,,2801,0,P00547,Homoserine kinase OS=Escherichia coli (strain ...,thrB,Cytoplasm,3.313051e-02,0.032556,0.033302,1.377497e-06
4,b0004,3734,5020,1,gene,,3734,0,P00934,,,,4.591656e-07,,,3.313188e-02
5,b0005,5234,5530,1,gene,,5234,0,P75616,,,,4.591656e-07,,,3.313234e-02
6,b0006,5683,6459,-1,gene,,5683,0,P0A8I3,,,,4.591656e-07,,,3.313280e-02
7,b0007,6529,7959,-1,gene,,6529,0,P30143,,,,4.591656e-07,,,3.313326e-02
8,PM0-9956,8191,8237,1,promoter,[b0008],8191,0,,,,,4.591656e-07,,,3.313372e-02
9,b0008,8238,9191,1,gene,,8238,0,P0A870,Transaldolase B OS=Escherichia coli (strain K1...,talB,Cytoplasm,6.209092e-01,0.622423,0.610794,3.313418e-02


In [55]:
ecoli['cumulativeMass'] = np.insert(np.cumsum(ecoli[bw_glucose[0]]), 0, 0)

ValueError: Length of passed values is 6083, index implies 6082

# Mass per protein per cell calculated from Schmidt et al Table S9



$$[\text{Mass of protein per cell}][\text{protein Molecular weight (kDa)}][\text{Copies/Cell}]$$


In [56]:
proteins = pd.read_excel('data/Ecoli/Schmidt/nbt.3418-S2.xlsx', sheet_name='Table S9', skiprows=2)
mg1655 = proteins[['Uniprot Accession', 'Description', 'Gene', 'proteinMW', 'Copies/Cell_MG1655.Glucose' ]]


In [57]:
mg1655 = proteins[['Uniprot Accession', 
                   'Description', 
                   'Gene', 
                   'proteinMW', 
                   'Copies/Cell_MG1655.Glucose' ]].\
                iloc[:2019].\
                set_index('Uniprot Accession')
mg1655['kDa/Protein'] = mg1655['proteinMW'] * mg1655['Copies/Cell_MG1655.Glucose']
mg1655.sort_values('kDa/Protein', ascending=False)

Unnamed: 0_level_0,Description,Gene,proteinMW,Copies/Cell_MG1655.Glucose,kDa/Protein
Uniprot Accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
P0CE47,Elongation factor Tu 1 OS=Escherichia coli (st...,tufA,43238.30279,279772.591884,1.209689e+10
P0A910,Outer membrane protein A OS=Escherichia coli (...,ompA,37159.64679,108747.026756,4.041001e+09
P25665,5-methyltetrahydropteroyltriglutamate--homocys...,metE,84602.80905,37939.368305,3.209777e+09
P0A6M8,Elongation factor G OS=Escherichia coli (strai...,fusA,77514.47523,40158.649597,3.112877e+09
P0ABK5,Cysteine synthase A OS=Escherichia coli (strai...,cysK,34450.28634,65474.668058,2.255621e+09
P05793,Ketol-acid reductoisomerase OS=Escherichia col...,ilvC,54016.26705,34730.063498,1.875988e+09
P04949,Flagellin OS=Escherichia coli (strain K12) GN=...,fliC,51246.56095,34899.631529,1.788486e+09
P08200,Isocitrate dehydrogenase [NADP] OS=Escherichia...,icd,45709.46787,37682.155126,1.722431e+09
P0A6P9,Enolase OS=Escherichia coli (strain K12) GN=en...,eno,45608.41069,35656.217111,1.626223e+09
P0ACF0,DNA-binding protein HU-alpha OS=Escherichia co...,hupA,9511.17791,167069.666730,1.589029e+09


## Calculate Cumulative abundance



     
Note that we do not calculate min copy number and multiply by molecular weight, we just fill genes by the min g/Protein.
The next version will include molecular weights for all proteins so we can do it by min copy number instead.


In [58]:
df

Unnamed: 0,gene_or_promoter,start,end,strand,class,genes_in_TU,start_if_select_as_start,cannot_as_start
0,PM00249,148,189,1,promoter,"[b0001, b0002, b0003, b0004]",148,0
1,b0001,190,255,1,gene,,190,0
2,b0002,337,2799,1,gene,,337,0
3,b0003,2801,3733,1,gene,,2801,0
4,b0004,3734,5020,1,gene,,3734,0
5,b0005,5234,5530,1,gene,,5234,0
6,b0006,5683,6459,-1,gene,,5683,0
7,b0007,6529,7959,-1,gene,,6529,0
8,PM0-9956,8191,8237,1,promoter,[b0008],8191,0
9,b0008,8238,9191,1,gene,,8238,0


In [59]:
import numpy as np
b2u = pd.read_table('data/Ecoli/blatter-to-uniprot.tab', index_col = 'Blattner')
ecoli = df.join(b2u, on='gene_or_promoter').join(BW25113_fg, on='Uniprot')
ecoli

Unnamed: 0,gene_or_promoter,start,end,strand,class,genes_in_TU,start_if_select_as_start,cannot_as_start,Uniprot,Description,Gene,Cellular protein location (according to www.uniprot.org),A14.07036,A14.07037,A14.07038
0,PM00249,148,189,1,promoter,"[b0001, b0002, b0003, b0004]",148,0,,,,,,,
1,b0001,190,255,1,gene,,190,0,P0AD86,,,,,,
2,b0002,337,2799,1,gene,,337,0,P00561,,,,,,
3,b0003,2801,3733,1,gene,,2801,0,P00547,Homoserine kinase OS=Escherichia coli (strain ...,thrB,Cytoplasm,0.033131,0.032556,0.033302
4,b0004,3734,5020,1,gene,,3734,0,P00934,,,,,,
5,b0005,5234,5530,1,gene,,5234,0,P75616,,,,,,
6,b0006,5683,6459,-1,gene,,5683,0,P0A8I3,,,,,,
7,b0007,6529,7959,-1,gene,,6529,0,P30143,,,,,,
8,PM0-9956,8191,8237,1,promoter,[b0008],8191,0,,,,,,,
9,b0008,8238,9191,1,gene,,8238,0,P0A870,Transaldolase B OS=Escherichia coli (strain K1...,talB,Cytoplasm,0.620909,0.622423,0.610794


In [62]:
minAbundance = mg1655['kDa/Protein'].dropna().min()
minAbundance


110.73090803705458

In [63]:
ecoli.loc[ecoli['class']== 'promoter','kDa/Protein'] = 10
ecoli['kDa/Protein'].fillna(minAbundance,inplace=True)
ecoli['cumulativeAbundance'] = np.cumsum(ecoli['kDa/Protein'])
                          
ecoli.to_csv('data/Ecoli/{}_cumulative_abundance.tab'.format('kDa_per_protein_per_cell'), sep='\t', index=False)
ecoli

Unnamed: 0,gene_or_promoter,start,end,strand,class,genes_in_TU,start_if_select_as_start,cannot_as_start,Uniprot,Description,Gene,Cellular protein location (according to www.uniprot.org),A14.07036,A14.07037,A14.07038,kDa/Protein,cumulativeAbundance
0,PM00249,148,189,1,promoter,"[b0001, b0002, b0003, b0004]",148,0,,,,,,,,1.000000e+01,10.000000
1,b0001,190,255,1,gene,,190,0,P0AD86,,,,,,,4.591656e-07,10.000000
2,b0002,337,2799,1,gene,,337,0,P00561,,,,,,,4.591656e-07,10.000001
3,b0003,2801,3733,1,gene,,2801,0,P00547,Homoserine kinase OS=Escherichia coli (strain ...,thrB,Cytoplasm,0.033131,0.032556,0.033302,4.591656e-07,10.000001
4,b0004,3734,5020,1,gene,,3734,0,P00934,,,,,,,4.591656e-07,10.000002
5,b0005,5234,5530,1,gene,,5234,0,P75616,,,,,,,4.591656e-07,10.000002
6,b0006,5683,6459,-1,gene,,5683,0,P0A8I3,,,,,,,4.591656e-07,10.000003
7,b0007,6529,7959,-1,gene,,6529,0,P30143,,,,,,,4.591656e-07,10.000003
8,PM0-9956,8191,8237,1,promoter,[b0008],8191,0,,,,,,,,1.000000e+01,20.000003
9,b0008,8238,9191,1,gene,,8238,0,P0A870,Transaldolase B OS=Escherichia coli (strain K1...,talB,Cytoplasm,0.620909,0.622423,0.610794,4.591656e-07,20.000004


In [61]:
import pandas
abundance_f = 'data/Ecoli/{}_cumulative_mass.tab'.format(bw_glucose[0])
cum_abundance = pd.read_table(abundance_f, usecols=['gene_or_promoter', 'cumulativeMass']).set_index('gene_or_promoter')
cum_abundance.loc['b3322':'b3340']

Unnamed: 0_level_0,cumulativeMass
gene_or_promoter,Unnamed: 1_level_1
b3322,121.055606
b3323,121.055606
PM-8819,121.055607
PM0-9514,121.055607
b3324,121.055607
b3325,121.055608
b3326,121.055608
b3327,121.055609
b3328,121.055609
b3329,121.05561


In [115]:
from IPython.display import display, Image, HTML
solution = pd.read_table('out/local_result_essential.tab',index_col=0)
solution['end'] = solution['end'].str.lstrip('u_G_')
solution['start'] = solution['start'].str.lstrip('u_G_')
display( solution)

Unnamed: 0,end,start,status
0,b3340,b3322,Optimal
1,b0971,b0955,Optimal
2,b1636,b4493,Optimal
3,PM00633,PM0-10091,Optimal
4,b3559,b3465,Optimal
5,PM0-5862,PM00001,Optimal
6,b2779,b2765,Optimal
7,b0452,b0433,Optimal
8,PM324,b1289,Optimal
9,b2231,PM00652,Optimal


In [25]:
from IPython.display import display, Image, HTML
solution = pd.read_table('out/local_result_essential.tab')
solution['end'] = solution['end'].str.lstrip('u_G_')
solution['start'] = solution['start'].str.lstrip('u_G_')
display( solution)

Unnamed: 0,end,start,status
0,b3340,b3322,Optimal
1,b0971,b0955,Optimal
2,PM0-7141,b1913,Optimal
3,b3833,PM00100,Optimal
4,PM00633,PM0-10091,Optimal
5,b0452,b0433,Optimal
6,b3559,b3465,Optimal
7,b1636,b4493,Optimal
8,b4005,b3998,Optimal
9,b2779,b2765,Optimal


In [26]:
from IPython.display import Markdown
genome_view = []
mass_reclaimed = []
for i in solution.index:
    start, end = solution.loc[i, 'start'], solution.loc[i,'end']
    gv = ecoli.set_index('gene_or_promoter').loc[start:end].copy()
    mass_reclaimed.append(gv.loc[end,'cumulativeMass'] - gv.loc[start, 'cumulativeMass'])
    gv[r'mass reclaimed $(fg/cell)$'] = mass_reclaimed[i]
    gv['SolutionOrder'] = len(solution.index) - i
    genome_view.append( gv )
genome_view = pd.concat(genome_view)
genome_view.index.name = '$gene_or_promoter'
genome_view.to_csv('out/genome_view.tab',sep='\t',columns=['SolutionOrder', r'mass reclaimed $(fg/cell)$', bw_glucose[0], 'cumulativeMass'])
print('\n'.join([str(m) for m in mass_reclaimed]))
display(Markdown(r'Total mass reclaimed = %0.2g $fg/cell$' % sum(mass_reclaimed)))
HTML(genome_view.to_html())


24.1442705355
6.88266638029
0.910191488255
0.0949279533945
4.19366640413
2.83457780068
2.35459208769
1.42959234843
1.19512854724e-05
2.9846500906
2.63181406031


Total mass reclaimed = 48 $fg/cell$

Unnamed: 0_level_0,start,end,strand,class,genes_in_TU,start_if_select_as_start,cannot_as_start,Uniprot,Description,Gene,Cellular protein location (according to www.uniprot.org),A14.07036,A14.07037,A14.07038,cumulativeMass,mass reclaimed $(fg/cell)$,SolutionOrder
$gene_or_promoter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
b3322,3453508,3453927,-1,gene,,3453508,0,P03825,,,,4.591656e-07,,,121.055607,24.144271,11
b3323,3453929,3455398,-1,gene,,3453929,0,P45756,,,,4.591656e-07,,,121.055607,24.144271,11
PM-8819,3455399,3455634,-1,promoter,"[b3322, b3323]",3455399,0,,,,,4.591656e-07,,,121.055607,24.144271,11
PM0-9514,3455482,3455577,1,promoter,"[b3335, b3334, b3333, b3332, b3331, b3330, b33...",3455634,1,,,,,4.591656e-07,,,121.055608,24.144271,11
b3324,3455578,3456393,1,gene,,3455634,0,P45757,,,,4.591656e-07,,,121.055608,24.144271,11
b3325,3456377,3458329,1,gene,,3456393,0,P45758,,,,4.591656e-07,,,121.055609,24.144271,11
b3326,3458339,3459820,1,gene,,3458339,0,P45759,,,,4.591656e-07,,,121.055609,24.144271,11
b3327,3459817,3461013,1,gene,,3459820,0,P41441,,,,4.591656e-07,,,121.05561,24.144271,11
b3328,3461023,3461460,1,gene,,3461023,0,P41442,,,,4.591656e-07,,,121.05561,24.144271,11
b3329,3461468,3461977,1,gene,,3461468,0,P41443,,,,4.591656e-07,,,121.055611,24.144271,11


In [31]:
W3110toMG1655 = pd.read_table('data/Ecoli/lambdaRed_W3110_MG1655.locus.tab', na_values='NIL',index_col='Blattner')
W3110toMG1655 = W3110toMG1655.loc[sorted(W3110toMG1655.index.dropna())]
W3110toMG1655

Unnamed: 0_level_0,E_coli_K12_W3110_lambdaRed,E_coli_K12_W3110,W3110-Name,W3110-Accession,MG1655-ID,W3110-startbase,W3110-endbase,W3110-strand,MG1655-startbase,MG1655-endbase,MG1655-strand,sort_order
Blattner,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
b0002,W3110_lambdaRed.CDS.1,JW0001,thrA,ECK0002,EG10998,337,2799,+,337.0,2799.0,+,1
b0003,W3110_lambdaRed.CDS.2,JW0002,thrB,ECK0003,EG10999,2801,3733,+,2801.0,3733.0,+,2
b0004,W3110_lambdaRed.CDS.3,JW0003,thrC,ECK0004,EG11000,3734,5020,+,3734.0,5020.0,+,3
b0006,W3110_lambdaRed.CDS.5,JW0005,yaaA,ECK0006,EG10011,5683,6459,-,5683.0,6459.0,-,5
b0007,W3110_lambdaRed.CDS.6,JW0006,yaaJ,ECK0007,EG11555,6529,7959,-,6529.0,7959.0,-,6
b0008,W3110_lambdaRed.CDS.7,JW0007,talB,ECK0008,EG11556,8238,9191,+,8238.0,9191.0,+,7
b0009,W3110_lambdaRed.CDS.8,JW0008,mog,ECK0009,EG11511,9306,9893,+,9306.0,9893.0,+,8
b0010,W3110_lambdaRed.CDS.9,JW0009,satP,ECK0010,EG11512,9928,10494,-,9928.0,10494.0,-,9
b0011,W3110_lambdaRed.CDS.10,JW0010,yaaW,ECK0011,G6082,10643,11356,-,10643.0,11356.0,-,10
b0013,W3110_lambdaRed.CDS.11,JW0012,yaaI,ECK0013,G8202,11382,11786,-,11382.0,11786.0,-,11
