## Replicating the results of Becker et al. 2011

The issue with the model "iCGB21FR.xml" is that its authors have named the genes in the following format **"lcl_{locus}_prot_{protein_id}"**. An exemple name of the genes from the model would look like this: lcl_NC_006958_1_prot_WP_003859586_1_2138

In which **"NC_006958"** is the locus represent the complete sequence of Corynebacterium glutamicum ATCC 13032, and **WP_XXXXXXXXX** represent the protein_id of its genes.


**This results in adding an extra step of finding the gene name for the concerned protein ids.**


In [1]:
%%capture
%run gene_annotation.ipynb

from colorama import init, Fore, Style
from cobra import Model, Reaction
from cobra.io import read_sbml_model
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

model = read_sbml_model('Files/iCGB21FR.xml')

lys_exp_rxn = Reaction("LYSt3r")
lys_exp_rxn.name = "L lysine transport out via proton antiport reversible"
lys_exp_rxn.add_metabolites({model.metabolites.h_e: -1, model.metabolites.lys__L_c: -1, model.metabolites.h_c: 1, model.metabolites.lys__L_e: 1})
lys_exp_rxn.gene_reaction_rule = "( Mbar_A2583 or Mbar_A3244 )"
lys_exp_rxn.lower_bound, lys_exp_rxn.upper_bound = 0, 1000

model.add_reactions([lys_exp_rxn])

### Annotating the genes

In [2]:
%run gene_annotation.ipynb
genes_to_find = ["lysC", "pck", "dapB", "lysA", "hom", "pyc", "icd", "fbp", "tkt"]
found_matches = findGenes(genes_to_find, model)

genes_map = printReport(found_matches)
genes_map

Total number of unique genes in the model is: 786 genes

RefSeq data from  C. glutamicum ATCC 13032 is already stored locally: 786 genes

MATCH: dapB ->

    ● WP_011014794 [4-hydroxy-tetrahydrodipicolinate reductase]

MATCH: lysA ->

    ● WP_011014180 [diaminopimelate decarboxylase]

MATCH: tkt ->

    ● WP_011014456 [transketolase]

KEGG data from other C. glutamicum strains is already stored locally: 10 strains

    Homologous genes for fbp: 12 results
    Homologous genes for icd: 3 results
    Homologous genes for pyc: 2 results
    Homologous genes for hom: 6 results
    Homologous genes for lysC: 4 results
    Homologous genes for pck: 3 results

Fetching AA sequence for the homologous genes: 

    ycgB (pvd:CFBP1590_0653): MTAREQKRQPISTGSEWTFELIQTY...
    Fbp (cgu:WA5_0976): MNLKNPETPDRNLAMELVRVTEAAA...
    fbpA1 (cgf:CGUA_01390): MKLLRKLAAPAAIAIALGAGMSGVA...
    fbpA2 (cgf:CGUA_05650): MNRLPTRAGAVLTAVTVASLGVSGA...
    fbpA3 (cgf:CGUA_05655): MNTRTTVRYAAALVVAAVAAAPFPA...
    f

{'dapB': 'lcl_NC_006958_1_prot_WP_011014794_1_1871',
 'lysA': 'lcl_NC_006958_1_prot_WP_011014180_1_1145',
 'tkt': 'lcl_NC_006958_1_prot_WP_011014456_1_1529',
 'lysC': 'lcl_NC_006958_1_prot_WP_003855724_1_250',
 'pck': 'lcl_NC_006958_1_prot_WP_011015446_1_2735',
 'hom': 'lcl_NC_006958_1_prot_WP_003854900_1_1147',
 'pyc': 'lcl_NC_006958_1_prot_WP_011013816_1_666',
 'icd': 'lcl_NC_006958_1_prot_WP_011013800_1_641',
 'fbp': 'lcl_NC_006958_1_prot_WP_003856830_1_987'}

In [3]:
genes = {
    'dapB': 'lcl_NC_006958_1_prot_WP_011014794_1_1871',
    'lysA': 'lcl_NC_006958_1_prot_WP_011014180_1_1145',
    'tkt': 'lcl_NC_006958_1_prot_WP_011014456_1_1529',
    'lysC': 'lcl_NC_006958_1_prot_WP_003855724_1_250',
    'pck': 'lcl_NC_006958_1_prot_WP_011015446_1_2735',
    'hom': 'lcl_NC_006958_1_prot_WP_003854900_1_1147',
    'pyc': 'lcl_NC_006958_1_prot_WP_011013816_1_666',
    'icd': 'lcl_NC_006958_1_prot_WP_011013800_1_641',
    'fbp': 'lcl_NC_006958_1_prot_WP_003856830_1_987'}

### Implementation of the genetic modifications

@TODO: make it as table 

**LYS1:** 
- the mutation realsed **lysC** from the feedback inhibition => **skipped => gene regulation**
**LYS2:**
- having a second copy of **ddh** => the reaction coded by "ddh" (meso-diaminopimelate D-dehydrogenase: EC 1.4.1.16) does not exist in the model

**LYS3:**
- knockout of **pck** => **implemented by knocking out pck**

**LYS4:**
- over expression of **dapB** => **implemented by ↑ flux**
**LYS5:**
- additional copy of **lysA** => **implemented by ↑ flux**

**LYS6:**
- over expression of **lysC** => **implemented by ↑ flux**

**LYS7:**
- leaky variant of **hom** gene => **implemented by ↑ flux**

**LYS8-9**
- imporoving  the activity of **pyc** => **implemented by ↑ flux**

**LYS10:**
- overexpression of FBPase **fbp**  =>  **implemented by ↑ flux**

**LYS11:**
- overexpression  **tkt** => **implemented by ↑ flux**



In [6]:
def optimizeModel(model, full_gene_name, fold_change=None, knock_out_gene=False):
    
    new_model = model.copy()
    if fold_change is not None:
        
        for rxn in model.genes.get_by_id(full_gene_name).reactions:
            reaction = model.reactions.get_by_id(rxn.id)
            if reaction.lower_bound == 0:
                reaction.lower_bound = 0.001
            bounds = (fold_change * reaction.lower_bound, fold_change * reaction.upper_bound)
            new_model.reactions.get_by_id(rxn.id).bounds = bounds

    if knock_out_gene:
        new_model.genes.get_by_id(full_gene_name).knock_out()

    new_model.reactions.Growth.bounds = (0.259193805 , 1000)
    new_model.objective = new_model.reactions.DAPDC
    solution = new_model.optimize()
    
    return new_model, solution


In [7]:
model_2, solution_2 = optimizeModel(model, genes["icd"], knock_out_gene=True)