# Introduction
Now that the metabolites have been renamed, we should ensure they all contain a chemical formula and if not assign the correct chemical formula to the compound. This will make later investigation into mass imbalances easier, as then metabolites without weights are not the cause for the problems seen. 

This will be covered in this notebook. 

In [1]:
import cobra
import pandas as pd
import cameo

In [52]:
#import model
model = cobra.io.read_sbml_model("../model/Beata_model_orig_g-thermo.xml")

We can Identify metabolties without weights in two ways: 1) memote 2) via a simple code here

Memote says that there are no metabolites without a formula. This doesn't mean they have a weight, as some formulas can contain 'R' (rest) groups that complicate calculating a mass. This doesn't have to be a problem as long as the reactions they are involved in are mass balanced. 

As an alternative, I will check to see that also here we can find there are no reactions without a chemical formula.

In [53]:
met_wo_formula =[]
for met in model.metabolites:
    try: 
        formula = met.formula
        if formula == []:
            met_wo_formula.append(met)
        else: 
            continue
    except KeyError:
        met_wo_formula.append(met)
len (met_wo_formula)

0

So it is confirmed that there are no metabolites without a chemical formula, which is good. Now I will check how many metabolites have no weight associated to them (because of an R group)

In [54]:
met_wo_weight = []
for met in model.metabolites:
    try: 
        weight = met.formula_weight
        if weight == 0:
            met_wo_weight.append(met)
        else: 
            continue     
    except AttributeError:
        met_wo_weight.append(met)
        


The element 'R' does not appear in the periodic table


invalid formula (has parenthesis) in 'C20H30N6O11PSeR(C5H8O6PR)n'


invalid formula (has parenthesis) in 'C10H18O13P2R2(C5H8O6PR)n'


invalid formula (has parenthesis) in 'C11H16N5O8P(C5H8O6PR)n(C5H8O6PR)n'


invalid formula (has parenthesis) in 'C16H18O2(C5H8)n'


invalid formula (has parenthesis) in 'C25H30N8O10(C5H7NO3)n'


invalid formula (has parenthesis) in 'C14H20O4(C5H8)n'


invalid formula (has parenthesis) in 'C14H18O4(C5H8)n'


invalid formula (has parenthesis) in '(C12H20O10)n'


invalid formula (has parenthesis) in '(C5H8O4)n'



In [55]:
len(met_wo_weight)

32

In [56]:
met_wo_weight

[<Metabolite selmethtrna_c at 0x1e6df09e848>,
 <Metabolite edhyl_e at 0x1e6df0c4448>,
 <Metabolite ps_cho_c at 0x1e6df0c4ac8>,
 <Metabolite smbdhlp_c at 0x1e6df09ed08>,
 <Metabolite lpro_c at 0x1e6df0c4748>,
 <Metabolite alpro_c at 0x1e6df0c4dc8>,
 <Metabolite f1p_e at 0x1e6df09eb08>,
 <Metabolite glu__L_e at 0x1e6d7fad848>,
 <Metabolite lys__L_e at 0x1e6df0cb408>,
 <Metabolite adhlam_e at 0x1e6df0c9cc8>,
 <Metabolite edhyl_c at 0x1e6df0de7c8>,
 <Metabolite RNA_c at 0x1e6df0deec8>,
 <Metabolite dhmtp_c at 0x1e6e0440248>,
 <Metabolite hkmpp_c at 0x1e6e0440c08>,
 <Metabolite myinp_c at 0x1e6e043bc88>,
 <Metabolite menqui_c at 0x1e6dcfcb708>,
 <Metabolite ftethplg_c at 0x1e6dcffd548>,
 <Metabolite mantri_c at 0x1e6dcfdc6c8>,
 <Metabolite meli_c at 0x1e6dcff4248>,
 <Metabolite isomalt_c at 0x1e6dcfe9d88>,
 <Metabolite qh2_c at 0x1e6dcfc02c8>,
 <Metabolite cellulose_c at 0x1e6dcfd2488>,
 <Metabolite treh6p_c at 0x1e6dcfdfd08>,
 <Metabolite ubiquin_c at 0x1e6dcfd26c8>,
 <Metabolite pento_c a

So there are 32 metabolites without a weight, as they contain some form of an R group. 

Some of these metabolites are quire complex and so it makes sense they have an R group. For some others however, it is not so logical and so these can already be inspected and fixed a bit before continuing further. The inspecting leads to changing of the chemical formula, or in some cases even removing the metabolite, where necessary. 

In [57]:
#lys__L_e is only in an exchange reaction and not connected to the cellular compartment, so can be removed.
remove = ["lys__L_e"]
remove_metabolites = [model.metabolites.get_by_id(mid) for mid in remove]
model.remove_metabolites(remove_metabolites)

In [58]:
model.metabolites.meli_c.formula

''

So it seems that there are some compounds without a chemical formula... so something went wrong in the original script, and we should add formulas for these. I will do so by exporting the list of metabolites and their formulas out, and then in exel manually adding them. Then I will reimport the dataframe and from there assign the new chemical formulas. 

In [59]:
met_wo_weight_ID = []
for met in model.metabolites:
    if met in met_wo_weight:
        met_wo_weight_ID.append(met.id)
    else:
        continue
len(met_wo_weight_ID)

31

In [60]:
met_wo_weight_formula = []
for met in model.metabolites:
    if met in met_wo_weight:
        met_wo_weight_formula.append(met.formula)
    else:
        continue
len(met_wo_weight_formula)

31

In [61]:
met_wo_weight_name = []
for met in model.metabolites:
    if met in met_wo_weight:
        met_wo_weight_name.append(met.name)
    else:
        continue
len(met_wo_weight_name)

31

In [62]:
mets_wo_weight_df = pd.DataFrame({'Metabolite' : met_wo_weight_ID, 'name' : met_wo_weight_name, 'formula': met_wo_weight_formula})    
mets_wo_weight_df.to_csv('../../Databases/mets_wo_weight.csv', index=False, encoding='utf-8')

In [100]:
#Fix them in the csv file and reimport to assign better formulas to them and see if it improves mass balance in memote or not!
mets_w_weight = pd.read_excel("../../Databases/mets_wo_weight_new.xlsx")
mets_w_weight

Unnamed: 0,Metabolite,name,formula,formula_new,Notes
0,selmethtrna_c,Selenomethionyl-tRNA(Met),C20H30N6O11PSeR(C5H8O6PR)n,,
1,edhyl_e,Dihydrolipoamide-E,C10H18O13P2R2(C5H8O6PR)n,,Check in the reaction if this should really be...
2,ps_cho_c,Phosphatidylserine,C10H18O13P2R2(C5H8O6PR)n,C8H12NO10P,Name may be misleading here?
3,smbdhlp_c,S-(2-Methylbutanoyl)-dihydrolipoamide-E,C10H18O13P2R2(C5H8O6PR)n,,Check in the reaction if this should really be...
4,lpro_c,Lipoylprotein,C10H18O13P2R2(C5H8O6PR)n,C8H14NOS2R,
5,alpro_c,Phosphatidylserine,C10H18O13P2R2(C5H8O6PR)n,C9H18NOS2R,
6,f1p_e,Phosphatidylserine,C10H18O13P2R2(C5H8O6PR)n,C6H13O9P,
7,glu__L_e,Phosphatidylserine,C10H18O13P2R2(C5H8O6PR)n,C5H8NO4,
8,adhlam_e,S-Acetyldihydrolipoamide-E,C10H18O13P2R2(C5H8O6PR)n,,
9,edhyl_c,Dihydrolipoamide-E,C10H18O13P2R2(C5H8O6PR)n,,


In [98]:
#now, for metabolites in this list, take the new formula, unless there is none.
for met in model.metabolites:
    if met.id in met_wo_weight_ID:
        found = mets_w_weight[mets_w_weight["Metabolite"]==met.id]
        if found["formula_new"].isna().values[0] == False:
            try: 
                met.formula = found["formula_new"].values[0]
            except: print(met.id)
        else:
            continue
    else: 
        continue

In [101]:
model.metabolites.epimelbio_c

0,1
Metabolite identifier,epimelbio_c
Name,epimelibiose
Memory address,0x01e6dcb7fe08
Formula,C12H22O11
Compartment,c
In 1 reaction(s),1354


In doing this, I found some wrongly named metabolites, and have fixed that below. 

In [15]:
model.metabolites.alpro_c.name = 'S8-aminomethyldihydrolipoylprotein'

In [23]:
model.metabolites.f1p_e.name = 'D-Fructose_1-phosphate'

In [26]:
model.metabolites.glu__L_e.name='L-Glutamate'

For comparison of the formula for RNA, for example, I will load the E. coli iML1515 model, and see what the formula is there, to make it the same in our model. 

In [30]:
iML1515_model = cameo.load_model("iML1515")

Hmm I cannot find the RNA in the iML1515 model... something to ask Denis/Nico/Ben about. 

In [102]:
#save the model
cobra.io.write_sbml_model(model,"../model/Beata_model_orig_g-thermo.xml")

# Conclusion
Where possible, I have tried to fix the metabolites that do not have a formula associated to them. This is saved and can be run through memote to check if anything gets a lot worse or not. It seems nothing is broken. 