### Intermezzo
I observed that in the model metabolite trnaglu_c should actually be called atp_c. here i'll quickly fix that.

In [1]:
import cameo
import pandas as pd
import cobra.io
import escher
from cobra import Model, Reaction, Metabolite


In [2]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [123]:
model.metabolites.trnaglu_c.id = 'atp_c'
model.metabolites.trnaglu_e.id = 'atp_e'

In [124]:
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

# Introduction
Since Beata has finished her thesis, there have been some interim people working on the model, resulting in a different version , already previously referred to as 'Matteo's model'. We have previously investigated the difference between the two models, which led to the decision to work with the version Beata had after her PhD. 

We saw that between the models there are some reactions that have been added or removed which may make sense. In this notebook, I will re-add the unique reactions from Matteo's version to Beata's version (one by one and cumulatively) and investigate what the difference in biomass prediction to determine if we should keep it in the model or not. 

N.B.: here we treat both Beata's original model and Matteo's model as databases, because of the changes in ID's, we will compare the two models with the numerical ID system. After analysis, we will add the reactions back into the model with the correct ID's (i.e. '../model/g-thermo.xml')

In [5]:
#import the two database models for comparison
matteo = cobra.io.read_sbml_model('../databases/g-thermo-Matteo.xml')

In [6]:
beata = cobra.io.read_sbml_model('../databases/g-thermo-Beata-2015.xml')

## Metabolites
What metabolites are unique between the two models?

In [7]:
#define intersection function

def intersection(lst1, lst2): 
    lst3 = [value for value in lst1 if value in lst2] 
    return lst3                     

In [8]:
#which metabolites do they have in common
common_met = intersection(matteo.metabolites,beata.metabolites)
len(common_met)

843

In [9]:
#which metabolites are in beatas version that are not in the matteo version
unique_met_beata = []
for met in beata.metabolites:
    if met in matteo.metabolites:
        continue
    else:
        unique_met_beata.append(met)

print (len(unique_met_beata))

1436


In [10]:
#which metabolites are in the matteo version that are not in the original beata version?
unique_met_matteo = []
for met in matteo.metabolites:
    if met in beata.metabolites:
        continue
    else:
        unique_met_matteo.append(met)

print (len(unique_met_matteo))

50


So in Beata's version there are 1436 metabolites that are not in Matteo's version, and there are 50 unique in Matteo's version. 
When Beata's version is run through memote, it is clear that there are >1400 unconnected metabolites (likely from the automatic annotation process). So I will now check which of the unique metabolites are also the unconnected ones, and so can be ignored.

In [11]:
#make a list of the metabolites memote says are unconnected in the original version
unconnected_memote = ["10598_c","10556_c","10555_c","10551_c","10550_c","660_c","1414_c","647_c","740_c","595_c","143_c","399_c","1227_c","372_c","26_c","173_c","313_c","358_c","1652_c","1653_c","1654_c","2009_c","905_c","1022_c","1855_c","1857_c","1951_c","1460_c","1686_c","2161_c","1650_c","670_c","2185_c","2187_c","2056_c","2057_c","2058_c","775_c","793_c","874_c","835_c","837_c","1300_c","855_c","1105_c","759_c","877_c","845_c","390_c","558_c","483_c","489_c","1878_c","1879_c","1656_c","1657_c","1658_c","1875_c","1712_c","2167_c","2166_c","2165_c","2164_c","2163_c","2162_c","2160_c","2169_c","2168_c","1122_c","1123_c","1120_c","1121_c","1126_c","1127_c","1124_c","1125_c","1128_c","1129_c","2343_c","2342_c","2341_c","2340_c","1838_c","1839_c","1830_c","1831_c","1832_c","1833_c","1834_c","1835_c","1836_c","1837_c","941_c","943_c","1580_c","1581_c","1582_c","1583_c","1584_c","1585_c","1238_c","1587_c","1236_c","1589_c","1230_c","1904_c","1905_c","1906_c","1907_c","1900_c","1901_c","1902_c","1903_c","970_c","1908_c","1909_c","1444_c","1005_c","1059_c","839_c","803_c","1009_c","1457_c","1106_c","1455_c","164_c","1101_c","1102_c","1103_c","764_c","765_c","760_c","761_c","763_c","768_c","384_c","385_c","382_c","542_c","2112_c","2113_c","2110_c","2111_c","2116_c","2117_c","2114_c","2115_c","2118_c","2119_c","2028_c","2029_c","2026_c","2027_c","2024_c","2025_c","2022_c","2023_c","2020_c","2021_c","1603_c","1350_c","1353_c","1352_c","1355_c","1606_c","1130_c","1133_c","1135_c","1134_c","1137_c","1136_c","1139_c","1138_c","1604_c","2338_c","2339_c","2332_c","2333_c","2330_c","2331_c","2336_c","2337_c","2334_c","2335_c","1218_c","1605_c","2246_c","2247_c","2244_c","2245_c","2242_c","1228_c","2240_c","2241_c","1597_c","1596_c","1226_c","1221_c","1220_c","1223_c","2249_c","1913_c","1912_c","1911_c","1910_c","1917_c","1916_c","1915_c","1914_c","1919_c","1918_c","2259_c","2318_c","1827_c","841_c","840_c","842_c","844_c","1829_c","629_c","628_c","625_c","623_c","622_c","713_c","714_c","717_c","538_c","444_c","442_c","2101_c","2100_c","2103_c","2102_c","2105_c","2104_c","1850_c","1673_c","1854_c","1993_c","1776_c","1775_c","1990_c","2038_c","2035_c","2034_c","2037_c","2036_c","2031_c","2030_c","2033_c","2032_c","1698_c","1699_c","1694_c","1695_c","1696_c","1697_c","1690_c","1691_c","1693_c","1108_c","1109_c","1458_c","1459_c","1456_c","1454_c","1107_c","1452_c","1453_c","1450_c","1451_c","2328_c","2321_c","2320_c","2323_c","2322_c","2325_c","2327_c","2326_c","2255_c","1219_c","2257_c","2256_c","2250_c","1210_c","1211_c","1216_c","1217_c","1612_c","1070_c","1071_c","1073_c","1078_c","1978_c","1676_c","1677_c","1672_c","1670_c","1671_c","853_c","1858_c","856_c","857_c","1678_c","1679_c","1992_c","639_c","636_c","817_c","703_c","700_c","701_c","707_c","704_c","708_c","528_c","2227_c","522_c","521_c","1537_c","2228_c","2229_c","1994_c","1532_c","632_c","2138_c","2139_c","2134_c","2135_c","2136_c","2137_c","2130_c","2131_c","2132_c","2133_c","2248_c","2008_c","2000_c","2001_c","2002_c","2003_c","2004_c","2005_c","2006_c","2007_c","1689_c","1688_c","1683_c","1682_c","1681_c","1680_c","1687_c","1684_c","2314_c","2315_c","1119_c","2317_c","1449_c","1448_c","2312_c","2313_c","1445_c","1112_c","1447_c","1110_c","1117_c","2319_c","1114_c","1207_c","604_c","1205_c","998_c","1203_c","1202_c","1201_c","1200_c","991_c","1209_c","1208_c","2261_c","2262_c","2263_c","2264_c","2267_c","2268_c","2269_c","865_c","1060_c","1067_c","862_c","861_c","860_c","1069_c","1665_c","1664_c","1667_c","1666_c","1661_c","1660_c","1847_c","1662_c","1849_c","1848_c","1669_c","1668_c","1586_c","1588_c","1237_c","739_c","738_c","733_c","732_c","519_c","518_c","285_c","289_c","463_c","462_c","460_c","468_c","191_c","1983_c","2129_c","2128_c","2122_c","2121_c","2120_c","2127_c","2126_c","2017_c","2016_c","2015_c","2014_c","2013_c","2012_c","2011_c","2019_c","2018_c","2303_c","2302_c","2301_c","2300_c","2307_c","2306_c","2305_c","2304_c","1470_c","1471_c","2309_c","2308_c","1474_c","1475_c","1476_c","1477_c","936_c","1751_c","1716_c","1782_c","985_c","1780_c","980_c","981_c","1784_c","1785_c","1788_c","1789_c","1273_c","1276_c","1277_c","1274_c","1275_c","939_c","1278_c","2279_c","2278_c","2276_c","2275_c","2273_c","2272_c","2271_c","2270_c","871_c","873_c","1096_c","1097_c","1094_c","1095_c","1092_c","1093_c","1090_c","1091_c","1651_c","1655_c","1308_c","1309_c","1874_c","1659_c","1876_c","1877_c","1870_c","1871_c","1872_c","1873_c","1896_c","1897_c","1894_c","1895_c","1892_c","1893_c","1890_c","1891_c","1898_c","1899_c","1865_c","728_c","720_c","724_c","725_c","727_c","500_c","506_c","507_c","509_c","291_c","294_c","478_c","474_c","476_c","186_c","180_c","183_c","2062_c","2063_c","2060_c","2061_c","2069_c","1467_c","1466_c","1465_c","1464_c","1463_c","1462_c","1461_c","1141_c","1490_c","1469_c","1468_c","1143_c","1145_c","1495_c","1791_c","1790_c","1793_c","1792_c","1795_c","1794_c","1796_c","1261_c","1260_c","1263_c","1269_c","795_c","2208_c","2209_c","1247_c","2203_c","2200_c","2201_c","2206_c","2207_c","2204_c","2205_c","1675_c","889_c","1088_c","1085_c","1087_c","1086_c","1083_c","1869_c","1868_c","1319_c","1318_c","1649_c","1648_c","1647_c","1862_c","1861_c","1644_c","1867_c","1310_c","1641_c","1885_c","1884_c","1887_c","1886_c","1881_c","1880_c","1883_c","1882_c","1889_c","1888_c","1089_c","1113_c","884_c","887_c","880_c","663_c","662_c","665_c","664_c","1646_c","1635_c","1860_c","1311_c","1866_c","1313_c","319_c","64_c","2070_c","2073_c","2072_c","2075_c","2074_c","2077_c","2076_c","2079_c","2078_c","1412_c","1413_c","1410_c","1411_c","1416_c","1417_c","1415_c","1418_c","1419_c","1772_c","1258_c","1528_c","1529_c","1525_c","1522_c","1253_c","2219_c","2218_c","1737_c","2211_c","2210_c","2213_c","2212_c","2215_c","2214_c","2217_c","2216_c","816_c","1035_c","897_c","893_c","890_c","1328_c","1329_c","1638_c","1639_c","1632_c","1633_c","1630_c","1324_c","1637_c","1326_c","1327_c","811_c","2316_c","1118_c","2310_c","2311_c","1746_c","1747_c","1744_c","1745_c","1743_c","1740_c","1741_c","1111_c","1748_c","1749_c","1446_c","1441_c","1440_c","1443_c","1442_c","1645_c","279_c","277_c","2094_c","672_c","677_c","674_c","675_c","679_c","418_c","419_c","416_c","417_c","2048_c","2049_c","2044_c","2045_c","2046_c","2047_c","2040_c","2041_c","2042_c","2043_c","999_c","1401_c","1400_c","1403_c","1402_c","1405_c","1404_c","1407_c","1406_c","1409_c","1408_c","1616_c","2198_c","2199_c","2192_c","2193_c","2190_c","2191_c","2196_c","2197_c","2194_c","2195_c","2224_c","2225_c","2226_c","1248_c","2220_c","2221_c","2222_c","2223_c","1535_c","1534_c","1241_c","1536_c","1531_c","1530_c","1533_c","1244_c","1006_c","1335_c","1629_c","1628_c","1621_c","1620_c","1623_c","1622_c","1625_c","1624_c","1627_c","1626_c","1590_c","934_c","1757_c","1756_c","1750_c","1753_c","932_c","1758_c","247_c","246_c","244_c","1345_c","649_c","648_c","643_c","640_c","426_c","425_c","424_c","423_c","429_c","428_c","863_c","338_c","1841_c","1840_c","1843_c","1842_c","1845_c","1844_c","1663_c","1846_c","1702_c","2059_c","1703_c","2053_c","2052_c","2051_c","2050_c","1700_c","1706_c","1707_c","903_c","1435_c","1436_c","1437_c","1430_c","1431_c","1432_c","2189_c","2188_c","2181_c","2180_c","2183_c","2182_c","2184_c","2186_c","1508_c","1509_c","2231_c","2230_c","2237_c","2236_c","2235_c","2234_c","1500_c","1501_c","2239_c","2238_c","1504_c","1505_c","1506_c","1507_c","1290_c","1291_c","1293_c","1346_c","1611_c","1613_c","1348_c","1349_c","1618_c","1619_c","1761_c","924_c","1989_c","1984_c","1985_c","1986_c","1987_c","1980_c","1981_c","1982_c","929_c","1966_c","1967_c","1964_c","1965_c","1962_c","1963_c","1960_c","1961_c","1968_c","1969_c","255_c","2232_c","782_c","780_c","784_c","1538_c","658_c","659_c","651_c","654_c","655_c","656_c","657_c","431_c","432_c","1503_c","434_c","435_c","324_c","326_c","327_c","321_c","322_c","1429_c","1428_c","1423_c","1422_c","1421_c","1420_c","1427_c","1426_c","1425_c","1424_c","1204_c","1517_c","1516_c","1515_c","1514_c","1513_c","1511_c","1510_c","1519_c","1518_c","2080_c","2081_c","2082_c","2083_c","2084_c","2085_c","2086_c","2087_c","2088_c","2089_c","1614_c","1289_c","1615_c","1286_c","1283_c","1282_c","1281_c","1617_c","1610_c","1351_c","1602_c","1601_c","1600_c","1607_c","1354_c","1357_c","1356_c","1359_c","1358_c","1609_c","1608_c","1197_c","1199_c","1198_c","1674_c","912_c","1999_c","917_c","1779_c","1778_c","1777_c","1991_c","1996_c","1995_c","2243_c","1599_c","1598_c","1975_c","1974_c","1976_c","1971_c","1970_c","1973_c","1972_c","1979_c","1595_c","1594_c","1593_c","1592_c","1591_c","1222_c","682_c","681_c","686_c","689_c","359_c","352_c","354_c","2280_c","2288_c","1760_c","2289_c","920_c","1562_c","1563_c","1560_c","1561_c","1564_c","1565_c","2099_c","2098_c","2097_c","2096_c","2095_c","1828_c","2093_c","2092_c","2091_c","2090_c","1368_c","1369_c","1364_c","1365_c","1366_c","1367_c","1360_c","1361_c","1362_c","1363_c","1168_c","1164_c","1163_c","1386_c","1387_c","1384_c","1385_c","1382_c","1383_c","1380_c","1388_c","1389_c","1709_c","1701_c","1704_c","1705_c","1940_c","1941_c","1943_c","1944_c","1945_c","1946_c","1947_c","1948_c","1949_c","1010_c","233_c","129_c","694_c","696_c","697_c","690_c","691_c","692_c","693_c","343_c","340_c","341_c","347_c","345_c","1524_c","1257_c","1250_c","1523_c","1520_c","1521_c","1642_c","627_c","2158_c","2159_c","2156_c","2157_c","2154_c","2155_c","2152_c","2153_c","2150_c","2151_c","1570_c","1573_c","1572_c","1575_c","1574_c","1577_c","1576_c","1578_c","1379_c","1378_c","1373_c","1372_c","1371_c","1370_c","1377_c","1376_c","1375_c","1374_c","1395_c","1394_c","1397_c","1391_c","1390_c","1393_c","1399_c","1398_c","1719_c","1718_c","1711_c","1710_c","1713_c","972_c","1715_c","1714_c","1717_c","976_c","1732_c","1730_c","1959_c","1958_c","1957_c","1956_c","1955_c","1954_c","1953_c","1952_c","1736_c","951_c","805_c","807_c","1003_c","809_c","757_c","756_c","751_c","750_c","1320_c","1631_c","1636_c","1325_c","1634_c","594_c","592_c","591_c","373_c","379_c","575_c","574_c","576_c","570_c","572_c","579_c","1317_c","2149_c","2148_c","2145_c","2144_c","2147_c","2146_c","2141_c","2140_c","2143_c","2142_c","1546_c","1547_c","1149_c","1498_c","1499_c","1140_c","1493_c","1142_c","1491_c","1496_c","1497_c","1494_c","1147_c","1478_c","1029_c","1816_c","1817_c","1814_c","1815_c","1812_c","1813_c","1810_c","1811_c","1724_c","1725_c","1726_c","1727_c","1720_c","1722_c","1723_c","1729_c","967_c","965_c","1473_c","1928_c","1929_c","1922_c","1923_c","1920_c","1921_c","1926_c","1927_c","1924_c","1925_c","1030_c","1032_c","748_c","749_c","746_c","747_c","588_c","582_c","584_c","585_c","586_c","587_c","361_c","362_c","366_c","367_c","369_c","562_c","563_c","569_c","1783_c","1781_c","1786_c","1787_c","982_c","1307_c","983_c","1502_c","1479_c","2178_c","2179_c","2170_c","2171_c","2172_c","2173_c","2174_c","2175_c","2176_c","2177_c","2283_c","1559_c","1558_c","2287_c","1553_c","1557_c","2233_c","1154_c","1153_c","1152_c","1489_c","1488_c","1481_c","1480_c","1483_c","1482_c","1485_c","1487_c","1809_c","1808_c","1805_c","1804_c","1807_c","1806_c","1801_c","1803_c","1802_c","957_c","956_c","1731_c","953_c","952_c","1735_c","1739_c","1738_c","1472_c","849_c","1939_c","1938_c","1931_c","1930_c","1933_c","1932_c","1935_c","1934_c","1937_c","1936_c","1028_c","1026_c","829_c","1023_c","1021_c","779_c","778_c","176_c","1098_c","1099_c","1027_c","1064_c","1082_c","1116_c","113_c","1160_c","117_c","1196_c","1206_c","1233_c","1245_c","1246_c","1249_c","125_c","1262_c","1284_c","1285_c","1292_c","1314_c","1321_c","138_c","1392_c","1396_c","1438_c","1439_c","1484_c","1492_c","1552_c","1555_c","1556_c","156_c","1579_c","158_c","162_c","166_c","167_c","1721_c","1728_c","1733_c","1734_c","174_c","1742_c","879_c","178_c","187_c","1942_c","203_c","205_c","206_c","207_c","208_c","223_c","226_c","239_c","252_c","300_c","303_c","312_c","325_c","328_c","331_c","334_c","335_c","337_c","351_c","355_c","365_c","371_c","387_c","409_c","420_c","433_c","436_c","443_c","448_c","450_c","459_c","470_c","48_c","484_c","487_c","488_c","49_c","492_c","498_c","501_c","511_c","512_c","515_c","517_c","523_c","533_c","545_c","565_c","566_c","597_c","617_c","619_c","620_c","621_c","635_c","637_c","667_c","676_c","711_c","712_c","716_c","735_c","752_c","773_c","777_c","78_c","799_c","802_c","818_c","827_c","828_c","838_c","850_c","864_c","885_c","894_c","896_c","927_c","948_c","949_c"] 
unconnected_beata  = []
for met in beata.metabolites:
    if met.id in unconnected_memote:
        unconnected_beata.append(met)
    else:
        continue

len(unconnected_beata)

1418

In [12]:
#find the intersect between the unique in original model vs the unconnected ones
len(intersection(unconnected_beata,unique_met_beata))

1399

So we can observe that the vast majority of the unconnected metabolites are the unique metabolites. There are 37 metabolites that are unique to Beata's model that are not unconnected. 
I wil gather these into a list for later too.

In [13]:
connected_unique_beata = []
for met in unique_met_beata:
    if met in unconnected_beata:
        continue
    else: 
        connected_unique_beata.append(met)

len(connected_unique_beata)

37

## Reactions
In the same way as the metabolites, we will observe the differences in reactions between the two model types. With this combined information we can decide which reaction/metabolites we should re-add into Beata's model ('../model/g-thermo.xml')

In [14]:
#reactions in common
common_rct = intersection(matteo.reactions,beata.reactions)
len(common_rct)

1115

In [15]:
#reactions unique in the beata model
unique_rct_beata = []
for rct in beata.reactions:
    if rct in matteo.reactions:
        continue
    else:
        unique_rct_beata.append(rct)

print (len(unique_rct_beata))

87


In [16]:
#reactions unique to the matteo model
unique_rct_matteo = []
for rct in matteo.reactions:
    if rct in beata.reactions:
        continue
    else:
        unique_rct_matteo.append(rct)

print (len(unique_rct_matteo))

133


So the models differ with a total of 220 reactions: 87 unique in Beata's model, 133 unique in Matteo's model. WHen inspecting these reactions, we see that quite a lot of them are transport or exchange reactions. This is a seperate issue that Martyn has looked into before and so will be fixed at a later stage anyway. We should generate lists that show just the metabolic reactions that are unique in each model.

In [17]:
#Make a list excluding the transport and exchange reactions.
unique_rct_beata_metabolic=[]
for rct in unique_rct_beata:
    if rct.id[0] == 'E':
        continue
    elif rct.id[0] =='M': 
        continue
    else:
        unique_rct_beata_metabolic.append(rct)
len(unique_rct_beata_metabolic)

37

In [18]:
# Make a list excluding the transport and exchange reactions, as this is fixed later.
unique_rct_matteo_metabolic=[]
for rct in unique_rct_matteo:
    if rct.id[0] == 'E':
        continue
    elif rct.id[0] =='M': 
        continue
    else:
        unique_rct_matteo_metabolic.append(rct)
len(unique_rct_matteo_metabolic)

35

So we see that there are significantly less reactions that are unique between the two models. We need to determine whether we should re-introduce the unique rcts from Matteos model and/or remove the rcts that are unique to Beata's model. 

We will do this in different ways:

- 1) Add each unique reaction in matteo to beatas one by one or cumulatively to determine if they 'break' the current biomass predition.
- 2) remove each unique reaction in beatas one by one or cumulatively to determine if they 'break' the current biomass predition.

With that information we can reflect on each reaction and whether we should add them into Beata's model (the ../model/g-thermo.xml file)

### 1: adding unique reactions to beata's model

In [19]:
#reference biomass predicition of beata's model
beata.optimize().objective_value

0.7765098381803307

In [20]:
#to add unique reactions one by one, and evalaute the impact on biomass:
for rct in unique_rct_matteo_metabolic:
    with beata:
        beata.add_reaction(rct)
        biomass = beata.optimize().objective_value
        if biomass > 0.778 or biomass < 0.770:
            print("Adding reaction ",rct.id, " causes biomass rate of ", biomass)
        else: 
            continue

Adding reaction  421  causes biomass rate of  0.7796168925142639
Adding reaction  1011  causes biomass rate of  0.7826421216647415
Adding reaction  200  causes biomass rate of  0.7827239468481363
Adding reaction  272  causes biomass rate of  0.7824182777062504
Adding reaction  275  causes biomass rate of  0.7809484872287626


Adding each reaction one by one doesn't cause any large shifts in biomass prediction. Now we test to see if it influences the prediction when they are added cumulatively. 

In [21]:
#this will add reactions consecutively
with beata:
    for rct in unique_rct_matteo_metabolic:
        beata.add_reaction(rct)
        biomass = beata.optimize().objective_value
        if biomass > 0.778 or biomass < 0.770:
            print("After adding reaction ",rct.id, " the biomass rate becomes ", biomass)
        else:
            continue

After adding reaction  421  the biomass rate becomes  0.7796168925142332
After adding reaction  475  the biomass rate becomes  0.779616892514233
After adding reaction  489  the biomass rate becomes  0.7796222044523642
After adding reaction  619  the biomass rate becomes  0.7796222044523645
After adding reaction  620  the biomass rate becomes  0.7864946023459768
After adding reaction  777  the biomass rate becomes  0.7864946023459773
After adding reaction  790  the biomass rate becomes  0.7864946023459773
After adding reaction  834  the biomass rate becomes  0.7864946023459773
After adding reaction  1009  the biomass rate becomes  0.7864946023459775
After adding reaction  1011  the biomass rate becomes  0.7941280191386647
After adding reaction  1838  the biomass rate becomes  0.7941280191386645
After adding reaction  1839  the biomass rate becomes  0.7941280191386639
After adding reaction  1840  the biomass rate becomes  0.7941280191386643
After adding reaction  200  the biomass rate be

Overall, one can see that adding all of the unique reactions from matteo's version increases the biomass predicition from 0.7765 to 0.8017. This is quite a minimal change, and assuming that Matteo added reactions for a reason, it would make sense to add them, after a bit of more manual inspection.

### 2. Removing unique reactions from beata's model

In [22]:
#remove each reaction one by one
for rct in unique_rct_beata_metabolic:
    with beata:
        beata.remove_reactions(rct)
        biomass = beata.optimize().objective_value
        if biomass > 0.778 or biomass < 0.770:
            print("Removing reaction ",rct.id, " causes biomass rate of ", biomass)
        else: 
            continue


need to pass in a list


need to pass in a list



Removing reaction  839  causes biomass rate of  0.7588328156552289
Removing reaction  1081  causes biomass rate of  0.7654971282632246
Removing reaction  1082  causes biomass rate of  0.7654971282632246
Removing reaction  1153  causes biomass rate of  0.7341727171599524


Again, removing single reactions doesn't change biomass drastically. We will now also check what happens when they are all removed.

In [23]:
#this will remove reactions consecutively
with beata:
    for rct in unique_rct_beata_metabolic:
        beata.remove_reactions(rct)
        biomass = beata.optimize().objective_value
        if biomass > 0.778 or biomass < 0.77:
            print("After removing reaction ",rct.id, " the biomass rate becomes ", biomass)
        else:
            continue
    print (biomass)

After removing reaction  839  the biomass rate becomes  0.756914363456746
After removing reaction  972  the biomass rate becomes  0.7569143634567466
After removing reaction  1010  the biomass rate becomes  0.7569143634567466
After removing reaction  1081  the biomass rate becomes  0.7565557371616403
After removing reaction  1082  the biomass rate becomes  0.7565557371616407
After removing reaction  1153  the biomass rate becomes  0.7133409077988079
After removing reaction  1242  the biomass rate becomes  0.7133409077988063
After removing reaction  1269  the biomass rate becomes  0.7133409077988087
After removing reaction  1286  the biomass rate becomes  0.7133409077988075
After removing reaction  1326  the biomass rate becomes  0.7133409077988083
After removing reaction  1329  the biomass rate becomes  0.7133409077988071
After removing reaction  1331  the biomass rate becomes  0.7133409077988079
After removing reaction  1332  the biomass rate becomes  0.7133409077988043
After removing 

If we remove all of the biomass reactions one by one it also only causes a very small discrepancy in biomass. Therefore it can be worth considering removing them from the model aswell. 

Finally, it would be worth testing if we add all unique reactions from matteo AND remove all the unique ones from Beata to see the impact. (See if this causes the huge discrepancy between the two model's biomass prediction).


In [24]:
with beata:
    for rct in unique_rct_matteo_metabolic:
        beata.add_reaction(rct)
        biomass = beata.optimize().objective_value
        if biomass > 0.8 or biomass < 0.7:
            print("After adding reaction ",rct.id, " the biomass rate becomes ", biomass)
        else:
            continue
    for rct in unique_rct_beata_metabolic:
        beata.remove_reactions(rct)
        biomass = beata.optimize().objective_value
        if biomass > 0.8 or biomass < 0.7:
            print("After removing reaction ",rct.id, " the biomass rate becomes ", biomass)
        else:
            continue
    print ("total reactions = ", len(beata.reactions))
    print ("final biomass rate = ", biomass)

After adding reaction  272  the biomass rate becomes  0.8015000508888145
After adding reaction  275  the biomass rate becomes  0.8017345078652987
After adding reaction  284  the biomass rate becomes  0.8017345078652994
After adding reaction  287  the biomass rate becomes  0.8017345078652993
total reactions =  1200
final biomass rate =  0.7493977327103155


So also the combination of all reactions similar to Matteo's model, doesn't recapitulate the difference in biomass prediction. It may be worth adding the reactions he found back into the model and removing the unique reactions. 

Though to be able to make the final decision of which reactions we should add back to the model, i need to inspect the reactions individually to be sure nothing strange will be added unintentionally. For that, I will export the unique reactions into two dataframes to look at them by hand.

In [25]:
#make a df of the unique reactions in matteo's model
matteo_rct_id = []
matteo_rct_kegg = []
matteo_rct_definition = []
matteo_rct_eq =[]
for rct in unique_rct_matteo_metabolic:
    matteo_rct_id.append(rct.id)
    matteo_rct_kegg.append(rct.name)
    try: 
        matteo_rct_definition.append(rct.notes['DEFINITION'])
    except KeyError:
        matteo_rct_definition.append('--')
    try:
        matteo_rct_eq.append(rct.notes['EQUATION'])
    except KeyError:
        matteo_rct_eq.append('--')

In [26]:
unique_rct_matteo_df = pd.DataFrame({'ID' : matteo_rct_id, 'KEGG' : matteo_rct_kegg, 'definition':matteo_rct_definition, 'equation':matteo_rct_eq})
unique_rct_matteo_df.to_csv('../databases/unique_rct_matteo.csv')

To facilitate converting the metabolite ID's from matteo's version to fit the active model, I will make two dataframes: 
1) with a list of all metabolite names, IDs and KEGG numbers from the working model
2) a list of all metabolite names, IDs and Kegg numbers from the matteo model.

Then i can merge the two dataframes accoring to the KEGG numbers to easily convert the metabolites from the unique reactions to match the working model.

In [27]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [28]:
#make lists for all metabolite names from the working model
model_met_ID = []
model_met_name = []
model_met_kegg = []
for met in model.metabolites:
    model_met_ID.append(met.id)
    model_met_name.append(met.name)
    try: 
        model_met_kegg.append(met.notes['KEGG'])
    except KeyError:
        model_met_kegg.append('--')   

In [29]:
#make into a dataframe
model_met_df = pd.DataFrame({'Model ID' : model_met_ID, 'Model name' : model_met_name, 'Model Kegg':model_met_kegg})
model_met_df[0:5]

Unnamed: 0,Model ID,Model name,Model Kegg
0,pyridoxal_c,Pyridoxal_C4H5N2O3R2,C00030
1,pydx5p_c,Pyridoxal_phosphate_C4H5N2O3R2,C00018
2,co2dam_c,"Cob(II)yrinate-a,c-diamide",C06504
3,adhlam_c,S-Acetyldihydrolipoamide-E,C16255
4,selmethtrna_c,Selenomethionyl-tRNA(Met),C05336


In [30]:
#make same list for the matteo version
matteo_met_ID = []
matteo_met_name = []
matteo_met_kegg = []
for met in matteo.metabolites:
    matteo_met_ID.append(met.id)
    matteo_met_name.append(met.name)
    try: 
        matteo_met_kegg.append(met.notes['KEGG'])
    except KeyError:
        matteo_met_kegg.append('--')   

In [31]:
matteo_met_df = pd.DataFrame({'Model ID' : matteo_met_ID, 'Model name' : matteo_met_name, 'Model Kegg':matteo_met_kegg})
matteo_met_df[0:5]

Unnamed: 0,Model ID,Model name,Model Kegg
0,232_e,HCO3-,C00288
1,317_e,2’-Deoxyuridine 5’-triphosphate,C00460
2,424_c,(S)-Lactaldehyde,C00424
3,3200_c,Biotin,C00120
4,3201_c,Biotinyl-5’-AMP,C05921


## Adding unique reactions to working model
In the code below I take the unique reactions from Matteo's model, convert the metabolites IDs to fit what they should be in the working model, and then add them to a reaction to the model. I also copy over the bounds associated to the reaction and the notes Matteo added to the reactions to provide sufficient information in the new reactions added. Additionally, for the new metabolites added I copy over the notes so we can try to automatically modify their IDs later. Same goed for the reactions, and then the left over IDs that do not map to BiGG can be given manual names. This part will be covered further down in the notebook. 

In [125]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [33]:
#check the correct amount of reactions are added
len(model.reactions)

1174

In [34]:
for rct in unique_rct_matteo_metabolic:
    stoich = {}
    equation = []
    new_rct = Reaction(f"N_{rct.id}") #Just give it a random number for now and fix these IDs later
    for key in rct.metabolites.keys():
        key_matteo = key.id
        try:
            metabolite_kegg = matteo.metabolites.get_by_id(key_matteo).notes['KEGG']
        except KeyError:
            metabolite_kegg = '--'
        try:
            metabolite_model_id = model_met_df.loc[model_met_df['Model Kegg'] == metabolite_kegg,'Model ID'].values[0]
        except IndexError:
            metabolite_model_id = f"N_{key.id}"
            #metabolite_model_id.notes = matteo.metabolites.get_by_id(key.id).notes
        value_matteo = rct.metabolites.get(key)
        metabolite_model_id = Metabolite(id=metabolite_model_id, compartment=key.id[1:])
        if metabolite_model_id.id[:2] in 'N_':
            metabolite_model_id.notes = matteo.metabolites.get_by_id(key.id).notes
            metabolite_model_id.compartment = 'c'
        stoich.update({metabolite_model_id: value_matteo})
#next step is to convert for each reaction the stoichiometric matrix into a reaction that plugs into the model...
    new_rct.add_metabolites(stoich)
    matteo_bounds = rct.bounds #take bounds from matteo's reactions
    new_rct.bounds = matteo_bounds
    new_rct.notes = rct.notes
    model.add_reaction(new_rct)

In [35]:
#there should be 1209 reactions now.
len(model.reactions)
#and it is, yay!

1209

In [36]:
model.compartments

{'c': 'cytosol', 'e': 'extracellular space'}

In [37]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

## Fixing IDs of new reactions and metabolites
After the above, we've added a bunch of new reactions and metabolites to the working model. These were automatically given numerical IDs, which is not very practical. To maintain model consistancy, we want to give them BiGG compliant IDs as much as possible. This part of the notebook will cover that for the reaction IDs and metabolite IDs.

#### Metabolites
Here we use the MetaNetX database again to give automatic IDs for the new metabolites that have been added.

In [38]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [39]:
new_mets = []
for met in model.metabolites:
    if met.id[:2] in 'N_':
        new_mets.append(met)
    else: continue
len(new_mets)

24

In [40]:
# Load database of chemical IDs, taken from MetanetX
ch_df = pd.read_csv("../../Databases\chem_xref.tsv", sep="\t", skiprows=385)
ch_df.sample(5)

Unnamed: 0,#XREF,MNX_ID,Evidence,Description
1092136,slm:000288285,MNXM438886,reference,"1-O-triacontyl-2-(13Z,16Z-docosadienoyl)-3-hex..."
1528762,deprecated:MNXM32760,MNXM2454,inferred,
1497436,MNXM227243,MNXM227243,identity,
678186,slm:000075588,MNXM645832,reference,PI(8:0_36:4)|Phosphatidylinositol (8:0_36:4)
315998,hmdb:HMDB54094,MNXM140806,identity,"(2R)-3-[(6Z,9Z,12Z,15Z)-octadeca-6,9,12,15-tet..."


In [41]:
unmatched_BiGG = []
for met in new_mets:
#     construct string that matches kegg compound id in IDs database
    try:
        kegg_id = "kegg:"+ met.notes["KEGG"]
    except KeyError:
        unmatched_BiGG.append(met)
        continue
    # try to find metanetx id for this compound
    meta_net_id = ch_df.loc[ch_df["#XREF"] == kegg_id,"MNX_ID"].values[0]
    # find all entries that have the same metanetx id
    matched_compounds = ch_df[ch_df['MNX_ID'] == meta_net_id]
    # find the shortest BiGG id that correspond to our MetaNetX id
    try:
        # Look for BiGG ID, if it fails look for biocyc id
        new_id = (
            matched_compounds[matched_compounds["#XREF"].str.startswith("bigg:")]["#XREF"]
            .str.replace("bigg:", "").sort_values(ascending=False).values[0]
              )
    except IndexError:
        unmatched_BiGG.append(met)
        continue
    # add compartment information
    if met.compartment == "c":
        new_id = new_id + "_c"
    elif met.compartment == "e":
        new_id = new_id + "_e"
    # overwrite model id with matched id
    try:
        if "-" in new_id:
            new_id = new_id.replace("-","__")
        met.id = new_id
    except ValueError:
        unmatched_BiGG.append(met)
        continue
BiGGID = len(model.metabolites) - len(unmatched_BiGG)

In [42]:
nonfixed_mets = []
for met in model.metabolites:
    if met.id[:2] in 'N_':
        nonfixed_mets.append(met)
    else: continue
len(nonfixed_mets)

14

Of the 13 metabolites that were not given automatic IDs, I will give them manual IDs now to make sure they are not numerical.

In [43]:
for met in nonfixed_mets:
    print (met.id, '     ', met.notes['KEGG'])

N_327_c       C20246
N_647_c       C20247
N_10556_c       C15814
N_10555_c       C15810
N_10598_c       C15811
N_10550_c       C15812
N_10551_c       C15813
N_3201_c       C05921
N_200_c       C03044
N_205_c       C00466
N_206_c       C00028
N_220_c       C05183
N_208_c       C05305
N_224_c       C00002


In [44]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

### Adding formulas to new metabolites
The previous hasn't added the right formulas to the metabolites, and so these should be added. 

Im not sure why they are not lifted from the annotations/notes... check when i do an I/O cycle they hopefully are?

In [69]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [78]:
new_mets_ids = []
for met in new_mets:
    new_mets_ids.append(met.id)

In [82]:
for met in model.metabolites:
    if met.id in new_mets_ids:
        try: 
            form = met.notes['FORMULA']
            met.formula = form
        except KeyError:
            print(met.id)
    else: continue
        

N_10556_c
N_10555_c
N_10551_c
N_206_c
N_220_c
N_208_c


These five that still don't have a formula i can add manually later.

In [85]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

#### Fixing manual IDs new metabolites

In [107]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [108]:
model.metabolites.N_327_c.name = '2-[(2R,5Z)-2-Carboxy-4-methylthiazol-5(2H)-ylidene]ethyl phosphate'
model.metabolites.N_327_c.id = 'cmtdepp_c'

In [109]:
model.metabolites.N_647_c.name = '2-(2-Carboxy-4-methylthiazol-5-yl)ethyl phosphate'
model.metabolites.N_647_c.id =  'cthzp_c'

In [110]:
model.metabolites.N_10556_c.formula = 'C4H7N2O2SR'
model.metabolites.N_10556_c.name = 'Thiocarboxy-[sulfur-carrier protein]'
model.metabolites.N_10556_c.id =  'tcscp_c'


In [111]:
model.metabolites.N_10555_c.formula = 'C4H7N2O3R'
model.metabolites.N_10555_c.name = '[Sulfur-carrier protein]-Gly-Gly'
model.metabolites.N_10555_c.id =  'scpgg_c'


In [112]:
model.metabolites.N_10598_c.name = '[Enzyme]-cysteine'
model.metabolites.N_10598_c.id =  'enzcys_c'

In [113]:
model.metabolites.N_10550_c.name = '[Enzyme]-S-sulfanylcysteine'
model.metabolites.N_10550_c.id =  'enzscys_c'

In [114]:
model.metabolites.N_10551_c.formula = 'C14H19N7O9PR'
model.metabolites.N_10551_c.name = 'Adenylyl-[sulfur-carrier protein]'
model.metabolites.N_10551_c.id =  'ascp_c'


In [115]:
model.metabolites.N_3201_c.name = 'Biotinyl-5-AMP'
model.metabolites.N_3201_c.id =  'b5amp_c'

In [116]:
model.metabolites.N_200_c.name = '(R,R)-Butane-2,3-diol; (R,R)-2,3-Butanediol'
model.metabolites.N_200_c.id =  'rr23bdo_c'

In [117]:
model.metabolites.N_205_c.name = 'Acetoin'
model.metabolites.N_205_c.id =  'actn_c'

In [118]:
model.metabolites.N_206_c.formula = 'R'
model.metabolites.N_206_c.name = 'Hydrogen-acceptor'
model.metabolites.N_206_c.id =  'hacc_c'


In [119]:
model.metabolites.N_220_c.formula = 'R'
model.metabolites.N_220_c.name = 'Ferrocytochrome b-561'
model.metabolites.N_220_c.id =  'fcytb561_c'


In [120]:
model.metabolites.N_208_c.formula = 'R'
model.metabolites.N_208_c.name = 'Ferricytochrome b-561'
model.metabolites.N_208_c.id =  'fcytb5612_c'


In [121]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

### Reactions
Similar to the metabolites, for the reactions we should modify the ID's so that they are compliant (where possible) with the BiGG database.


In [148]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [161]:
new_rcts = []
for rct in model.reactions:
    if rct.id[:2] in 'N_':
        new_rcts.append(rct)
    else: continue
len(new_rcts)

35

In [150]:
rct_df = pd.read_csv("../databases/reac_xref.tsv", sep="\t", skiprows=385)
#NOTE: an extra colum head was added to the file on line 386 called 'note' to fix the headings being aligned further

In [151]:
rct_df[0:5]

Unnamed: 0,#XREF,MNX_ID,Note
0,MNXR01,MNXR01,Synthetic reaction
1,bigg:10FTHF5GLUtl,MNXR94668,1 10fthf5glu@l = 1 10fthf5glu@c
2,bigg:R_10FTHF5GLUtl,MNXR94668,1 10fthf5glu@l = 1 10fthf5glu@c
3,MNXR94668,MNXR94668,
4,bigg:10FTHF5GLUtm,MNXR94668,1 10fthf5glu@c = 1 10fthf5glu@m


In [162]:
new_rcts_id = []
for rct in new_rcts:
    new_rcts_id.append(rct.id)
len(new_rcts_id)

35

In [166]:
#try to change reacion IDs to match BiGG IDs
unmatched_rct = []
for rct in model.reactions:
    if rct.id in new_rcts_id:
    # construct string that matches reaction ID to IDs in database
        try:
            rct_id = "kegg:"+ rct.notes["ID"]
        except KeyError:
            unmatched_rct.append(rct)
            continue
    # try to find metanetx id for this reaction
        try:
            rct_new_id = rct_df.loc[rct_df["#XREF"] == rct_id,"MNX_ID"].values[0]
        except IndexError:
            unmatched_rct.append(rct)
            continue
    # find all entries that have the same metanetx id
        matched_compounds = rct_df[rct_df['MNX_ID'] == rct_new_id]
    # find the shortest BiGG id that correspond to our MetaNetX id
        try:
        # Look for BiGG ID, if it fails put it in the unmatched_rct list
            new_id = (
                matched_compounds[matched_compounds["#XREF"].str.startswith("bigg:")]["#XREF"]
                .str.replace("bigg:", "").sort_values(ascending=True).values[0])
        except IndexError:
            unmatched_rct.append(rct)
            continue
        # overwrite model id with matched id
        try:
            if "-" in new_id:
                new_id = new_id.replace("-","__")
            else: rct.id = new_id
        except ValueError:
            unmatched_rct.append(rct)
            continue
    else: continue

In [167]:
len(unmatched_rct)

26

In [170]:
unmatched_rct

[<Reaction N_7 at 0x2103b7b76c8>,
 <Reaction N_8 at 0x2103b7b7888>,
 <Reaction N_11 at 0x2103b7b7848>,
 <Reaction N_13 at 0x2103b7bd588>,
 <Reaction N_22 at 0x2103b7bda88>,
 <Reaction N_29 at 0x2103b7bdf88>,
 <Reaction N_30 at 0x2103b7c53c8>,
 <Reaction N_228 at 0x2103b7c5948>,
 <Reaction N_340 at 0x2103b7c5ec8>,
 <Reaction N_364 at 0x2103b7c93c8>,
 <Reaction N_421 at 0x2103b7c9d88>,
 <Reaction N_475 at 0x2103b7ce3c8>,
 <Reaction N_620 at 0x2103b7d3948>,
 <Reaction N_777 at 0x2103b7d3dc8>,
 <Reaction N_790 at 0x2103b7d3e88>,
 <Reaction N_1009 at 0x2103b7d8c88>,
 <Reaction N_1838 at 0x2103b7e0208>,
 <Reaction N_1839 at 0x2103b7e0b08>,
 <Reaction N_1840 at 0x2103b7e5408>,
 <Reaction N_200 at 0x2103b7e5bc8>,
 <Reaction N_245 at 0x2103b7e92c8>,
 <Reaction N_248 at 0x2103b7ec3c8>,
 <Reaction N_251 at 0x2103b7ecb08>,
 <Reaction N_252 at 0x2103b7f2588>,
 <Reaction N_284 at 0x2103b7f9a08>,
 <Reaction N_287 at 0x2103b7f9e88>]

In [171]:
rct = model.reactions.N_1009

In [172]:
rct.id in new_rcts_id

True

In [173]:
rct_id = "kegg:"+ rct.notes["ID"]
rct_id

'kegg:R01197'

In [174]:
rct_new_id = rct_df.loc[rct_df["#XREF"] == rct_id,"MNX_ID"].values[0]
rct_new_id

'MNXR106859'

In [175]:
new_id = (
            matched_compounds[matched_compounds["#XREF"].str.startswith("bigg:")]["#XREF"]
            .str.replace("bigg:", "").sort_values(ascending=True).values[0]
              )
new_id

IndexError: index 0 is out of bounds for axis 0 with size 0

In [144]:
rct.id = new_id
rct.id

'M_acetone'

In [45]:
#check biomass is still fine:
model.optimize()

Unnamed: 0,fluxes,reduced_costs
IDPh,20.000000,4.988287e-03
PYRACTT,8.970582,1.734723e-18
CAT,0.000000,-9.896064e-05
PDHam1hi,0.000000,5.421011e-20
CCP,-20.000000,-1.484410e-04
...,...,...
N_265,0.000000,1.287490e-19
N_272,16.893296,-1.201432e-17
N_275,0.000000,-1.355253e-20
N_284,0.000000,1.355253e-20
