# Introduction
In notebook 0. Comparison between original and newer model, we've made the reactions match between the two model versions. Now we need to make it more intuitive to work with and so in a similar way as done before fix the IDs and annotations of metabolites and reactions.

In Beata's version of the model, there is very little information stored about the metabolites. This is greatly improved in the version of Matteo. 

Luckily, the ID's between the two models are kept consistantly. Therefore, one can copy the annotations from Matteo's version to Beata's version and then from there fix the metabolite ID's for every compound. 

Also, there are many unconnected metabolites in Beata's version of the model, and these can be removed. 

This notebook will cover the above mentioned.

In [1]:
import cobra
import cameo

In [2]:
import pandas as pd

In [3]:
#import models
model = cobra.io.read_sbml_model("../model/g-thermo.xml")

In [4]:
matteo_model = cobra.io.read_sbml_model("../databases/g-thermo-Matteo.xml")

In [5]:
model.metabolites

[<Metabolite 28_c at 0x2c72cf5bb08>,
 <Metabolite 12_c at 0x2c72cf5b948>,
 <Metabolite 1337_c at 0x2c72cf5bac8>,
 <Metabolite 1864_c at 0x2c72cf5bf48>,
 <Metabolite 1066_c at 0x2c72cf5bc08>,
 <Metabolite 1819_e at 0x2c72cf5f088>,
 <Metabolite 742_c at 0x2c72cf5f248>,
 <Metabolite 1825_c at 0x2c72cf5be88>,
 <Metabolite 653_c at 0x2c72cf5f688>,
 <Metabolite 549_c at 0x2c72cf5f788>,
 <Metabolite 499_e at 0x2c72cf5bd08>,
 <Metabolite 23_e at 0x2c72cf5f9c8>,
 <Metabolite 43_e at 0x2c72cf5fa88>,
 <Metabolite 1864_e at 0x2c72cf5fb88>,
 <Metabolite 1818_c at 0x2c72cf5fc88>,
 <Metabolite 1863_c at 0x2c72cf5b6c8>,
 <Metabolite 1819_c at 0x2c72cf5fd08>,
 <Metabolite 42_c at 0x2c72cf5ffc8>,
 <Metabolite 35_c at 0x2c72cf5fc08>,
 <Metabolite 257_c at 0x2c72cf5ff48>,
 <Metabolite 258_c at 0x2c72cf6e288>,
 <Metabolite 2466_c at 0x2c72cf6e308>,
 <Metabolite 18544_c at 0x2c72cf6e548>,
 <Metabolite 1853_c at 0x2c72cf6e648>,
 <Metabolite 1852_c at 0x2c72cf6e888>,
 <Metabolite 1851_c at 0x2c72cf6e988>,
 <M

In [6]:
matteo_model.metabolites

[<Metabolite 232_e at 0x2c72f390f08>,
 <Metabolite 317_e at 0x2c72f3902c8>,
 <Metabolite 424_c at 0x2c72f39bb48>,
 <Metabolite 3200_c at 0x2c72f3a53c8>,
 <Metabolite 3201_c at 0x2c72f3abe88>,
 <Metabolite 3202_c at 0x2c72f3b6488>,
 <Metabolite 26_c at 0x2c72f3c1508>,
 <Metabolite 3333_c at 0x2c72f3c7fc8>,
 <Metabolite 10598_c at 0x2c72f3cc508>,
 <Metabolite 10550_c at 0x2c72f3cc1c8>,
 <Metabolite 10551_c at 0x2c72f3c16c8>,
 <Metabolite 10556_c at 0x2c72f3dadc8>,
 <Metabolite 10555_c at 0x2c72f3daf88>,
 <Metabolite 327_c at 0x2c72f3dfc08>,
 <Metabolite 647_c at 0x2c72f3e4888>,
 <Metabolite 2202_c at 0x2c72f3ec3c8>,
 <Metabolite 1157_c at 0x2c72f3f1388>,
 <Metabolite 177_c at 0x2c72f3fd508>,
 <Metabolite 287_c at 0x2c72f3fdf48>,
 <Metabolite 267_c at 0x2c72f408148>,
 <Metabolite 58_e at 0x2c72f40b108>,
 <Metabolite 210_c at 0x2c72f40bd08>,
 <Metabolite 1080_c at 0x2c72f414f08>,
 <Metabolite 1256_c at 0x2c72f414688>,
 <Metabolite 318_c at 0x2c72f41b388>,
 <Metabolite 28_c at 0x2c72f421b08

## Removing unconnected metabolites
Memote reports the unconnected metabolites that can just be removed from the model.

In [7]:
unconnected_met =                     ["10598_c","10556_c","10555_c","10551_c","10550_c","660_c","1414_c","647_c","740_c","595_c","143_c","399_c","1227_c","372_c","26_c","173_c","313_c","358_c","1652_c","1653_c","1654_c","2009_c","905_c","1022_c","1855_c","1857_c","1951_c","1460_c","1686_c","2161_c","1650_c","670_c","2185_c","2187_c","2056_c","2057_c","2058_c","775_c","793_c","874_c","835_c","837_c","1300_c","855_c","1105_c","759_c","877_c","845_c","390_c","558_c","483_c","489_c","1878_c","1879_c","1656_c","1657_c","1658_c","1875_c","1712_c","2167_c","2166_c","2165_c","2164_c","2163_c","2162_c","2160_c","2169_c","2168_c","1122_c","1123_c","1120_c","1121_c","1126_c","1127_c","1124_c","1125_c","1128_c","1129_c","2343_c","2342_c","2341_c","2340_c","1838_c","1839_c","1830_c","1831_c","1832_c","1833_c","1834_c","1835_c","1836_c","1837_c","941_c","943_c","1580_c","1581_c","1582_c","1583_c","1584_c","1585_c","1238_c","1587_c","1236_c","1589_c","1230_c","1904_c","1905_c","1906_c","1907_c","1900_c","1901_c","1902_c","1903_c","970_c","1908_c","1909_c","1444_c","1005_c","1059_c","839_c","803_c","1009_c","1457_c","1106_c","1455_c","164_c","1101_c","1102_c","1103_c","764_c","765_c","760_c","761_c","763_c","768_c","384_c","385_c","382_c","542_c","2112_c","2113_c","2110_c","2111_c","2116_c","2117_c","2114_c","2115_c","2118_c","2119_c","2028_c","2029_c","2026_c","2027_c","2024_c","2025_c","2022_c","2023_c","2020_c","2021_c","1603_c","1350_c","1353_c","1352_c","1355_c","1606_c","1130_c","1133_c","1135_c","1134_c","1137_c","1136_c","1139_c","1138_c","1604_c","2338_c","2339_c","2332_c","2333_c","2330_c","2331_c","2336_c","2337_c","2334_c","2335_c","1218_c","1605_c","2246_c","2247_c","2244_c","2245_c","2242_c","1228_c","2240_c","2241_c","1597_c","1596_c","1226_c","1221_c","1220_c","1223_c","2249_c","1913_c","1912_c","1911_c","1910_c","1917_c","1916_c","1915_c","1914_c","1919_c","1918_c","2259_c","2318_c","1827_c","841_c","840_c","842_c","844_c","1829_c","629_c","628_c","625_c","623_c","622_c","713_c","714_c","717_c","538_c","444_c","442_c","2101_c","2100_c","2103_c","2102_c","2105_c","2104_c","1850_c","1673_c","1854_c","1993_c","1776_c","1775_c","1990_c","2038_c","2035_c","2034_c","2037_c","2036_c","2031_c","2030_c","2033_c","2032_c","1698_c","1699_c","1694_c","1695_c","1696_c","1697_c","1690_c","1691_c","1693_c","1108_c","1109_c","1458_c","1459_c","1456_c","1454_c","1107_c","1452_c","1453_c","1450_c","1451_c","2328_c","2321_c","2320_c","2323_c","2322_c","2325_c","2327_c","2326_c","2255_c","1219_c","2257_c","2256_c","2250_c","1210_c","1211_c","1216_c","1217_c","1612_c","1070_c","1071_c","1073_c","1078_c","1978_c","1676_c","1677_c","1672_c","1670_c","1671_c","853_c","1858_c","856_c","857_c","1678_c","1679_c","1992_c","639_c","636_c","817_c","703_c","700_c","701_c","707_c","704_c","708_c","528_c","2227_c","522_c","521_c","1537_c","2228_c","2229_c","1994_c","1532_c","632_c","2138_c","2139_c","2134_c","2135_c","2136_c","2137_c","2130_c","2131_c","2132_c","2133_c","2248_c","2008_c","2000_c","2001_c","2002_c","2003_c","2004_c","2005_c","2006_c","2007_c","1689_c","1688_c","1683_c","1682_c","1681_c","1680_c","1687_c","1684_c","2314_c","2315_c","1119_c","2317_c","1449_c","1448_c","2312_c","2313_c","1445_c","1112_c","1447_c","1110_c","1117_c","2319_c","1114_c","1207_c","604_c","1205_c","998_c","1203_c","1202_c","1201_c","1200_c","991_c","1209_c","1208_c","2261_c","2262_c","2263_c","2264_c","2267_c","2268_c","2269_c","865_c","1060_c","1067_c","862_c","861_c","860_c","1069_c","1665_c","1664_c","1667_c","1666_c","1661_c","1660_c","1847_c","1662_c","1849_c","1848_c","1669_c","1668_c","1586_c","1588_c","1237_c","739_c","738_c","733_c","732_c","519_c","518_c","285_c","289_c","463_c","462_c","460_c","468_c","191_c","1983_c","2129_c","2128_c","2122_c","2121_c","2120_c","2127_c","2126_c","2017_c","2016_c","2015_c","2014_c","2013_c","2012_c","2011_c","2019_c","2018_c","2303_c","2302_c","2301_c","2300_c","2307_c","2306_c","2305_c","2304_c","1470_c","1471_c","2309_c","2308_c","1474_c","1475_c","1476_c","1477_c","936_c","1751_c","1716_c","1782_c","985_c","1780_c","980_c","981_c","1784_c","1785_c","1788_c","1789_c","1273_c","1276_c","1277_c","1274_c","1275_c","939_c","1278_c","2279_c","2278_c","2276_c","2275_c","2273_c","2272_c","2271_c","2270_c","871_c","873_c","1096_c","1097_c","1094_c","1095_c","1092_c","1093_c","1090_c","1091_c","1651_c","1655_c","1308_c","1309_c","1874_c","1659_c","1876_c","1877_c","1870_c","1871_c","1872_c","1873_c","1896_c","1897_c","1894_c","1895_c","1892_c","1893_c","1890_c","1891_c","1898_c","1899_c","1865_c","728_c","720_c","724_c","725_c","727_c","500_c","506_c","507_c","509_c","291_c","294_c","478_c","474_c","476_c","186_c","180_c","183_c","2062_c","2063_c","2060_c","2061_c","2069_c","1467_c","1466_c","1465_c","1464_c","1463_c","1462_c","1461_c","1141_c","1490_c","1469_c","1468_c","1143_c","1145_c","1495_c","1791_c","1790_c","1793_c","1792_c","1795_c","1794_c","1796_c","1261_c","1260_c","1263_c","1269_c","795_c","2208_c","2209_c","1247_c","2203_c","2200_c","2201_c","2206_c","2207_c","2204_c","2205_c","1675_c","889_c","1088_c","1085_c","1087_c","1086_c","1083_c","1869_c","1868_c","1319_c","1318_c","1649_c","1648_c","1647_c","1862_c","1861_c","1644_c","1867_c","1310_c","1641_c","1885_c","1884_c","1887_c","1886_c","1881_c","1880_c","1883_c","1882_c","1889_c","1888_c","1089_c","1113_c","884_c","887_c","880_c","663_c","662_c","665_c","664_c","1646_c","1635_c","1860_c","1311_c","1866_c","1313_c","319_c","64_c","2070_c","2073_c","2072_c","2075_c","2074_c","2077_c","2076_c","2079_c","2078_c","1412_c","1413_c","1410_c","1411_c","1416_c","1417_c","1415_c","1418_c","1419_c","1772_c","1258_c","1528_c","1529_c","1525_c","1522_c","1253_c","2219_c","2218_c","1737_c","2211_c","2210_c","2213_c","2212_c","2215_c","2214_c","2217_c","2216_c","816_c","1035_c","897_c","893_c","890_c","1328_c","1329_c","1638_c","1639_c","1632_c","1633_c","1630_c","1324_c","1637_c","1326_c","1327_c","811_c","2316_c","1118_c","2310_c","2311_c","1746_c","1747_c","1744_c","1745_c","1743_c","1740_c","1741_c","1111_c","1748_c","1749_c","1446_c","1441_c","1440_c","1443_c","1442_c","1645_c","279_c","277_c","2094_c","672_c","677_c","674_c","675_c","679_c","418_c","419_c","416_c","417_c","2048_c","2049_c","2044_c","2045_c","2046_c","2047_c","2040_c","2041_c","2042_c","2043_c","999_c","1401_c","1400_c","1403_c","1402_c","1405_c","1404_c","1407_c","1406_c","1409_c","1408_c","1616_c","2198_c","2199_c","2192_c","2193_c","2190_c","2191_c","2196_c","2197_c","2194_c","2195_c","2224_c","2225_c","2226_c","1248_c","2220_c","2221_c","2222_c","2223_c","1535_c","1534_c","1241_c","1536_c","1531_c","1530_c","1533_c","1244_c","1006_c","1335_c","1629_c","1628_c","1621_c","1620_c","1623_c","1622_c","1625_c","1624_c","1627_c","1626_c","1590_c","934_c","1757_c","1756_c","1750_c","1753_c","932_c","1758_c","247_c","246_c","244_c","1345_c","649_c","648_c","643_c","640_c","426_c","425_c","424_c","423_c","429_c","428_c","863_c","338_c","1841_c","1840_c","1843_c","1842_c","1845_c","1844_c","1663_c","1846_c","1702_c","2059_c","1703_c","2053_c","2052_c","2051_c","2050_c","1700_c","1706_c","1707_c","903_c","1435_c","1436_c","1437_c","1430_c","1431_c","1432_c","2189_c","2188_c","2181_c","2180_c","2183_c","2182_c","2184_c","2186_c","1508_c","1509_c","2231_c","2230_c","2237_c","2236_c","2235_c","2234_c","1500_c","1501_c","2239_c","2238_c","1504_c","1505_c","1506_c","1507_c","1290_c","1291_c","1293_c","1346_c","1611_c","1613_c","1348_c","1349_c","1618_c","1619_c","1761_c","924_c","1989_c","1984_c","1985_c","1986_c","1987_c","1980_c","1981_c","1982_c","929_c","1966_c","1967_c","1964_c","1965_c","1962_c","1963_c","1960_c","1961_c","1968_c","1969_c","255_c","2232_c","782_c","780_c","784_c","1538_c","658_c","659_c","651_c","654_c","655_c","656_c","657_c","431_c","432_c","1503_c","434_c","435_c","324_c","326_c","327_c","321_c","322_c","1429_c","1428_c","1423_c","1422_c","1421_c","1420_c","1427_c","1426_c","1425_c","1424_c","1204_c","1517_c","1516_c","1515_c","1514_c","1513_c","1511_c","1510_c","1519_c","1518_c","2080_c","2081_c","2082_c","2083_c","2084_c","2085_c","2086_c","2087_c","2088_c","2089_c","1614_c","1289_c","1615_c","1286_c","1283_c","1282_c","1281_c","1617_c","1610_c","1351_c","1602_c","1601_c","1600_c","1607_c","1354_c","1357_c","1356_c","1359_c","1358_c","1609_c","1608_c","1197_c","1199_c","1198_c","1674_c","912_c","1999_c","917_c","1779_c","1778_c","1777_c","1991_c","1996_c","1995_c","2243_c","1599_c","1598_c","1975_c","1974_c","1976_c","1971_c","1970_c","1973_c","1972_c","1979_c","1595_c","1594_c","1593_c","1592_c","1591_c","1222_c","682_c","681_c","686_c","689_c","359_c","352_c","354_c","2280_c","2288_c","1760_c","2289_c","920_c","1562_c","1563_c","1560_c","1561_c","1564_c","1565_c","2099_c","2098_c","2097_c","2096_c","2095_c","1828_c","2093_c","2092_c","2091_c","2090_c","1368_c","1369_c","1364_c","1365_c","1366_c","1367_c","1360_c","1361_c","1362_c","1363_c","1168_c","1164_c","1163_c","1386_c","1387_c","1384_c","1385_c","1382_c","1383_c","1380_c","1388_c","1389_c","1709_c","1701_c","1704_c","1705_c","1940_c","1941_c","1943_c","1944_c","1945_c","1946_c","1947_c","1948_c","1949_c","1010_c","233_c","129_c","694_c","696_c","697_c","690_c","691_c","692_c","693_c","343_c","340_c","341_c","347_c","345_c","1524_c","1257_c","1250_c","1523_c","1520_c","1521_c","1642_c","627_c","2158_c","2159_c","2156_c","2157_c","2154_c","2155_c","2152_c","2153_c","2150_c","2151_c","1570_c","1573_c","1572_c","1575_c","1574_c","1577_c","1576_c","1578_c","1379_c","1378_c","1373_c","1372_c","1371_c","1370_c","1377_c","1376_c","1375_c","1374_c","1395_c","1394_c","1397_c","1391_c","1390_c","1393_c","1399_c","1398_c","1719_c","1718_c","1711_c","1710_c","1713_c","972_c","1715_c","1714_c","1717_c","976_c","1732_c","1730_c","1959_c","1958_c","1957_c","1956_c","1955_c","1954_c","1953_c","1952_c","1736_c","951_c","805_c","807_c","1003_c","809_c","757_c","756_c","751_c","750_c","1320_c","1631_c","1636_c","1325_c","1634_c","594_c","592_c","591_c","373_c","379_c","575_c","574_c","576_c","570_c","572_c","579_c","1317_c","2149_c","2148_c","2145_c","2144_c","2147_c","2146_c","2141_c","2140_c","2143_c","2142_c","1546_c","1547_c","1149_c","1498_c","1499_c","1140_c","1493_c","1142_c","1491_c","1496_c","1497_c","1494_c","1147_c","1478_c","1029_c","1816_c","1817_c","1814_c","1815_c","1812_c","1813_c","1810_c","1811_c","1724_c","1725_c","1726_c","1727_c","1720_c","1722_c","1723_c","1729_c","967_c","965_c","1473_c","1928_c","1929_c","1922_c","1923_c","1920_c","1921_c","1926_c","1927_c","1924_c","1925_c","1030_c","1032_c","748_c","749_c","746_c","747_c","588_c","582_c","584_c","585_c","586_c","587_c","361_c","362_c","366_c","367_c","369_c","562_c","563_c","569_c","1783_c","1781_c","1786_c","1787_c","982_c","1307_c","983_c","1502_c","1479_c","2178_c","2179_c","2170_c","2171_c","2172_c","2173_c","2174_c","2175_c","2176_c","2177_c","2283_c","1559_c","1558_c","2287_c","1553_c","1557_c","2233_c","1154_c","1153_c","1152_c","1489_c","1488_c","1481_c","1480_c","1483_c","1482_c","1485_c","1487_c","1809_c","1808_c","1805_c","1804_c","1807_c","1806_c","1801_c","1803_c","1802_c","957_c","956_c","1731_c","953_c","952_c","1735_c","1739_c","1738_c","1472_c","849_c","1939_c","1938_c","1931_c","1930_c","1933_c","1932_c","1935_c","1934_c","1937_c","1936_c","1028_c","1026_c","829_c","1023_c","1021_c","779_c","778_c","176_c","1098_c","1099_c","1027_c","1064_c","1082_c","1116_c","113_c","1160_c","117_c","1196_c","1206_c","1233_c","1245_c","1246_c","1249_c","125_c","1262_c","1284_c","1285_c","1292_c","1314_c","1321_c","138_c","1392_c","1396_c","1438_c","1439_c","1484_c","1492_c","1552_c","1555_c","1556_c","156_c","1579_c","158_c","162_c","166_c","167_c","1721_c","1728_c","1733_c","1734_c","174_c","1742_c","879_c","178_c","187_c","1942_c","203_c","205_c","206_c","207_c","208_c","223_c","226_c","239_c","252_c","300_c","303_c","312_c","325_c","328_c","331_c","334_c","335_c","337_c","351_c","355_c","365_c","371_c","387_c","409_c","420_c","433_c","436_c","443_c","448_c","450_c","459_c","470_c","48_c","484_c","487_c","488_c","49_c","492_c","498_c","501_c","511_c","512_c","515_c","517_c","523_c","533_c","545_c","565_c","566_c","597_c","617_c","619_c","620_c","621_c","635_c","637_c","667_c","676_c","711_c","712_c","716_c","735_c","752_c","773_c","777_c","78_c","799_c","802_c","818_c","827_c","828_c","838_c","850_c","864_c","885_c","894_c","896_c","927_c","948_c","949_c"]
              
len(unconnected_met)

1418

In [8]:
model.metabolites.get_by_id("10598_c")

0,1
Metabolite identifier,10598_c
Name,G10598
Memory address,0x02c72a23dc48
Formula,C16H28N2O11
Compartment,c
In 0 reaction(s),


In [9]:
#remove metabolites
remove_metabolites = [model.metabolites.get_by_id(mid) for mid in unconnected_met]
model.remove_metabolites(remove_metabolites)

len(model.metabolites)

861

In [11]:
#save model
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")

## Find out where the numbering between the models goes wrong
I will make a dataframe with all metabolite info for both models and inspect where the numbering goes wrong.

First I need to prepare the lists of each model to be able to compare them


In [14]:
#list of original metabolite IDs.
orig_met_ID =[]
for met in model.metabolites:
    orig_met_ID.append(met.id)
len(orig_met_ID)

861

In [15]:
#list of original metabolite IDs.
new_met_ID =[]
for met in matteo_model.metabolites:
    new_met_ID.append(met.id)
len(new_met_ID)

893

In [16]:
#list of original names
orig_met_name =[]
for met in model.metabolites:
    orig_met_name.append(met.name)
len(orig_met_name)

861

In [17]:
#list of original metabolite IDs.
new_met_name =[]
for met in matteo_model.metabolites:
    new_met_name.append(met.name)
len(new_met_name)

893

In [18]:
#create a dataframe of each model
df_orig = pd.DataFrame({'Metabolite' : orig_met_ID, 'name orig' : orig_met_name})  
df_matteo = pd.DataFrame({'Metabolite': new_met_ID, 'name new': new_met_name})

In [19]:
#merge the two dataframes per metabolite ID to inspect where the differences lie
df_met = pd.DataFrame.merge(df_orig, df_matteo, how='outer')
len(df_met)

930

In [20]:
#export the dataframe to csv so I can inspect it

In [21]:
df_met.to_csv('../Databases/Metabolite comparison.csv', index=False, encoding='utf-8')

It seems that generally there is consensus amongst the numbers as we saw earlier. There is only for metabolite 1825_c a discrepancy. For that metabolite it seems to be something matteo has fixed and so we will use his further work & annotation.

## Copying notes from the Matteo model to Beata's version
In Beata's model, each metabolite has a formula and a charge assigned to it. However, we would like to also have the MNX_ID, CheBi, Kegg, module and name where possible. The formula and charge assigned by Beata should be kept. 

As Matteo and Beata's model have the same IDs, we can copy over the notes from the one to the other.

In [5]:
#copying the desired notes from the matteo model
no_notes_KEGG = []
no_notes_ChEBI = []
no_notes_met = []
for met in model.metabolites:
    orig_id = met.id
    if met.id in matteo_model.metabolites:
        met_matteo = matteo_model.metabolites.get_by_id(orig_id)
        try:
            met.notes["KEGG"] = met_matteo.notes["KEGG"]
        except KeyError:
            no_notes_KEGG.append(met)
        try:
            met.notes["NAME"] = met_matteo.notes["NAME"]
        except KeyError:
            continue
        try:
            met.notes["ChEBI"] = met_matteo.notes["ChEBI"]
        except KeyError:
            if met in no_notes_met:
                continue
            else:
                no_notes_ChEBI.append(met)
    else:
        no_notes_met.append(met)
print(len(no_notes_KEGG))
print(len(no_notes_ChEBI))
print(len(no_notes_met))
        

9
26
37


In [6]:
#save model
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")

So there are 9 metabolites that are not in the matteo version of the model. There are also some that may not have the Kegg or Chebi info as a note now, but that can be oke. 

## Automatic modification of IDs
The metabolites are currently named with numbers. Now that we have copied the information we need from Matteo's model, we can change the metabolite IDs to more intuitive and informative IDs.

To do so, we will use the metanetX databse, to couple the KEGG IDs to the MetanetX ID, and from there find the BiGG ID that relates to the metabolite. This will standardize many metabolites against the BiGG database aswell.


In [11]:
# Load database of chemical IDs, taken from MetanetX
ch_df = pd.read_csv("../../Databases\chem_xref.tsv", sep="\t", skiprows=385)
ch_df.sample(5)

Unnamed: 0,#XREF,MNX_ID,Evidence,Description
1352451,slm:000421427,MNXM652776,reference,PIP2(22:1/2:0)|Phosphatidylinositol bisphospha...
108859,chebi:27300,MNXM88966,reference,D vitamins|vitamin D
796588,MNXM309730,MNXM309730,identity,
60979,MNXM583541,MNXM583541,identity,
522267,seed:cpd20206,MNXM87078,identity,Thioquinox


In [12]:
unmatched_BiGG = []
for met in model.metabolites:
#     construct string that matches kegg compound id in IDs database
    try:
        kegg_id = "kegg:"+ met.notes["KEGG"]
    except KeyError:
        unmatched_BiGG.append(met)
        continue
    # try to find metanetx id for this compound
    meta_net_id = ch_df.loc[ch_df["#XREF"] == kegg_id,"MNX_ID"].values[0]
    # find all entries that have the same metanetx id
    matched_compounds = ch_df[ch_df['MNX_ID'] == meta_net_id]
    # find the shortest BiGG id that correspond to our MetaNetX id
    try:
        # Look for BiGG ID, if it fails look for biocyc id
        new_id = (
            matched_compounds[matched_compounds["#XREF"].str.startswith("bigg:")]["#XREF"]
            .str.replace("bigg:", "").sort_values(ascending=False).values[0]
              )
    except IndexError:
        unmatched_BiGG.append(met)
        continue
    # add compartment information
    if met.compartment == "c":
        new_id = new_id + "_c"
    elif met.compartment == "e":
        new_id = new_id + "_e"
    # overwrite model id with matched id
    try:
        if "-" in new_id:
            new_id = new_id.replace("-","__")
        met.id = new_id
    except ValueError:
        unmatched_BiGG.append(met)
        continue
BiGGID = len(model.metabolites) - len(unmatched_BiGG)

In [13]:
len(unmatched_BiGG)

164

In [22]:
model.metabolites

[<Metabolite 28_c at 0x215689a9ac8>,
 <Metabolite pydx5p_c at 0x215689a9a48>,
 <Metabolite co2dam_c at 0x215689a9c08>,
 <Metabolite 1864_c at 0x215689a9b48>,
 <Metabolite 1066_c at 0x215689ae1c8>,
 <Metabolite 1819_e at 0x215689ae208>,
 <Metabolite ps_cho_c at 0x215689ae3c8>,
 <Metabolite 1825_c at 0x215689ae308>,
 <Metabolite lpro_c at 0x215689ae808>,
 <Metabolite alpro_c at 0x215666997c8>,
 <Metabolite f1p_e at 0x215689a9f88>,
 <Metabolite glu__L_e at 0x215689aeb08>,
 <Metabolite lys__L_e at 0x215689aebc8>,
 <Metabolite 1864_e at 0x215689aecc8>,
 <Metabolite 1818_c at 0x215689aedc8>,
 <Metabolite 1863_c at 0x215689aeec8>,
 <Metabolite 1819_c at 0x215689bf188>,
 <Metabolite 42_c at 0x215689bf308>,
 <Metabolite dna_c at 0x215689aef48>,
 <Metabolite trdrd_c at 0x215689bf408>,
 <Metabolite trdox_c at 0x215689bf448>,
 <Metabolite 2466_c at 0x215689bf4c8>,
 <Metabolite 18544_c at 0x215689bf708>,
 <Metabolite toctd2eACP_c at 0x215689bfa08>,
 <Metabolite M_3hoctaACP_c at 0x215689bfa48>,
 <Me

In [23]:
model.metabolites.get_by_id("102_c")

0,1
Metabolite identifier,102_c
Name,D-Ribose_5-phosphate_C5H10O8P
Memory address,0x02156ab52408
Formula,C5H10O8P
Compartment,c
In 8 reaction(s),"329, 327, 330, 1504, 325, 463, 326, 328"


In [24]:
#save and commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

In [3]:
model = cobra.io.read_sbml_model("../model/g-thermo.xml")

## Manual modification of metabolite IDs
After trying to automatically modify the metabolite IDs, some do not contain a Kegg ID or cannot be found in the metanetX database. Therefore, these should be given non-numerical IDs or considered for removal manually. To do so, I will export the list of unmatched metabolites into a dataframe, manuall give them an ID and then re-import this and modify them further.

In [20]:
#remove these as they are only exported, they are not even taken up.
remove_met = ["1207_e","2328_e","523_e","635_e","654_e","657_e","991_e"]
remove_met_2 = [model.metabolites.get_by_id(mid) for mid in remove_met]
model.remove_metabolites(remove_met_2)

In [9]:
numbers = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "0"]

In [15]:
#generate a list of all metabolites still with a numerical ID
unmatched_BiGG_num = []
for met in model.metabolites:
    if met.id[0] in numbers:
        unmatched_BiGG_num.append(met)
    else: 
        continue
        
len(unmatched_BiGG_num)

162

In [16]:
#create a list of the unmatched_Bigg metabolite names.
unmatched_BiGG_name =[]
for met in model.metabolites:
    if met in unmatched_BiGG_num:
        unmatched_BiGG_name.append(met.name)
    else:
        continue
len(unmatched_BiGG_name)

162

In [19]:
#export to a csv file to manipulate in excel
df = pd.DataFrame({'Metabolite' : unmatched_BiGG_num, 'name' : unmatched_BiGG_name})    
df.to_csv('../Databases/Chebi metabolites.csv', index=False, encoding='utf-8')

In [56]:
#Now sort the exel file and import into python, to change the ID's for the metabolites changed by hand. 
chebi_prop_id = pd.read_excel("../databases/Chebi metabolites_newID.xlsx")
chebi_prop_id

Unnamed: 0,Metabolite,name,new_ID
0,10000_c,branched fatty acid,bcfa_c
1,10000_e,C00080,bcfa_e
2,1013_c,UDP-N-acetylmuramoyl-L-alanyl-D-gamma-glutamyl...,uaagmd_c
3,1014_c,"N-Acetyl-beta-D-mannosaminyl-1,4-N-acetyl-D-_g...",abmaagapc_c
4,1014_e,C00080,abmaagapc_e
...,...,...,...
157,961_c,alpha-D-Galactosyl-diphosphoundecaprenol,adgppuc_c
158,961_e,C00080,adgppuc_e
159,968_c,4-Amino-2-methyl-5-phosphomethylpyrimidine_C6H...,ampm_c
160,979_c,2-(Formamido)-N1-(5'-phosphoribosyl)acetamidin...,fpram_c


We need to make sure each metabolite gets a unique ID, so give them either a different ID, or fix the other metabolite.

In [39]:
dextrin = model.metabolites.starch_c
dextrin.id = 'Dextrin'

In [40]:
model.metabolites.get_by_id("hpglu_c").id = "thpglu_c"

In [41]:
model.metabolites.glu__L_c.id = 'glu__DL_c'

In [42]:
model.metabolites.cl_c.id = 'choline_c'

While looking into the above, I saw some metabolites have strange names in Beata's version and so should also be changed manually. this is done below.

Also, some extracellular metabolites are not connected to anything except exchange, and so these are removed as well. 

In [43]:
#change the name of the 1825_c metabolite
model.metabolites.get_by_id("1825_c").name = 'S-(2-Methylbutanoyl)-dihydrolipoamide-E'

In [44]:
#change name of the 1823 metabolite too
model.metabolites.get_by_id("1823_c").name = 'Enzyme N6-(S-[2-methylpropanoyl]dihydrolipoyl)lysine'

In [45]:
#change name of the 1864_e metabolite too
model.metabolites.get_by_id("1864_e").name = 'S-Acetyldihydrolipoamide-E'

In [46]:
#change the name of all the glycan metabolites too
model.metabolites.get_by_id("2260_c").name = 'manninotriose'
model.metabolites.get_by_id("2265_c").name = 'melibiose'
model.metabolites.get_by_id("2277_c").name = 'cellulose'


In [47]:
model.metabolites.get_by_id("394_e").name = 'alpha-D-Glucose 6-phosphate'

In [48]:
model.metabolites.get_by_id("440_c").name ='Menaquinone'

In [49]:
model.metabolites.get_by_id("82_e").name ='Succinyl-CoA'

In [50]:
model.metabolites.get_by_id("Biomass_e").name ='Biomass'

In [57]:
matched_met = []
for met in model.metabolites:
    found = chebi_prop_id[chebi_prop_id["Metabolite"]== met.id] 
    if found["new_ID"].empty or found["new_ID"].isna().values[0]: #NaN is stored as empty in the data frame
        continue
    elif found["new_ID"].empty == False:    
        try: 
            met.id = found["new_ID"].values[0]
            matched_met.append(met.id)
        except: 
            print (met.id, "non unique")
len(matched_met)

3

# Conclusion
All metabolites now have ID's that make their content a bit more insightful. Now with this, we can add the correct annotations to each metabolite, based on their IDs. This will be done in another notebook, so we save the model with the same name.

In [60]:
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')