# Metabolic network analysis using *volestipy*

## Dependencies

With respect to this ```jupyter notebook```.
First you need to create a **conda environment** by making use of at least Python 3.6. 
Then open the notebook using the ```jupyter notebook``` command after entering the conda evironment you built. 

For example, considering that the base environment of ```conda``` includes Python 3.6:

```conda activate```

```jupyter notebook```

Before showing how you can exploit the *volestipy* software, we first need to get all the relative dependencies. 

This demo uses [Anaconda](https://www.anaconda.com/products/individual) which you can download following [these](https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart) instructions.

Furtheremore, special, powerful mathematical optimization solvers like [Gurobi](https://www.gurobi.com/) are also used. You can get Gurobi following the steps described [here](https://support.gurobi.com/hc/en-us/articles/360044290292-Installing-Gurobi-for-Python). Keep in mind that you will need a Gurobi license. To do this, you need to create a Gurobi user account and then follow the instructions for a license you will find there.

To read a .mat file you need some extra Python libraries, as it is quite a challenging task. If you are more interested in that, you can read this article [here](https://scipy-cookbook.readthedocs.io/items/Reading_mat_files.html).

To get any libraries that need to run commands as ```sudo``` you need to make a file including **only** your password and replace ```/home/haris/Desktop/running/metabolic_network_pipeline_volestipy/my_project_virtual_env/error.txt``` with the corresponding path. 

In [3]:
import getpass
import os
import sys

sys.version
sys.path

['/home/haris/Documents/GitHub/volesti_fork/volestipy',
 '/home/haris/anaconda3/lib/python37.zip',
 '/home/haris/anaconda3/lib/python3.7',
 '/home/haris/anaconda3/lib/python3.7/lib-dynload',
 '',
 '/home/haris/anaconda3/lib/python3.7/site-packages',
 '/home/haris/anaconda3/lib/python3.7/site-packages/IPython/extensions',
 '/home/haris/.ipython']

In [7]:
# Get the ggplot - oriented Python library
!sudo -H -S pip3 install -t "/home/haris/anaconda3/lib/python3.7/site-packages/" --upgrade pandas plotnine < /home/haris/Desktop/running/metabolic_network_pipeline_volestipy/my_project_virtual_env/error.txt
print("*** The ggplot for Python library has now been installed *** \n\n")

[sudo] password for haris: Collecting pandas
  Using cached pandas-1.1.0-cp36-cp36m-manylinux1_x86_64.whl (10.5 MB)
Collecting plotnine
  Using cached plotnine-0.7.0-py3-none-any.whl (4.4 MB)
Collecting pytz>=2017.2
  Using cached pytz-2020.1-py2.py3-none-any.whl (510 kB)
Collecting python-dateutil>=2.7.3
  Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting numpy>=1.15.4
  Using cached numpy-1.19.1-cp36-cp36m-manylinux2010_x86_64.whl (14.5 MB)
Collecting mizani>=0.7.1
  Using cached mizani-0.7.1-py3-none-any.whl (62 kB)
Collecting scipy>=1.2.0
  Using cached scipy-1.5.2-cp36-cp36m-manylinux1_x86_64.whl (25.9 MB)
Collecting matplotlib>=3.1.1
  Using cached matplotlib-3.3.0-1-cp36-cp36m-manylinux1_x86_64.whl (11.5 MB)
Collecting descartes>=1.1.0
  Using cached descartes-1.1.0-py3-none-any.whl (5.8 kB)
Collecting statsmodels>=0.11.1
  Using cached statsmodels-0.11.1-cp36-cp36m-manylinux1_x86_64.whl (8.7 MB)
Collecting patsy>=0.5.1
  Using cached patsy-0.5.1-py2.py

In [6]:
# Get the tables library to read .mat files
!sudo -H -S pip3 install tables < /home/haris/Desktop/running/metabolic_network_pipeline_volestipy/my_project_virtual_env/error.txt
print("*** The tables library has now been installed *** \n\n")

*** The tables library has now been installed *** 




In [8]:
# Get the h5py library in case you are working with .mat files of 7.3 release of Matlab and after 
!sudo -H -S pip install h5py < /home/haris/Desktop/running/metabolic_network_pipeline_volestipy/my_project_virtual_env/error.txt
print("*** The h5py library has now been installed *** \n\n")

[sudo] password for haris: Keyring is skipped due to an exception: org.freedesktop.DBus.Error.NoServer: Failed to connect to socket /tmp/dbus-1PvWkBesDl: Connection refused
*** The h5py library has now been installed *** 




In [9]:
# Get GUROBI through anaconda - You can find more about installing Gurobi here: https://support.gurobi.com/hc/en-us/articles/360044290292-Installing-Gurobi-for-Python
!conda install -y -c gurobi gurobi
print("*** The Gurobi solver library has now been installed *** \n\n")

Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

*** The Gurobi solver library has now been installed *** 




In [None]:
import gurobipy as gp
from gurobipy import GRB

In [None]:
# This is just a test that the Gurobi solver is well installed

# Create a new model
m = gp.Model("mip1")

print("\n*** Gurobi test has been completed successfully. ***\n")

Now we can import all the necessary libraries.

In [5]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
from plotnine import *
%matplotlib inline



In [11]:
# Matlab up to 7.1 = mat files created with Matlab up to version 7.1 can be read using the mio module part of scipy.io.
from scipy.io import loadmat 

# Beginning at release 7.3 of Matlab, mat files are actually saved using the HDF5 format by default (except if you use the -vX flag at save time, see in Matlab). These files can be read in Python using, for instance, the PyTables or h5py package
import tables 
import h5py

## Read your network file

The following command is recommended for mat files created with Matlab up to version 7.1 c

In [12]:
mat_file_with_loadmat = loadmat('/home/haris/Downloads/e_coli_core.mat')

While this one is in case of ```.mat``` files created by releases of Matlab later after the 7.3
This command will not run in any other case.
For this demo we will use the ```loadmat``` option.

In [13]:
# mat_file_with_tables = h5py.File('/home/haris/Downloads/e_coli_core.mat')

OSError: Unable to create file (unable to open file: name = '/home/haris/Downloads/e_coli_core.mat', errno = 17, error message = 'File exists', flags = 15, o_flags = c2)

Now you can see your metabolic network. 

In [14]:
print(mat_file_with_loadmat)

{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Fri Nov  1 08:18:51 2019', '__version__': '1.0', '__globals__': [], 'e_coli_core': array([[(array([[array(['glc__D_e'], dtype='<U8')],
       [array(['gln__L_c'], dtype='<U8')],
       [array(['gln__L_e'], dtype='<U8')],
       [array(['glu__L_c'], dtype='<U8')],
       [array(['glu__L_e'], dtype='<U8')],
       [array(['glx_c'], dtype='<U5')],
       [array(['h2o_c'], dtype='<U5')],
       [array(['h2o_e'], dtype='<U5')],
       [array(['h_c'], dtype='<U3')],
       [array(['h_e'], dtype='<U3')],
       [array(['icit_c'], dtype='<U6')],
       [array(['lac__D_c'], dtype='<U8')],
       [array(['lac__D_e'], dtype='<U8')],
       [array(['mal__L_c'], dtype='<U8')],
       [array(['mal__L_e'], dtype='<U8')],
       [array(['nad_c'], dtype='<U5')],
       [array(['nadh_c'], dtype='<U6')],
       [array(['nadp_c'], dtype='<U6')],
       [array(['nadph_c'], dtype='<U7')],
       [array(['nh4_c'], dtype='<U5')],
       [array(

In [18]:
data_from_mat = mat_file_with_loadmat
print("the data type of the variable with the network as it was read is: " + str(type(data_from_mat)))

s_matrix = data_from_mat.keys()
print("\nthe keys of this dictionaries are: ")
print(s_matrix)

e_coli_np_void = data_from_mat['e_coli_core'][0][0]
print("\nHowever, if we keep the key:value pair of this dictionary, called 'e_coli_core' where the necessary \
information is located, we can see that its type is: " + str(type(e_coli_np_void)) + "\n\n")

print("number of dimensions of the np.void data type equals to:" + str(e_coli_np_void.ndim) + "\n")

print(type(e_coli_np_void))
print(len(e_coli_np_void))



# metabolites
print(type(e_coli_np_void[0]))
print(e_coli_np_void[0].shape)

print(type(e_coli_np_void[0][0]))
print(e_coli_np_void[0][0].shape)

print(e_coli_np_void[0][:3,])
metabolites = [item[0][0] for item in e_coli_np_void[0]]
print(metabolites)

# genes
print(type(e_coli_np_void[4]))
print(e_coli_np_void[4].shape)
print(e_coli_np_void[4][:3,])

# reactions
print(type(e_coli_np_void[7]))
print(e_coli_np_void[7].shape)
print(e_coli_np_void[7][:3,])



# print(type(e_coli_np_void[5]))
# print(e_coli_np_void[5].ndim)
# print(e_coli_np_void[5][:3,])

# print(type(e_coli_np_void[6]))
# print(e_coli_np_void[6].ndim)
# print(e_coli_np_void[6][:3,])



# print(type(e_coli_np_void[8]))
# print(e_coli_np_void[8].ndim)
# print(e_coli_np_void[8][:3,])

# print(type(e_coli_np_void[9]))
# print(e_coli_np_void[9].ndim)
# print(e_coli_np_void[9][:3,])

# print(type(e_coli_np_void[25]))
# print(e_coli_np_void[25].ndim)
# print(e_coli_np_void[25][:3,])


the data type of the variable with the network as it was read is: <class 'dict'>

the keys of this dictionaries are: 
dict_keys(['__header__', '__version__', '__globals__', 'e_coli_core'])

However, if we keep the key:value pair of this dictionary, called 'e_coli_core' where the necessary information is located, we can see that its type is: <class 'numpy.void'>


number of dimensions of the np.void data type equals to:0

<class 'numpy.void'>
17
<class 'numpy.ndarray'>
(72, 1)
<class 'numpy.ndarray'>
(1,)
[[array(['glc__D_e'], dtype='<U8')]
 [array(['gln__L_c'], dtype='<U8')]
 [array(['gln__L_e'], dtype='<U8')]]
['glc__D_e', 'gln__L_c', 'gln__L_e', 'glu__L_c', 'glu__L_e', 'glx_c', 'h2o_c', 'h2o_e', 'h_c', 'h_e', 'icit_c', 'lac__D_c', 'lac__D_e', 'mal__L_c', 'mal__L_e', 'nad_c', 'nadh_c', 'nadp_c', 'nadph_c', 'nh4_c', '13dpg_c', 'nh4_e', 'o2_c', '2pg_c', 'o2_e', '3pg_c', 'oaa_c', 'pep_c', '6pgc_c', 'pi_c', '6pgl_c', 'pi_e', 'ac_c', 'pyr_c', 'pyr_e', 'q8_c', 'q8h2_c', 'r5p_c', 'ru5p__D_c'

In [None]:
counter = 0


# for entry in data_from_mat['e_coli_core'][0][0]:
#     print(entry[0])
#     counter += 1
#     print(counter)

In [None]:
for key,value in data_from_mat.items():
    print(str(key) + "\t" + str(value))
    print("\n\n\n\n\n\n")

## Preprocess

## Full dimensional (not always required step)

## Rounding

## Sampling

## Analysis, plots etc.

In [None]:
import sys
sys.version_info