# MFA using INCA in MATLAB - now all in Python!

This is an example notebook that makes use of functions that can write a MATLAB script that runs an MFA analysis using INCA. INCA is a MATLAB software. To make it easier for users, MATLAB can be run here in the notebook using an engine. See the instructions below on how to use this notebook.

## Prerequisites

#### MATLAB

Get a free academic licence and install MATLAB from https://www.mathworks.com. Then, install the engine API following the guide provided [under this link](https://www.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html). In short, you will have to go to your MATLAB root folder (find your installation and open that folder) and go to "/extern/engines/python" and run "python setup.py install" from the command line.

#### INCA

"INCA (Isotopomer Network Compartmental Analysis) is a MATLAB-based software package for isotopomer network modeling and metabolic flux analysis." You can read more about it in [Young, 2014](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998137/pdf/btu015.pdf).

You have to get a free academic licence for INCA from [the Vanderbilt University website](http://mfa.vueinnovations.com/licensing) (the second option is the relevant one) and install it. Note the path to the base directory of your INCA installation, you will need it later.

#### Import customized functions for INCA utilization

In [44]:
import pandas as pd
import pathlib
from BFAIR.mfa.INCA.INCAScript import INCAScript
from BFAIR.mfa.INCA.INCAScript_writing import (
    define_reactions, define_tracers, define_ms_data, define_flux_measurements,
    define_experiment, define_options, define_model, define_runner
)
import ast
from BFAIR.mfa.INCA.run_inca import run_inca
data_folder = pathlib.Path("../../tests/test_data/MFA_modelInputsData/simple_model/")

To illustrate how to use the INCA parser, we will use a small toy model with 5 reactions original used as a test model in [1,2]. As described in the Input data notebook, the INCA parser takes inputs in the form of pandas dataframes, which has to obey specific structure schema's. Let's have a look at the reactions data of our simple model.

In [45]:
reactions_data = pd.read_csv(data_folder / "reactions.csv")
reactions_data.head()

Unnamed: 0,model,rxn_id,rxn_eqn
0,simple_model,R1,A (abc) -> B (abc)
1,simple_model,R2,B (abc) <-> D (abc)
2,simple_model,R3,B (abc) -> C (bc) + E (a)
3,simple_model,R4,B (abc) + C (de) -> D (bcd) + E (a) + E (e)
4,simple_model,R5,D (abc) -> F (abc)


We see that the simple toy model consist of 5 reactions each defined with an atom map and has a unique identifier. Lets move on to the tracer data.

In [46]:
tracers_data = pd.read_csv(data_folder / "tracers.csv", converters={'atom_ids': ast.literal_eval, 'atom_mdv':ast.literal_eval}) # remove id, add prurity
tracers_data.head()

Unnamed: 0,experiment_id,met_id,tracer_id,atom_ids,ratio,atom_mdv,enrichment
0,exp1,A,[1-13C]A,[1],1.0,"[0, 1]",1
1,exp2,A,"[1,2-13C]A","[1, 2]",0.5,"[0.05, 0.95]",1


Our data set contains two experiments each carried out with a different labelled substrate of metabolite A. In this data set we had measurements of exchange fluxes and ms data of some metabolites. Notice that we use the converters argument to properly read some the data (see XX for more information).

In [47]:
flux_data = pd.read_csv(data_folder / "measuredFluxes.csv")
ms_data = pd.read_csv(data_folder / "experimentalMS_data_multiple_experiments.csv", 
   converters={'labelled_atom_ids': ast.literal_eval, 'idv': ast.literal_eval, 'idv_std_error': ast.literal_eval}
)
flux_data.head()

Unnamed: 0,experiment_id,rxn_id,flux,flux_std_error
0,exp1,R1,10.0,1e-05


In [48]:
ms_data.head()

Unnamed: 0,experiment_id,time,met_id,ms_id,unlabelled_atoms,labelled_atom_ids,idv,idv_std_error
0,exp1,0,F,F1,C3H5O1,"[1, 2, 3]","[0.01, 0.8, 0.1, 0.0009]","[0.0003, 0.003, 0.0008, 0.001]"
1,exp2,0,F,F1,C3H5O1,"[1, 2, 3]","[0.5, 0.45, 0.05, 0.0]","[3e-05, 0.0027, 0.03, 0.001]"


## Generate the MATLAB script
To prepare a model and data for 13C-MFA in INCA, we need to write a matlab script, which specifies to the model, the tracers and the measured data. The INCA parser contains a workflow which can generate this script for the input files. This script is created piece by piece by line of code to the `INCAScript` object. When the INCA parser runs INCA it simply execute this script in Matlab. To ensure that the matlab code is executed in the correct order we have organized the `INCAScript` object in code blocks. During this workflow the user populate these code blocks one by one until the script is fully populated. The main procedure utilize the `.add_to_block()` method of the `INCAScript` object and a function which creates a string of Matlab code.

Let's have look at how to define the model in the `INCAScript`. For this we will use the script-writer function `define_reactions()`. This function takes a dataframe that describes the reactions and return a string of Matlab code that defines the reactions an INCA model.

In [49]:
from BFAIR.mfa.INCA.INCAScript_writing import define_model
print(define_reactions(reactions_data))

% Create reactions
r = [...
reaction('A (abc) -> B (abc)', ['id'], ['R1']),...
reaction('B (abc) <-> D (abc)', ['id'], ['R2']),...
reaction('B (abc) -> C (bc) + E (a)', ['id'], ['R3']),...
reaction('B (abc) + C (de) -> D (bcd) + E (a) + E (e)', ['id'], ['R4']),...
reaction('D (abc) -> F (abc)', ['id'], ['R5']),...
];


To add the reaction definition to the `INCAScript` we use the `.add_to_block()` method.

In [50]:
from BFAIR.mfa.INCA.INCAScript import INCAScript
script = INCAScript()
script.add_to_block('reactions', define_reactions(reactions_data))

We can view the reactions block in the `INCAScript`.

In [51]:
print(script.reaction)

% REACTION BLOCK
% Create reactions
r = [...
reaction('A (abc) -> B (abc)', ['id'], ['R1']),...
reaction('B (abc) <-> D (abc)', ['id'], ['R2']),...
reaction('B (abc) -> C (bc) + E (a)', ['id'], ['R3']),...
reaction('B (abc) + C (de) -> D (bcd) + E (a) + E (e)', ['id'], ['R4']),...
reaction('D (abc) -> F (abc)', ['id'], ['R5']),...
];


There are similar script writing functions to generate the other parts of the Matlab script. The experimental data is added to the `INCAScript` on a per experiment basis. Thus, the functions which adds experimental data also takes a argument for the experiment id. One example is the `define_tracers`:

In [52]:
from BFAIR.mfa.INCA.INCAScript_writing import define_tracers
print(define_tracers(tracers_data, 'exp1'))

% define tracers used in exp1
t_exp1 = tracer({...
'[1-13C]A: A @ 1',...
});
t_exp1.frac = [1.0 ];
t_exp1.atoms.it(:,1) = [0,1];



Because the `.add_to_block()` appends to the code block is best practice to include the `INCAScript` instantiation and the `.add_to_block()` calls within the same jupyter-notebook cell. This will avoid adding the same code multiple times, when working a jyputer-notebook file. In the following we add tracers, flux measurements, and ms measurements from the experiment with experiment id exp1.

In [53]:
script = INCAScript()
script.add_to_block('reactions', define_reactions(reactions_data))
script.add_to_block('tracers', define_tracers(tracers_data, 'exp1'))
script.add_to_block('fluxes', define_flux_measurements(flux_data, 'exp1'))
script.add_to_block('ms_fragments', define_ms_data(ms_data, 'exp1'))
script.add_to_block('experiments', define_experiment('exp1', measurement_types=['data_flx', 'data_ms']))

Notice that that addition of data is a two step procedure. First, the data is added to the script and second the data is added to the experiment using the `define_experiment()` function. This function takes the experiment id and a list of what measurement types are associated with this experiment. 

We can view the script that we have build by simply printing the script object.

In [54]:
print(script)

clear functions

% REACTION BLOCK
% Create reactions
r = [...
reaction('A (abc) -> B (abc)', ['id'], ['R1']),...
reaction('B (abc) <-> D (abc)', ['id'], ['R2']),...
reaction('B (abc) -> C (bc) + E (a)', ['id'], ['R3']),...
reaction('B (abc) + C (de) -> D (bcd) + E (a) + E (e)', ['id'], ['R4']),...
reaction('D (abc) -> F (abc)', ['id'], ['R5']),...
];

% TRACERS BLOCK
% define tracers used in exp1
t_exp1 = tracer({...
'[1-13C]A: A @ 1',...
});
t_exp1.frac = [1.0 ];
t_exp1.atoms.it(:,1) = [0,1];


% FLUXES BLOCK

% define flux measurements for experiment exp1
f_exp1 = [...
data('R1', 'val', 10.0, 'std', 1e-05),...
];


% MS_FRAGMENTS BLOCK

% define mass spectrometry measurements for experiment exp1
ms_exp1 = [...
msdata('F1: F @ 1 2 3', 'more', 'C3H5O1'),...
];

% define mass spectrometry measurements for experiment exp1
ms_exp1{'F1'}.idvs = idv([[0.01;0.8;0.1;0.0009]], 'id', {'exp1_F1_0_0_1'}, 'std', [[0.0003;0.003;0.0008;0.001]], 'time', 0.0)


% EXPERIMENTAL_DATA BLOCK
e_exp1 = exper

We see that the three last blocks model, options and runner, have not been populated yet. To populate the model block we need to define what experiments should be included in the model. In the options block, we can changes the options/settings which influence how INCA is run, for example we can increase the number of restarts during the flux estimation procedure and turn off the natural abundance correction. Finally, the runner defines what algorithms INCA should run on the model. In this example we will run the estimation and the simulation algorithm.

In [55]:
# same as previous code
script = INCAScript()
script.add_to_block('reactions', define_reactions(reactions_data))
script.add_to_block('tracers', define_tracers(tracers_data, 'exp1'))
script.add_to_block('fluxes', define_flux_measurements(flux_data, 'exp1'))
script.add_to_block('ms_fragments', define_ms_data(ms_data, 'exp1'))
script.add_to_block('experiments', define_experiment('exp1', ['data_flx', 'data_ms']))

# new code to define model, options and runner
script.add_to_block('model', define_model(['exp1']))
script.add_to_block('options', define_options(fit_starts=20, sim_na=False))
script.add_to_block('runner', define_runner("/path/to/output/file.mat", run_estimate=True, run_simulation=True))

In [56]:
print(script)

clear functions

% REACTION BLOCK
% Create reactions
r = [...
reaction('A (abc) -> B (abc)', ['id'], ['R1']),...
reaction('B (abc) <-> D (abc)', ['id'], ['R2']),...
reaction('B (abc) -> C (bc) + E (a)', ['id'], ['R3']),...
reaction('B (abc) + C (de) -> D (bcd) + E (a) + E (e)', ['id'], ['R4']),...
reaction('D (abc) -> F (abc)', ['id'], ['R5']),...
];

% TRACERS BLOCK
% define tracers used in exp1
t_exp1 = tracer({...
'[1-13C]A: A @ 1',...
});
t_exp1.frac = [1.0 ];
t_exp1.atoms.it(:,1) = [0,1];


% FLUXES BLOCK

% define flux measurements for experiment exp1
f_exp1 = [...
data('R1', 'val', 10.0, 'std', 1e-05),...
];


% MS_FRAGMENTS BLOCK

% define mass spectrometry measurements for experiment exp1
ms_exp1 = [...
msdata('F1: F @ 1 2 3', 'more', 'C3H5O1'),...
];

% define mass spectrometry measurements for experiment exp1
ms_exp1{'F1'}.idvs = idv([[0.01;0.8;0.1;0.0009]], 'id', {'exp1_F1_0_0_1'}, 'std', [[0.0003;0.003;0.0008;0.001]], 'time', 0.0)


% EXPERIMENTAL_DATA BLOCK
e_exp1 = exper

In [35]:
# replace fake path with actual path, this is just to hide my actual path from being displayed in the docs.
# It is not necessary in the actual workflow
script.runner = script.runner.replace("'/path/to/output/file.mat'", "'" + str(pathlib.Path('simple_model_inca.mat').resolve()) + "'")

Now we are ready to run the flux estimation algorithm in INCA.

In [36]:
from BFAIR.mfa.INCA.run_inca import run_inca
import dotenv
inca_directory = pathlib.Path(dotenv.get_key(dotenv.find_dotenv(), "INCA_base_directory")) # Simply replace this with the path to your INCA installation
run_inca(script, INCA_base_directory=inca_directory)

INCA script saved to /var/folders/z6/mxpxh4k56tv0h0ff41vmx7gdwtlpvp/T/tmpvymjjsts/inca_script.m.
Starting MATLAB engine...
 
ms_exp1 = 1x1 msdata object
 
fields: atoms  id  [idvs]  more  on  state  
 
F1
 
 
m = 1x1 model object
 
fields: [expts]  [mets]  notes  [options]  [rates]  [states]  
 
	5 reactions (6 fluxes)                                  
	6 states (3 balanced, 1 source, 2 sink and 0 unbalanced)
	6 metabolites                                           
	1 experiments                                           
 

                                         Directional 
 Iteration      Residual     Step-size    derivative        Lambda
     0       9.99461e+11
     1       9.97934e+11      0.000764     -9.99e+11       1.60177
     2       1.42897e+06             1     -3.06e+06       1.60177
     3           80454.8             1     -1.92e+05      0.533923
     4            8606.5             1     -4.85e+03      0.177974
     5           8153.72             1         -3.28  

INCA has now done flux estimation on our model and saved it to a .mat file, which was specified in the `INCAScript` (here simple_model_inca.mat). This file can be open using the `INCAResults` workflow described in a later notebook or directly in the INCA GUI using "Open fluxmap".

### Note about opening models in INCA GUI
There are multiple ways to open models in the INCA GUI: "Open model", "Open fluxmap". Both methods opens the .mat file with is generated by `run_inca()`. "Open model" will open the model with any associated experiments and data, but it will not include the results of flux estimation. To get the results of the estimate, continuation or monte carlo algorithms use the "Open fluxmap" instead. This will load both the model and the results of the algorithms which were applied. One thing to know is that "Open fluxmap" in the INCA GUI fails if there is no simulation in the .mat file. Thus, to be able to open the results in the INCA GUI it is required to set `run_simulation=True` in the `define_runner()`.

## Handling multiple experiments
The INCA parser can create a model with data from multiple experiments. This is relevant when you wish to fit data from a parallel labelling experiment. The toy model example, which have worked with has two experiments. The easiest way to specify multiple experiments is to loop over the experiments and add the data. This is easiest done by defining a experimental configuration dictionary, which specify the measurement types for each experiment.

In [61]:
experiment_config = {"exp1":["data_flx", "data_ms"], "exp2":["data_ms"]}

Now we can loop over the experiments in the configuration dictionary, add the and set up the model.

In [62]:
# hiding warnings to hide paths
import warnings
warnings.filterwarnings('ignore')

script_multiple_exp = INCAScript()
script_multiple_exp.add_to_block('reactions', define_reactions(reactions_data))

for exp_id, measurement_types in experiment_config.items():
    script_multiple_exp.add_to_block('tracers', define_tracers(tracers_data, exp_id))
    script_multiple_exp.add_to_block('fluxes', define_flux_measurements(flux_data, exp_id))
    script_multiple_exp.add_to_block('ms_fragments', define_ms_data(ms_data, exp_id))
    script_multiple_exp.add_to_block('experiments', define_experiment(exp_id, measurement_types))

script_multiple_exp.add_to_block('model', define_model(experiment_config.keys()))
script_multiple_exp.add_to_block('options', define_options(fit_starts=20, sim_na=False))
script_multiple_exp.add_to_block('runner', define_runner("simple_model_inca_multiple_exp", run_estimate=True, run_simulation=True))

The above code will produce a warning telling you that exp2 had no MS data. This is simply to inform the user in this case we did not supply MS data for exp2 everything is fine.

Now we run the estimation algorithm to fit the data from both experiments at the same time.

In [64]:
run_inca(script_multiple_exp, INCA_base_directory=inca_directory)

INCA script saved to /var/folders/z6/mxpxh4k56tv0h0ff41vmx7gdwtlpvp/T/tmpl8iyzgxf/inca_script.m.
Starting MATLAB engine...
 
ms_exp1 = 1x1 msdata object
 
fields: atoms  id  [idvs]  more  on  state  
 
F1
 
 
ms_exp2 = 1x1 msdata object
 
fields: atoms  id  [idvs]  more  on  state  
 
F1
 
 
m = 1x1 model object
 
fields: [expts]  [mets]  notes  [options]  [rates]  [states]  
 
	5 reactions (6 fluxes)                                  
	6 states (3 balanced, 1 source, 2 sink and 0 unbalanced)
	6 metabolites                                           
	2 experiments                                           
 

                                         Directional 
 Iteration      Residual     Step-size    derivative        Lambda
     0        9.9972e+11
     1       8.90482e+11        0.0562     -9.43e+11       6.01096
     2       6.66179e+11         0.135      -7.7e+11       6.01096
     3        6.5757e+11       0.00648     -6.62e+11       6.01096
     4       2.58872e+11         0.37

## References
[1] M. R. Antoniewicz, J. K. Kelleher, and G. Stephanopoulos, “Determination of confidence intervals of metabolic fluxes estimated from stable isotope measurements,” Metabolic Engineering, vol. 8, no. 4, pp. 324–337, Jul. 2006, doi: 10.1016/j.ymben.2006.01.004.

[2] M. R. Antoniewicz, J. K. Kelleher, and G. Stephanopoulos, “Elementary metabolite units (EMU): A novel framework for modeling isotopic distributions,” Metabolic Engineering, vol. 9, no. 1, pp. 68–86, Jan. 2007, doi: 10.1016/j.ymben.2006.09.001.
