# Size Effect Normalization Tutorial

## 1. General Information

In this tutorial we explain the use of the ```size_effect_normalization``` package that comes along with our publication "Probabilistic quotient's work & pharmacokinetics' contribution: countering size effect in metabolic time series measurements". A preprint of the manuscript is available on bioRxiv [![DOI:10.1101/2022.01.17.476591](https://zenodo.org/badge/DOI/10.1007/978-3-319-76207-4_15.svg)](https://doi.org/10.1101/2022.01.17.476591).

The package is divided into three submodules:
1. ```size_effect_normalization.extended_model``` contains model classes for PKM and MIX model.
2. ```size_effect_normalization.normalization``` continas wrapper that call model classes and optimize them.
3. ```size_effect_normalization.synthetic_data_generation``` contains functions for synthetic data gerneration.

## 2. Installation

For installation of the package clone the git and run ```python setup.py install``` in the base folder. We recommend to use a virtual environment with Python 3.7 and all packages listed in ```requirements.txt```. Subsequently all modules can be imported.

In [1]:
from size_effect_normalization import extended_model
from size_effect_normalization import synthetic_data_generation
from size_effect_normalization import normalization

Docstrings of all functions can be accessed with ```?<function>```.
Other required imports for this tutorial are:

In [2]:
import numpy as np

## 3. Generate Synthetic Data

For this tutorial instead of real data we use synthetically generated data as described in the original manuscript.

### 3.1 Definition of Data Parameters

In [10]:
# We assume that the first four metabolites have a describable kinetic over time.
# Definition of basic toy model kinetic parameters.
toy_parameters = np.array([[2,.1,1,0,.1],
                            [2,.1,2,0,.1],
                            [2,.1,3,0,.1],
                            [2,.1,.5,0,.1]])
n_known_metabolites = toy_parameters.shape[0]
# Definition of time points of toy model
timepoints = np.linspace(0,15,20)
n_timepoints        = len(timepoints)
# Set seed of rng
np.random.seed(13)
# Definition of bounds of pharmacokinetic parameters.
bounds_per_metabolite  = [3,3,5,15,3]
# Definition of error size (SD/Mean)
error_sigma = .2
# Definition of the total number of metabolites in the data set.
n_metabolites = 60
# number of replicates
n_replicates = 1

### 3.2 Sampling

In [11]:
# Sample volumes (i.e. size effects).
v_tensor, v_list = synthetic_data_generation.generate_sweat_volumes(n_replicates =n_replicates,
                                                                    n_metabolites=n_metabolites,
                                                                    n_timepoints =n_timepoints)
# volume_tensor is the expanded version of shape (n_replicates,n_metabolites,n_timepoints) of volume_list with the shape (n_replicates,n_timepoints).
assert (v_tensor[:,0,:] == v_list[:,:]).all()
print(v_tensor.shape)

# Sample experimental errors
e_tensor = synthetic_data_generation.generate_experimental_errors(n_replicates=n_replicates,
                                                                  n_metabolites=n_metabolites,
                                                                  n_timepoints=n_timepoints,
                                                                  error_sigma=error_sigma)
# In contrast to v_t_list, e_list does not have repetitive elements in the n_metabolites dimension.
print(e_tensor.shape)

# Sample measured data.
# Simulation v1 from the manuscript
c_tensor = synthetic_data_generation.generate_random_kinetic_data(n_known_metabolites,
                                                                  n_metabolites,
                                                                  toy_parameters,
                                                                  timepoints,
                                                                  bounds_per_metabolite)
# Simulation v2 from the manuscript
c_tensor = synthetic_data_generation.generate_completely_random_data(n_known_metabolites,
                                                                     n_metabolites,
                                                                     toy_parameters,
                                                                     timepoints,
                                                                     bounds_per_metabolite)
# Simulation v3 from the manuscript
c_tensor = synthetic_data_generation.generate_random_from_real_data(n_known_metabolites,
                                                                    n_metabolites,
                                                                    toy_parameters,
                                                                    timepoints,
                                                                    bounds_per_metabolite)
print(c_tensor.shape)

(1, 60, 20)
(1, 60, 20)
(60, 20)


```v_tensor``` and ```e_tensor``` have the shape ```(n_replicates, n_metabolites, n_timepoints)```. ```v_tensor``` has duplicate elements along the ```n_metabolites``` axis.
```c_tensor``` has the shape (n_metabolites,n_timepoints). 
To calculate the synthetic measured mass table they are multiplied.

In [12]:
# calculate M_tilde
m_tensor = c_tensor * v_tensor[0,:,:] * e_tensor[0,:,:]
print(m_tensor.shape)

(60, 20)


## 4. Size Effect Normalization