# Internal documentation
This document gives a tour of the code in the `MDSINE2` package and how it interacts with `MDSINE2_Paper` package. The first part of the tutorial is giving a high level tour of the `MDSINE2` package. The next part shows how to run selected analyses with the code. The last part of the tour shows where specific functions are implemented in the code that are used often

### Table of contents
* [High level tour](#highleveltour) 
* [Common functionality](#commonfunctionality)
    * [Reading in the gibson dataset](#readinginthegibsondataset)
    * [Retrieve trace from disk](#retrievetracefromdisk)
    * [Get statistics of a trace](#getstatisticsofatrace)
    * [Defining the parameters of the model](#definingtheparametersofthemodel)
    * [Bayes factors](#bayesfactors)
    * [Condensing fixed cluster interactions in perturbations into cluster-cluster interactions](#condensingfixedcluster)
    * [Forward simulating](#forwardsimulating)
    * [Automatically generate names for taxa](#generatenamesfortaxa)

# High level tour <a class="anchor" id="highleveltour"></a>

In [None]:
study, mcmc, taxa, perturabtions, interactions, clustering

# Running specific analysis

# Common functionality <a class="anchor" id="commonfunctionality"></a>
If there is a command that is not listed here that you don't know where it is in the code, a way that you can find it is by looking in the `mdsine2.__init__` file. By seeing the module that the function/class is imported from, you can see the location in the code.

The `import` statements in the `__init__` file allow the user to access functions/classes directly from the imported package instead of having to go throughout all of the submodules. For example, instead of loading a `Study` object as
```python
study = md2.pylab.base.Study(...)
```
we can import it directly from the `mdsine2` package:
```python
study = md2.Study(...)
```
We can do this because it is imported in the `__init__` file. 

We can see the location in the code where the `Study` object is implemented by looking in the `__init__` file:
```python
from .pylab.base import ..., Study, ...
```
Note that there are many objects/functions that are imported on the same line.

To find the documentation of the different classes and functions, you can look at the html files `docs` folder of the autogenerated docs. The location of the functions/classes in the docs are the same as their location in the code, so if you don't know the location, then you need to look at the `__init__` file.

In [1]:
import mdsine2 as md2
from mdsine2.names import STRNAMES

### Reading in the Gibson dataset <a class="anchor" id="readinginthegibsondataset"></a>
    
This will automatically try to download this dataset from github.

* Location in MDSINE2: `MDSINE2.dataset.load_gibson`

*  Command: `study = md2.dataset.load_gibson(...)`

* Example:

   ```python
   # Load healthy
   healthy = md2.dataset.load_gibson(dset='healthy')
   # Load uc
   uc = md2.dataset.load_gibson(dset='uc')
   ```


### Retrieve trace from disk <a class="anchor" id="retrievetracefromdisk"></a>
    
Get the Gibb samples of a variable from disk. This has 3 difference options:
1. `section='posterior'` : This only gets the Gibb samples post-burnin
2. `section='burnin'` : This only gets the Gibb samples used for burnin
3. `section='entire'` : Gets all of the Gibb samples (`burnin` + `posterior`)

* Location in MDSINE2: `MDSINE2.pylab.inference.Tracer.get_trace`

* Example:
    Load using `Tracer` object

    ```python
    mcmc = md2.BaseMCMC.load(path/to/mcmc.pkl)

    # Get the growth parameters from disk of the posterior
    trace = mcmc.tracer.get_trace(name=STRNAMES.GROWTH_VALUE, section='posterior')
    ```

    Load directly from the object

    ```python
    mcmc = md2.BaseMCMC.load(path/to/mcmc.pkl)
    growth = mcmc.graph[STRNAMES.GROWTH_VALUE]

    # Get the growth parameters from disk from the posterior
    trace = growth.get_trace_from_disk(section='posterior')
    ```

    Note that these two are the exact same - the function `get_trace_from_disk` internally calls `tracer.get_trace`
    and passes in its own name. You can call this function for any variable you are tracing during inference
            

### Get statistics of a trace <a class="anchor" id="getstatisticsofatrace"></a>
    
Calculate some statistics of a trace. 

* Automatically calculates and returns a dictionary of:
    * `'mean'`
    * `'median'`
    * `'25th percentile'`
    * `'75th percentile'`

* Location in MDSINE2: `MDSINE2.pylab.variables.summary`

* Command: `md2.summary`

* Example:

    Using objects directly
    ```python
    mcmc = md2.BaseMCMC.load(path/to/mcmc.pkl)
    processvar = mcmc.graph[STRNAMES.PROCESSVAR]
    summ = md2.summary(processvar, section='posterior')
    # Get the mean
    mean = summ['mean']
    ```

    From raw numpy files
    ```python
    mcmc = md2.BaseMCMC.load(path/to/mcmc.pkl)
    processvar = mcmc.graph[STRNAMES.PROCESSVAR]
    trace = processvar.get_trace_from_posterior(section='posterior')
    summ = md2.summary(traces)
    # Get the mean
    mean = summ['mean']
    ```

    Note that these two calls are exactly the same - if you pass in a variable with a trace, it will automatically get the trace from disk by calling the function `get_trace_from_disk`. You can specify the section to retrieve using the `section` parameter.

* Handling NaNs
    By default, `md2.summary` ignores NaNs when calculating the statistics by using the functions `numpy.nanmean`, etc. If you want to set the NaNs to 0s, set the flag `set_nans_to_0=True`

    - Example:
        ```python
        mcmc = md2.BaseMCMC.load(path/to/mcmc.pkl)
        interactions = mcmc.graph[STRNAMES.INTERACTIONS_OBJ]
        ```

        Ignores the NaNs:

        ```python
        trace = md2.summary(interactions)
        ```

        Sets the NaNs to zero:

        ```python
        trace = md2.summary(interactins, set_nan_to_0=True)
        ```

### Defining the parameters of the model <a class="anchor" id="definingtheparametersofthemodel"></a>

* Location in MDSINE2: `MDSINE2.config`

* Command:
    - Logging: `logging = md2.LoggingConfig(...)`
        - This is used to define the logging level and the format to log with.
        - This automatically writes all of the logging to a file that you can view later
    - MDSINE2 parameters: `params = md2.MDSINE2ModelConfig(...)`
        - Defines the parameters to run the MDSINE2 model
    - Negative Binomial dispersion parameters: `params = md2.MDSINE2ModelConfig(...)`
        - Defines the parameters to learn the Negative binomial dispersion parameters

### Bayes factors <a class="anchor" id="bayesfactors"></a>

Generate the bayes factors after the chains have run

- Interactions

    * Location in MDSINE2: `MDSINE2.util.generate_interation_bayes_factors_posthoc`

    * Command: `bf = md2.generate_interation_bayes_factors_posthoc(...)`

    * Example:

    ```python
    mcmc = md2.BaseMCMC.load(path/to/mcmc.pkl)
    bf = md2.generate_interation_bayes_factors_posthoc(mcmc=mcmc, section='posterior')
    ```

- Perturbations

    * Location in MDSINE2: `MDSINE2.util.generate_perturbation_bayes_factors_posthoc`

    * Command: `bf = md2.generate_perturbation_bayes_factors_posthoc(...)`

    * Example:

    ```python
    mcmc = md2.BaseMCMC.load(path/to/mcmc.pkl)
    perturbation = mcmc.graph.perturbations[name_of_perturbation]
    bf = md2.generate_perturbation_bayes_factors_posthoc(
        mcmc=mcmc, perturbation=perturbation, section='posterior')
    ```

### Condensing fixed cluster interactions in perturbations into cluster-cluster interactions <a class="anchor" id="condensingfixedcluster"></a>

- Interactions

    * Location in MDSINE2: `MDSINE2.util.condense_fixed_clustering_interaction_matrix`

    * Command: `M = md2.condense_fixed_clustering_interaction_matrix(...)`

    * Example:

    Generate cluster-cluster interactions for each gibb step

    ```python
    mcmc = md2.BaseMCMC.load(path/to/fixed/clustering/mcmc.pkl)
    clustering = mcmc.graph[STRNAMES.CLUSTERING_OBJ]
    M = mcmc.graph[STRNAMES.INTERACTIONS_OBJ].get_trace_from_disk(
        section='posterior') # (n_gibbs, n_taxa, n_taxa)
    M_condense = md2.condense_fixed_clustering_interaction_matrix(
        M, clustering=clustering) # (n_gibbs, n_clusters, n_clusters)
    ```

    Generate expected cluster-cluster interactions

    ```python
    mcmc = md2.BaseMCMC.load(path/to/fixed/clustering/mcmc.pkl)
    clustering = mcmc.graph[STRNAMES.CLUSTERING_OBJ]
    M = md2.summary(mcmc.graph[STRNAMES.INTERACTIONS_OBJ],
                    set_nan_to_0=True, section='posterior')['mean'] # (n_taxa, n_taxa)
    M_condense = md2.condense_fixed_clustering_interaction_matrix(
        M, clustering=clustering) # (n_clusters, n_clusters)
    ```

    Generate bayes factors of the cluster-cluster interactions

    ```python
    mcmc = md2.BaseMCMC.load(path/to/fixed/clustering/mcmc.pkl)
    clustering = mcmc.graph[STRNAMES.CLUSTERING_OBJ]
    bf = md2.generate_interation_bayes_factors_posthoc(
        mcmc=mcmc, section='posterior') # (n_taxa, n_taxa)
    bf_condensed = md2.condense_fixed_clustering_interaction_matrix(
        bf, clustering=clustering) # (n_clusters, n_clusters)
    ```

    Note that the function can be fed any `n.ndarray` as long as the last 2 dimensions have the shape `(n_taxa, n_taxa)`


- Perturbations

    * Location in MDSINE2: `MDSINE2.util.condense_fixed_clustering_perturbation`

    * Command: `bf = md2.condense_fixed_clustering_perturbation(...)`   
    
    * Example:
    
    Generate cluster perturbations for each gibb step

    ```python
    mcmc = md2.BaseMCMC.load(path/to/fixed/clustering/mcmc.pkl)
    perturbation = mcmc.graph.perturbations[name_of_perturbation]
    M = perturbation.get_trace_from_disk(section='posterior') # (n_gibbs, n_taxa)
    M_condense = md2.condense_fixed_clustering_perturbation(
        M, clustering=clustering) # (n_gibbs, n_clusters)
    ```
    
    Generate expected cluster perturbation values

    ```python
    mcmc = md2.BaseMCMC.load(path/to/fixed/clustering/mcmc.pkl)
    perturbation = mcmc.graph.perturbations[name_of_perturbation]
    M = md2.summary(perturbation,set_nan_to_0=True, section='posterior')['mean'] # (n_taxa, )
    M_condense = md2.condense_fixed_clustering_perturbation(
        M, clustering=clustering) # (n_clusters, )
    ```

    Generate bayes factors of the perturbation

    ```python
    mcmc = md2.BaseMCMC.load(path/to/fixed/clustering/mcmc.pkl)
    clustering = mcmc.graph[STRNAMES.CLUSTERING_OBJ]
    bf = md2.generate_perturbation_bayes_factors_posthoc(
        mcmc=mcmc, perturbation=perturbation, section='posterior') # (n_taxa, )
    bf_condensed = md2.condense_fixed_clustering_perturbation(
        bf, clustering=clustering) # (n_clusters, )
    ```

    Note that the function can be fed any `n.ndarray` as long as the last dimension has the shape `(n_taxa, )`

### Forward simulating <a class="anchor" id="forwardsimulating"></a>

Forward simulate a dynamical system

* Location in MDSINE2: 
    - Definition of the model: `MDSINE2.model.gLVDynamicsSingleClustering`
    - Forward simulation: `MDSINE2.pylab.dynamics.integrate`
* Command:
    - Definition of the model: `md2.gLVDynamicsSingleClustering`
    - Forward simulation: `md2.integrate`
    
* Examples:
    
    Forward simulate the dynamical system in each Gibb step:
    ```python
    mcmc = md2.BaseMCMC.load(path/to/mcmc.pkl)
    
    # Get the initial conditions from  a subject
    subj = md2.Subject.load(path/to/subject.pkl)
    initial_conditions = subj.matrix()['abs'][:, 0]
    
    pred_matrix = md2.gLVDynamicsSingleClustering.forward_sim_from_chain(
        mcmc=mcmc, subjname=subj.name, initial_conditions=initial_conditions,
        times=subj.times, simulation_dt=0.01, section='posterior')
    ```
    
    Initial conditions can come from anywhere, it just needs to be a numpy array with `n_taxa` elements. We do need to pass in a subject name `subjname` so the chain knows which perturbations to use. This function acts as a wrapper for the `gLVDynamicsSingleClustering` and `md2.integrate` and is high level. We can forward simulate a single Gibb step as well with more fine tuning:
    
    Forward simulate for a single gibb step:
    ```python
    
    # Initialize and run the dynamics object
    dyn = md2.gLVDynamicsSingleClustering(
        # np.ndarray(n_taxa), growth parameters
        growth=...,
        # np.ndarray(n_taxa, n_taxa), Interaction matrix (with self-interactions on diagonal)
        interactions=..., 
        # iterable(np.ndarray(n_taxa)), perturbation effects for each perturbation
        perturbations=...,
        # iterable(np.ndarray(float)), start time for each perturbation
        perturbation_starts=...,
        # iterable(np.ndarray(float)), end time for each perturbation
        perturbation_ends=...)
    
    ret = md2.integrate(
        dyn, 
        # np.ndarray(n_taxa, 1) initial abundances
        initial_conditions=..., 
        dt=...,
        n_days=...,
        # If you want to return only certain timepoints
        subsample=True, times=...)
    ```
    
    You can find examples of forward simulation in `MDSINE2_Paper/forward_sim.py`


### Automatically generate names for taxa <a class="anchor" id="generatenamesfortaxa"></a>

In [None]:
# Initialize the MDSINE2 graph
# Location in MDSINE2" `MDSINE2.run.initialize_graph`
# Command: `mdsine2.initialize_graph`
chain = md2.intialize