## Parameter Estimation Tutorial

In [1]:
import os, glob
import site
site.addsitedir(r'/home/ncw135/Documents/pycotools3')
from pycotools3 import viz, model, misc, tasks, models
from io import StringIO
import pandas
%matplotlib inline

### Build a Model

In [2]:
working_directory = os.path.abspath('')

copasi_file = os.path.join(working_directory, 'negative_feedback.cps')

with model.BuildAntimony(copasi_file) as loader:
    mod = loader.load(
        """
        model negative_feedback
            compartment cell = 1.0
            var A in cell
            var B in cell

            vAProd = 0.1
            kADeg = 0.2
            kBProd = 0.3
            kBDeg = 0.4
            A = 0
            B = 0

            AProd: => A; cell*vAProd
            ADeg: A =>; cell*kADeg*A*B
            BProd: => B; cell*kBProd*A
            BDeg: B => ; cell*kBDeg*B
        end
        """.strip()
    )

## open model in copasi
#mod.open()

### Collect some experimental data
Organise your experimental data into delimited text files

In [3]:
experimental_data = StringIO(
    """
Time,A,B
 0, 0.000000, 0.000000
 1, 0.099932, 0.013181
 2, 0.199023, 0.046643
 3, 0.295526, 0.093275
 4, 0.387233, 0.147810
 5, 0.471935, 0.206160
 6, 0.547789, 0.265083
 7, 0.613554, 0.322023
 8, 0.668702, 0.375056
 9, 0.713393, 0.422852
10, 0.748359, 0.464639
    """.strip()
)

df = pandas.read_csv(experimental_data, index_col=0)

fname = os.path.join(os.path.abspath(''), 'experimental_data.csv')
df.to_csv(fname)

assert os.path.isfile(fname)

### The Config Object
The interface to COPASI's parameter estimation using `pycotools3` revolves around the `ParameterEstimation.Config` object. `ParameterEstimation.Config` is a dictionary-like object which allows the user to define their parameter estimation problem. All features of COPASI's parameter estimations task are supported, including configuration of `validation experiments`, `affected experiments`, `affected validation experiments` and `constraints` as well additional features such as the configuration of multiple models simultaneously.

The `ParameterEstimation.Config` object expects at the bare minimum some information about the models being configured, some experimental data, some fit items and a working directory. The remaining options are automatically filled in with defaults. 

In [4]:
config = tasks.ParameterEstimation.Config(
    models=dict(
        model_name=dict(
            copasi_file=copasi_file
        )
    ),
    datasets=dict(
        experiments=dict(
            first_dataset=dict(
                filename=fname,    
                separator=','
            )
        )
    ),
    items=dict(
        fit_items=dict(
            A={},
            B={},
        )
    ),
    settings=dict(
        working_directory=working_directory
    )
)
config

datasets:
    experiments:
        first_dataset:
            affected_models: all
            filename: /home/ncw135/Documents/pycotools3/docs/source/Tutorials/experimental_data.csv
            mappings:
                A:
                    model_object: A
                    object_type: Metabolite
                    role: dependent
                B:
                    model_object: B
                    object_type: Metabolite
                    role: dependent
                Time:
                    model_object: Time
                    role: time
            normalize_weights_per_experiment: true
            separator: ','
    validations: {}
items:
    fit_items:
        A:
            affected_experiments:
            - first_dataset
            affected_models:
            - model_name
            affected_validation_experiments: []
            lower_bound: 1.0e-06
            start_value: model_value
            upper_bound: 1000000.0
        B:
            affected_e

The COPASI user will be familiar with most of these settings, though there are also a few [additional options](link/to/additional/options).

Once built, a `ParameterEstimation.Config` object can be passed to `ParameterEstimation` object. 

In [5]:
PE = tasks.ParameterEstimation(config)

By default, the `run_mode` setting is set to False. To run the parameter estimation in background processes using CopasiSE, set `run_mode` to `True` or `parallel`. 

In [7]:
config.settings.run_mode = True
PE = tasks.ParameterEstimation(config)
viz.Parse(PE)
# config

TypeError: unsupported operand type(s) for +: 'dict' and 'str'

Things to describe
--------------------------
- copy number and pe number
- multiple models 
- context manager

In [None]:

This interface gives the user complete control over their parameter estimation configuration

### Experimental Data Files
Any experimental data files are passed to COPASI and should comply with [COPASI's format](http://copasi.org/Support/User_Manual/Tasks/Parameter_Estimation/). In addition, here are some additional rules to follow for using PyCoTools with COPASI:

    - Column headings must match model variables exactly in order to be mapped. 
    - Independant variables can be mapped by appending the suffix `_indep` to the variable.
        - For example, if you vary the concentration of A between 0 and 10ng/mL in an experiment:
        
                Time A_indep  B   C
                t=1   0       -   - 
                t=2   0           -
                t=3  10           -
                t=4  10       -   -
    - Do not use non-asci strings (i.e. alpha or beta) in parameter definitions. They are minimally supported in pycotools but this is unstable. 

Copasi time course output contains additional content in headers (i.e. `Values[specie_x]` instead of just `species_x`). For this demonstration to work the column headers must exactly match model variables. The `misc.format_timecourse_data` function specifically deals with this issue for the sake of this example.  `misc.format_timecourse_data` operates inplace (i.e. changes the content of the file) and returns the data as a `pandas.DataFrame`. 

### Simulate Experimental Data

In [None]:
## choose report name
report= 'parameter_estimation_synthetic_data.txt'

## simulate some data. Set global variables to empty list to remove them from the data file output
TC=tasks.TimeCourse(mod, start=0, end=10, intervals=10, step_size=1, report_name=report, 
                   global_quantities=[])

## assign simulated data file to variable
exp_data = TC.report_name

print(os.path.isfile(exp_data))

In [None]:
exp_data

### Format the Simulated Data

In [None]:
print(misc.format_timecourse_data(exp_data))


### Setup and run single parameter estimation 

Method outline:

    1) Instantiate ParameterEstimation instance with model and data filenames
        - Ensure optional arguments are specified to the ParameterEstimation class
    2) Use `write_config_template()` method to write and configure a parameter estimation config file.
    2) Use `setup()` method
    3) Use `run()` method
    4) Use `format_results()` method (to give output results meaningful headers)

In [None]:
PE=tasks.ParameterEstimation(mod, exp_data, method='hooke_jeeves', tolerance=1e-6, iteration_limit=1000)

#### Use Particle Swarm

In [None]:
PE=tasks.ParameterEstimation(mod, exp_data, method='particle_swarm', swarm_size=50)

### Use Genetic Algorithm

In [None]:
## Set getetic algorithm parameters to low for speed of demonstration
PE=tasks.ParameterEstimation(mod, exp_data, method='genetic_algorithm', 
                             population_size=15, number_of_generations=10)

### Write Parameter Estimation Configuration File

In [None]:
PE.write_config_file()
os.path.isfile(PE.config_filename)

### Configure Parameter Estimation
Parameter Estimations can be configured manually or programmatically.

##### Configure with API
The `metabolites`, `global_parameters` and `local_parameters` atributes of the `tasks.parameter_estimation` class tell Pycotools which parameters to include in the config file. By default all model parameters are included but these can be overridden by giving our own lists. We can set these to empty lists to specify that we do not want to include them in the estimation process. We can also use the `upper_bound` and `lower_bound` arguments to constrain the estimation problem.


##### Configure manually
Configuration files can also be configured manually. This is useful for when an optimization problem requires a more complex setup - maybe different estimated parameters need different boundaries. To configure manually: 

    - Delete rows with parameters you do not want estimated
    - Change start values, lower bounds and upper bounds
        - To set custom start values, remember to set the "randomize_start_values" to False.
    - If configuring manually, it is a good idea to set the "overwrite_config_file" argument to False.

In [None]:
PE = tasks.ParameterEstimation(
    mod, exp_data, 
    method='genetic_algorithm', population_size=30, number_of_generations=100,
    lower_bound = 0.001, upper_bound = 3000)

### Set up the Parameter Estimation

In [None]:
PE.setup()

### Run Parameter Estimation

In [None]:
PE.run()

#### Check Parameter Estimation Report Exists

In [None]:
os.path.isfile(PE.report_name)

### Visualize Parameter Estimaton Data

In [None]:
viz.PlotParameterEstimation(PE)

#### Save to File

In [None]:
viz.PlotParameterEstimation(PE, savefig=True)

#### Choose Results Directory

In [None]:
results_dir = os.path.join(working_directory, 'ParameterEstimationDemoResults')
viz.PlotParameterEstimation(PE, savefig=True, results_directory=results_dir)

#### Specify Which Variables to Plot

In [None]:
viz.PlotParameterEstimation(PE, y=['A'])

### Multiple Data Files

Pycotools handles multiple data files by giving `ParameterEsimation` a list of data file paths as its second argument. For demonstration we change the value of a parameter and simulate some more data

Lets simulate some more data.

Lets first change a model parameter so both sets of simulated data are not identical:

In [None]:
working_directory = '/home/b3053674/Documents/Models/2018/03_March/ParameterEstimationDemo'

copasi_file2 = os.path.join(working_directory, 'negative_feedback_with_signal.cps')

with model.BuildAntimony(copasi_file2) as loader:
    mod2 = loader.load(
        """
        model negative_feedback
            compartment cell = 1.0
            var A in cell
            var B in cell
            var Signal in cell

            vAProd = 0.1
            kADeg = 0.2
            kBProd = 0.3
            kBDeg = 0.4
            A = 0
            B = 0
            Signal = 10

            AProd: $Signal => A; cell*vAProd*Signal
            ADeg: A =>; cell*kADeg*A*B
            BProd: => B; cell*kBProd*A
            BDeg: B => ; cell*kBDeg*B
        end
        """
    )

## open model in copasi
mod2.open()

#### Generate Multiple Synthetic Data Files

In [None]:
import pandas
## initialize empty dict to store TimeCourse objects
TC_dict = {}
for i in [10, 20, 30, 40]:
    ## modify signal parameter
    model.InsertParameters(mod2, parameter_dict={'Signal': i}, inplace=True)
    
    ## create unique report name
    report_name = os.path.join(working_directory, 'timecourse_signal_{}.txt'.format(i))
    
    ## simulate time course. Turn off recording global quantities
    TC_dict[i] = tasks.TimeCourse(mod2, end=10, intervals=10, step_size=1, report_name=report_name, 
                            global_quantities=[])
    viz.PlotTimeCourse(TC_dict[i], separate=False)
    
    
    ## format the time course data
    df = misc.format_timecourse_data(TC_dict[i].report_name)
    
    ## create independent variable column
    df['Signal_indep'] = [i]*df.shape[0]
    
    ## write data to file
    df.to_csv(TC_dict[i].report_name, sep='\t', index=False)

#### Setup and Run Parameter Estimation

In [None]:
PE2 = tasks.ParameterEstimation(mod2, [i.report_name for i in TC_dict.values()], 
                                method='genetic_algorithm', population_size=30, 
                                number_of_generations=100, run_mode=True
                               )

PE2.write_config_file()
PE2.setup()
# PE2.model.open() ## to check the model
PE2.run()

#### Plot the Results

In [None]:
viz.PlotParameterEstimation(PE2)

### Steady State Experiments
By default, all data files are timecourse. To specify timecourse data we need to add a few more arguments to the ParameterEstimation instantiation - namely the experiment type argument. 

##### Create Synthetic Steady State Data

In [None]:
steady_state_data_file = os.path.join(working_directory, 'steady_state_data.txt')
df = pandas.DataFrame([400], columns=['A'])
print(df)
df.to_csv(steady_state_data_file, index=False)

#### Setup and Run Parameter Estimation

In [None]:
config_file = os.path.join(working_directory, 'steadystate_config_file.csv')
PE3 = tasks.ParameterEstimation(
    mod, [TC.report_name, steady_state_data_file],
    experiment_type=['timecourse', 'steadystate'],
    weight_method=['mean_squared', 'mean_squared'],
    method='genetic_algorithm', population_size=15,
    number_of_generations=10, metabolites=[], local_parameters=[],
    lower_bound = 0.001, upper_bound = 3000, config_filename = config_file
)
PE3.write_config_file()
PE3.setup()
PE3.run()