# Parameter Estimation Analysis
While COPASI is excellant for generation of parameter estimation data, users are largely left to their own devices when it comes to analysing this data. PyCoTools provides the `PEAnalysis` module which is designed specifically for quickly visualizing parameter estimation data, whether generated by COPASI or elsewhere. This section describes how to use the `PEAnalysis` module. For an example on a possible workflow that COPASI users can follow to find best fitting parameters see the [workflow tutorial](https://github.com/CiaranWelsh/PyCoTools/blob/develop/PyCoTools/Examples/KholodenkoExample/modelCalibrationWorkflow.ipynb).

The PEAnalysis module includes feature to:
* Parse parameter estimation data into a python environment (`pandas.DataFrame`)
* Quickly produce customizable:
    * Boxplots
    * Optimization performance graphs
    * Histograms
    * Scatter graphs
    * Hex plots
* In future releases, `PyCoTools` will enable:
    * Heat maps displaying various statistics about the parameter estimation data.
    * Contours on the scatter/Hex with chi2 based confidence level.
    * Principle component analysis 

The `InsertParameters` and `ParameterEstimation` classes are also useful in this context to visualize best fits against experimental data.

## Get the model

In [10]:
%matplotlib inline
import os,glob

for i in glob.glob('*kholodenko.cps'):
    kholodenko_model= os.path.abspath(i)
    
print kholodenko_model
print os.path.isfile(kholodenko_model)

/home/b3053674/Documents/PyCoTools/PyCoTools/PyCoToolsTutorial/Kholodenko.cps
True


## Parsing Data

The majority of the time a user does not need the `PEAnalysis.ParsePEData` class since the other classes use it implicitly, however it is useful to have access to the raw data sometimes, specifically when custom analyses are required. Lets generate some data to read

In [14]:
import PyCoTools
TC = PyCoTools.pycopi.TimeCourse(kholodenko_model,End = 1000,StepSize=1,Intervals = 1000)
RMPE = PyCoTools.pycopi.runMultiplePEs(kholodenko_model,TC.kwargs['report_name'],
                                       copy_number = 3,pe_number = 2,
                                      method = 'GeneticAlgorithm',
                                      )
RMPE.write_config_template()
RMPE.set_up()
RMPE.run()

INFO:PyCoTools.pycopi:pycopi:4455:creating a directory for analysis in : 

/home/b3053674/Documents/PyCoTools/PyCoTools/PyCoToolsTutorial/MultipleParameterEsimationAnalysis
INFO:PyCoTools.pycopi:pycopi:4314:writing PE config template for model: /home/b3053674/Documents/PyCoTools/PyCoTools/PyCoToolsTutorial/Kholodenko.cps
INFO:PyCoTools.pycopi:pycopi:4435:Copying copasi file 3 times
INFO:PyCoTools.pycopi:pycopi:4354:setting up scan for model : /home/b3053674/Documents/PyCoTools/PyCoTools/PyCoToolsTutorial/Kholodenko_0.cps
INFO:root:pycopi:3466:defining report
INFO:PyCoTools.pycopi:pycopi:4354:setting up scan for model : /home/b3053674/Documents/PyCoTools/PyCoTools/PyCoToolsTutorial/Kholodenko_1.cps
INFO:root:pycopi:3466:defining report
INFO:PyCoTools.pycopi:pycopi:4354:setting up scan for model : /home/b3053674/Documents/PyCoTools/PyCoTools/PyCoToolsTutorial/Kholodenko_2.cps
INFO:root:pycopi:3466:defining report
INFO:PyCoTools.pycopi:pycopi:4379:Setup Took 0.376939058304 seconds
INFO:Py

In [12]:
import PyCoTools
data=PyCoTools.PEAnalysis.ParsePEData(RMPE.kwargs['output_dir']).data ## Data is held in the data attribute of the ParsePEData class

INFO:PyCoTools.PEAnalysis:PEAnalysis:90:Parsing data from /home/b3053674/Documents/PyCoTools/PyCoTools/PyCoToolsTutorial/MultipleParameterEsimationAnalysis into python


When the ParsePEData class is used, it automatically prodces a `pickle` file containing a `pandas.DataFrame`.Briefly, pickle files are a way of saving the contents of a variable to file. For example if I stored the integer 10 in a variable called x, I'd be able to write this to a pickle file to be used again elsewhere. For more information on pickle files see [here](https://docs.python.org/2/library/pickle.html).

In the `PEData` class, and therefore all the plot generating classes in `PEAnalysis`, it is possible to set  `UsePickle='true'` and `overwrite_pickle='false'` to speed up parsing. The ParsePEData class accepts `.xls, .xls, .csv, .tsv,` or `.pickle` files or a folder containing many of these files as argument (from the same problem).

In [None]:
data=PyCoTools.PEAnalysis.ParsePEData(K.PEData_file,UsePickle='true',overwrite_pickle='false').data

Since a demonstration of this module works best with a large number of parameter estimation iterations, all the kinetic parameters in the [kholodenko2000 model](http://www.ebi.ac.uk/biomodels-main/BIOMD0000000010) were re-estimated 4000 times using COPASIs genetic algorithm with a population size of 300 and generation number of 1000. Only the kinetic variables were estimated using wide boundaries between 1e-6 to 1e6 and all the noisy data simualted above were used as experimental data. Additionally, the parameter estimations were run on a cluster using the scripts under the `Scripts` folder in the PyCoTools distribution. 

This data is available as a python pickle file called [1GlobalPEData.pickle](https://github.com/CiaranWelsh/PyCoTools/tree/master/PyCoTools/Examples/KholodenkoExample). After download, put it in your working directory and make sure there is a pointer to it in the `FilePath.KholodenkoExample` class.

In [None]:
data_file = 

## Visualize Optimization Performance

In [None]:
PyCoTools.PEAnalysis.EvaluateOptimizationPerformance(K.PE_data_global1,log10='true')

This is a plot of the ordered likelihood (or RSS value) against iteration. The smooth curve indicates that the parameter estimation settings chosen for this problem are not a good choice. The absence of a monotonically increasing'step-like' shape suggests many optimizations are falling short of the minima that they are trying to find (Raue 2013). 

## Simulations Versus Experiment plots

By chaining together the `InsertParameter` class with the `ParameterEstimation` class using the `CurrentSolutionStatistics` method, setting `plot='true'` and `randomize_start_values='false'`, we can visualize a plot of simulated versus experimental data.  Note we could also set `index=1` to get a visual on the second best parameter set (and so on).

In [None]:
PEData=PyCoTools.PEAnalysis.ParsePEData(K.PE_data_global1)

print 'best estimated parameters:\n',PEData.data.iloc[0].sort_index()
PyCoTools.pycopi.InsertParameters(K.kholodenko_model,parameter_path=K.PE_data_global1,index=0)
PE=PyCoTools.pycopi.ParameterEstimation(K.kholodenko_model,K.noisy_timecourse_report,
                                        method='CurrentSolutionStatistics',
                                        plot='true',
                                        savefig='false',
                                        randomize_start_values='false') #important to turn this off
PE.set_up() ## setup
PE.run()    ## and run the current solution statistics parameter estimation

## Boxplots

In [None]:
PyCoTools.PEAnalysis.plotBoxplot(K.PE_data_global1,savefig='false',NumPerplot=6)

Since a large portion of the parameter estimations are 'bad' runs its often useful to truncate the data to below a certain value of RSS. A `below_x` value of `2.3` was chosen based on the `OptimizationPerformance` graph. 

In [None]:
PyCoTools.PEAnalysis.plotBoxplot(K.PE_data_global1,savefig='false',NumPerplot=15,truncate_model='below_x',x=2.3)

We can also get the top `x` percent. 

In [None]:
PyCoTools.PEAnalysis.plotBoxplot(K.PE_data_global1,savefig='false',NumPerplot=15,truncate_model='percent',x=10)

## Histograms

In [None]:
PyCoTools.PEAnalysis.plotHistogram(K.PE_data_global1,
                                   log10='true', ##plot on log10 scale
                                   savefig='false',bins=200)

Graphs can also be truncated by top `x` percent:

In [None]:
PyCoTools.PEAnalysis.plotHistogram(K.PE_data_global1,
                                   log10='true', 
                                   savefig='false',bins=30,
                                   truncate_model='percent',x=10) ## plot top 10% best runs

## Scatter Graphs

The `plotScatters` class automatically plots all ${{N}\choose{2}}$ pairs of estimated parameters and can therefore take some time with larger models. Since it consumes a lot of memory to plot and show all of these graphs, usually its preferable to write them to file instead. 

In [None]:
PyCoTools.PEAnalysis.plotScatters(K.PE_data_global1,savefig='false',
                                  log10='true') 

## Hex Maps

Hex maps are an alternative to both scatter graphs and histograms depending on the `mode` argument. When `mode` is `count` (the default), colours represent counts like in a histogram. Because of the dispersion in the data, the `log10='true'` is usually required to get a good look at the data with scatters and hex maps. Like scatter graphs, all  ${{N}\choose{2}}$ pairs are plotted automatically and therefore its preferable to write them to file instead of viewing in `ipython`. The `grid_size` and `bins` keywords may need fine tuning by iteration to get decent looking plots. More information can be found [here](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hexbin)

In [None]:
PyCoTools.PEAnalysis.plotHexMap(K.PE_data_global1,savefig='true',
                                  show='false',log10='true')

When `mode='RSS` hex maps are more like scatter graphs coloured by RSS value. 

In [None]:
PyCoTools.PEAnalysis.plotHexMap(K.PE_data_global1,savefig='true',
                                  log10='true',show='false',mode='RSS')