# `Pydentify2`: Profile Likelihoods

Profile likelihoods are an extremely useful method of assessing a models identifiability status (Raue et al., 2009). This method (along with a number of other useful modelling tools) is impemented in a package called Data2Dynamics (Raue et al., 2015) by the group that first used profile likelihoods in systems biology. COPASI users can also calculate profile likelihoods (Schaber, 2012). The `Pydentify2` module automates Schabers method while extending it to include calculation of profile likelihoods around multiple and an arbitrary number of parameter sets. A `Plot` class is also provided for easy visualization and calculation of confidence levels. 

Since it can take some time to run a profile likelihood analysis, examples of a pre-run analysis can be downloaded from the [PyCoTools repository](https://github.com/CiaranWelsh/PyCoTools/tree/master/PyCoTools/Examples/KholodenkoExample/ProfileLikelihood) for the Kholodenko2000 model around the top three parameter sets.

The profile likelihood class can be used in three ways:

## Profile Likelihoods Around Current Point in Parameter Space

Use the `ProfileLikelihood` class with the relevant optional arguments. By default `pydentify2` samples 10 times at 1000 fold above and below the parameter of interest in log space. When using `pydentify2` in this way, it is necessary to take note of the `RSS` value for the current parameters and data. This is used later to calculate confidence levels. 

In [None]:
PyCoTools.pydentify2.ProfileLikelihood(K.kholodenko_model,
                                       LowerBoundMultiplier=1000, ## Sample 1000 times above and below the estimated parameter value
                                       UpperBoundMultiplier=1000,
                                       NumberOfSteps=10,
                                       Run='false' #turn this to 'true' before running to run the analysis
                                      )

When the analysis has finished running, use the `Plot` class to visualize the results. Remember you'll need the RSS. 

In [None]:
RSS_value=300 ## need to specify this value yourself. (300 is just a made up for illustration)
PyCoTools.pydentify2.Plot(K.kholodenko_model,RSS=RSS_value,SaveFig='true')

## Using the `ParameterPath` argument

One of the more useful features of `pydentify2` is the ability to easily calculate profile likelihoods around parameters from a file or folder of files, such as parameter estimation output from COPASI. To do this, use the `ParameterPath` kwarg. 

### Integer Index

Internally, PyCoTools assigs an `Index=-1` when no `ParameterPath` argument is specified. The analysis is set up in a new directory called `<pathToModel>\ProfileLikelihood\-1`. When a parameter estimation results file is specified the `Index` keyword dictates which rank of best fit to calculate profile likelihoods around, i.e. 0 is the best, 1 is second best and so on. 

In [None]:
PyCoTools.pydentify2.ProfileLikelihood(K.kholodenko_model,
                                       ParameterPath=K.PE_data_global1,
                                       Index=0, ## 0 is best fitting (lowest RSS) parameter set
                                       LowerBoundMultiplier=1000, ## Sample 1000 times above and below the estimated parameter value
                                       UpperBoundMultiplier=1000,
                                       NumberOfSteps=25, 
                                       Run='false' #Just set up the profile likelihood
                                      )

Now the `RSS` is automatically taken from the parameter estimation data and does not need to be specified by the user. Now the analysis can be found under the `<PathToModel>\ProfileLikelihood\0`. To plot:

In [None]:
PyCoTools.pydentify2.Plot(K.kholodenko_model,ParameterPath=K.PE_data_global1,SaveFig='true',Index=0) 

Remember to give the `Index` argument to be the same as what was used in `ProfileLikelihood`

### List Index
The `Index` kwarg also takes a list of integers to run profile likelihoods around multiple run parameter sets at once. This is useful because the profile likeihood method of identifiability analysis is a local method, and identifiability status may vary depending on what region of parameter space the parameters are in.  

In [None]:
range_of_indices=range(0,10,2) ##inventive list of indices 
print 'indices used: {}'.format(range_of_indices)
PyCoTools.pydentify2.ProfileLikelihood(K.kholodenko_model,
                                       ParameterPath=K.PE_data_global1,
                                       Index=range_of_indices,
                                       LowerBoundMultiplier=1000, ## Sample 1000 times above and below the estimated parameter value
                                       UpperBoundMultiplier=1000,
                                       NumberOfSteps=25, 
                                       Run='false' #Just set up the profile likelihood
                                      )

To plot:

In [None]:
range_of_indices=range(0,10,2) #Same range used above
PyCoTools.pydentify2.Plot(K.kholodenko_model,
                          ParameterPath=K.PE_data_global1,Index=range_of_indices,MultiPlot='true',
                          SaveFig='true') 

When `MultiPlot='true'`, the plotter starts at the largest index and works toward the lowest, sequentially adding profiles from each index to a single canvas per parameter. Graphs in this case can be found the folder of the lowest index. When `MultiPlot='false'`, each parameter index is plotted on their own canvas per `Index`.

## Running Profile Likelihood Calculations

Each way of using `ProfileLikelihood` (referred to as `methods 1-3`) can be run by specifying an argument to the `Run` keyword. There are 4 options:
    1. `Run='false'` -  set up but do not run the profile likelihood analysis
    2. `Run='slow'`  -  set up and run the profile likleihoods in serial, using a single process. 
    3. `Run='multiprocess'` - set up and run profilelikelihoods on separate process in parallel. 
    4. `Run='SGE'` - set up and run on a SunGrid engine based job scheduler. 
    
Using the `multiprocess` mode is not a very sophisticated method of running in parallel. In fact, this isn't true parallel programming since multiple models are simply opened and run by multiple processes. When `Run='multiprocess'`, `ProfileLikelihood` opens a new process for each parameter and attempts to run them at the same time. This can be computationally very heavy and makes a computer unusable until the analysis is finished. 

Note that because the parameters in this iteration of parameter estimations are not particurarly good and therefore after running this example the profiles themselves may look noisy. 

# Running on a Cluster

Sun grid engine users may use the `SGE` mode. This writes a `.sh` script containing commands to submit and run the model via CopasiSE on the cluster. For this reason, people with a SGE cluster but not at Newcastle University will probably have to modify the following snippet of code in the `PyCoTools.pydentify2.ProfileLikelihood().run_SGE` source code to include the directory to COPASI on their own cluster. 

    with open('run_script.sh','w') as f:
        f.write('#!/bin/bash\n#$ -V -cwd\nmodule addapps/COPASI/4.16.104-Linux-64bit\nCopasiSE "{}"'.format(self.cps_dct[i][j]))
                    
People using a job scheduler other than SGE will have to write their own function to submit the analysis. To do this simply copy the `run_SGE` method of the ProfileLikelihood class, change the contents of the `.sh` file and the `os.system` command to whatever is necessary for your own cluster. When using the `ProfileLikelihood` class arguments are checked for validity. Therefore you need to add a `Mode` to the class by modifying the `__init__` section of `ProfileLikelihood` class, specifically the bit which raises an error if the argument passed to `Run` isn't one of `['false','slow','multiprocess','SGE']`:

i.e. change 

    if self.kwargs.get('Run') not in ['false','slow','multiprocess','SGE']:
        raise Errors.InputError('\'Run\' keyword must be one of \'slow\', \'false\',\'multiprocess\', or \'SGE\'')
        
to

    if self.kwargs.get('Run') not in ['false','slow','multiprocess','SGE','other_job_scheduler']:
        raise Errors.InputError('\'Run\' keyword must be one of \'slow\', \'false\',\'multiprocess\', or \'SGE\' or \'other_job_scheduler\' ')

* Kholodenko, B.N. (2000) 'Negative feedback and ultrasensitivity can bring about oscillations in the mitogen-activated protein kinase cascades', Eur J Biochem, 267.
* Raue, A., Schilling, M., Bachmann, J., Matteson, A., Schelke, M., Kaschek, D., Hug, S., Kreutz, C., Harms, B.D., Theis, F.J., Klingmüller, U. and Timmer, J. (2013) 'Lessons Learned from Quantitative Dynamical Modeling in Systems Biology', PLoS ONE, 8(9), p. e74335.
* Raue, A., Kreutz, C., Maiwald, T., Bachmann, J., Schilling, M., Klingmüller, U. and Timmer, J. (2009) 'Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood', Bioinformatics, 25(15), pp. 1923-1929.
* Schaber, J. (2012) 'Easy parameter identifiability analysis with COPASI', Biosystems, 110(3), pp. 183-185.
* Raue, A., Steiert, B., Schelker, M., Kreutz, C., Maiwald, T., Hass, H., Vanlier, J., Tönsing, C., Adlung, L., Engesser, R., Mader, W., Heinemann, T., Hasenauer, J., Schilling, M., Höfer, T., Klipp, E., Theis, F., Klingmüller, U., Schöberl, B. and Timmer, J. (2015) 'Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems', Bioinformatics.
