# Suggested Workflow

You will build a workflow that starts with the data and ends with a report written in LaTeX. I suggest the following components and sequence in your workflow (you may choose to do it differently):

## Data preparation script 

First, a script that imports the data and prepares it for NLLS fitting. This may be in Python or R, and will typically have the following features:

* Creates unique ids so that you can identify unique datasets (e.g., single thermal responses or functional responses). *This may not always be necessary because your data might already contain a field that delineates single curves (e.g., an `ID` field/column)* 
* Filters out datasets with less than $x$ data points, where $x$ is the minimum number of data points needed to fit the models. Note that this step is not necessary because in any case, the model fitting (or estimation of goodness of fit statistics) will fail for datasets with small sample sizes anyway, and you can then filter these datasets *after* the NLLS fitting script (see below) has finished running and you are in the analysis phase.  
* Deals with missing, and other problematic data values.
* Saves the modified data to one or more csv file(s).

## NLLS fitting script

A separate script that does the NLLS fitting. For example, it may have the following features: 

* Opens the (new, modified) dataset from previous step.

* Calculates [starting values](more on this [below](#Obtaining-starting-values)). 

* Does the NLLS fitting.
    * If you choose Python for this use `lmfit` (look up submodules `minimize`, `Parameters`, `Parameter`, and `report_fit`. *Have a look through* <http://lmfit.github.io/lmfit-py>, especially <http://lmfit.github.io/lmfit-py/fitting.html#minimize> . You will have to install `lmfit` using `pip` or `easy_install`  (use sudo mode). Lots if examples of using lmfit online.
    * If you choose `R`, examples are [here](Appendix-ModelFitting.ipynb). 
    
* Uses the `try` construct because not all runs will converge: for Python, see [this](https://docs.python.org/3.6/tutorial/errors.html); for R, [recall this](07-R.ipynb#Errors-and-Debugging). *The more data curves you are able to fit, the better — that is part of the challenge*

* Calculates AIC, BIC, R$^{2}$, and other statistical measures of model fit (you decide what you want to include)

* Exports the results to a csv that the [final plotting script](#Final-plotting-script) can read.

### Obtaining starting values 

The main challenge for NLLS fitting is finding starting values. Ideally, you should determine starting values specific to each dataset (e.g., single thermal performance, functional response, or population growth rate curve) that you are trying to fit a model to. To do so, understanding how each parameter in the model corresponds to features of the actual data is key. For example, in the Gompertz population growth rate model(eq. \ref{eq:Gompertz}), your starting values generator would essentially be an algorithm which, for each dataset,   
*  Calculates a starting value for $r_{max}$ by searching for the steepest slope of the growth curve using the first few data points (fitting a straight line using OLS)
* Calculates a starting value of $t_{lag}$ by intersecting the fitted line with the x (time)-axis 
* Calculates a starting value for the asymptote $A$ as the highest data (abundance) value in the dataset. 

In general, a good strategy to optimize fits (and maximize how many dataseta are successfully fitted to a non-linear model) is to not sample starting values from a distribution. For example, you can choose a gaussian (high confidence in mean of parameter) or a uniform distribution (low confidence in mean, high confidence in the range of values that the parameter can take) with the mean being the value you inferred from the data.

*We suggest you write a separate script/module/function that calculates starting values for the model parameters.*  

## Final plotting and analysis script  

Next, you can import the results from the previous step and plot every curve with the two (or more) models (or none, if nothing converges) overlaid. Doing this will help you identify poor fits visually and help you decide whether the previous, NLLS fitting script can be further optimized (e.g., by improving the starting values generator). All plots should be saved in a single separate sub-directory. This script will also perform any analyses of the results of the Model fitting, for example to summarize which model(s) fit(s) best, and address any biological questions involving co-variates.    

## Report compiling script

Then comes the $\LaTeX$ source code and a (typically, Bash) script that compiles it. 

## A single script to run them all

Finally, write a Python or Bash script called `run_MiniProject.py` or `run_MiniProject.sh` respectively, which runs the whole project, right down to compilation of the LaTeX  document.

## Getting started 

Doing all this may seem a bit scary at the start. However, if you approach the problem systematically and methodically, you will soon be on your way. 

Here are some suggested first steps to get started:

* Explore the data in R or Python (e.g., using Jupyter). 

* Write a preliminary version of the plotting script without the fitted models overlaid. That will also give you a feel for the data and allow you to see (literally) what shapes the curves can take.

* Explore the models you will be fitting. Basically, be able to plot them. Write them as functions in your Python/R script (you can then re-use these functions in your NLLS fitting script as well). Then do some plotting of the functions (you can suppress or sandbox those code lines for exploratory plotting of the functions in the final product).

* Figure out, using a minimal example (say, with one, "nice-looking" thermal performance, functional response, or population growth curve/dataset) to see how the NLLS fitting package and its commands work. This is your minimal example

* next, write a loop over all unique datasets (data curves) using the `try` to catch errors in case the fitting doesn't converge.

*One thing to note is that you may need to do the NLLS fitting on the logarithm of the function (and therefore, the data) to facilitate convergence.*