# Basics of ObjectiveLearner

## What is ObjectieLearner

**ObjectiveLearner** defines the python module `objlearner` that provides functionality to run machine learning (linear regression) and sensitivity analysis on the objective function versus parameter relationship using the thousands (to millions) of (sometimes expensive) objective function evaluations performed during model calibration with packages like [PyDREAM](https://github.com/LoLab-VU/PyDREAM), [simplePSO](https://github.com/LoLab-VU/ParticleSwarmOptimization), [Gleipnir](https://github.com/LoLab-VU/Gleipnir), and [GAlibrate](https://github.com/blakeaw/GAlibrate).

ObjectiveLearner provides easy to use objective function decorators which allow users to save data from the objective function evaluations performed during model calibration, and thereby provides them a way to utilize what would typically be lost data (i.e., not saved by the calibrator) and learn even more about the objective function and its relationship to model parameters, as well as learn more about the underlying model and assumptions the objective function represents.

ObjectiveLearner installs as the `objlearner` package which defines the three decorator classes:
   * ObjectiveCounter - counts the number of objective function calls.
   * ObjectiveSaver - saves the input-output pairs fo the objective function calls.
   * ObjectiveLearner - provides machine learning (linear regression) and sensitivity analysis.

In the following sections we'll cover each class and it's use. 

### The Example Model

For the purposes of this overview we will use the [GAlibrate](https://github.com/blakeaw/GAlibrate) package which defines a genetic algorithm optimizer: 
```
pip install galibrate
```
or 
```
conda install -c blakeaw galibrate
```

We will estimate the model parameters for a linear fit to three data points with uncertainty, an example adapted from the Nestle 'Getting started' section at: http://kylebarbary.com/nestle/


Here are the imports we need from NumPy and GAlibrate:

In [1]:
import numpy as np
from galibrate.sampled_parameter import SampledParameter
from galibrate import GAO



And this is the data that we are going calibrate against:

In [2]:
# Setup the data points that are being fitted.
data_x = np.array([1., 2., 3.])
data_y = np.array([1.4, 1.7, 4.1])
data_yerr = np.array([0.2, 0.15, 0.2])

This is the objective function for this problem, a genetic algorithm fitness function which will be maximized by the GAO:

In [3]:
def fitness(chromosome):
    y = chromosome[1] * data_x + chromosome[0]
    chisq = np.sum(((data_y - y) / data_yerr)**2)
    if np.isnan(chisq):
        return -np.inf
    return -chisq / 2.

### ObjectiveCounter

The ObjectiveCounter simply keeps count of the number of calls to the objective function. It's imported from `objlearner`:

In [4]:
from objlearner import ObjectiveCounter

Then we use it as a decorator for the objective function:

In [5]:
@ObjectiveCounter
def fitness(chromosome):
    y = chromosome[1] * data_x + chromosome[0]
    chisq = np.sum(((data_y - y) / data_yerr)**2)
    if np.isnan(chisq):
        return -np.inf
    return -chisq / 2.

And then we setup and run the calibration run as normal:

In [6]:
# Set up the list of sampled parameters
parm_names = list(['m', 'b'])
sampled_parameters = [SampledParameter(name=p, loc=-5.0, width=10.0) for p in parm_names]

# Set the active point population size
population_size = 200
n_params = len(sampled_parameters)
print("Sampling a total of {} parameters".format(n_params))
print("Will use GA population size of {}".format(population_size))
# Construct the GAO
gao = GAO(sampled_parameters,
         fitness,
         population_size,
         generations = 100,
         mutation_rate = 0.1)
# run it
best, best_f = gao.run()
print(best, best_f)

Sampling a total of 2 parameters
Will use GA population size of 200
[-0.33089809  1.24605161] -13.664009334856136


Then we can access the information on the number of objective function evaluations with `count` member variable:

In [7]:
print("Number of fitness evaluations: ",fitness.count)

Number of fitness evaluations:  10300


### ObjectiveSaver

In addition to keeping count of the number of calls to the objective function, ObjectiveSaver also saves the input parameter vectors and the corresponding objective function evaluations. It's imported from `objlearner`:

In [8]:
from objlearner import ObjectiveSaver

Then we use it as a decorator for the objective function:

In [9]:
@ObjectiveSaver
def fitness(chromosome):
    y = chromosome[1] * data_x + chromosome[0]
    chisq = np.sum(((data_y - y) / data_yerr)**2)
    if np.isnan(chisq):
        return -np.inf
    return -chisq / 2.

And then we setup and run the calibration run as normal:

In [10]:
# Set up the list of sampled parameters
parm_names = list(['m', 'b'])
sampled_parameters = [SampledParameter(name=p, loc=-5.0, width=10.0) for p in parm_names]

# Set the active point population size
population_size = 200
n_params = len(sampled_parameters)
print("Sampling a total of {} parameters".format(n_params))
print("Will use GA population size of {}".format(population_size))
# Construct the GAO
gao = GAO(sampled_parameters,
         fitness,
         population_size,
         generations = 100,
         mutation_rate = 0.1)
# run it
best, best_f = gao.run()
print(best, best_f)

Sampling a total of 2 parameters
Will use GA population size of 200
[-0.12701344  1.2286512 ] -13.600164887776693


Then the principal information for the objective function evaluations is accessed with `objective_theta` member variable:

In [11]:
print(fitness.objective_theta)

         objective         0         1
0      -378.589375 -4.755865  4.362777
1     -4586.718622 -1.515087 -2.798161
2      -445.367602  1.198610  2.021765
3      -354.921838 -1.450604  3.048453
4      -567.588219  2.634260  1.523190
...            ...       ...       ...
10295 -2505.106490 -0.351074 -2.104487
10296   -83.400251  1.073834  0.160292
10297  -221.240213 -0.237476  0.272537
10298  -186.489599  1.073834 -0.187327
10299 -2188.824642 -0.674788 -1.737417

[10300 rows x 3 columns]


which returns a pandas DataFrame.

Additionally, the ObjectiveSaver defines the following member variable and functions:
  * count - the number of objective function evaluations
  * write_csv(prefix='filename_prefix') - write out the objective function evaluation data as a csv file
  * write_npy(prefix='filename_prefix') - write out the objective function evaluation data as a NumPy npy file 

### ObjectiveLearner

The ObjectiveLearner is the principal (and name) tool from the **ObjectiveLearner** package, and it provides functionanlity to analyze objective function evaluation data with machine learning (linear regression) and sensitivity analysis. It's imported from `objlearner`:

In [12]:
from objlearner import ObjectiveLearner

Then we use it as a decorator for the objective function:

In [13]:
@ObjectiveLearner
def fitness(chromosome):
    y = chromosome[1] * data_x + chromosome[0]
    chisq = np.sum(((data_y - y) / data_yerr)**2)
    if np.isnan(chisq):
        return -np.inf
    return -chisq / 2.

And then we setup and run the calibration run as normal:

In [14]:
# Set up the list of sampled parameters
parm_names = list(['m', 'b'])
sampled_parameters = [SampledParameter(name=p, loc=-5.0, width=10.0) for p in parm_names]

# Set the active point population size
population_size = 200
n_params = len(sampled_parameters)
print("Sampling a total of {} parameters".format(n_params))
print("Will use GA population size of {}".format(population_size))
# Construct the GAO
gao = GAO(sampled_parameters,
         fitness,
         population_size,
         generations = 100,
         mutation_rate = 0.1)
# run it
best, best_f = gao.run()
print(best, best_f)

Sampling a total of 2 parameters
Will use GA population size of 200
[-0.94778955  1.61824086] -14.820251155584078


#### Machine Learning (Linear Regression)

The CostLearner decorator provides several functions to compute the coefficients and explained variance score between the objective function and input parameter vectors using linear regression methods (form [scikit-learn](https://scikit-learn.org/stable/index.html)):

  * least_squares() - Least squares linear regression of the objective function against the parameters.
  * ridge() - Ridge regression of the objective function against the parameters.
  * lasso() - Lasso regression of the objective function against the parameters.
  * linear_svr() - Linear Support Vector Regression (SVR) of the objective function against the parameters.
  
For example:  

In [15]:
coefs, ev_score = fitness.least_squares()
print("Least-squares linear regression coefficients and explained variance score:")
print(coefs, ev_score)

Least-squares linear regression coefficients and explained variance score:
[232.58851622 478.82551034] 0.3588026227732696


#### Sensitivity Analysis

The CostLearner decorator also provides several functions to compute sensitivity metrics between the objective function and input parameter vectors using sensitivity analyses (from [SALib](https://salib.readthedocs.io/en/latest/index.html)):

  * sobol() - Sobol sensitivity of the objective function.
  * morris() - Morris Method sensitivity of the objective function.
  * delta() - Delta Moment-Independent Measure sensitivity of the objective function.
  * fast() - FAST sensitivity analysis of the objective function.
  * rbd_fast() - RBD-FAST sensitivity analysis of the objective function.
  
For example:  

In [16]:
Si = fitness.sobol()
print(Si)

{'S1': array([0.06167879, 0.60798909]), 'S1_conf': array([0.04306523, 0.0665324 ]), 'ST': array([0.39456359, 0.94167418]), 'ST_conf': array([0.03766878, 0.0691636 ]), 'S2': array([[       nan, 0.34303938],
       [       nan,        nan]]), 'S2_conf': array([[       nan, 0.09877772],
       [       nan,        nan]])}
