# Model Calibration as Part of Model Evaluation in a Model Evolution System
Model calibration serves the purpose of comparing a (data) model revision against a benchmark (expert) model. It is a validation process that ensures external quality assurance. The key question asked during the calibration process is: *"Does the proposed new model meet stakeholder needs?"*

The calibration process captures the following 3 questions that determine the extend to which a proposed model is fit for purpose:
1. __Data integrity__: How good is my data?
2. __Model integrity__: How well does my new model match my existing model?
3. __Abstraction integrity__: How well does my model represent reality?

## Generate Data Model

In [1]:
import features.ts as ts
import evaluation.evalhelpers as eh
import evaluation.calibration as ec

import plotly.offline as offline
import plotly.graph_objs as go
import plotly as py
offline.init_notebook_mode(connected=True) #set for plotly offline plotting

In [4]:
%%capture

year = 2011
experiment_dir = 'exp1' # sub-directory in which inferred customer classes are saved

dm = ec.generateDataModel(year, experiment_dir)

ods = dm[0] # observed demand summary
ohp = dm[1] # observed hourly profiles
adtd = dm[2] # aggregate daytype demand
amd = dm[3] # annual monthly demand
aggpp = dm[4] # aggregate monthly power profile
pp = dm[5] # power profile

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

## Explore Data Integrity
The purpose of an uncertainty index is to assess the data integrity. The key questions that the uncertainty index answers are:
1. *Do I have enough representative data?*
2. *Is the data sufficiently reliable to construct a model with integrity?*

The uncertainty index is calculated by establishing whether the sample size is sufficient to draw a conclusion about a certain characteristic feature of the model. In this system it is derived by selecting a valid model based on:
* a specified minimum number of profiles observed
* a specified minimum number of valid observations per model variable

The uncertainty index is the ratio of variables (rows) in the valid model to total variables. It is calculated as follows:

```
valid_submodel = submodel_input[where AnswerID_count >= minimum and valid_obs_ratio >= minimum]
uix = rows(valid_submodel) / rows(submodel_input)
```

Moreover, for a model to be valid, it must share the same baseline as the benchmark model (eg same year, same region).

In [5]:
ec.uncertaintyStats(ods)[:8]

Unnamed: 0_level_0,Unnamed: 1_level_0,AnswerID_count,valid_obs_ratio
customer_class,index,Unnamed: 2_level_1,Unnamed: 3_level_1
informal_settlement,count,15.0,15.0
informal_settlement,mean,13.266667,0.910585
informal_settlement,std,19.436588,0.089651
informal_settlement,min,2.0,0.705574
informal_settlement,25%,3.0,0.893042
informal_settlement,50%,5.0,0.949663
informal_settlement,75%,15.0,0.970046
informal_settlement,max,78.0,0.976135


In [42]:
data = []
xx = list(range(1, 16))

for c in ods['class'].unique():
    t = ods.loc[ods['class']==c, ['YearsElectrified', 'AnswerID_count']]
    for y in xx:
        if y in t.YearsElectrified.unique():
            pass
        else:
            t = t.append({'YearsElectrified':y, 'AnswerID_count':0}, ignore_index=True)
    t.sort_values('YearsElectrified', inplace=True)
    t.reset_index(drop=True, inplace=True)
    
    trace = go.Bar(
            x=xx,
            y=t['AnswerID_count'],
            name=c
    )
    data.append(trace)

layout = go.Layout(
            barmode='stack',
            xaxis = dict(title='Years Electrified',
                            tickvals = xx),
            yaxis = dict(title='Valid AnswerID count'))

fig = go.Figure(data=data, layout=layout)
offline.iplot(fig)

In [6]:
ods.name = 'demand_summary'
ohp.name = 'hourly_profiles'
ec.dataUncertainty([ods, ohp], 2, 0.85)

Unnamed: 0_level_0,valid_data,uncertainty_index
submodel_name,Unnamed: 1_level_1,Unnamed: 2_level_1
demand_summary,class YearsElectrified M_k...,0.631579
hourly_profiles,class YearsElectrified m...,0.62552
