## Splitting Evaluation
The objective of this notebook is to obtain a standard way of evaluating models. For each person, the model is expected to predict a probability.  The referal is then evaluated under what time period it occurs. 




In [12]:
%reload_ext autoreload
%autoreload 2


In [21]:
import sys
import pandas as pd
import datetime
from pathlib import Path
import numpy as np
pd.set_option('display.max_columns', 9999)
from sklearn.metrics import precision_score, accuracy_score, recall_score, balanced_accuracy_score, f1_score
from sklearn.metrics import roc_auc_score, log_loss
from sklearn.metrics import roc_curve

In [22]:
#This could be used for benchmarking data
#100 People for 5.5 years. 
path=Path('../data')
file='test.csv' # has just 10 people for 2 years
#file='long-5-1000-10-5-2-True-10-5-2-True-2-101-5-0-0.csv'
df=pd.read_csv('https://raw.githubusercontent.com/HealthINCITE/patient_panel/master/data/test.csv')
df

Unnamed: 0,id,yrm,cad0,cad1,dv9,target
0,1000,201601,1,1,0,0
1,1000,201602,1,1,0,0
2,1000,201603,1,1,0,0
3,1000,201604,1,1,0,0
4,1000,201605,1,1,0,0
5,1000,201606,1,1,0,0
6,1000,201607,1,1,0,0
7,1000,201608,1,1,0,0
8,1000,201609,1,1,0,0
9,1000,201610,1,1,0,0


## Train test split based on time window.


In [15]:
#Define the function
def train_test_split(df, date_col, date_format, split_time):
    split =pd.Timestamp(split_time)
    #Let's convert this to datetime while we are at it. 
    df['yrm'] = pd.to_datetime(df[date_col], format=date_format)
    train=df.loc[df['yrm']<split]
    test=df.loc[df['yrm']>split]
    return train, test

#Run the Fuction
train, test = train_test_split(df, 'yrm', '%Y%m', datetime.date(2016, 12,30)) 
test

Unnamed: 0,id,yrm,cad0,cad1,dv9,target
12,1000,2017-01-01,1,1,0,0
13,1000,2017-02-01,1,1,0,0
14,1000,2017-03-01,1,1,0,0
15,1000,2017-04-01,1,1,0,0
16,1000,2017-05-01,1,1,0,0
17,1000,2017-06-01,1,1,0,0
18,1000,2017-07-01,1,1,0,0
19,1000,2017-08-01,1,1,0,0
20,1000,2017-09-01,1,1,0,0
21,1000,2017-10-01,1,1,0,0


### Predictions 
The predictions are easy to assess for the toy model. 
For the toy model:
    - The first 4 individuals are not referrals.
    - The next 2 are positive in the first three months.
    - The next 2 are positive in the first 6 months.
    - The final 2 are positive in the 12th month. 

We set the windows according to the following. 
`windows= [[0,3], [0,6], [0,12]]`

In [16]:
probabilities=np.array([0,.05,.1,0,.9,.9,.8,.8,.9,.9])
threshold=0.5
predictions = np.where(probabilities > 0.5, 1, 0)
predictions

array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

In [17]:
def score_windows(test, probabilities, threshold, windows):
    """
    test = test dataframe
    
    """
    ir=test.pivot_table(index='id', columns='yrm', values='target', aggfunc='sum')
    c=ir.columns
    df=pd.DataFrame() #final results
    predictions=np.where(probabilities > 0.5, 1, 0) 
    row=0
    # Loop through the windows
    for w in windows:
        sl=slice(w[0],w[1])
        y= ir.iloc[:,sl].sum(axis=1)
        label=c[w[0]].strftime('%Y%m')+'-'+c[w[1]-1].strftime('%Y%m')
        df.loc[row, 'log_loss'] = log_loss(y, probabilities)
        df.loc[row, 'range']=label
        df.loc[row, 'precision']=precision_score(y, predictions)
        df.loc[row, 'recall']=recall_score(y, predictions)
        df.loc[row, 'accuracy']=accuracy_score(y, predictions)
        df.loc[row, 'balanced_accuracy']=balanced_accuracy_score(y, predictions)
        df.loc[row, 'f1']=f1_score(y, predictions)
        row=row+1
    return df

#define the windows.  For example [0,3] is including between 0-3 months.
windows= [[0,3], [0,6], [0,12]]
#Score windows
score_windows(test,probabilities,threshold,windows)


Unnamed: 0,log_loss,range,precision,recall,accuracy,balanced_accuracy,f1
0,0.819142,201701-201703,0.333333,1.0,0.6,0.75,0.5
1,0.541883,201701-201706,0.666667,1.0,0.8,0.833333,0.8
2,0.102438,201701-201712,1.0,1.0,1.0,1.0,1.0


## Null Model 
The null model here is just that there are referrals. 

In [19]:
null=np.array([0,0,0,0,0,0,0,0,0,0])
score_windows(test,null,threshold,windows)


  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


Unnamed: 0,log_loss,range,precision,recall,accuracy,balanced_accuracy,f1
0,6.907755,201701-201703,0.0,0.0,0.8,0.5,0.0
1,13.815511,201701-201706,0.0,0.0,0.6,0.5,0.0
2,20.723266,201701-201712,0.0,0.0,0.4,0.5,0.0
