# Model Evaluation

##### *In which we hope all the work we put into our model pays off.*

###### [GitHub Repository](https://github.com/ahester57/ai_workshop/tree/master/notebooks/time_for_crab/3-evaluation)

###### [Notebook Viewer](https://nbviewer.jupyter.org/github/ahester57/ai_workshop/blob/master/notebooks/time_for_crab/3-evaluation/evaluation.ipynb)

###### [Kaggle Dataset](https://www.kaggle.com/sidhus/crab-age-prediction)


### Define Constants

In [2]:
%%time
CACHE_FILE = '../cache/designrcrabs.feather'
NEXT_NOTEBOOK = '../Hester-CS5300-Time-for-Crab.ipynb'

PREDICTION_TARGET = 'Age'    # 'Age' is predicted
DATASET_COLUMNS = ['Sex_F','Sex_M','Sex_I','Length','Diameter','Height','Weight','Shucked Weight','Viscera Weight','Shell Weight',PREDICTION_TARGET]
REQUIRED_COLUMNS = [PREDICTION_TARGET]

NUM_EPOCHS = 100
VALIDATION_SPLIT = 0.2


### Import Libraries

In [3]:
%%time
from notebooks.time_for_crab.mlutils import display_df, generate_neural_network, generate_neural_pyramid
from notebooks.time_for_crab.mlutils import plot_training_loss, plot_training_loss_from_dict, plot_true_vs_pred_from_dict
from notebooks.time_for_crab.mlutils import score_combine, score_comparator, score_model

import keras

keras_backend = keras.backend.backend()
print(f'Keras version: {keras.__version__}')
print(f'Keras backend: {keras_backend}')
if keras_backend == 'tensorflow':
    import tensorflow as tf
    print(f'TensorFlow version: {tf.__version__}')
    print(f'TensorFlow devices: {tf.config.list_physical_devices()}')
elif keras_backend == 'torch':
    import torch
    print(f'Torch version: {torch.__version__}')
    print(f'Torch devices: {torch.cuda.get_device_name(torch.cuda.current_device())}')
    # torch supports windows-native cuda, but CPU was faster for this task
elif keras_backend == 'jax':
    import jax
    print(f'JAX version: {jax.__version__}')
    print(f'JAX devices: {jax.devices()}')
else:
    print('Unknown backend; Proceed with caution.')

import numpy as np
import pandas as pd

from typing import Generator

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

pd.set_option('mode.copy_on_write', True)


### Load Data from Cache

In the [feature importance section](../2-features/features.ipynb), we saved the life of the crabs by removing the features which killed the crab.


In [4]:
%%time
crabs = pd.read_feather(CACHE_FILE)
crabs_test = pd.read_feather(CACHE_FILE.replace('.feather', '_test.feather'))

display_df(crabs, show_distinct=True)

# split features from target
X_train = crabs.drop([PREDICTION_TARGET], axis=1)
y_train = crabs[PREDICTION_TARGET]

X_test = crabs_test.drop([PREDICTION_TARGET], axis=1)
y_test = crabs_test[PREDICTION_TARGET]

print(f'X_train: {X_train.shape}')
print(f'X_test: {X_test.shape}')


DataFrame shape: (3114, 11)
First 5 rows:
        Length  Diameter    Height    Weight  Shucked Weight  Viscera Weight  \
1698  0.500977  0.394531 -0.725586 -0.199707       -0.126953       -0.445801   
1361  0.743164  0.713867 -0.645996  0.489258        0.507812       -0.045898   
1972  0.013672 -0.025391 -0.787598 -0.706543       -0.755859       -0.750977   
960   0.163086  0.126953 -0.813965 -0.537109       -0.616211       -0.527344   
2639  0.716797  0.748047 -0.690430  0.099609       -0.041504       -0.026367   

      Shell Weight  Sex_F  Sex_I  Sex_M  Age  
1698     -0.362305  False  False   True    8  
1361      0.100586  False  False   True   12  
1972     -0.701660   True  False  False    9  
960      -0.579102  False  False   True   11  
2639      0.159180   True  False  False   15  
<class 'pandas.core.frame.DataFrame'>
Index: 3114 entries, 1698 to 645
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 

## Regression Model Evaluation

### Compute the ROC Curve

This is a regression problem, so the ROC curve is more of a challenge.

First, we need to convert our predictions into True/False. Since we are trying to predict a crab's age,
a threshold of 2 years is a good starting point. If the prediction is within 2 years of the actual, we'll consider it a success.

#### Convert Predictions to True/False

**Threshold:** 2 years


In [None]:
%%time
## Convert Predictions to True/False based on threshold of 2 years
# get absolute difference between prediction and actual
