# Model hyper-parameter tuning & evaluation

## Abstract

This notebook analyses the hyperparameter tuned model. It uses Tensorflow Model Analysis on the results provided by the Evaluator component in the pipeline.

In order to run, TFMA needs notebook extensions.
<br>To enable such extensions, keep in mind to switch on Vertex Workbench from standard Jupyterlab to Jupyter notebook classic version.
<br>To do so, go to `Help > Launch Classic Notebook`.

<br>We need to re-run the cell everytime we want to visualize the metrics. If you're using Jupyterlab, you might also need to reinstall the extensions every time.

<br>TFMA visuals were saved in an HTML version also in the [reports](../reports/tfma_export.html) directory.

In [None]:
%%bash

# Install extensions in terminal to view outputs
jupyter labextension install tensorflow_model_analysis@0.27.0
jupyter nbextension enable --py widgetsnbextension
jupyter nbextension enable --py tensorflow_model_analysis

In [None]:
mkdir -p eval_data

In [None]:
!gsutil cp -r gs://aliz-ml-spec-2022/demo-1/pipeline_root/taxi-vertex-pipelines/53911330556/taxi-vertex-pipelines-20220608212029/Evaluator_-8726911200832520192/evaluation eval_data/

In [1]:
import tensorflow_model_analysis as tfma

# overall performance
tfma_result = tfma.load_eval_result('eval_data/evaluation')
overall_result_view = tfma.view.render_slicing_metrics(tfma_result)
slicing_metrics_view = tfma.view.render_slicing_metrics(tfma_result, slicing_column='TripStartHour')

Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


In [3]:
from ipywidgets.embed import embed_minimal_html
from IPython.core.display import HTML

tfma_output = '../reports/tfma_export.html'
embed_minimal_html(tfma_output, views=[overall_result_view, slicing_metrics_view], title='Taxi Model Performance')

# display breaks Jupyterlab interface. Open in Classic view or the HTML file instead    
html = None
with open(tfma_output, 'r') as view:
    html = view.read()
# display(HTML(html))

In the TFMA results we can see model performance globally as well as across the defined data slices, in our case it was `TripStartHour`. 

Overall MSE error is ~20 dollars which translates to 4.4$ error per prediction which is +/- 33% compared to the average trip fare. Which can be considered quite high and leaves room for further improvement.

Looking at performance by dat slices, model errors are quite even, but in case of 14 hour we see only a few examples which led to lower performance.