# Evaluation Metrics

In [1]:
%cd ..

/Users/alejandro.jimenez/Documents/reference_repos/titanic-ml-model


In [None]:
!python -m titanic_ml_model train --train_data ./data/raw/train.csv

In [2]:
from IPython.display import display, Markdown

with open("./logs/logs.txt", "r") as file:
    last_exec = file.readlines()[-1]
    logged_metrics = eval(last_exec.replace("\n", ""))

variable = "world"
display(Markdown(f"# Model Performance Results!"))
display(Markdown(f"""
<h5>
The model achieved the following on the test data (20%)

 - Accuracy of **{logged_metrics['accuracy']:.2%}**

 - F1 score of **{logged_metrics['f1_score']:.2%}**

 - Precision of **{logged_metrics['precision']:.2%}**

 - Recall of **{logged_metrics['recall']:.2%}**

Based on the current Challenge LeaderBoard, that will locate us within the Top 185/15860, per positioning by May 31st 2023

The F1 score, that reflects the harmony between precision and recall, of **{logged_metrics['f1_score']:.2%}** is fairly high.

Both values from Precision and Recall show a **well balanced and robust model**. As a result of the Cross Validation and FineTuning Steps taken.

</h5>
"""))


# Model Performance Results!


<h5>
The model achieved the following on the test data (20%)

 - Accuracy of **84.92%**

 - F1 score of **81.38%**

 - Precision of **83.10%**

 - Recall of **79.73%**

Based on the current Challenge LeaderBoard, that will locate us within the Top 185/15860, per positioning by May 31st 2023

The F1 score, that reflects the harmony between precision and recall, of **81.38%** is fairly high.

Both values from Precision and Recall show a **well balanced and robust model**. As a result of the Cross Validation and FineTuning Steps taken.

</h5>


# Feature Importance

The SHAP values provided give us a sense of the importance of each feature in the model. 

The higher the **SHAP value**, the more impact that feature has on the model's output. 

From the provided SHAP values, the most influential features appear to be 'fare', 'pclass', and 'name_title', and age. 

This suggests that:
 - **Fare paid by a passenger**
 - **Passenger class**
 - **Title** (Mr, Miss, Master, etc, which can indicate social status and embeds sex attributes) 
 - **Age**

All are particularly important in predicting survival on the Titanic.

SHAP Summary Plot is shown below:

![Shap Summary Plot Feature Importance](../logs/shap_summary_plot.png)

# Deployment Steps

There are several ways to put the current model into production.

Here we will outline the steps for a **Microservice type Architecture** 
 
so that the model can be consumed through an API by any client (webpage, app, etc). 
 
 We decide for this architecture for the kind of model this package involves (there is no need for a batch prediction to be saved weekly as it is a past event, 
 but people might want to play their chances of surviving by playing with data, thus an app, webpage, etc).


1) **Dockerization**: Create a Dockerfile that describes the environment in which your ML model runs. This makes sure we can later upload the Docker Image in different versions of a Registry such as Amazon Elastic Container Registry (ECR).

2) **Uploading it to a Registry**: Such as Amazon Elastic Container Registry (ECR)

3) **Model Deployment (HTTP endpoint)**: This can be achieved by:
    - Modifying this package to work as a Flask RESTFul API, and then use Amazon Elastic Container Service for orchestration. **(Microservice Architecture)**
    - Using a serverless alternative such as with Amazon SageMaker that allows serverless support for Docker Images in ECR. It provides an endpoint for requests to be made.

4) **Scaling**: Amazon API Gateway can be used to create a scalable API for a model hosted in Amazon SageMaker. Otherwise a load balancer option is good to consider if you are expected to handle a large amount of requests.

5) **CI/CD**: For continuous updates and deployments it is good to consider a pipelining options with technologies such as Jenkins. Some services in AWS help automate all the steps above with AWS CodePipeline (for continuous delivery).

6) **Monitoring**: It is key to set up steps to monitor the ML model's performance and usage. AWS provides services such as Amazon CloudWatch and X-Ray (I have yet to explore). Alternatively, you could integrate saving metrics steps in the package itself and use a BI tool to visualize performance (which is the current plan for us in SP&A to have a metric monitoring and A/B monitoring for model use cases).

6) **Security**: This might be optional provided the application or endpoint requires a certain type of authentication. Services in AWS may help with this but I do not have full clarity on this.
