# AI for Trading | Module 7 | L17: Model Testing and Evaluation

## 1. Intro
- https://youtu.be/4C4PuJANIdE
- How well is my model doing?
- How do we improve the model based on its metrics?


## 2. Outline
- https://youtu.be/mIgABrjJVBY
- ![image.png](attachment:37f788ca-fa1c-4a43-971d-cbbf8c47d09e.png)
- We'll be learning the Measurement Tools


## 3. Testing your models
- https://youtu.be/gmxGRJSKEb0
- Regression and Classification
- Regression: Predicts a value
- Classification: Aims to determine a state
  - + or - value as shown in the following
- ![image.png](attachment:b8c926c3-48ca-47a4-9098-c401c82d2bdf.png)
- How do we find a model that generalizes well?
- Training set and Testing set
  - ![image.png](attachment:cf05b232-1b23-47a5-972d-c26635fe6685.png)
- ![image.png](attachment:2fda8128-bc1e-44aa-9ead-a76af4df74bb.png)
  - Thou shalt never use your testing data for training


In [None]:
## 3. Testing your models
# Import statements 
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

# Import the train test split
# http://scikit-learn.org/0.16/modules/generated/sklearn.cross_validation.train_test_split.html
from sklearn.cross_validation import train_test_split


# Read in the data.
data = np.asarray(pd.read_csv('data/c2-m7-l17-model-testing-and-evaluation.csv', header=None))
# Assign the features to the variable X, and the labels to the variable y. 
X = data[:,0:2]
y = data[:,2]

# Use train test split to split your data 
# Use a test size of 25% and a random state of 42
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Instantiate your decision tree model
model = DecisionTreeClassifier()

# TODO: Fit the model to the training data.
model.fit(X_train, y_train)

# TODO: Make predictions on the test data
y_pred = model.predict(X_test)

# TODO: Calculate the accuracy and assign it to the variable acc on the test data.
acc = accuracy_score(y_test, y_pred)

## 4. Confusion Matrix
- https://youtu.be/9GLNjmMUB_4
- ![image.png](attachment:d226f8d3-ab2f-4181-88a3-7b84580d032c.png)

### Quiz: Confusion Matrix
![image.png](attachment:419e8497-7d62-45be-a07d-655fde0f2e0d.png)

![image.png](attachment:284cebdd-2386-4dc0-88cb-daa17e1176cc.png)
- True Positives, True Negatives, False Positives, and False Negatives
- 6, 5, 2, 1


## 5. Confusion Matrix 2
This section explains the solution: 6, 5, 1, 2
- https://youtu.be/ywwSzyU9rYs
- ![image.png](attachment:fd67efee-23e7-411b-a7de-eeabe65fa12d.png)


## 6. Accuracy
- https://youtu.be/s6SfhPTNOHA
- ![image.png](attachment:6c8b439f-fc92-4576-96d8-f91b37143c78.png)
- ![image.png](attachment:3a6b91a7-804c-4f0f-8ddb-268f253dee48.png)

### Quiz
![image.png](attachment:cb557fb5-aced-4ccf-a65b-e97747b6b0be.png)

See next cell for calculations.


## 7. Accuracy 2
- https://youtu.be/ueYCLfd_aNQ

In [5]:
#### 6. Accuracy | Quiz Calculations
true_positives = 6
true_negatives = 5
false_positives = 1
false_negatives = 2
total = true_positives + true_negatives + false_positives + false_negatives

print(((true_positives + true_negatives) / (total)) * 100)


78.57142857142857


## 8. When accuracy won't work
- https://youtu.be/r0-O-gIDXZ0
- The denominator for accuracy in the video should actually be 284,807 instead of 284,887.


## 9. False Negatives and Positives
- https://youtu.be/_ytP9zIkziw

### Quiz 1: The Medical Model
- ![image.png](attachment:7f6b0a6c-42ae-4ad0-8e45-1ab9e6b78192.png)

Medical
- ![image.png](attachment:0de2bd59-9ab6-403d-8f0c-5dd28d5ec203.png)
- Correct! A False Positive implies sending a healthy person to get more tests. This is slightly inconvenient, but ok. A False Negative implies sending a sick person home, which can be disastrous!

Spam
- ![image.png](attachment:4c89822c-5edd-4a50-85cd-4b2c9fd2d2a5.png)
- ![image.png](attachment:a48f8de6-2c80-49e6-952a-63f4d8277f3d.png)
- Correct! A False Negative implies a spam message will make its way into your inbox. This is slightly inconvenient, but ok. A False Positive implies missing an e-mail from your dear grandma, which can be disastrous!


## 10. Precision and Recall
- https://youtu.be/KOytJL1lvgg
- ![image.png](attachment:44695d52-e811-4baa-8649-625a151e354d.png)
- Corrections: At 0:11, "false negative" and "false positive" are swapped. In this case, a false negative is much worse than a false positive, since predicting that a sick person is healthy is much more dangerous than predicting that a healthy person is sick.

## 11. Precision
- https://youtu.be/q2wVorBfefU

### Quiz
![image.png](attachment:d57e1de5-fe1f-4ee2-99b1-7ad893818689.png)
In this image, the blue points are labelled positive, and the red points are labelled negative. Furthermore, the points on top of the line are predicted to be positive, and the points below the line are predicted to be negative.
- ![image.png](attachment:a08f4278-46ca-4e87-98d2-b6a00deff3ea.png)
- ![image.png](attachment:a42e40b3-5170-4065-9db0-7818336632c6.png)

In [7]:
#### 11. Precision | Quiz
true_positives = 6
true_negatives = 5
false_positives = 1
false_negatives = 2
total = true_positives + true_negatives + false_positives + false_negatives

#print(((true_positives + true_negatives) / (total)) * 100)
print(total)

14


## 12. Recall
- https://youtu.be/0n5wUZiefkQ
- ![image.png](attachment:3c7b9bac-6292-4cef-b43d-6cd0f0a65690.png)
- ![image.png](attachment:7dfd3212-891d-461c-9eaa-b6e078ed8bd2.png)

### Quiz
![image.png](attachment:8dcfb027-b320-42c1-a2aa-9df98e0f6cc7.png)
![image.png](attachment:46d49a4a-a63e-4c76-a66b-7819daa3e416.png)
![image.png](attachment:25b68864-d9b1-4ad6-9aec-8179a22a4d79.png)


## 13. Types of Errors
- https://youtu.be/Twf1qnPZeSY
- 4:01 ![image.png](attachment:fe1f6e16-acf1-4b6f-b340-44b171c7b347.png)


## 14. Model Complexity Graph
