# AI for Trading | Module 7 | L17: Model Testing and Evaluation

## 1. Intro
- https://youtu.be/4C4PuJANIdE
- How well is my model doing?
- How do we improve the model based on its metrics?


## 2. Outline
- https://youtu.be/mIgABrjJVBY
- ![image.png](attachment:37f788ca-fa1c-4a43-971d-cbbf8c47d09e.png)
- We'll be learning the Measurement Tools


## 3. Testing your models
- https://youtu.be/gmxGRJSKEb0
- Regression and Classification
- Regression: Predicts a value
- Classification: Aims to determine a state
  - + or - value as shown in the following
- ![image.png](attachment:b8c926c3-48ca-47a4-9098-c401c82d2bdf.png)
- How do we find a model that generalizes well?
- Training set and Testing set
  - ![image.png](attachment:cf05b232-1b23-47a5-972d-c26635fe6685.png)
- ![image.png](attachment:2fda8128-bc1e-44aa-9ead-a76af4df74bb.png)
  - Thou shalt never use your testing data for training


In [None]:
## 3. Testing your models
# Import statements 
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

# Import the train test split
# http://scikit-learn.org/0.16/modules/generated/sklearn.cross_validation.train_test_split.html
from sklearn.cross_validation import train_test_split


# Read in the data.
data = np.asarray(pd.read_csv('data/c2-m7-l17-model-testing-and-evaluation.csv', header=None))
# Assign the features to the variable X, and the labels to the variable y. 
X = data[:,0:2]
y = data[:,2]

# Use train test split to split your data 
# Use a test size of 25% and a random state of 42
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Instantiate your decision tree model
model = DecisionTreeClassifier()

# TODO: Fit the model to the training data.
model.fit(X_train, y_train)

# TODO: Make predictions on the test data
y_pred = model.predict(X_test)

# TODO: Calculate the accuracy and assign it to the variable acc on the test data.
acc = accuracy_score(y_test, y_pred)

## 4. Confusion Matrix
- https://youtu.be/9GLNjmMUB_4
- ![image.png](attachment:d226f8d3-ab2f-4181-88a3-7b84580d032c.png)

### Quiz: Confusion Matrix
![image.png](attachment:419e8497-7d62-45be-a07d-655fde0f2e0d.png)

![image.png](attachment:284cebdd-2386-4dc0-88cb-daa17e1176cc.png)
- True Positives, True Negatives, False Positives, and False Negatives
- 6, 5, 2, 1


## 5. Confusion Matrix 2
This section explains the solution: 6, 5, 1, 2
- https://youtu.be/ywwSzyU9rYs
- ![image.png](attachment:fd67efee-23e7-411b-a7de-eeabe65fa12d.png)


## 6. Accuracy
- https://youtu.be/s6SfhPTNOHA
- ![image.png](attachment:6c8b439f-fc92-4576-96d8-f91b37143c78.png)
- ![image.png](attachment:3a6b91a7-804c-4f0f-8ddb-268f253dee48.png)

### Quiz
![image.png](attachment:cb557fb5-aced-4ccf-a65b-e97747b6b0be.png)

See next cell for calculations.


## 7. Accuracy 2
- https://youtu.be/ueYCLfd_aNQ

In [5]:
#### 6. Accuracy | Quiz Calculations
true_positives = 6
true_negatives = 5
false_positives = 1
false_negatives = 2
total = true_positives + true_negatives + false_positives + false_negatives

print(((true_positives + true_negatives) / (total)) * 100)


78.57142857142857


## 8. When accuracy won't work
- https://youtu.be/r0-O-gIDXZ0
- The denominator for accuracy in the video should actually be 284,807 instead of 284,887.


## 9. False Negatives and Positives
- https://youtu.be/_ytP9zIkziw

### Quiz 1: The Medical Model
- ![image.png](attachment:7f6b0a6c-42ae-4ad0-8e45-1ab9e6b78192.png)

Medical
- ![image.png](attachment:0de2bd59-9ab6-403d-8f0c-5dd28d5ec203.png)
- Correct! A False Positive implies sending a healthy person to get more tests. This is slightly inconvenient, but ok. A False Negative implies sending a sick person home, which can be disastrous!

Spam
- ![image.png](attachment:4c89822c-5edd-4a50-85cd-4b2c9fd2d2a5.png)
- ![image.png](attachment:a48f8de6-2c80-49e6-952a-63f4d8277f3d.png)
- Correct! A False Negative implies a spam message will make its way into your inbox. This is slightly inconvenient, but ok. A False Positive implies missing an e-mail from your dear grandma, which can be disastrous!


## 10. Precision and Recall
- https://youtu.be/KOytJL1lvgg
- ![image.png](attachment:44695d52-e811-4baa-8649-625a151e354d.png)
- Corrections: At 0:11, "false negative" and "false positive" are swapped. In this case, a false negative is much worse than a false positive, since predicting that a sick person is healthy is much more dangerous than predicting that a healthy person is sick.

## 11. Precision
- https://youtu.be/q2wVorBfefU

### Quiz
![image.png](attachment:d57e1de5-fe1f-4ee2-99b1-7ad893818689.png)
In this image, the blue points are labelled positive, and the red points are labelled negative. Furthermore, the points on top of the line are predicted to be positive, and the points below the line are predicted to be negative.
- ![image.png](attachment:a08f4278-46ca-4e87-98d2-b6a00deff3ea.png)
- ![image.png](attachment:a42e40b3-5170-4065-9db0-7818336632c6.png)

In [7]:
#### 11. Precision | Quiz
true_positives = 6
true_negatives = 5
false_positives = 1
false_negatives = 2
total = true_positives + true_negatives + false_positives + false_negatives

#print(((true_positives + true_negatives) / (total)) * 100)
print(total)

14


## 12. Recall
- https://youtu.be/0n5wUZiefkQ
- ![image.png](attachment:3c7b9bac-6292-4cef-b43d-6cd0f0a65690.png)
- ![image.png](attachment:7dfd3212-891d-461c-9eaa-b6e078ed8bd2.png)

### Quiz
![image.png](attachment:8dcfb027-b320-42c1-a2aa-9df98e0f6cc7.png)
![image.png](attachment:46d49a4a-a63e-4c76-a66b-7819daa3e416.png)
![image.png](attachment:25b68864-d9b1-4ad6-9aec-8179a22a4d79.png)


## 13. Types of Errors
- https://youtu.be/Twf1qnPZeSY
- 4:01 ![image.png](attachment:fe1f6e16-acf1-4b6f-b340-44b171c7b347.png)


## 14. Model Complexity Graph
- https://youtu.be/YS5OQCA5cLY
- ![image.png](attachment:b2a0d074-b896-43f8-a71c-57e9a28f0a6d.png)

### Quiz
![image.png](attachment:197ff140-df82-42a9-b39c-08c7d6e5fae3.png)
![image.png](attachment:d514e1f6-1716-48f7-af9b-b8a74d28b487.png)


## 15. Cross Validation
- https://youtu.be/5pWHGkNyRhA
- ![image.png](attachment:ec093ff0-92d3-4cf3-b363-a7e3f67c5f5c.png)


## 16. K-Fold Cross Validation
- https://youtu.be/9W6o6eWGi-0


## 17. Cross Validation for Time Series
![image.png](attachment:4f0a0460-9a88-4bcc-8ad9-a4487c44e4d2.png)


## 18. Validation for Financial Data
Furthermore, when working with financial data, we can bring practitioners' knowledge of markets and financial data to bear on our validation procedures. We know that since markets are competitive, factors decay over time; signals that may have worked well in the past may no longer work well by the current time. For this reason, we should generally test and validate on the most recent data possible, as testing on the recent past could be considered the most demanding test.

It's possible that the design of the model may cause it to perform better or worse in different market regimes; so the most recent time period may not be in a market regime in which the model would perform well. But generally, we still prefer to use most recent data to test if the model would work in the time most similar to the present. In practice, of course, before investing a lot of money in a strategy, we would allow time to elapse without changing the model, and test its performance with this true out-of-sample data: what's known as "paper trading".

In summary, most common practice is to **keep a block of data from the most recent time period as your *test* set**.

Then, the data are split into train, valid and test sets according to the following schematic:

![train-valid-test-time-2.png](attachment:dc5afa45-9e6c-4d70-8be3-263d064850fb.png)

When working with data that are indexed by asset and day, it's important not to split data for the same day, but for different assets, among sets. This would manifest as a subtle form of lookahead bias. For example, say data from Coca-Cola and Pepsi for the same day ended up in different sets. Since they are very similar companies, one might expect their share price trends to be correlated. If the model were trained on data from one company, and then validated on data from the other company, it might "learn" about a price movement that affects both companies, and therefore have artificially inflated performance on the validation set.


## 19. Learning Curves
- https://youtu.be/ZNhnNVKl8NM
  - Correction: In the first plots, the horizontal axis is labeled "Degree", and it should labeled "Number of Training Points". At 4:10: The graph represents High Variance to the left side of the video instead of High Bias
- ![image.png](attachment:e1ef0ec1-0f1a-4b85-9856-00066c3aee77.png)


## 20. Detecting Overfitting and Underfitting with Learning Curves
![image.png](attachment:cfba8ccd-17b2-453d-96a5-bc1b19894d30.png)
![image.png](attachment:2f301714-9de8-4b43-98d2-8f12d0123ffe.png)

### Part 2: Analyzing the learning curves
For this second part of the quiz, you can look at the curves you've drawn before, to decide which one of the three models underfits, which one overfits, and which one is just right.

![image.png](attachment:2f36b984-1e42-4b48-9efd-5bf7b9097a31.png)


## 21. Solution: Detecting Overfitting and Underfitting
![image.png](attachment:1fc25cd6-09ba-4f70-afbe-ba54e91235be.png)
![image.png](attachment:5ba8d95a-824c-4be6-a90e-b56d17ade3fa.png)

In [None]:
## 20. Detecting Overfitting and Underfitting with Learning Curves

# Import, read, and split data
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import pandas as pd
data = pd.read_csv('data/c2-m7-l17-c20.csv')
import numpy as np
X = np.array(data[['x1', 'x2']])
y = np.array(data['y'])

# Fix random seed
np.random.seed(55)

### Imports
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC

# TODO: Uncomment one of the three classifiers, and hit "Test Run"
# to see the learning curve. Use these to answer the quiz below.

### Logistic Regression
estimator = LogisticRegression()

### Decision Tree
#estimator = GradientBoostingClassifier()

### Support Vector Machine
#estimator = SVC(kernel='rbf', gamma=1000)