<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Why-is-it-important-to-do-validation?" data-toc-modified-id="Why-is-it-important-to-do-validation?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Why is it important to do validation?</a></span><ul class="toc-item"><li><span><a href="#Bad-Ideas" data-toc-modified-id="Bad-Ideas-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Bad Ideas</a></span><ul class="toc-item"><li><span><a href="#Bad-Idea-#1" data-toc-modified-id="Bad-Idea-#1-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Bad Idea #1</a></span></li><li><span><a href="#Bad-Idea-#2---Better-but-not-great-(subtle)" data-toc-modified-id="Bad-Idea-#2---Better-but-not-great-(subtle)-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Bad Idea #2 - Better but not great (subtle)</a></span></li><li><span><a href="#A-Solution" data-toc-modified-id="A-Solution-1.1.3"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>A Solution</a></span></li></ul></li></ul></li><li><span><a href="#Holdout-Validation" data-toc-modified-id="Holdout-Validation-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Holdout Validation</a></span></li><li><span><a href="#Cross-Validation" data-toc-modified-id="Cross-Validation-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Cross-Validation</a></span></li></ul></div>

# Why is it important to do validation?

Consider a class of students.

- As the teacher what would you do to ensure they "learn" the material for the final test?
- How would you test their knowledge?

## Bad Ideas

There are plenty of "bad ideas" that could encourage the students to not be best prepared for the final test. Here are a couple relatable ways we could help the students study but still have issues.

### Bad Idea #1

> Literally give the students the final test with answers.

How can this backfire?

### Bad Idea #2 - Better but not great (subtle)

> Give students practice tests (with answers); tell them the test will be like these practice tests.

How can this backfire? What are some potential issues with this?

### A Solution

Instead we instruct students to use practice tests to study, then the teacher will lead a practice test where the students don't know the answers ahead of time.

# Holdout Validation

In [None]:
import sklearn.datasets

# Load into features & targets
my_data = sklearn.datasets.fetch_california_housing()
X = my_data.data[:1000,]
y = my_data.target[:1000,]

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.DataFrame(data=X,columns=my_data.feature_names)
sns.pairplot(df)

In [None]:
print(my_data.feature_names)
label = 2
sns.scatterplot(X[:,label],y, alpha=0.3);

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
def create_many_holdouts(features,labels, splits=10,random_state=27):
    '''
    '''
    tests=[]
    trains=[]
    for i in range(splits):
        X_train, X_test, y_train, y_test = train_test_split(
                                                features, 
                                                labels, 
                                                test_size=0.2,
                                                #random_state=random_state
                                                shuffle=True)
        trains.append((X_train,y_train))
        tests.append((X_test,y_test))
        
    return tests,trains

In [None]:
# Split your data into train-test sets
tests, trains = create_many_holdouts(X[:,label], y)

In [None]:
fig = plt.figure(figsize=(10,10))
fig.subplots_adjust(hspace=0.4, wspace=0.4)

for i in range(1, 8):
    ax = fig.add_subplot(4, 2, i)
#     ax.scatter(trains[i-1][0], trains[i-1][1], alpha=0.3)
    ax.scatter(tests[i-1][0], tests[i-1][1], alpha=0.3)
    # TODO: Train a linear regression and plot line

# Cross-Validation

> Specifically k-fold cross validation

![](https://scikit-learn.org/stable/_images/grid_search_cross_validation.png)
> From SciKit-Learn's documentation: https://scikit-learn.org/stable/modules/cross_validation.html)