# 05.03 - Hyperparameters and Model Validation

To review from the previous episode, we saw the basic steps for applying a supervised machine learning model:

1. Choose a class of model
2. Choose model hyperparameters
3. Fit the model to the training data
4. Use the model to predict labels for new data

Generally speaking, the first two steps are the most important. Finding the most appropriate **model** and tuning it with the right **hyperparameters** is of fundamental importance, and this section will cover how to perform validation on both of them. 

### Thinking about Model Validation

Model and hyperparameters validation is _deceitfully_ simple: 

1. Choose a model and hyperparameters 
2. Apply it to training data
3. Compare prediction to known value

However, there are good (and "less good") ways of doing it. Let's have a look at both:

### Model validation the wrong way

In [1]:
# loading the data
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

Next we choose a model and hyperparameters. Here we'll use a k-neighbors classifier with <code>n_neighbors=1</code>.

In [2]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=1)

In [3]:
# training the model
model.fit(X, y)
y_model = model.predict(X)

In [4]:
# accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y, y_model)

1.0

100% ? Wow, we might just have create the perfect model! Or maybe not. In fact, the flaw here has been training and evaluating the model on the _same data_.

Since KNN simply stores training data and then use it to predict labels by comparing new data to these stored points, it will get 100% accuracy (nearly almost) every time. 

### Model validation the right way: Holdout sets

