## Evaluating machine-learning models

Evaluating a model always boils down to splitting the available data into three sets:
training, validation, and test. You train on the training data and evaluate your model
on the validation data(in order to prevent information leaks, you shouldn’t tune your model based on the test set, and therefore you
should also reserve a validation set). Once your model is ready for prime time, you test it one final
time on the test data. This may seem straightforward,
but there are a few advanced ways to do it. Let’s review three classic evaluation recipes:<br> Simple hold-out validation, K-fold validation, and Iterated K-fold validation with shuffling.

#### SIMPLE HOLD-OUT VALIDATION

In [None]:
num_validation_samples = 10000

#Shuffling the data is usually appropriate.
np.random.shuffle(data)

#Defines the validation set
validation_data = data[:num_validation_samples]
data = data[num_validation_samples:]

#Defines the training set
training_data = data[:]

#Trains a model on the training data, and evaluates it on 
#the validation data
model = get_model()
model.train(training_data)
validation_score = model.evaluate(validation_data)

# At this point you can tune your model,
# retrain it, evaluate it, tune it again...

#Once you’ve tuned your hyperparameters, it’s common to 
#train your final model from scratch on all non-test data available.
model = get_model()
model.train(np.concatenate([training_data,validation_data]))
test_score = model.evaluate(test_data)

It suffers from one flaw: if little data is
available, then your validation and test sets may contain too few samples to be statistically representative of the data at hand

#### K- FOLD VALIDATION

you split your data into K partitions of equal size. For each parti-
tion i , train a model on the remaining K – 1 partitions, and evaluate it on partition i .
Your final score is then the averages of the K scores obtained

In [None]:
k = 4
num_validation_samples = len(data) // k
np.random.shuffle(data)
validation_scores = []

for fold in range(k):
    # selecting the validition data partition
    validation_data = data[num_validation_samples * fold:
                           num_validation_samples * (fold + 1)]
    # using the remainder data as training data
    training_data = data[:num_validation_samples * fold] + 
    data[num_validation_samples * (fold + 1):]
    # create a model
    model = get_model()
    model.train(training_data)
    validation_score = model.evaluate(validation_data)
    validation_scores.append(validation_score)

# average test score
validation_score = np.average(validation_scores)
# train the final model on all non-test data.
model = get_model()
model.train(data)
test_score = model.evaluate(test_data)

#### ITERATED K-FOLD VALIDITION
It consists of applying K -fold validation multiple times, shuffling
the data every time before splitting it K ways. The final score is the average of the
scores obtained at each run of K -fold validation. Note that you end up training and
evaluating P × K models (where P is the number of iterations you use), which can very
expensive.

#### Imortant points
You want both your training set and test set to be representative of the data at hand. For instance, if you’re trying to classify images of digits,you usually should randomly shuffle your data before splitting it into training and test sets.<br>
f you’re trying to predict the future given the past (for example, tomorrow’s weather, stock movements, and so on), you should not randomly shuffle your data before splitting it. In such situations, you should always make sure all data in your test set is posterior to the data in the training set.
<br>Make sure your training set and validation set are disjoint.
