In [None]:
'''Before, we used the 'train_test_split' function to perform one experimental run
However, to use just one split often biased (you could ask: what qualifies a data point to be in the training or test set?)
so running multiple instances using your model but with different sets each run is the key idea in crossvalidation
In the newest version of scikit, it is sufficient to just call 'cross_val_score' with the desired number of folds. 
It returns already the accuracy when trained on each fold'''

# So just import it instead of 'train_test_split'
from sklearn.model_selection import cross_val_score
from sklearn.datasets import fetch_openml
from sklearn.neural_network import MLPClassifier

# Get some data, use X for the datapoints and y for the label (or rename but then change accordingly)
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
# Define a classifier

mlp = MLPClassifier(hidden_layer_sizes=(100, 100), solver='sgd', learning_rate='constant', learning_rate_init=0.1, max_iter=100)

# Here, the k-fold crossvalidation starts with k=10; you do not pass separated train and test sets anymore 
scores = cross_val_score(mlp, X, y, cv=10, scoring='accuracy')
# Prints all scores of all folds
print(scores)

# Prints the average, this is actually what you want and report in a paper
print(scores.mean())



In [None]:
'''Okay, now you hopefully got a refresh on k-fold crossvalidation and how to use it practically. This technique does not
only cope with small datasets but is especially used to find out the best hyperparameter for a model, e.g. the learning
rate, say 'eta', in a neural network. Here is how you can do it, but be aware that you can't just run it as it is: '''

# if you use integers
param_range = list(range(?, ?)) # remember the range function from the first tutorial?

# if you use floats, e.g. for the learning rate 
#eta=[(i+1)/10.0 for i in range(10)] 
#print eta

fold_scores = []
for k in param_range:
mlp_example=MLPClassifier(learning_rate_init=k) 
    scores = cross_val_score(mlp_example, X, y, cv=10, scoring='accuracy')
    fold_scores.append(scores.mean())
print(fold_scores)

In [None]:
# It is nicer to plot it the parameter range (here for example for eta, the learning rate, but change to your needs); 
# refreshes also your skills from the 2nd tutorial :)

import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(param_range, fold_scores)
plt.xlabel('parameter value')
plt.ylabel('fold accuracy')