* Problems with splitting into training and testing data;


K-Fold CV In Sklearn
There's a simple way to randomize the events in sklearn k-fold CV: set the shuffle flag to true.

Then you'd go from something like this:

cv = KFold( len(authors), 2 )

To something like this:

cv = KFold( len(authors), 2, shuffle=True )

GridSearchCV is a way of systematically working through multiple combinations of parameter tunes, cross-validating as it goes to determine which tune gives the best performance. The beauty is that it can work through many combinations in only a couple extra lines of code.

Here's an example from the sklearn <a href="http://scikit-learn.org/0.17/modules/generated/sklearn.grid_search.GridSearchCV.html">documentation</a>:

```
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svr = svm.SVC()
clf = grid_search.GridSearchCV(svr, parameters)
clf.fit(iris.data, iris.target)
```

Let's break this down line by line.
```
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} 
```

A dictionary of the parameters, and the possible values they may take. In this case, they're playing around with the kernel (possible choices are 'linear' and 'rbf'), and C (possible choices are 1 and 10).

Then a 'grid' of all the following combinations of values for (kernel, C) are automatically generated:

('rbf', 1)	('rbf', 10)
('linear', 1)	('linear', 10)

Each is used to train an SVM, and the performance is then assessed using cross-validation.
```
svr = svm.SVC() 
```
This looks kind of like creating a classifier, just like we've been doing since the first lesson. But note that the "clf" isn't made until the next line--this is just saying what kind of algorithm to use. Another way to think about this is that the "classifier" isn't just the algorithm in this case, it's algorithm plus parameter values. Note that there's no monkeying around with the kernel or C; all that is handled in the next line.
```
clf = grid_search.GridSearchCV(svr, parameters) 
```
This is where the first bit of magic happens; the classifier is being created. We pass the algorithm (svr) and the dictionary of parameters to try (parameters) and it generates a grid of parameter combinations to try.
```
clf.fit(iris.data, iris.target)
```
And the second bit of magic. The fit function now tries all the parameter combinations, and returns a fitted classifier that's automatically tuned to the optimal parameter combination. You can now access the parameter values via
```
clf.best_params_
```

# Quiz: GridSearchCV in sklearn
Refer to the eigenfaces code, which you can find <a href="http://scikit-learn.org/0.17/auto_examples/applications/face_recognition.html">here</a>. What parameters of the SVM are being tuned with GridSearchCV?

# Summary
Congratulations on completing this course! In this course you've seen:

* Numerical tools for handling large amounts of data efficiently
* Different types of data, examples of how they arise, and techniques for using them with standard tools
* A variety of metrics for evaluating the performance of different algorithms
Basic cross validation techniques for ensuring performance generalizes
Visual representations of learning and complexity, and how to use these to pick effective models
Great job getting this far! We'll now put these together in a practical project in which you'll go from a dataset to predictions, learn to explain the pros and cons of possible models, and decide on an optimal model.