#### Bias-variance trade-off
* Models with a high bias (underfitting) usually have a low variance. - LOW DEGREE POLYNOMIAL MODELS
* In the example degree 3 polynomial was the perfect equilibrium between low bias and variance
* Models with a low bias usually have a high variance (overfitting). - HIGH DEGREE POLYNOMIAL MODELS

In practice, we want to find a model that has both a low bias and a low variance. In the next unit, we will see how to manage this bias-variance trade-off.

#### The bias-variance decomposition

* A model that is underfitting. Predictions from this model are, on average, far from the target value (high bias), but close to each other (low variance)
* A model that is overfitting. Predictions are centered around the target value (low bias), but far from each other (high variance)

Generalization error = Bias^2 + Variance + Irreducible error

This is called the bias-variance decomposition. The formula says that the error is equal to the square of the bias of the model plus its variance. Note that there is also an irreducible error term which is due to the noise in the observations and the effect of unobserved variables. It’s interesting to note that this term sets a lower bound on the generalization error.

Here is a figure that illustrates the equation from above

**Summary**
Minimizing the generalization error is one of the main goals of machine learning, and there are many ways to do this.

By tuning the complexity of the model using grid search
By reducing the bias or the variance directly
We will see an example of this second option later in this course with random forests. In short, this model reduces the variance of another model called a decision tree by averaging the output of many different instances of it.

### Grid search using ParameterGrid


We already saw in the last course how to tune a hyperparameter using grid search. The for loop approach worked well in the cases that we saw, but doesn’t scale well to models with multiple hyperparameters and more complex grids.

In this unit, we will see how to solve this with the ParameterGrid object from Scikit-learn and try to improve our k-NN model on the heart disease dataset.

In [1]:
# Fit a k-NN classifier: Let’s start by loading the data
import pandas as pd

# Load data
data_df = pd.read_csv("c4_heart-numerical.csv")

# Create X/y arrays
X = data_df.drop("disease", axis=1).values
y = data_df.disease.values

data_df.head()

Unnamed: 0,age,trestbps,chol,thalach,oldpeak,ca,disease
0,63,145,233,150,2.3,0,absence
1,67,160,286,108,1.5,3,presence
2,67,120,229,129,2.6,2,presence
3,37,130,250,187,3.5,0,absence
4,41,130,204,172,1.4,0,absence


We will use the train/test set methodology to evaluate our classifiers. Let’s split the data into 70-30 train/test sets



In [2]:
from sklearn.model_selection import train_test_split

# Split data into train/test sets
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3, random_state=0)

**BASELINE**

We can now fit a k-NN classifier using the default values i.e. without setting any parameters. This will be our baseline for this unit.

In [3]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

# Create a k-NN classifier with default values
pipe = Pipeline([("scaler", StandardScaler()), ("knn", KNeighborsClassifier())])

# Fit to train data
pipe.fit(X_tr, y_tr)

# Evaluate on test set
accuracy = pipe.score(X_te, y_te)
print("Accuracy: {:.3f}".format(accuracy))

Accuracy: 0.747


As we can see, its accuracy is around 75%.


**GRID SEARCH IMPROVEMENT**

Let’s try to improve our k-NN classifier by tuning some hyperparameters from the KNeighborsClassifier object.

* n_neighbors - the number of neighbors.
* p - the distance metric. Scikit-learn implements the L1 and L2 ones. 
* weights - The weighting function for the majority vote.

When doing the majority vote, the classifier can use a weighting function. By default, all points have the same weight. This corresponds to the **'uniform'** strategy. However, we can also give more weights to closer data points. For instance, the **'distance'** strategy assigns a weight inversely proportional to their distance.

Let’s define a list of values for each parameter



In [5]:
import numpy as np

# Define a set of reasonable values
k_values = np.arange(1, 21)  # 1, 2, 3, .., 20
weights_functions = ["uniform", "distance"] #weight strategies uniform and distanced
distance_types = [1, 2]  # L1, L2 distances

Using our for loop strategy, we can test the 20*2*2=80 combinations by nesting three for loops



In [6]:
# Create a k-NN classifier
pipe = Pipeline([("scaler", StandardScaler()), ("knn", KNeighborsClassifier())])

# Save accuracy on test set
test_scores = []

# Grid search
for k in k_values:
    for f in weights_functions:
        for d in distance_types:
            # Set hyperparameters
            pipe.set_params(knn__n_neighbors=k, knn__weights=f, knn__p=d)

            # Fit a k-NN classifier
            pipe.fit(X_tr, y_tr)

            # Evaluate on test set
            accuracy = pipe.score(X_te, y_te)

            # Save accuracy
            test_scores.append(
                {
                    "knn__n_neighbors": k,
                    "knn__weights": f,
                    "knn__p": d,
                    "accuracy": accuracy,
                }
            )

We will try the three properties, and then create a dataframe that contains the accuracy values from bigger to smaller

In [7]:
# Create DataFrame with test scores
scores_df = pd.DataFrame(test_scores)

# Top five scores
scores_df.sort_values(by="accuracy", ascending=False).head()


Unnamed: 0,knn__n_neighbors,knn__weights,knn__p,accuracy
14,4,distance,1,0.813187
28,8,uniform,1,0.813187
12,4,uniform,1,0.802198
32,9,uniform,1,0.802198
30,8,distance,1,0.802198


As we can see, it’s possible to achieve an accuracy of 80% using the L1 distance metric. However, the code from above can quickly become complex if we have more hyperparameters to tune or if there are several different grids to evaluate.

Let’s see how to simplify our code.

### Grid search using ParameterGrid

Scikit-learn implements a ParameterGrid object to define grids of parameters for our grid search. It takes a dictionary of (parameter, values) pairs. Let’s create a grid for our example from above.

In [8]:
from sklearn.model_selection import ParameterGrid

# Define a grid of values
grid = ParameterGrid(
    {
        "knn__n_neighbors": k_values,
        "knn__weights": weights_functions,
        "knn__p": distance_types,
    }
)

# Print the number of combinations
print("Number of combinations:", len(grid))
# Prints: 80

Number of combinations: 80


This grid variable represents all the combinations of parameters, and we can use it as a list. For instance, we print the total number of combinations using the len() function. We can also use it to iterate through each combination of parameters.



In [9]:
# Iterate through each combination of parameters
for params_dict in list(grid)[
    :5
]:  # We use list(iter) in order to be able to slice and show only 5 first elements
    print(params_dict)

{'knn__n_neighbors': 1, 'knn__p': 1, 'knn__weights': 'uniform'}
{'knn__n_neighbors': 1, 'knn__p': 1, 'knn__weights': 'distance'}
{'knn__n_neighbors': 1, 'knn__p': 2, 'knn__weights': 'uniform'}
{'knn__n_neighbors': 1, 'knn__p': 2, 'knn__weights': 'distance'}
{'knn__n_neighbors': 2, 'knn__p': 1, 'knn__weights': 'uniform'}


At each iteration, the params_dict variable contains a dictionary with the current combination of values. The idea is to initialize the pipeline with this dictionary of values using the set_params() function.

Here is the code to apply grid search using our grid object.

In [10]:
# Create k-NN classifier
pipe = Pipeline([("scaler", StandardScaler()), ("knn", KNeighborsClassifier())])

# Save accuracy on test set
test_scores = []

for params_dict in grid:
    # Set parameters
    pipe.set_params(**params_dict)

    # Fit a k-NN classifier
    pipe.fit(X_tr, y_tr)

    # Save accuracy on test set
    params_dict["accuracy"] = pipe.score(X_te, y_te)

    # Save result
    test_scores.append(params_dict)

This time, we need a single for loop to iterate through each combination of parameters. At each iteration, we set the parameters of our pipeline using the set_params() function and the **kwargs syntax which is a way to work with keyword arguments. In short, the idea is to set the arguments of a function using a dictionary of (keyword, value) pairs. In our case, this dictionary corresponds to the params_dict variable.

In [11]:
# Value of params_dict for the first combination
params_dict = {"knn__n_neighbors": 1, "knn__p": 1, "knn__weights": "uniform"}

# Setting the parameters using the **kwargs syntax
pipe.set_params(**params_dict)

# .. is equivalent to
pipe.set_params(knn__n_neighbors=1, knn__p=1, knn__weights="uniform")


Pipeline(steps=[('scaler', StandardScaler()),
                ('knn', KNeighborsClassifier(n_neighbors=1, p=1))])

Finally, we fit the k-NN classifier, save its accuracy in the params_dict dictionary and add it to the test_scores list.

Again, we can use this list to create a DataFrame with the results and print the top five scores.

In [12]:
# Create DataFrame with test scores
scores_df = pd.DataFrame(test_scores)

# Top five scores
scores_df.sort_values(by="accuracy", ascending=False).head()

Unnamed: 0,knn__n_neighbors,knn__p,knn__weights,accuracy
28,8,1,uniform,0.813187
13,4,1,distance,0.813187
32,9,1,uniform,0.802198
12,4,1,uniform,0.802198
37,10,1,distance,0.802198


### Multiple grids
One of the advantages of the ParameterGrid object is that we can define many grids of parameters by passing a list of dictionaries.



In [13]:
# Define two grids (NOT THE FINAL)
grid = ParameterGrid(
    [
        {"knn__n_neighbors": [2, 3], "knn__p": [1, 2]},
        {"knn__weights": ["uniform", "distance"], "knn__p": [1, 2]},
    ]
)

# List combinations
list(grid)

[{'knn__n_neighbors': 2, 'knn__p': 1},
 {'knn__n_neighbors': 2, 'knn__p': 2},
 {'knn__n_neighbors': 3, 'knn__p': 1},
 {'knn__n_neighbors': 3, 'knn__p': 2},
 {'knn__p': 1, 'knn__weights': 'uniform'},
 {'knn__p': 1, 'knn__weights': 'distance'},
 {'knn__p': 2, 'knn__weights': 'uniform'},
 {'knn__p': 2, 'knn__weights': 'distance'}]

The first four elements correspond to the combinations from the first grid and the last four to the combinations from the second grid.

There is a small issue in this case. The first grid doesn’t specify the weights parameter and the second grid doesn’t specify the n_neighbors one. Hence, we can get unexpected results since we don’t set these parameters. One solution is to assign a default value to each one.

In [15]:
# Define two grids (FINAL DEFINITION FOR AVOIDING MESSAGES - USE DEFAULT VALUES)
grid = ParameterGrid(
    [
        {
            "knn__n_neighbors": [2, 3],
            "knn__weights": ["uniform"],  # Default value: uniform
            "knn__p": [1, 2],
        },
        {
            "knn__n_neighbors": [5],  # Default value: 5
            "knn__weights": ["uniform", "distance"],
            "knn__p": [1, 2],
        },
    ]
)

# List combinations
list(grid)

[{'knn__n_neighbors': 2, 'knn__p': 1, 'knn__weights': 'uniform'},
 {'knn__n_neighbors': 2, 'knn__p': 2, 'knn__weights': 'uniform'},
 {'knn__n_neighbors': 3, 'knn__p': 1, 'knn__weights': 'uniform'},
 {'knn__n_neighbors': 3, 'knn__p': 2, 'knn__weights': 'uniform'},
 {'knn__n_neighbors': 5, 'knn__p': 1, 'knn__weights': 'uniform'},
 {'knn__n_neighbors': 5, 'knn__p': 1, 'knn__weights': 'distance'},
 {'knn__n_neighbors': 5, 'knn__p': 2, 'knn__weights': 'uniform'},
 {'knn__n_neighbors': 5, 'knn__p': 2, 'knn__weights': 'distance'}]

The grid still has eight combinations, but this time, they all specify a value for each parameter.

### Optional steps
It’s also possible to disable a step by setting it to None. For instance, we can try fitting a k-NN model without standardization.

In [16]:
# Grid with optional steps
grid = ParameterGrid(
    {
        "scaler": [None, StandardScaler()],
        "knn__n_neighbors": [5, 10, 15],
    }
)

# List combinations
list(grid)


[{'knn__n_neighbors': 5, 'scaler': None},
 {'knn__n_neighbors': 5, 'scaler': StandardScaler()},
 {'knn__n_neighbors': 10, 'scaler': None},
 {'knn__n_neighbors': 10, 'scaler': StandardScaler()},
 {'knn__n_neighbors': 15, 'scaler': None},
 {'knn__n_neighbors': 15, 'scaler': StandardScaler()}]

This time, we’re not setting a parameter, but the step itself. Scikit-learn will disable the 'scaler' step when it gets a None value and will use the StandardScaler object in the other cases.