<img src="Images/IMG-Wine-Quality_Banner.png" alt="Title Banner" style="display: block; margin-left: auto; margin-right: auto; width: 100%">

---
Wine is classy, [this is well known](https://tvtropes.org/pmwiki/pmwiki.php/Main/WineIsClassy). However, how can one classify wine quality, which is obviously very subjective, using its chemical properties?

<img src="Images/IMG-sklearn-logo.png" alt="Title Banner" style="float:right; display: block; margin-left: auto; margin-right: auto; width: 30%">

Let us try it! In this notebook, we will use several classifiers and more functions from the Python package [```scikit-learn```](https://scikit-learn.org/stable/user_guide.html) (short: ```sklearn```). ```Sklearn``` is a very powerful framework that offers many methods for machine learning with a unified interface. This allows you to easily compare various classifiers. Here, you will get in touch with several functions and objects for classification. In thhe notebook _"Prediction of wine alcohol content"_ you will work ```sklearn``` as well. However, it is more focussed on methods and objects for regression.

Please keep in mind that for both, classification and regression notebook, you should always read the official documentation and the user's guide of ```sklearn```  to get a deeper understanding for the different methods.

## Content
<table style="width:256; border: 1px solid black; display: inline-block">
  <tr>
    <td  style="text-align:right" width=64px><img src="Images/IMG-csv-in.png" style="float:left"></td>
      <td style="text-align:left" width=128px>
          <a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#import_data'>Import data</a>
      </td>
  </tr>
  <tr>
    <td style="text-align:right"><img src="Images/IMG-magnifying-glass.png" style="float:left"></td>
    <td style="text-align:left" width=128px><a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#analyze_data'>Analyze data</a>
      </td>
  </tr>
    <tr>
    <td style="text-align:right"><img src="Images/IMG-broom.png" style="float:left"></td>
    <td style="text-align:left" width=128px><a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#clean_data'>Clean up data</a>
        </td>
    </tr>
    <tr>
    <td style="text-align:right"><img src="Images/IMG-gears.png" style="float:left"></td>
    <td style="text-align:left" width=128px><a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#build_model'>Model selection</a>
        </td>
        <tr>
    <td style="text-align:right"><img src="Images/IMG-new-file-out.png" style="float:left"></td>
    <td style="text-align:left" width=128px><a style="color:black; font-size:14px; font-weight:bold; text-decoration:none" href='#save_model'>Store model</a>
        </td>
  </tr>
</table>

**Notice:** Random numbers are used in several parts of this notebook, e.g. initialize a neural network or split the dataset into training and test sets. Thus, it can happen that you cannot exactly reproduce some results. It can also happen that the relative ranking of different classifiers is slightly different, even when repeating some computations. However, this does not affect the general way to go in this notebook.

---


<a id='import_data'></a><div><img src="Images/IMG-csv-in.png" style="float:left"> <h2 style="position: relative; top: 6px; left:5px">1. Daten importieren</h2>
<p style="position: relative; top: 10px">
The data of the wines as it is used here, was collected during a <a href='#data_source'>scientific study</a>, and you can read the meaning of the features in the referred publication. The listed features are:


<table style="width:256; border: 1px solid black; display: inline-block">
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Fixed acidity</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px"><a href="https://waterhouse.ucdavis.edu/whats-in-wine/fixed-acidity">https://waterhouse.ucdavis.edu/whats-in-wine/fixed-acidity</a></p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Volatile acidity:</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px"><a href="https://waterhouse.ucdavis.edu/whats-in-wine/volatile-acidity">https://waterhouse.ucdavis.edu/whats-in-wine/volatile-acidity</a></p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Citric acid:</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px"><a href="https://waterhouse.ucdavis.edu/whats-in-wine/fixed-acidity">https://waterhouse.ucdavis.edu/whats-in-wine/fixed-acidity</a></p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Residual sugar:</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px"><a href="https://https://winefolly.com/deep-dive/what-is-residual-sugar-in-wine/">https://winefolly.com/deep-dive/what-is-residual-sugar-in-wine/</a></p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Chlorides:</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Chlorides</p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Free sulfur dioxide:</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px"><a href="https://waterhouse.ucdavis.edu/whats-in-wine/sulfites">https://waterhouse.ucdavis.edu/whats-in-wine/sulfites</a></p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Total sulfur dioxide:</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px"><a href="https://waterhouse.ucdavis.edu/whats-in-wine">https://waterhouse.ucdavis.edu/whats-in-wine</a></p>
        </td>
    </tr>
    <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Density:</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Density</p>
        </td>
    </tr>
        <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">pH:</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px"><a href="https://waterhouse.ucdavis.edu/whats-in-wine/volatile-acidity">https://waterhouse.ucdavis.edu/whats-in-wine/volatile-acidity</a></p>
        </td>
    </tr>
        <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Sulphates</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px"><a href="https://waterhouse.ucdavis.edu/whats-in-wine/sulfites">https://waterhouse.ucdavis.edu/whats-in-wine/sulfites</a></p>
        </td>
    </tr>
        <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Alcohol:</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Alcohol content</p>
        </td>
    </tr>
        <tr>
        <td style="text-align:left"><p style="color:black; font-size:14px; font-weight:bold">Quality:</p>
        </td>
        <td style="text-align:left"><p style="color:black; font-size:14px">Subjective wine quality - to be classified</p>
        </td>
    </tr>
    </table>   

<a id='data_source'></a><b>Quelle der Daten:</b> P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. 
  Modeling wine preferences by data mining from physicochemical properties.
  In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

</p>

We have two datasets, the first with red wines, the second with white wines. In this notebook, we assume that the color of the whine does not influence the result.

Thus, the first step is to import and connect the two datasets, which are located in ```Data/winequality-red.csv``` and ```Data/winequality-white.csv```.

In [3]:
import pandas as pd  # Siehe Preprocessing/Lego-Sets/Lego Sets Preprocessing.ipynb für eine Einführung in Pandas

# Datensätze importieren (pandas Funktion: pd.read_csv() Achtung: Trennzeichen ist hier das Semikolon ";"!)
df_red = 
df_white = 
# Verbinden Sie die beiden Datensätze mit pd.concat()
df = 

SyntaxError: invalid syntax (<ipython-input-3-05aaaf74a46f>, line 4)

<a id='analyze_data'></a><div><img src="Images/IMG-magnifying-glass.png" style="float:left"> <h2 style="position: relative; top: 6px; left:5px" >2. Analyze data</h2>
    
<p style="position: relative; top: 10px">

Next, we would like to have a closer look on the data. Do we need to convert any features? Do we have any missing values?

In [None]:
# Let us start with some descriptive statistics (pandas funct  describe()):


We make three important observations:

1. In total, we have data for 6497 wines (__```count```__). At least, we have as many entries in each column. Thus, we do not need to deal with missing features.
2. All features are numeric. Thus, we do not need to convert any data!
3. The data have different scales. As we can see, the mean (__```mean```__) and standard deviation ((__```std```__)) of different features differ in orders of magnitudes.

To deal with this issue, each column can be standardized columnwise to obtain zero mean and a standard deviation of one ($z$-Score normalization). The effect of this standardization is that all features are in the same magnitude without destroying the relative distribution.

Now, we can define our preprocessing pipeline:
- No need to convert anything or deal with missing features
- Standardization of the data recommended

---

<a id='clean_data'></a><div><img src="Images/IMG-broom.png" style="float:left"> <h2 style="position: relative; top: 6px; left:5px">3. Clean up data</h2>
<p style="position: relative; top: 10px">

As we have found, we don not need to convert anything or deal with missing features. It is sufficient to standardize the features.  </p>

Before standardizing the features, we need to split the data into training and test set. Otherwise, we would use data that the model should not see in any training phase. Therefore, at first we divide the dataset into the feature vector sequence $X$ and the class labels $y$.

In [None]:
## We separate the entire dataset into features and predicted wine quality labels.
X = df.drop(columns=['quality']) # All columns of df but 'quality'
y = df['quality'] # Only the column 'quality'

Now, we do the actual split into training and test set. Therefore, we use the function [```train_test_split()```](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) from ```scikit-learn```.

In [None]:
# Import the function train_test_split from the module model_selection from scikit-learn

from PACKAGE.MODULE import FUNCTION

# Use the function train_test_split() to obtain a hold-out test set that contains 20% of the entie dataset

X_train, X_test, y_train, y_test =

Now, we can estimate the parameters for the standardization using the training set, and apply these on the training AND test set. Of course, we could manually subtract the mean and divide by the standard deviation in each column. However, with the object  [```StandardScaler```](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) it is much easier and the code looks much more elegant:

In [None]:
# Import the object StandardScaler from the module preprocessing from sklearn
from PACKAGE.MODULE import OBJECT

# Instanciate an object StandardScaler()
stdScaler = 

# Compute the standardization parameter on the training set (!) X_train using the function "fit()"

# Apply **the same** standardization on the training and test set using the function transform()
X_train = 
X_test = 

Notice that we have not used any information from the test set in our calculations. The ```StandardScaler``` has been parameterized only on the training set!

Now, we have already finished the preparation of this very well designed dataset, and the next step is the model selection.

---

<a id='build_model'></a><div><img src="Images/IMG-gears.png" style="float:left"> <h2 style="position: relative; top: 6px; left:5px">4. Model selection</h2>
<p style="position: relative; top: 10px">
Now we will train various classifiers on the standardized training data and compare them using the held-back test set. Note that we **must not use** use the test set for any hyper parameter optimization, because then the test set would be part of the training set. Instead, we will use a $k$-fold cross validation on the training set to estimate the optimal hyper parameters for each classifier.

The first question that arises is how to select the first hyper parameter: How large should we select $k$?


We can find the answer to this question by looking on the distribution of the training labels $y_\mathrm{train}$. Therefore, we import a package to visualize data and plot the histogram of the training labels.


Python contains a bunch of packages for visualization. Here, we use [matplotlib](https://matplotlib.org/).</p>

In [2]:
# Import the plot module from matplotlib
from matplotlib import pyplot as plt
# This is a so-called "magic-command", that allows the visualization of plots directly in the notebook.
# If you forget this, you will not see your plot!
%matplotlib inline

# Histogram visualization of the training labels
plt.hist(y_train)
plt.xlabel("Quality")
plt.ylabel("Frequency")

NameError: name 'y_train' is not defined

Obviously, we have only a few quality labels 3 and 9. This can lead to problems if there exist less observations of a specific label than $k$: Some folds in the $k$-fold cross validation will have no observation of this labels at all. Thus, we should select a $k$ that is not larger than the lowest frequency of a label.

We can easily get this lowest frequency. Therefore, we use functions from the package [```numpy```](https://numpy.org/) that is optimized for numerical computations with matrices.

In [None]:
# Import of numpy
import numpy as np

# Compute the frequency of each value in the array y_train
labels, counts = np.unique(y_train, return_counts=True)
# find the lowest frequency
min_counts = np.min(counts)
# find the label with the lowest frequency
rarest_label = labels[counts == min_counts]
# Print this labels together with its frequency
print("The label {} exists only {} times in the training data!".format(*rarest_label, min_counts))

The maximum number of folds is thus $k=4$.

Using this information, we train several classifiers on the training set.

The simpliest classifier in ```sklearn``` that you have learned in the lecture, is the [$k$ Nearest Neighbors Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html).

In [None]:
# Import the object KNeighborsClassifier from the module neighbors in the package sklearn
from PACKAGE.MODULE import OBJECT

# Instanciate an object of type KNeighborsClassifier with standard settings for all parameters
knn_model = KNeighborsClassifier()
# Train ("Fit") the model
knn_model.fit(X_train, y_train)

The performance of this classifier (as well as of any other classifier in ```sklearn```) on any dataset can be computed with the method [```score()```](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier.score). For classifiers, the default metric is the Accuracy:

$$\mathrm{Accuracy} = \frac{\text{Number of correctly classified observations}}{\text{Total number of observations}}$$

In [None]:
# Training accuracy (in percent)
train_score_knn = knn_model.score(X_train, y_train)*100
# Test accuracy (in percent)
test_score_knn = knn_model.score(X_test, y_test)*100
# Output of the performance measures
print("Standard KNN - Training: {:.2f} % Test: {:.2f} %".format(train_score_knn, test_score_knn))

Typically, the performance with default settings is not yet very good. A hyper parameter optimization can strongly improve this basic result. Therefore, we use a $k$-fold cross validation with $k=4$ to make sure that every label is present in every fold at least one time.

To optimize the hyper parameters ("Tuning the model") based on cross validation, ```sklearn``` offers the object [```GridSearchCV```](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) and more in the module ```model_selection```. This object performs a grid search on a pre-defined parameter grid, and computes the performance of the parameterized model on every point in the parameter grid using cross validation. Therfore, it requires the classifier to be optimized, the parameter grid and more optional control parameters, such as the number of folds for the cross validation.

The values of the hyper parameters to be evaluated are passed as a list of [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries). Each dictionary contains a grid to be evaluated.

In the end, the model with the best performing set of hyper parameters is trained on the entire training set so that the return value of the method ```fit()``` is an ideal classifier for the given hyper parameters.

**It is recommende to perform the hyper parameter optimization on a parallel computer!*** As every hyper parameter combination and fold is independent from each other, an almost perfect parllel [speedup](https://en.wikipedia.org/wiki/Speedup) is possible. A simple example:

You would like to check, which number between 1 and 10 is the best number of neighbors for the $k$ nearest neighbors classifier. Therefore, you need to compute 10 models with 10 different values for $k$ and compare them. Furthermore, you want to use 4-fold cross validation. Thus, you need to train and evaluate 40 models in total. On a single core CPU, you need around one minute for that. However, all computations are independent from each other. Thus, exactly the same training can be computed on a parallel computer with, e.g. 20 CPUs, in 1/20 of the time, in this example in only 3 seconds. If you have even more hyper parameters to be optimized, or would like to evaluate more values, the required computational power explodes, only due to the high number of points to be evaluated on the grid. Thus, parallel processing is very important here. ***We strongly recommend working on a HPC if possible!***

In [None]:
# Import the object GridSearchCV from the module model_selection in the package sklearn
from PACKAGE.MODULE import OBJECT

# Choice of the hyper parameters to be evaluated (all unspecified ones remain default)
param_grid_knn = [
    {'n_neighbors': np.arange(1, 100, 1),  # Number of neighbors; "arange" is a numpy function to create a range
     'weights': ['uniform', 'distance'], # Weights of the neighbors for decision making
     'p': [1, 2]},                       # Exponent of the decision function (1: Manhattan; 2: Euclidean)
]

# Instanciate the object "GridSearchCV"
knn_model = GridSearchCV(KNeighborsClassifier(), # Classifier to be used
                         param_grid_knn,         # Hyper parameter grid
                         cv=4,                   # Number of folds for cross validation
                         verbose=10,             # Amount of printed output during the grid search (larger number -> more information)
                         n_jobs=-1               # Number of CPUs to be used in parallel; -1: use all available CPUs
                        )

# Train the model
knn_model.fit(X_train, y_train)

Note the high number of models to be trained in the first line of the output of the fit function, although we have selected only a few hyper parameters.

The best hyper parameter combination can be obtained using the property ```best_params_``` of the instance of ```GridSearchCV```:

In [None]:
print(knn_model.best_params_)

To compare the performance with the unoptimized $k$ nearest neighbors classifier, we compute again the accuracies on the training and test set:

In [None]:
train_score_knn = 
test_score_knn = 
print("Tuned KNN - Training: {:.2f} % Test: {:.2f} %".format(train_score_knn, test_score_knn))

Much better, can we do it even better with another model?

Let us use a support vector machine for classification. Therefore, we use the object ```SVC``` from ```sklearn``` . Note that the programming scheme for training and test without hyper parameter optimization is exactly the same as for the $k$ nearest neighbors classifier:

In [None]:
# Import the object SVC from the module svm in the package sklearn
from PACKAGE.MODULE import OBJECT

# Instanciate an object
svm_model = 
# Train a model

# Compute training and test score in percent
train_score_svm = 
test_score_svm = 

# Print the performance
print("Support Vector Machine - Training: {:.2f} % Test: {:.2f} %".format(train_score_svm, test_score_svm))

The performance of the unoptimized SVM is similar as of the unoptimized KNN. However, support vector machines have their hyper parameter that you should optimize, especially the kernel and the parameter $C$ (see lecture) are important.

**Attention:** The training time is strongly dependent of the parameter $C$. For small values, the training can be done within seconds. For large values, it can last minutes. Thus note that the cross validation seems to be "frozen" during the optimization.

In [None]:
# Define a parameter grid
param_grid_svm = [
    {'C': np.logspace(-5, 2, 8), # The numpy function "logspace" returns a logarithmic row (z. B. [0.01, 0.1, 1, 10])
     'kernel': ['linear', 'rbf']}
]

# Define a grid search
svm_model = 
# Perform the grid search


Again, print the best hyper parameters and the performance on the training and test set.

In [None]:
# Print the best hyper parameters

# Compute training and test score in percent
train_score_svm = 
test_score_svm = 

# Print the performance


Finally, we want to try out a neural network. Unfortunately, ```Sklearn``` does not have any methods for deep learning. However, at least they offer a  multilayer perceptron (MLP) as the object ```MLPClassifier```. The hyper parameters for neural networks always need to be optimized, because already the number of neurons in every layer and the number of layers is strongly task dependent and there are no good default parameters.

In [None]:
# Import the object MLPClassifier from the module neural_network in the package sklearn
from PACKAGE.MODULE import OBJECT

# Neural networks have many hyper parameters, from which we will investigate only a small subset here
param_grid_mlp = [
    {
        'hidden_layer_sizes': [(100, 50), (100, 100, 50), (100, 100, 100, 50)],
        'alpha': np.logspace(-5, 1, 7),
        'activation': ['tanh', 'relu'],
        'early_stopping': [True, False],
        'learning_rate': ['adaptive'],
    }    
]

# Define a grid search
mlp_model = 
# Perform a grid search


Note how many models needed to be trained here, and how long the training took. The considered hyper parameters was only a small subset of all tunable hyper parameters, and we have strongly limited the regarded value range. It is very likely that, somewhere in the hyper parameter space, exists a much better combination of hyper parameters. Of course, this risk exists for all classifiers, but especially for neural networks due to their large amount of hyper parameters.

Again, return the best hyper parameters, the training and test performance.

In [None]:
# Print the best determined hyper parameters
print(mlp_model.best_params_)

# Compute the training and test score in percent
train_score_mlp = 
test_score_mlp = 

# Print the performance


To select our final model, we print the performance of all optimized classifiers on the test set together:

In [None]:
print("KNN: {:.2f} %".format(test_score_knn))
print("SVM: {:.2f} %".format(test_score_svm))
print("MLP: {:.2f} %".format(test_score_mlp))

On a first glance, it looks as if the $k$ nearrest neighbors classifier would be the best, but the overall performance is generally worse. We should reconsider the labels for this specific task:

In general, two labels of a classification do not have any relationship. For example, consider the classification of animals: The label "Dog" for an image of a cat is exactly as wrong as the label "Mouse". However, for some tasks we have different kinds of errors that are weighted differently. For the image classification with animals, for example, one can postulate that the label "Dog" for a cat ias as wrong as the label "Mouse", but it is still better than the label "Dinosaur", because the latter one is extincted as well as a reptile.

In our example, we also have a relationship between labels: Classifying a wine as "4" although it was rated as "8" is obviously worse than classifying the same wine as "7". The accuracy as metric just knows "exactly true" are "false", without any steps in between.

We can define a new metric by ourselves, if we say that a deviation of $\pm 1$ in the classification is still correct. Therefore, we need to implement the following steps for each classifier:

1. Prediction of the quality
2. Computation of the absolute difference to the correct quality
3. Ratio betweeen number of predictions with a deviation of $\leq 1$ to the total number of predictions

In [None]:
# Predictions of the KNN
y_predicted = knn_model.predict(X_test)
# Computation of the "one-step-away performance"
one_off_knn = np.sum(np.abs(y_predicted-y_test) <= 1 ) / len(y_predicted) * 100
# Print
print("KNN one-off accuracy: Test {:.2f}".format(one_off_knn))

# Predictions of the SVM
y_predicted = svm_model.predict(X_test)
# Computation of the "one-step-away performance"
one_off_svm = np.sum(np.abs(y_predicted-y_test) <= 1 ) / len(y_predicted) * 100
# Print
print("SVM one-off accuracy: Test {:.2f}".format(one_off_svm))

# Predictions of the MLP
y_predicted = mlp_model.predict(X_test)
# Computation of the "one-step-away performance"
one_off_mlp = np.sum(np.abs(y_predicted-y_test) <= 1 ) / len(y_predicted) * 100
# Print
print("MLP one-off accuracy: Test {:.2f}".format(one_off_mlp))

We notes a slighly different ranking of the classifiers: Although the KNN classifier is still the best, the MLP is now slightly worse than the SVM. This means that the false predictions of the MLP are more far away than the ones of the SVM.

**NOTICE:** With this approach, we have changed our evaluation objective in the very end of our workflow, which is no good practice. Actually, we should have used the "one-off accuracy" already during the training. This is straightforward. You need to define a function  ```one_off_accuracy(model, X, y)``` and pass it during the instantiation of the ```GridSearchCV``` object via the parameter ```scoring```. Functions in Python are normal objects and can be passed via their name, just as any other object. Try it out!

In [None]:
# Definition of our adapted metric ("scoring function")
def one_off_accuracy(model, X, y):
    # Make predictions using X
    y_predicted = 
    # Computation of the "one-off-accuracy"
    one_off = 
    
    # Return the computed value
    return one_off

In [None]:
# Repeat the grid search (exactly the same parameter grid as before, just with the new scoring function)
knn_model = GridSearchCV(KNeighborsClassifier(), param_grid_knn, scoring=HIER_IHRE_METRIK, cv=4, verbose=10, n_jobs=-1)

# Perform the grid search

# Print the performance

print("Tuned KNN: {:.2f} %".format(one_off_accuracy(knn_model, X_test, y_test)))

In [None]:
# Repeat the grid search (exactly the same parameter grid as before, just with the new scoring function)
svm_model = 

# Perform the grid search

# Print the performance

print("Tuned SVM: {:.2f} %".format(one_off_accuracy(svm_model, X_test, y_test)))

In [None]:
# Repeat the grid search (exactly the same parameter grid as before, just with the new scoring function)
mlp_model = 

# Perform the grid search

# Print the performance

print("Tuned MLP: {:.2f} %".format(one_off_accuracy(mlp_model, X_test, y_test)))

Obviously, the training with the new metric has again slightly improved the performance on the test dataset. This was expected, as we have used the same scoring function for training and test now.


---
<a id='save_model'></a><div><img src="Images/IMG-new-file-out.png" style="float:left"> <h2 style="position: relative; top: 6px; left:5px">5. Store model</h2>
<p style="position: relative; top: 10px">
The KNN classifier is still the best, although we have used the new metric now. However, the performance is so close to the SVM now that the difference can almost be disregarded. KNN classifiers are relatively slowly, because the similarity of a new observation needs to be compared with every observation from the training set. Thus, the slightly worse performance can probably be compensated by the prediction speed.

Thus, we save both models. Therefore, we import the module ```pickle``` and write the models as well as the object for standardization in a binary file. Notice that, if we would not store the standardizer, a user of our model could never standardize new data and make useful predictions!

In [None]:
# Import the module pickle
import pickle

with open('wine_quality_model.pickle', 'wb') as model_file:
    # Write the three objects in the file
    pickle.dump([knn_model, svm_model, stdScaler], model_file)

If you would like to use the models later, you can directly load them from the binary file:

In [None]:
with open('wine_quality_model.pickle', 'rb') as model_file:
    # Load the objects from the file in three new variables
    knn, svm, scaler = pickle.load(model_file)

That's it!

<img src="Images/IMG-cheers.gif" alt="Footer" style="display: block; margin-left: auto; margin-right: auto; width: 100%">


---
<div>Wine data from <a href="http://archive.ics.uci.edu/ml/datasets/Wine">UCI Machine Learning Repository</a></div>
<div>Footer via <a href="https://media.giphy.com/media/BNjLM5WNcVyM0/giphy.gif">Giphy</a></div>
<div>Icons made by <a href="https://www.flaticon.com/authors/swifticons" title="Swifticons">Swifticons</a> from <a href="https://www.flaticon.com/" title="Flaticon">www.flaticon.com</a></div>
<div>Notebook created by Yifei Li und <a href="mailto:simon.stone@tu-dresden.de?Subject=Question%20about%20Jupyter%20Notebook%20Titanic" target="_top">Simon Stone</a> und <a href="mailto:peter.steiner@tu-dresden.de?Subject=Question%20about%20Jupyter%20Notebook%20Titanic" target="_top">Peter Steiner</a></div>