<div class="alert alert-block alert-info">
<b>Number of points for this notebook:</b> 0.8
<br>
<b>Deadline:</b> March 5, 2021 (Friday) 23:00
</div>

# Test exercise

**IMPORTANT**:
**In order to be admitted to the course, you need to solve this test exercise.**

The goal of that exercise is to make sure that
* you can write python code
* you can understand the instructions in the notebooks of the course
* you can understand documentation of machine learning libraries.

The task is to train a logistic regression model using [`sklearn`](https://scikit-learn.org/stable/index.html) library.

In [1]:
skip_training = True  # Set this flag to True before validation and submission

In [2]:
# During evaluation, this cell sets skip_training to True
# skip_training = True

In [3]:
import pickle
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import tools
import data

In [4]:
# When running on your own computer, you can specify the data directory by:
# data_dir = tools.select_data_dir('/your/local/data/directory')
data_dir = tools.select_data_dir()

The data directory is /coursedata


# Data

We will use the *winequality* dataset which contains red and white vinho verde wine samples rated by experts from 0 to 10 (obtained from [here](https://archive.ics.uci.edu/ml/datasets/wine+quality)).

In [5]:
trainset = data.WineQuality(data_dir, train=True, normalize=False)
x_train, quality_train = [t.numpy() for t in trainset.tensors]

testset = data.WineQuality(data_dir, train=False, normalize=False)
x_test, quality_test = [t.numpy() for t in testset.tensors]

In [6]:
# We will work with inputs normalized to zero mean and unit variance
mean = x_train.mean(axis=0)
std = x_train.std(axis=0)
scaler = lambda x: (x - mean) / std

x_train = scaler(x_train)
x_test = scaler(x_test)

We transform the learning task into a binary classification problem:
* class 0 (bad wines) correspond to wines with the quality smaller than 7.
* class 1 (good wines) correspond to the rest of the wines.

Your task is to implement a function that performs such a transformation.

In [7]:
def binarize_targets(quality):
    """
    Convert wine quality values to binary values.

    Args:
      quality of shape (n_samples,): Wine quality (integer values between 3 and 9).
    
    Returns:
      targets of shape (n_samples,): Binary targets for the classification problem:
                                      class 0: quality < 7 (bad wine)
                                      class 1: quality >= 7 (good wine)
    """
    # YOUR CODE HERE
    # raise NotImplementedError()
    return np.float32(quality >= 7)

In [8]:
targets_train = binarize_targets(quality_train)
targets_test = binarize_targets(quality_test)

assert targets_train.dtype == np.float32
assert targets_train.shape == quality_train.shape
print('Success')

Success


In [9]:
# This cell tests function binarize_targets()

# Logistic regression classifier

Your task is to train a logistic regression classifier using class [`LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) from sklearn.

Note:
* **The accuracy of the trained model should be greater than 0.78 on the test set.**

In [10]:
from sklearn.linear_model import LogisticRegression

In the cell below, you need tp Implement a function that creates the model and trains it using the provided data.

In [11]:
def create_and_train_logistic_regression(inputs, targets):
    """
    Args:
      inputs of shape (n_samples, n_inputs): Inputs in the training set.
      targets of shape (n_samples,): Targets (integer values wither 0 or 1).
    
    Returns:
      model: Trained model which is an instance of class LogisticRegression.
    """
    # YOUR CODE HERE
    # raise NotImplementedError()
    return LogisticRegression().fit(inputs,targets)

In [12]:
if not skip_training:
    model = create_and_train_logistic_regression(x_train, targets_train)

In [13]:
# Save the model to disk (the p-file will be submitted automatically together with your notebook)
if not skip_training:
    assert isinstance(model, LogisticRegression), 'model should be of type LogisticRegression.'
    pickle.dump(model, open('2_logreg.p', 'wb'))
else:
    model = pickle.load(open('2_logreg.p', 'rb'))

In [14]:
# This cell tests your model