# Machine Learning LAB 1 
Course 2022/23: F. Barbato, M. Mel, P. Zanuttigh

The notebook contains some simple tasks to be performed about **classification and regression**. <br>
Complete all the **required code sections** and **answer to all the questions**. <br>

### IMPORTANT for the evaluation score:
1. **Read carefully all cells** and **follow the instructions**
1. **Rerun all the code from the beginning** to obtain the results for the final version of your notebook, since this is the way we will do it before evaluating your notebooks.
2. Make sure to fill the code in the appropriate places **without modifying the template**, otherwise you risk breaking later cells.
3. Please **submit the jupyter notebook file (.ipynb)**, do not submit python scripts (.py) or plain text files. **Make sure that it runs fine with the restat&run all command** - otherwise points will be deduced.
4. **Answer the questions in the appropriate cells**, not in the ones where the question is presented.

## A) Classification of Day/Night

Place your **name** and **ID number** (matricola) in the cell below. <br>
Also recall to **save the file as Surname_Name_LAB1.ipynb** otherwise your homework could get lost
<br>

**Student name**: Francesco Zane<br>
**ID Number**: 2076717

### Dataset description

The data was recorded using the new **Luxottica I-SEE glasses** in exterior conditions. These devices provide multiple **sensors mounted inside the glasses**, which can be accessed through a bluetooth connection. <br>
For the **classification** part of the notebook we will focus on the **UVA**, **UVB** and **pressure** sensors, with the goal of discriminating between **day and night** time.

![I-SEE Glasses](data/isee.png "I-SEE")

We first **import** all **the packages** that are needed.

In [1]:
import csv
from matplotlib import pyplot as plt
import numpy as np
from sklearn import linear_model, preprocessing

Change some global settings for layout purposes.

In [2]:
# if you are in the jupyter notebook environment you can change the 'inline' option with 'notebook' to get interactive plots
%matplotlib notebook
# change the limit on the line length and crop to 0 very small numbers, for clearer printing
np.set_printoptions(linewidth=500, suppress=True)

## A.1) Perceptron
In the following cells we will **implement** the **perceptron** algorithm and use it to learn a halfspace.

**TO DO (A.1.0):** **Set** the random **seed** using your **ID**. If you need to change it for testing add a constant explicitly, eg.: 1234567 + 1

In [3]:
IDnumber = 2076717 # YOUR_ID
np.random.seed(IDnumber)

Before proceding to the training steps, we **load the dataset and split it** in training and test set (the **training** set is **typically larger**, here we use a 75% training 25% test split).
The **split** is **performed after applying a random permutation** to the dataset, such permutation will **depend on the seed** you set above. Try different seeds to evaluate the impact of randomization.<br><br>
**DO NOT CHANGE THE PRE-WRITTEN CODE UNLESS OTHERWISE SPECIFIED**

In [4]:
def load_dset(filename, features=[2,3,9], label_id=-1, mode='clas'):
    # Load the dataset
    with open(filename, newline='\n') as f:
        raw_data = csv.reader(f, delimiter=',')# "lista" di righe che vengono trasformate in liste i cui elementi
                                               # sono gli elementi della riga divisi da virgole
        header = next(raw_data)                # skip first line
        print(f"Header: {header}\n")

        dataset = np.array(list(raw_data))     # al posto di una lista di liste creo una matrice
        print(f"Data shape: {dataset.shape}\n")# qui mi manda a schermo le dimensioni della matrice appena creata
        print("Dataset Example:")
        #print(dataset[0,...])                  # print the first row of the dataset, same of print(dataset[0,:])

    X = dataset[:,features].astype(float)      # extract, for each line, the selected features, default refers to
                                               # the first part of the lab: [uva, uvb, pressure] and convert them 
                                               # into float
    # in the file the last column is refered to the night(=0) or day(=1) and I want to transform these labes from 0,1
    # to -1,1
    if mode == 'clas':
        Y = dataset[:,label_id].astype(int)    # if we are in classification mode, get the labels from the provided index as indices
        Y = 2*Y-1                              # for the perceptron night --> -1, day --> 1
    else:
        Y = dataset[:,label_id].astype(float)  # otherwise get them as floats

    m = dataset.shape[0]                       # numero di righe
    print("\nNumber of samples loaded:", m)
    permutation = np.random.permutation(m)     # random permutation

    X = X[permutation]
    Y = Y[permutation]
    
    # in this way I have permutated ramdomically the m rows
    
    return X, Y

In [5]:
dataset_mine = np.array([[1,2,3,-1],
              [4,5,6,0],
              [7,8,9,1],
              [10,11,12,2],
              [13,14,15,3]])

print(dataset_mine)

X_mine = dataset_mine[:,:3]
Y_mine = dataset_mine[:,-1]

print()
print(X_mine)
print(Y_mine)


m_mine = dataset_mine.shape[0]                       
print("\nNumber of samples loaded:", m_mine)
permutation_mine = np.random.permutation(m_mine)    # prende il valore m_mine e crea un vettore inserendo in modo 
                                                    # randomico tutti i valori interi x con 0 < x < m_mine-1

print()
print(permutation_mine)
X_mine = X_mine[permutation_mine]  # quello che X_mine = X_mine[permutation_mine] fa è 
                #(lista indici)    # X_mine[i] = X_mine[permutation_mine[i]]     per ogni riga di X
                                   # ovvero modifica ogni riga di X mettendo al suo posto un'altra riga di X
                                   # segundo il vettore permutation_mine   
Y_mine = Y_mine[permutation_mine]  # stessa cosa per il vettore Y

print()
print(X_mine)
print(Y_mine)

print()
tot = np.zeros((X_mine.shape[0],X_mine.shape[1]+1))
for row in range(X_mine.shape[0]):
    riga = list(X_mine[row,:])
    riga.append(Y_mine[row])
    tot[row,:]=riga

print(tot)


[[ 1  2  3 -1]
 [ 4  5  6  0]
 [ 7  8  9  1]
 [10 11 12  2]
 [13 14 15  3]]

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]]
[-1  0  1  2  3]

Number of samples loaded: 5

[0 1 4 2 3]

[[ 1  2  3]
 [ 4  5  6]
 [13 14 15]
 [ 7  8  9]
 [10 11 12]]
[-1  0  3  1  2]

[[ 1.  2.  3. -1.]
 [ 4.  5.  6.  0.]
 [13. 14. 15.  3.]
 [ 7.  8.  9.  1.]
 [10. 11. 12.  2.]]


In [6]:
# Load the dataset
X, Y = load_dset('data/lux.csv')

Header: ['rh', 'temp', 'uva', 'uvb', 'x', 'y', 'blueg', 'blueb', 'worn', 'pressure', 'timestamp', 'hour', 'minute', 'second', 'isnight']

Data shape: (2177, 15)

Dataset Example:

Number of samples loaded: 2177


We are going to differentiate (classify) between **class "1" (day)** and **class "-1" (night)**

# Split data in training and test sets



Given $m$ total data, denote with $m_{t}$ the part used for training. Keep $m_t$ data as training data, and $m_{test}:= m-m_{t}$. <br>
For instance one can take $m_t=0.75m$ of the data as training and $m_{test}=0.25m$ as testing. <br>
Let us define as define

$\bullet$ $S_{t}$ the training data set

$\bullet$ $S_{test}$ the testing data set


The reason for this splitting is as follows:

TRAINING DATA: The training data are used to compute the empirical loss
$$
L_S(h) = \frac{1}{m_t} \sum_{z_i \in S_{t}} \ell(h,z_i)
$$
which is used to estimate $h$ in a given model class ${\cal H}$.
i.e. 
$$
\hat{h} = {\rm arg\; min}_{h \in {\cal H}} \, L_S(h)
$$

TESTING DATA: The test data set can be used to estimate the performance of the final estimated model
$\hat h_{\hat d_j}$ using:
$$
L_{{\cal D}}(\hat h_{\hat d_j}) \simeq \frac{1}{m_{test}} \sum_{ z_i \in S_{test}} \ell(\hat h_{\hat d_j},z_i)
$$

**TO DO (A.1.1):** **Divide** the **data into training and test set** (**75%** of the data in the **first** set, **25%** in the **second** one). <br>
<br>
Notice that as is common practice in Statistics and Machine Learning, **we scale the data** (= each variable) so that it is centered **(zero mean)** and has **standard deviation equal to 1**. <br>
This helps in terms of numerical conditioning of the (inverse) problems of estimating the model (the coefficients of the linear regression in this case), as well as to give the same scale to all the coefficients.

In [7]:
m = np.zeros((X.shape[0], X.shape[1]+1))   # riunisco X e Y in un unica matrice
for row in range(m.shape[0]):
    riga = list(X[row,:])
    riga.append(Y[row])
    m[row,:]=riga
print(m.shape)

dim_training = int(m.shape[0]*0.75//1)
dim_test = m.shape[0]-dim_training

m_training = m[:dim_training, :]

m_test = m[dim_training:, :]

X_training = X[:dim_training, :]

Y_training = Y[:dim_training]

X_test = X[dim_training:, :]

Y_test = Y[dim_training:]


print("Number of samples in the train set:", X_training.shape[0])
print("Number of samples in the test set:", X_test.shape[0])
print("\nNumber of night instances in test:", np.sum(Y_test==-1))
print("Number of day instances in test:", np.sum(Y_test==1))


# standardize the input matrix
# the transformation is computed on training data and then used on all the 3 sets
scaler = preprocessing.StandardScaler().fit(X_training) 

np.set_printoptions(suppress=True) # sets to zero floating point numbers < min_float_eps
X_training =  X[:dim_training, :]
print ("\nMean of the training input data:", X_training.mean(axis=0))
print ("Std of the training input data:", X_training.std(axis=0))

X_test =  X[dim_training:, :]
print ("\nMean of the test input data:", X_test.mean(axis=0))
print ("Std of the test input data:", X_test.std(axis=0))

(2177, 4)
Number of samples in the train set: 1632
Number of samples in the test set: 545

Number of night instances in test: 281
Number of day instances in test: 264

Mean of the training input data: [1.54831112 0.44134667 0.99313922]
Std of the training input data: [3.13159172 0.9759003  0.00391479]

Mean of the test input data: [1.57227593 0.44854552 0.9929836 ]
Std of the test input data: [2.97680353 0.93501953 0.00414219]


We **add a 1 in front of each sample** so that we can use a vector in **homogeneous coordinates** to describe all the coefficients of the model. This can be done with the function $hstack$ in $numpy$.

In [8]:
def to_homogeneous(X_training, X_test):
    # Add a 1 to each sample (homogeneous coordinates)
    X_training = np.hstack( [np.ones( (X_training.shape[0], 1) ), X_training] )
    X_test = np.hstack( [np.ones( (X_test.shape[0], 1) ), X_test] )
    
    return X_training, X_test

In [9]:
# convert to homogeneous coordinates using the function above
X_training, X_test = to_homogeneous(X_training, X_test)
print("Training set in homogeneous coordinates:")
print(X_training[:10])      # it's the same to write    print(X_training[ :10, : ])   first 10 rows

Training set in homogeneous coordinates:
[[1.         0.0967619  0.00689714 0.99923584]
 [1.         2.65994444 0.75228444 0.99691221]
 [1.         2.09261364 0.58387818 0.9918045 ]
 [1.         0.10013462 0.00655385 0.99005637]
 [1.         0.127      0.0071     0.99332346]
 [1.         0.0635     0.         0.99773995]
 [1.         0.0635     0.00426    0.99419196]
 [1.         0.10583333 0.00355    0.99066042]
 [1.         3.67303922 1.05932    0.9912934 ]
 [1.         0.142875   0.008165   0.99317995]]


**TO DO (A.1.2):** Now **complete** the function *perceptron*. <br>
The **perceptron** algorithm **does not terminate** if the **data** is not **linearly separable**, therefore your implementation should **terminate** if it **reached the termination** condition seen in class **or** if a **maximum number of iterations** have already been run, where one **iteration** corresponds to **one update of the perceptron weights**. In case the **termination** is reached **because** the **maximum** number of **iterations** have been completed, the implementation should **return the best model** seen throughout .

The input parameters to pass are:
- $X$: the matrix of input features, one row for each sample
- $Y$: the vector of labels for the input features matrix X
- $max\_num\_iterations$: the maximum number of iterations for running the perceptron

The output values are:
- $best\_w$: the vector with the coefficients of the best model (or the latest, if the termination condition is reached)
- $best\_error$: the *fraction* of misclassified samples for the best model

In [10]:
def count_errors(current_w, X, Y):
    pred_lista = []
    wrong_lista = []
    for sample in range (X.shape[0]):
        result = np.sign(np.inner(X[sample,:], current_w))
        pred_lista.append(result)
        
        if result*Y[sample]>0:
            wrong_lista.append(0)
        else:
            wrong_lista.append(True)                
            
    pred = np.array(pred_lista)
    wrong = np.array(wrong_lista)
    num_misclassified = wrong.sum()     
    
    if num_misclassified > 0:
        index_misclassified = np.where(wrong)[0][0]
    else:
        index_misclassified = -1 
    
    return num_misclassified, index_misclassified   


def perceptron_update(current_w, x, y):
    new_w = current_w + y*x      # x is a 4-vector features associated to the wrong sample and y its label
    return new_w


def perceptron(X, Y, max_num_iterations):
    
    num_samples = X.shape[0]   
    
    best_error = num_samples+1

    curr_w = np.zeros((1,4))
    
    best_w = curr_w.copy()
    
    num_misclassified, index_misclassified = count_errors(curr_w, X, Y)

    if num_misclassified < best_error:
        best_error = num_misclassified
        best_w = curr_w
    
    num_iter = 0

    while index_misclassified != -1 and num_iter < max_num_iterations:        
        
        curr_w = perceptron_update(curr_w, X[index_misclassified,:], Y[index_misclassified])

        per = np.random.permutation(num_samples) 
        X = X[per]
        Y = Y[per]
        
        num_misclassified, index_misclassified = count_errors(curr_w, X, Y)
        
        if num_misclassified < best_error:
            best_error = num_misclassified
            best_w = curr_w
        
        num_iter += 1

    best_error = best_error/num_samples
    
    return best_w, best_error


Now we use the implementation above of the perceptron to learn a model from the training data using 100 iterations and print the error of the best model we have found.

In [11]:
# Now run the perceptron for 100 iterations
w_found, error = perceptron(X_training,Y_training, 100)
print("Training Error of perceptron (100 iterations): " + str(error))
print("Error of " + str(error*100) + "%")


Training Error of perceptron (100 iterations): 0.032475490196078434
Error of 3.2475490196078436%


**TO DO (A.1.3):** use the best model $w\_found$ to **predict the labels for the test dataset** and print the fraction of misclassified samples in the test set (the test error that is an estimate of the true loss).

In [12]:
num_errors, _ = count_errors(w_found, X_test,Y_test)

true_loss_estimate = num_errors/Y_test.shape[0]
print("Test Error of perceptron (100 iterations): " + str(true_loss_estimate))
print("Error of " + str(true_loss_estimate*100) + "%")

Test Error of perceptron (100 iterations): 0.047706422018348627
Error of 4.770642201834862%


**TO DO (A.Q1) [Answer the following]** <br>
What about the difference between the training error and the test error  in terms of fraction of misclassified samples? Explain what you observe. (Notice that with a very small dataset like this one results can change due to randomization, try to run with different random seeds if you get unexpected results).

<div class="alert alert-block alert-info">
**ANSWER A.Q1**:<br>
Here I can notice than, as I aspect, the value of the training error is smaller than the value of the true loss estimate. This appends becuase in the test set there are some input never seen from the algorithm during the 
    
    
    FINISCIIIIIIIIIIIIIIIIII
 </div>

**TO DO (A.1.4):** Copy the code from the last 2 cells above in the cell below and repeat the training with 3000 iterations. Then print the error in the training set and the estimate of the true loss obtained from the test set.

In [13]:
w_found, error = perceptron(X_training,Y_training, 3000) 
print("Training Error of perceptron (3000 iterations): " + str(error))
print("Error of " + str(error*100) + "%")

num_errors, _ =  count_errors(w_found, X_test,Y_test)

true_loss_estimate = num_errors/Y_test.shape[0]
print("Test Error of perceptron (3000 iterations): " + str(true_loss_estimate))
print("Error of " + str(true_loss_estimate*100) + "%")

Training Error of perceptron (3000 iterations): 0.029411764705882353
Error of 2.941176470588235%
Test Error of perceptron (3000 iterations): 0.045871559633027525
Error of 4.587155963302752%


**TO DO (A.Q2) [Answer the following]** <br>
What about the difference between the training error and the test error in terms of the fraction of misclassified samples) when running for a larger number of iterations? Explain what you observe and compare with the previous case.

<div class="alert alert-block alert-info">
**ANSWER A.Q2**:<br>
The answers don't change very much and in both cases the error is bigger than zero. This means that the data are not perfectly linearly divisible. In this situation the algorithm will never fild a perfect solution, the best that it can do is to minimize the number of misclassified samples. The fact that with 100 iterations and with 3000 iterations we obtain similar results means that already 100 iterations are enough for the algorithm to obtain the solution that minimizes the number of misclassified samples. 
    
    
        CONTROLAAAAAAAAAAAAAAAAAAAAAAAA
</div>

# A.2) Logistic Regression
Now we use **logistic regression**, exploiting the implementation in **Scikit-learn**, to predict labels. We will also plot the decision boundaries of logistic regression.

We first load the dataset again.

In [14]:
# Load the dataset
X, Y = load_dset('data/lux.csv')

Header: ['rh', 'temp', 'uva', 'uvb', 'x', 'y', 'blueg', 'blueb', 'worn', 'pressure', 'timestamp', 'hour', 'minute', 'second', 'isnight']

Data shape: (2177, 15)

Dataset Example:

Number of samples loaded: 2177


**TO DO (A.2.1):** As for the previous part, **divide the data** into training and test (75%-25%) and **add a 1 as first component** to each sample.

In [15]:
# compute the splits
m_training = # ADD YOUR CODE HERE

# m_test is the number of samples in the test set (total-training)
m_test =  # ADD YOUR CODE HERE

# X_training = instances for training set
X_training =  # ADD YOUR CODE HERE
# Y_training = labels for the training set
Y_training =  # ADD YOUR CODE HERE

# X_test = instances for test set
X_test =   # ADD YOUR CODE HERE
# Y_test = labels for the test set
Y_test =  # ADD YOUR CODE HERE

print("Number of samples in the train set:", X_training.shape[0])
print("Number of samples in the test set:", X_test.shape[0])
print("\nNumber of night instances in test:", np.sum(Y_test==-1))
print("Number of day instances in test:", np.sum(Y_test==1))

# standardize the input matrix
# the transformation is computed on training data and then used on all the 3 sets
scaler = preprocessing.StandardScaler().fit(X_training) 

np.set_printoptions(suppress=True) # sets to zero floating point numbers < min_float_eps
X_training =  # ADD YOUR CODE HERE
print ("Mean of the training input data:", X_training.mean(axis=0))
print ("Std of the training input data:",X_training.std(axis=0))

X_test =  # ADD YOUR CODE HERE
print ("Mean of the test input data:", X_test.mean(axis=0))
print ("Std of the test input data:", X_test.std(axis=0))

# convert to homogeneous coordinates
X_training, X_test = to_homogeneous(X_training, X_test)
print("Training set in homogeneous coordinates:")
print(X_training[:10])

SyntaxError: invalid syntax (561101725.py, line 2)

To define a logistic regression model in Scikit-learn use the instruction

$linear\_model.LogisticRegression(C=1e5)$

($C$ is a parameter related to *regularization*, a technique that
we will see later in the course. Setting it to a high value is almost
as ignoring regularization, so the instruction above corresponds to the
logistic regression you have seen in class.)

To learn the model you need to use the $fit(...)$ instruction and to predict you need to use the $predict(...)$ function. <br>
See the Scikit-learn documentation for how to use it [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).

**TO DO (A.2.2):** **Define** the **logistic regression** model, then **learn** the model using **the training set** and **predict** on the **test set**. Then **print** the **fraction of samples misclassified** in the training set and in the test set.

In [None]:
# part on logistic regression for 2 classes
logreg = # ADD YOUR CODE HERE  # C should be very large to ignore regularization (see above)

# learn from training set: hint use fit(...)
# ADD YOUR CODE HERE
print("Intercept:" , logreg.intercept_)
print("Coefficients:" , logreg.coef_)

# predict on training set
predicted_training = # ADD YOUR CODE HERE

# print the error rate = fraction of misclassified samples
error_count_training = (predicted_training != Y_training).sum()
error_rate_training = # ADD YOUR CODE HERE
print("Error rate on training set: "+str(error_rate_training))

# predict on test set
predicted_test = # ADD YOUR CODE HERE

#print the error rate = fraction of misclassified samples
error_count_test = (predicted_test != Y_test).sum()
error_rate_test = # ADD YOUR CODE HERE
print("Error rate on test set: " + str(error_rate_test))

**TO DO (A.2.3)** Now **pick two features** and restrict the dataset to include only two features, whose indices are specified in the $idx0$ and $idx1$ variables below. Then split into training and test.

In [None]:
# we define some additional variables:
#    feature_names  == name to display on the plots
#    feature_scales == scale (linear/log) to use for that metric
# this will be referred to in the following cell - to plot the values
feature_names  = ["UVA", "UVB", "Atm. Pressure"]
feature_scales = ["log", "log", "linear"]

# remember that we selected 3 sensors (uva, uvb, pressure), therefore
# to make the plot we need to reduce the data to 2D, so we choose two of them
# to do so change the following indices (between 0 and 2, inclusive)
idx0 = # ADD YOUR CODE HERE
idx1 = # ADD YOUR CODE HERE

X_reduced = X[:,[idx0, idx1]]

# re-initialize the dataset splits, with the reduced sets
X_training = # ADD YOUR CODE HERE
Y_training = # ADD YOUR CODE HERE

X_test = # ADD YOUR CODE HERE
Y_test = # ADD YOUR CODE HERE

Now learn a model using the training data and measure the performances.

In [None]:
# learning from training data
# ADD YOUR CODE HERE

# print the error rate = fraction of misclassified samples
error_rate_test = # ADD YOUR CODE HERE
error_rate_test = # ADD YOUR CODE HERE
print("Error rate on test set: " + str(error_rate_test))

**TO DO (A.Q3) [Answer the following]** <br>
Which features did you select and why? <br>
Compare the perfomance of the classifiers trained with every combination of two features with that of the baseline (which used all 3 features).

<div class="alert alert-block alert-info">
**ANSWER A.Q3**:<br>
answer here
</div>

If everything is ok, the code below uses the model in $logreg$ to plot the decision region for the two features chosen above, with colors denoting the predicted value. It also plots the points (with correct labels) in the training set. It makes a similar plot for the test set.

In [None]:
#firstly we import an additional module to suppress warnings
import warnings
warnings.filterwarnings("ignore")

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X_reduced[:, 0].min() - .5, X_reduced[:, 0].max() + .5
y_min, y_max = X_reduced[:, 1].min() - .5, X_reduced[:, 1].max() + .5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500), np.linspace(y_min, y_max, 500))

Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)

# shift all the values to positive, to avoid problems with log-log plots
shift = .001 - X_reduced.min(axis=0)

fig, axs = plt.subplots(1, 2, figsize=(8,3))
axs[0].pcolormesh(xx + shift[0], yy + shift[1], Z, cmap=plt.cm.Set3)
axs[1].pcolormesh(xx + shift[0], yy + shift[1], Z, cmap=plt.cm.Set3)

# Plot also the training points
axs[0].scatter(X_training[:, 0] + shift[0], X_training[:, 1] + shift[1], c=Y_training, edgecolors='k', cmap=plt.cm.Set3)

axs[0].set_xlabel(feature_names[idx0])
axs[0].set_ylabel(feature_names[idx1])
axs[0].set_xscale(feature_scales[idx0])
axs[0].set_yscale(feature_scales[idx1])

axs[0].set_xlim(xx.min(), xx.max())
axs[0].set_ylim(yy.min(), yy.max())
axs[0].set_xticks(())
axs[0].set_yticks(())
axs[0].set_title('Training set')

# Plot also the test points 
axs[1].scatter(X_test[:, 0] + shift[0], X_test[:, 1] + shift[1], c=Y_test, edgecolors='k', cmap=plt.cm.Set3, marker='s')
axs[1].set_xlabel(feature_names[idx0])
axs[1].set_ylabel(feature_names[idx1])
axs[1].set_xscale(feature_scales[idx0])
axs[1].set_yscale(feature_scales[idx1])

axs[1].set_xlim(xx.min(), xx.max())
axs[1].set_ylim(yy.min(), yy.max())
axs[1].set_xticks(())
axs[1].set_yticks(())
axs[1].set_title('Test set')

plt.show()

# finally turn the warnings back on
warnings.filterwarnings("default")

# B) Linear Regression on the Atmospheric Conditions

As before, these **samples** were **extracted from** a Luxottica **I-SEE** device. <br>
For the **second part** of the laboratory we will **focus on linear regression**. We will try to estimate the **relative humidity** starting from **temperature** and **atmospheric pressure**.

First of all, we load again the dataset. Notice that the indices of the features were changed!

In [None]:
# Load the dataset
X, Y = load_dset('data/lux.csv', [0,1], 9, mode='reg')

**TO DO (B.1)**: split the data in training and test sets (70%-30%)

In [None]:
m = np.zeros((X.shape[0], X.shape[1]+1))   # riunisco X e Y in un unica matrice
for row in range(m.shape[0]):
    riga = list(X[row,:])
    riga.append(Y[row])
    m[row,:]=riga
print(m.shape)

dim_training = int(m.shape[0]*0.70//1)
dim_test = m.shape[0]-dim_training

m_training = m[:dim_training, :]

m_test = m[dim_training:, :]

X_training = X[:dim_training, :]

Y_training = Y[:dim_training]

X_test = X[dim_training:, :]

Y_test = Y[dim_training:]

# standardize the input matrix
# the transformation is computed on training data and then used on all the 3 sets
scaler = preprocessing.StandardScaler().fit(X_training) 

np.set_printoptions(suppress=True) # sets to zero floating point numbers < min_float_eps
X_training =  X[:dim_training, :]
print ("Mean of the training input data:", X_training.mean(axis=0))
print ("Std of the training input data:",X_training.std(axis=0))

X_test = X_test = X[dim_training:, :]
print ("Mean of the test input data:", X_test.mean(axis=0))
print ("Std of the test input data:", X_test.std(axis=0))


# Model Training 

The model is trained (= estimated) minimizing the empirical error
$$
L_S(h) := \frac{1}{m_t} \sum_{z_i \in S_{t}} \ell(h,z_i)
$$
When the loss function is the quadratic loss
$$
\ell(h,z) := (y - h(x))^2
$$
we define  the Residual Sum of Squares (RSS) as
$$
RSS(h):= \sum_{z_i \in S_{t}} \ell(h,z_i) = \sum_{z_i \in S_{t}} (y_i - h(x_i))^2
$$ so that the training error becomes
$$
L_S(h) = \frac{RSS(h)}{m_t}
$$

We recal that, for linear models we have $h(x) = <w,x>$ and the Empirical error $L_S(h)$ can be written
in terms of the vector of parameters $w$ in the form
$$
L_S(w) = \frac{1}{m_t} \|Y - X w\|^2
$$
where $Y$ and $X$ are the matrices whose $i-$th row are, respectively, the output data $y_i$ and the input vectors $x_i^\top$. 

**TO DO (B.2):** compute the linear regression coefficients using np.linalg.lstsq from scikitlearn

In [None]:
#add a 1 at the beginning of each sample for training, and testing (use homogeneous coordinates)
X_trainingH, X_testH =  to_homogeneous(X_training, X_test)
print("Training set in homogeneous coordinates:")
print(X_trainingH[:10])

In [None]:
X_trainingH_T = X_trainingH.transpose()
A = np.dot(X_trainingH_T, X_trainingH)
print(type(A))
print(A.shape)
B = np.dot(X_trainingH_T, Y_training)
print(type(B))
print(B.shape)


w_np, RSStr_np, rank_Xtr, sv_Xtr = np.linalg.lstsq(A,B, rcond=None)

print("LS coefficients with numpy lstsq:", w_np)

Y_trainingR = Y_training
RSStr_train = 0
for sample in range(X_trainingH.shape[0]):
    RSStr_train = RSStr_train + pow(Y_trainingR[sample]-np.dot(X_trainingH_T[:,sample], w_np),2)

print(RSStr_train, "    ", RSStr_np)

print("RSS with numpy lstsq: ", RSStr_np)
print("Empirical risk with numpy lstsq:", RSStr_np/m_training)

In [None]:
X_trainingH = np.array([[1,-1],[1,0],[1,1],[1,2]])
Y_training = np.array([[-1],[0],[1],[2]])

X_trainingH_T = X_trainingH.transpose()
A = np.dot(X_trainingH_T, X_trainingH)
B = np.dot(X_trainingH_T, Y_training)
print("A.shape",A.shape)
print("B.shape",B.shape)
print(X_trainingH_T.shape)

w_np, RSStr_np, rank_Xtr, sv_Xtr = np.linalg.lstsq(A,B, rcond=None)

print("LS coefficients with numpy lstsq:", w_np)
print(w_np.shape)
print(rank_Xtr)
print(RSStr_np)
print(RSStr_np.shape)

RSStr_train = 0
for sample in range(X_trainingH.shape[0]):
    print(pow(Y_trainingR[sample]-np.dot(X_trainingH_T[:,sample], w_np),2))
    RSStr_train = RSStr_train + pow(Y_trainingR[sample]-np.dot(X_trainingH_T[:,sample], w_np),2)
print("My RSS error:", RSStr_train)

## Data prediction 

Compute the output predictions on both training and validation set.and compute the Residual Sum of Sqaures (RSS). 

**TO DO (B.3)**: Compute these quantities on  training, validation and test sets.

In [None]:
#compute predictions on training and validation

#prediction_training 
prediction_training =  # ADD YOUR CODE HERE
prediction_test =  # ADD YOUR CODE HERE

#what about the loss for points in the test data?
RSS_test =  # ADD YOUR CODE HERE

print("RSS on test data:",  RSS_test)
print("Loss estimated from test data:", RSS_test/m_test)

**TO DO (B.Q1) [Answer the following]** <br>
Comment on the results you get and on the difference between the train and test errors.

<div class="alert alert-block alert-info">
**ANSWER B.Q1**:<br>
answer here
</div>

Now let's plot the data and check our estimation result.

In [None]:
fig = plt.figure(figsize=(10,18))

ax1 = fig.add_subplot(1, 2, 1, projection='3d')
ax1.scatter(X_training[:,0], X_training[:,1], Y_training, label="Input Data")
ax1.scatter(X_training[:,0], X_training[:,1], prediction_training, label="Prediction Data")
ax1.set_xlabel("Norm. Temperature")
ax1.set_ylabel("Norm. Relative Humidity")
ax1.set_zlabel("Norm. Atm. Pressure")
ax1.set_title("Training Set")
ax1.legend()

ax2 = fig.add_subplot(1, 2, 2, projection='3d')
ax2.scatter(X_test[:,0], X_test[:,1], Y_test, label="Input Data")
ax2.scatter(X_test[:,0], X_test[:,1], prediction_test, label="Prediction Data")
ax2.set_xlabel("Norm. Temperature")
ax2.set_ylabel("Norm. Relative Humidity")
ax2.set_zlabel("Norm. Atm. Pressure")
ax2.set_title("Test Set")
ax2.legend()
plt.show()

## Ordinary Least-Squares using scikit-learn
Another fast way to compute the LS estimate is through sklearn.linear_model

In [None]:
LinReg = linear_model.LinearRegression()  # build the object LinearRegression
LinReg.fit(X_training, Y_training)  # estimate the LS coefficients
print("Intercept:", LinReg.intercept_)
print("Least-Squares Coefficients:", LinReg.coef_)
prediction_training = LinReg.predict(X_training)  # predict output values on training set
prediction_test = LinReg.predict(X_test)  # predict output values on test set
print("Measure on training data:", 1-LinReg.score(X_training, Y_training))