# Data Loading and Preprocessing

We consider the same notebook used in the labs, containing house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015.

https://www.kaggle.com/harlfoxem/housesalesprediction

For each house we know 18 house features (e.g., number of bedrooms, number of bathrooms, etc.) plus its price, that is what we would like to predict.

## TO DO: Insert your ID number ("numero di matricola") below

In [1]:
#put here your ``numero di matricola''
numero_di_matricola = 2019136

Load the required packages

In [2]:
#import all packages needed

%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Read the data, remove data samples/points with missing values (NaN), and print some statistics.

In [3]:
#load the data
df = pd.read_csv('kc_house_data.csv', sep = ',')

#remove the data samples with missing values (NaN)
df = df.dropna() 

df.describe()

Unnamed: 0,id,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
count,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0
mean,4645240000.0,535435.8,3.381163,2.071903,2070.027813,15250.54,1.434893,0.009798,0.244311,3.459229,7.615676,1761.252212,308.775601,1967.489254,94.668774,98077.125158,47.557868,-122.212337,1982.544564,13176.302465
std,2854203000.0,380900.4,0.895472,0.768212,920.251879,42544.57,0.507792,0.098513,0.776298,0.682592,1.166324,815.934864,458.977904,28.095275,424.439427,54.172937,0.140789,0.139577,686.25667,25413.180755
min,1000102.0,75000.0,0.0,0.0,380.0,649.0,1.0,0.0,0.0,1.0,3.0,380.0,0.0,1900.0,0.0,98001.0,47.1775,-122.514,620.0,660.0
25%,2199775000.0,315000.0,3.0,1.5,1430.0,5453.75,1.0,0.0,0.0,3.0,7.0,1190.0,0.0,1950.0,0.0,98032.0,47.459575,-122.32425,1480.0,5429.5
50%,4027701000.0,445000.0,3.0,2.0,1910.0,8000.0,1.0,0.0,0.0,3.0,7.0,1545.0,0.0,1969.0,0.0,98059.0,47.5725,-122.226,1830.0,7873.0
75%,7358175000.0,640250.0,4.0,2.5,2500.0,11222.5,2.0,0.0,0.0,4.0,8.0,2150.0,600.0,1990.0,0.0,98117.0,47.68025,-122.124,2360.0,10408.25
max,9839301000.0,5350000.0,8.0,6.0,8010.0,1651359.0,3.5,1.0,4.0,5.0,12.0,6720.0,2620.0,2015.0,2015.0,98199.0,47.7776,-121.315,5790.0,425581.0


Get the feature matrix and the vector of target values. We want to predict the price by using features other than id as input.

In [4]:
Data = df.values
# m = number of input samples
m = Data.shape[0]
print("Amount of data:",m)
Y = Data[:m,2]
X = Data[:m,3:]

feature_names = df.columns[3:]

Amount of data: 3164


We split the $m$ samples of the data into 3 parts: one will be used for training and choosing the parameters, one for choosing among different models, and one for testing. The part for training and choosing the parameters will consist of $m_{train}=2/3 m$ samples, the one for choosing among different models will consist of $m_{val}= (m - m_{train})/2$ samples, while the other part consists of $m_{test}=m - m_{train} - m_{val}$ samples.

In [5]:
# Split data into train (2/3 of samples), validation (1/6 of samples), and test data (the rest)
m_train = int(2./3.*m)
m_val = int((m-m_train)/2.)
m_test = m - m_train - m_val
print("Amount of data for training and deciding parameters:",m_train)
print("Amount of data for validation (choosing among different models):",m_val)
print("Amount of data for test:",m_test)
from sklearn.model_selection import train_test_split

#Xtrain_and_val, Ytrain_and_val is the part of data for training and validation
#Xtest, Ytest is the part of data for testing
Xtrain_and_val, Xtest, Ytrain_and_val, Ytest = train_test_split(X, Y, test_size=m_test/m, random_state=numero_di_matricola)

#if you need to consider a specific training and validation split, use
#Xtrain, Ytrain for training and Xval, Yval for validation
Xtrain, Xval, Ytrain, Yval = train_test_split(Xtrain_and_val, Ytrain_and_val, test_size=m_val/(m_train+m_val), random_state=numero_di_matricola)

Amount of data for training and deciding parameters: 2109
Amount of data for validation (choosing among different models): 527
Amount of data for test: 528


Let's scale the data.

In [6]:
# Data pre-processing
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(Xtrain)
Xtrain_scaled = scaler.transform(Xtrain)
Xtrain_and_val_scaled = scaler.transform(Xtrain_and_val)
Xval_scaled = scaler.transform(Xval)
Xtest_scaled = scaler.transform(Xtest)

# Neural Networks

Let's learn the best neural network with 1 hidden layer and between 1 and 9 hidden nodes, choosing the best number of hidden nodes with cross-validation.

In [7]:
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import GridSearchCV

mlp_cv = MLPRegressor()
param_grid = {'hidden_layer_sizes': [i for i in range(1,10)],
              'activation': ['relu'],
              'solver': ['lbfgs'], 
              'random_state': [numero_di_matricola]
             }
mlp_GS = GridSearchCV(mlp_cv, param_grid=param_grid, 
                   cv=5, verbose=True)
mlp_GS.fit(Xtrain_and_val_scaled, Ytrain_and_val)

Fitting 5 folds for each of 9 candidates, totalling 45 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-lear

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

GridSearchCV(cv=5, estimator=MLPRegressor(),
             param_grid={'activation': ['relu'],
                         'hidden_layer_sizes': [1, 2, 3, 4, 5, 6, 7, 8, 9],
                         'random_state': [2019136], 'solver': ['lbfgs']},
             verbose=True)

Now let's check what is the best parameter, and compare the best NNs with the linear model (learned on train and validation) on test data.

In [8]:
#let's print the best model according to grid search
print("Best model: ",mlp_GS.best_estimator_)
#let's print the error 1-R^2 for the best model
print("Error (1-R^2) of best model: ",1. - mlp_GS.best_score_)

Best model:  MLPRegressor(hidden_layer_sizes=6, random_state=2019136, solver='lbfgs')
Error (1-R^2) of best model:  0.1762817590717386


Let's learn the best NN using all of training and validation, and then compare the error of the best NN on train and validation and on test data.

In [9]:
best_mlp = mlp_GS.best_estimator_.fit(Xtrain_and_val_scaled,Ytrain_and_val)

print("Error best model on train and validation: ",1. - best_mlp.score(Xtrain_and_val_scaled,Ytrain_and_val))
print("Error best model on test data: ",1. - best_mlp.score(Xtest_scaled,Ytest))

Error best model on train and validation:  0.10659657991835736
Error best model on test data:  0.20012710085660879


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


# Linear Regression

Now let's learn the linear model on train and validation, and get error (1-R^2) on train and validation and on test data.

In [10]:
from sklearn import linear_model
#LR the linear regression model
LR = linear_model.LinearRegression()

#fit the model on training data
LR.fit(Xtrain_and_val_scaled, Ytrain_and_val)

print("1 - coefficient of determination on training data: "+str(1 - LR.score(Xtrain_and_val_scaled,Ytrain_and_val)))
print("1 - coefficient of determination on test data: "+str(1 - LR.score(Xtest_scaled,Ytest)))

1 - coefficient of determination on training data: 0.27780071518451566
1 - coefficient of determination on test data: 0.3026710113890685


# k-Nearest Neighbours

You will now explore the k-Nearest Neighbours (kNN) method for regression. In order to do this, you will need to use load the scikit-learn package *neighbors.KNeighborsRegressor* 

k-Nearest Neighbours for regression works as follows: the predicted value $h(\textbf{x})$ for an instance $\textbf{x}$ is obtained by first finding the $\ell$ instances *in the training set* that are clostest to $\textbf{x}$; the predicted value $h(\textbf{x})$ is then the mean of the targets of such $\ell$ instances. $\ell$ is a parameter of the method. The targets of the $\ell$ instances used for prediction can be weighted by the (inverse of) their distance to $\textbf{x}$.

## TO DO: load the package for kNN regression, learn the model with default parameters using the training and validation scaled data, and print the error (1-R^2) on the data used to train the model and on the test data.

In [11]:
#TO DO: import package
from sklearn.neighbors import KNeighborsRegressor

#TO DO: learn model
neigh = KNeighborsRegressor()
neigh.fit(Xtrain_and_val_scaled, Ytrain_and_val)

print("Error on training and validation data: "+ str(1 - neigh.score(Xtrain_and_val_scaled,Ytrain_and_val)))
print("Error on test data: "+ str(1 - neigh.score(Xtest_scaled,Ytest)))

Error on training and validation data: 0.1628301248860493
Error on test data: 0.2520788915610599


## TO DO: repeat the point (including the printing instructions) above using the kNN version where points are weighted by the inverse of their distance 

In [12]:
neigh_dis = KNeighborsRegressor(weights='distance')
neigh_dis.fit(Xtrain_and_val_scaled,Ytrain_and_val)

print("Error on training and validation data: "+ str(1 - neigh_dis.score(Xtrain_and_val_scaled,Ytrain_and_val)))
print("Error on test data: "+ str(1 - neigh_dis.score(Xtest_scaled,Ytest)))

Error on training and validation data: 0.00048633506531048365
Error on test data: 0.24637334448919102


## TO DO: use cross validation to choose the best number of neighbours between 2 and 20)

In [13]:
from sklearn.model_selection import GridSearchCV

parameters = {'n_neighbors': list(range(2, 21))}

neigh_cv = KNeighborsRegressor(weights='distance')
clf = GridSearchCV(neigh_cv, parameters)
clf.fit(Xtrain_and_val_scaled, Ytrain_and_val)

GridSearchCV(estimator=KNeighborsRegressor(weights='distance'),
             param_grid={'n_neighbors': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
                                         14, 15, 16, 17, 18, 19, 20]})

## TO DO: print the best model according to cross validation above, and print the score of the best model 

In [14]:
#let's print the best model according to grid search
print("Best model: ", clf.best_estimator_)

#let's print the score for the best model
print("Score of best model: ", clf.best_score_)

Best model:  KNeighborsRegressor(n_neighbors=6, weights='distance')
Score of best model:  0.760421803842255


## TO DO: learn the best model on all of the training and validation scaled data, and print the error on training and validation scaled data, and on test scaled data

In [15]:
#TO DO: learn model
best_neigh_cv = clf.best_estimator_.fit(Xtrain_and_val_scaled, Ytrain_and_val)

print("Error best model on train and validation: ", str(1 - best_neigh_cv.score(Xtrain_and_val_scaled,Ytrain_and_val)))
print("Error best model on test data: ", str(1 - best_neigh_cv.score(Xtest_scaled,Ytest)))

Error best model on train and validation:  0.00048633506531048365
Error best model on test data:  0.23056311993059453


## TO DO: compare the error on test data of the best kNN model with the error on test data of linear regression and of NNs. Describe what you observe and give a potential explanation.
## [USE MAX 10 LINES]

The best kNN model (which in this case uses points weighted by the inverse of their distance) obtained an error of 0.23056311993059453, while the best linear regression model and the best NN model achieved an error of 0.3026710113890685 and 0.20012710085660879, respectively. Clearly the type of model which performed the best is the NN model; a possible explanation is given by the higher number of parameters used by NN's in comparison to the other models tested, which allow the NN to produce a more precise regression function.


# Clustering and "Local" Linear Models

You are now going to explore the use of clustering to identify groups of *similar* instances, and then learning models that are specific to each group.

Once you have clustered the data, and then learned a model for each cluster, the prediction for a new instance is obtained by using the model of the cluster that is the closest to the instance, where the distance of a cluster to the instance is defined as the distance of the *center* of the cluster to the instance.

**Note**: in this part you are not explicitely told which part of the data to use, deciding which one is the correct one is part of the homework!

## TO DO: use k-means in sklearn to learn a cluster with 5 clusters.

In [16]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=5, n_init=10, random_state=numero_di_matricola).fit(Xtrain_and_val_scaled)

## TO DO: for each cluster, learn a linear model using the elements of the cluster. For each model, print the error on the data used to learn it.

In [17]:
from sklearn import linear_model

indexes_train_and_val = kmeans.predict(Xtrain_and_val_scaled)
linear_models = np.empty(5, dtype=np.object)

for i in range(0,5):
    linear_models[i] = linear_model.LinearRegression().fit(Xtrain_and_val_scaled[np.where(indexes_train_and_val==i)], Ytrain_and_val[np.where(indexes_train_and_val==i)])
    print("Error on train and val data: ", str(1 - linear_models[i].score(Xtrain_and_val_scaled[np.where(indexes_train_and_val==i)], Ytrain_and_val[np.where(indexes_train_and_val==i)]))) 

Error on train and val data:  0.339098433006251
Error on train and val data:  0.3734286272457893
Error on train and val data:  0.34032092533358904
Error on train and val data:  0.17337145672487875
Error on train and val data:  0.049910027499693976


## TO DO: *compute* the error (1 - R^2) on the data not used to learn the models.
For each instance not used to learn the model, the prediction is done by:
- finding the cluster C whose center is the closest to the instance
- use the model learned for cluster C to make the prediction

In [18]:
from sklearn.metrics import r2_score

indexes_test = kmeans.predict(Xtest_scaled)
predicted_label_cluster_lm = []
Ytest_lm = []

for i in range(0,5):      
    Ytest_cluster_lm = Ytest[np.where(indexes_test==i)]
    predicted_label_cluster_lm = np.append(predicted_label_cluster_lm, linear_models[i].predict(Xtest_scaled[np.where(indexes_test==i)]))
    Ytest_lm = np.append(Ytest_lm ,Ytest_cluster_lm)
    
error_lm = r2_score(Ytest_lm, predicted_label_cluster_lm)

## TO DO: *print* the error (1-R^2) on the data not used to learn the models

In [19]:
print("Error on test data: ", str(1 - error_lm))

Error on test data:  0.29802837937405346


## TO DO: compare the error of the model "clustering + linear models" and of the linear model (see the beginning of the HW). Describe what you observe, and provide a possible explanation.
## [USE MAX 10 LINES]

The model "clustering + linear models" obtained a test error of 0.29802837937405346, while the best linear regression model, as already said, achieved a test error of 0.3026710113890685. By training the "clustering + linear models" architecture with data coming from only one cluster for each linear model, we obtain a better performance since the samples belonging to a cluster are somehow similar (in fact, k-means groups samples in clusters based on their similarity, thus clusters ideally contain samples very similar one with each other). What we do is basically a "specialization" of these linear models, after which we let them operate on data similar to what they used for their training, therefore obtaining a better approximation of the regression function for each cluster.

## TO DO: compare the error of the model "clustering + linear models" and of kNN. Describe what you observe, and provide a possible explanation.
## [USE MAX 10 LINES]

As previously stated, the model "clustering + linear models" obtained a test error of 0.29802837937405346, while the best kNN model achieved a test error of 0.23056311993059453. Therefore, i assume that the samples of our dataset might have a certain distribution on the feature space for which a kNN model is the best one we can use (there might be a distribution of the data that enlightens the division between samples of different clusters).

# Clustering and "Local" NNs

Repeat the same as above, but using neural networks instead of linear models.

**Note**: note that we are not telling you which parameters to use for NNs. You have to decide how to select the parameters.

## TO DO: clearly explain how you decided to set the parameters, motivating the choice of your strategy.

The setting of the parameters was done by means of a cross-validation aimed at choosing the best size of the hidden layer (between 1 and 10) for each one of the different five NN models trained on the elements of the corresponding cluster. In this way we can create the best NN for each different cluster.

## TO DO: repeat the analysis in part "Clustering and "Local" Linear Models" using NNs instead of linear models.

In [20]:
nn_models = np.empty(5, dtype=np.object)
error_train_and_val = np.empty(5, dtype=np.object)

for i in range(0,5):
    nn_cv = MLPRegressor()
    param_grid = {'hidden_layer_sizes': [i for i in range(1,10)],
                  'activation': ['relu'],
                  'solver': ['lbfgs'], 
                  'random_state': [numero_di_matricola]
                 }
    nn_GS = GridSearchCV(nn_cv, param_grid=param_grid, cv=5)
    nn_GS.fit(Xtrain_and_val_scaled[np.where(indexes_train_and_val==i)], Ytrain_and_val[np.where(indexes_train_and_val==i)])
    
    nn_models[i] = nn_GS.best_estimator_.fit(Xtrain_and_val_scaled[np.where(indexes_train_and_val==i)], Ytrain_and_val[np.where(indexes_train_and_val==i)]) 
    
    error_train_and_val[i] = 1 - nn_models[i].score(Xtrain_and_val_scaled[np.where(indexes_train_and_val==i)], Ytrain_and_val[np.where(indexes_train_and_val==i)])     

#################################################

predicted_label_cluster_nn = []
Ytest_nn = []
error_test = np.empty(5, dtype=np.object)

for i in range(0,5):        
    Ytest_cluster_nn = Ytest[np.where(indexes_test==i)]
    predicted_label_cluster_nn = np.append(predicted_label_cluster_nn, nn_models[i].predict(Xtest_scaled[np.where(indexes_test==i)]))
    Ytest_nn = np.append(Ytest_nn ,Ytest_cluster_nn)
    error_test[i] = 1 - nn_models[i].score(Xtest_scaled[np.where(indexes_test==i)], Ytest[np.where(indexes_test==i)])     
    
error_nn = r2_score(Ytest_nn, predicted_label_cluster_nn)
    
##################################################

for i in range(0,5):
    print("Error on train and val data of NN model trained on cluster (", str(i), "): ", str(error_train_and_val[i]))
    print("Error on test data of NN model trained on cluster (", str(i), "): ", str(error_test[i]))
    
print("\nError on test data: ", str(1 - error_nn))

labels_train_and_val, freqs_train_and_val = np.unique(indexes_train_and_val, return_counts = True)
print("\nLabels in training and validation data: ", labels_train_and_val)
print("Frequencies in training and validation data: ", freqs_train_and_val)

labels_test, freqs_test = np.unique(indexes_test, return_counts = True)
print("\nLabels in test data: ", labels_test)
print("Frequencies in test data: ", freqs_test)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

Error on train and val data of NN model trained on cluster ( 0 ):  0.2714586046153785
Error on test data of NN model trained on cluster ( 0 ):  0.4320466828647843
Error on train and val data of NN model trained on cluster ( 1 ):  0.3022403090226258
Error on test data of NN model trained on cluster ( 1 ):  0.30156918130398
Error on train and val data of NN model trained on cluster ( 2 ):  0.28970440622922977
Error on test data of NN model trained on cluster ( 2 ):  0.3207454269968282
Error on train and val data of NN model trained on cluster ( 3 ):  0.1406316654897194
Error on test data of NN model trained on cluster ( 3 ):  0.26085235871497736
Error on train and val data of NN model trained on cluster ( 4 ):  0.0015452300257530194
Error on test data of NN model trained on cluster ( 4 ):  1.1950331615933412

Error on test data:  0.4055246065907552

Labels in training and validation data:  [0 1 2 3 4]
Frequencies in training and validation data:  [647 935 908 120  26]

Labels in test dat

## TO DO: compare the error of the model "clustering + NNs" and of NNs (see the beginning of the HW). Describe what you observe, and provide a possible explanation.
## [USE MAX 10 LINES]

The "clustering + NNs" model achieved a mean test error across all five clusters of 0.4055246065907552, while the best NN model obtained a test error of 0.20012710085660879. Clearly the latter performed way better, in fact overfitting occurred in the NN model trained with the 5th cluster, caused by the low number of available samples for training. By the way, if we had to look at the test errors for each cluster of the "clustering + NNs" model, we can see that they are quite high, indicating that to properly train five different NNs on each of the five clusters we probably need more data (for each cluster) than what we have in our dataset.

## TO DO: compare the error of the model "clustering + NNs" and of kNN. Describe what you observe, and provide a possible explanation.
## [USE MAX 10 LINES]

As already said, the "clustering + NNs" model achieved a mean test error across all five clusters of 0.4055246065907552, while the best kNN model obtained a test error of 0.23056311993059453. As before, the main reason i can provide to justify these results is the lack of sufficient data to properly train a model like "clustering + NNs".

## TO DO: compare the error of the model "clustering + NNs" and of "clustering + Linear Models". Describe what you observe, and provide a possible explanation.
## [USE MAX 10 LINES]

As already said, the "clustering + NNs" model achieved a mean test error across all five clusters of 0.4055246065907552, while the "clustering + Linear Models" one obtained a mean test error across all five clusters of 0.29802837937405346. The former being the worst one, we can however notice that even the latter doesn't perform very well in comparison with some other architectures previously tested, indicating that in general these fusions of approaches, which are obviously more complex models that the other we confronted them with, are in need of more data than what we have available here to get properly trained and to provide an optimal performance.