# Data Loading and Preprocessing

We consider the same notebook used in the labs, containing house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015.

https://www.kaggle.com/harlfoxem/housesalesprediction

For each house we know 18 house features (e.g., number of bedrooms, number of bathrooms, etc.) plus its price, that is what we would like to predict.

## TO DO: Insert your ID number ("numero di matricola") below

In [1]:
#put here your ``numero di matricola''
numero_di_matricola = 2019296

Load the required packages

In [2]:
#import all packages needed
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Read the data, remove data samples/points with missing values (NaN), and print some statistics.

In [3]:
#load the data
df = pd.read_csv('kc_house_data.csv', sep = ',')

#remove the data samples with missing values (NaN)
df = df.dropna() 

df.describe()

Unnamed: 0,id,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
count,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0,3164.0
mean,4645240000.0,535435.8,3.381163,2.071903,2070.027813,15250.54,1.434893,0.009798,0.244311,3.459229,7.615676,1761.252212,308.775601,1967.489254,94.668774,98077.125158,47.557868,-122.212337,1982.544564,13176.302465
std,2854203000.0,380900.4,0.895472,0.768212,920.251879,42544.57,0.507792,0.098513,0.776298,0.682592,1.166324,815.934864,458.977904,28.095275,424.439427,54.172937,0.140789,0.139577,686.25667,25413.180755
min,1000102.0,75000.0,0.0,0.0,380.0,649.0,1.0,0.0,0.0,1.0,3.0,380.0,0.0,1900.0,0.0,98001.0,47.1775,-122.514,620.0,660.0
25%,2199775000.0,315000.0,3.0,1.5,1430.0,5453.75,1.0,0.0,0.0,3.0,7.0,1190.0,0.0,1950.0,0.0,98032.0,47.459575,-122.32425,1480.0,5429.5
50%,4027701000.0,445000.0,3.0,2.0,1910.0,8000.0,1.0,0.0,0.0,3.0,7.0,1545.0,0.0,1969.0,0.0,98059.0,47.5725,-122.226,1830.0,7873.0
75%,7358175000.0,640250.0,4.0,2.5,2500.0,11222.5,2.0,0.0,0.0,4.0,8.0,2150.0,600.0,1990.0,0.0,98117.0,47.68025,-122.124,2360.0,10408.25
max,9839301000.0,5350000.0,8.0,6.0,8010.0,1651359.0,3.5,1.0,4.0,5.0,12.0,6720.0,2620.0,2015.0,2015.0,98199.0,47.7776,-121.315,5790.0,425581.0


Get the feature matrix and the vector of target values. We want to predict the price by using features other than id as input.

In [4]:
Data = df.values
# m = number of input samples
m = Data.shape[0]
print("Amount of data:",m)
Y = Data[:m,2]
X = Data[:m,3:]

feature_names = df.columns[3:]

Amount of data: 3164


We split the $m$ samples of the data into 3 parts: one will be used for training and choosing the parameters, one for choosing among different models, and one for testing. The part for training and choosing the parameters will consist of $m_{train}=2/3 m$ samples, the one for choosing among different models will consist of $m_{val}= (m - m_{train})/2$ samples, while the other part consists of $m_{test}=m - m_{train} - m_{val}$ samples.

In [5]:
# Split data into train (2/3 of samples), validation (1/6 of samples), and test data (the rest)
m_train = int(2./3.*m)
m_val = int((m-m_train)/2.)
m_test = m - m_train - m_val
print("Amount of data for training and deciding parameters:",m_train)
print("Amount of data for validation (choosing among different models):",m_val)
print("Amount of data for test:",m_test)
from sklearn.model_selection import train_test_split

#Xtrain_and_val, Ytrain_and_val is the part of data for training and validation
#Xtest, Ytest is the part of data for testing
Xtrain_and_val, Xtest, Ytrain_and_val, Ytest = train_test_split(X, Y, test_size=m_test/m, random_state=numero_di_matricola)

#if you need to consider a specific training and validation split, use
#Xtrain, Ytrain for training and Xval, Yval for validation
Xtrain, Xval, Ytrain, Yval = train_test_split(Xtrain_and_val, Ytrain_and_val, test_size=m_val/(m_train+m_val), random_state=numero_di_matricola)

Amount of data for training and deciding parameters: 2109
Amount of data for validation (choosing among different models): 527
Amount of data for test: 528


Let's scale the data.

In [6]:
# Data pre-processing
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(Xtrain)
Xtrain_scaled = scaler.transform(Xtrain)
Xtrain_and_val_scaled = scaler.transform(Xtrain_and_val)
Xval_scaled = scaler.transform(Xval)
Xtest_scaled = scaler.transform(Xtest)

# Neural Networks

Let's learn the best neural network with 1 hidden layer and between 1 and 9 hidden nodes, choosing the best number of hidden nodes with cross-validation.

In [7]:
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import GridSearchCV

mlp_cv = MLPRegressor()
param_grid = {'hidden_layer_sizes': [i for i in range(1,10)],
              'activation': ['relu'],
              'solver': ['lbfgs'], 
              'random_state': [numero_di_matricola]
             }
mlp_GS = GridSearchCV(mlp_cv, param_grid=param_grid, 
                   cv=5, verbose=True)
mlp_GS.fit(Xtrain_and_val_scaled, Ytrain_and_val)

Fitting 5 folds for each of 9 candidates, totalling 45 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-lear

GridSearchCV(cv=5, estimator=MLPRegressor(),
             param_grid={'activation': ['relu'],
                         'hidden_layer_sizes': [1, 2, 3, 4, 5, 6, 7, 8, 9],
                         'random_state': [2019296], 'solver': ['lbfgs']},
             verbose=True)

Now let's check what is the best parameter, and compare the best NNs with the linear model (learned on train and validation) on test data.

In [8]:
#let's print the best model according to grid search
print("Best model: ",mlp_GS.best_estimator_)
#let's print the error 1-R^2 for the best model
print("Error (1-R^2) of best model: ",1. - mlp_GS.best_score_)

Best model:  MLPRegressor(hidden_layer_sizes=9, random_state=2019296, solver='lbfgs')
Error (1-R^2) of best model:  0.18669712680190176


Let's learn the best NN using all of training and validation, and then compare the error of the best NN on train and validation and on test data.

In [11]:
best_mlp = MLPRegressor(hidden_layer_sizes=(9,), activation='relu', solver='lbfgs', random_state = numero_di_matricola)
best_mlp.fit(Xtrain_and_val_scaled,Ytrain_and_val)

print("Error best model on train and validation: ",1. - best_mlp.score(Xtrain_and_val_scaled,Ytrain_and_val))
print("Error best model on test data: ",1. - best_mlp.score(Xtest_scaled,Ytest))

Error best model on train and validation:  0.10624269122367258
Error best model on test data:  0.1582682970296665


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


# Linear Regression

Now let's learn the linear model on train and validation, and get error (1-R^2) on train and validation and on test data.

In [12]:
from sklearn import linear_model
#LR the linear regression model
LR = linear_model.LinearRegression()

#fit the model on training data
LR.fit(Xtrain_and_val_scaled, Ytrain_and_val)

print("1 - coefficient of determination on training data:"+str(1 - LR.score(Xtrain_and_val_scaled,Ytrain_and_val)))
print("1 - coefficient of determination on test data:"+str(1 - LR.score(Xtest_scaled,Ytest)))

1 - coefficient of determination on training data:0.27231941599689335
1 - coefficient of determination on test data:0.3207570465807096


# k-Nearest Neighbours

You will now explore the k-Nearest Neighbours (kNN) method for regression. In order to do this, you will need to use load the scikit-learn package *neighbors.KNeighborsRegressor* 

k-Nearest Neighbours for regression works as follows: the predicted value $h(\textbf{x})$ for an instance $\textbf{x}$ is obtained by first finding the $\ell$ instances *in the training set* that are clostest to $\textbf{x}$; the predicted value $h(\textbf{x})$ is then the mean of the targets of such $\ell$ instances. $\ell$ is a parameter of the method. The targets of the $\ell$ instances used for prediction can be weighted by the (inverse of) their distance to $\textbf{x}$.

## TO DO: load the package for kNN regression, learn the model with default parameters using the training and validation scaled data, and print the error (1-R^2) on the data used to train the model and on the test data.

In [13]:
#TO DO: import package
from sklearn.neighbors import KNeighborsRegressor

#TO DO: learn model
neigh = KNeighborsRegressor()
neigh.fit(Xtrain_and_val_scaled, Ytrain_and_val)

print("Error on training and validation data:"+str(1 - neigh.score(Xtrain_and_val_scaled, Ytrain_and_val)))
print("Error on test data:"+str(1 - neigh.score(Xtest_scaled,Ytest)))

Error on training and validation data:0.15155035705730469
Error on test data:0.30508836334396894


## TO DO: repeat the point (including the printing instructions) above using the kNN version where points are weighted by the inverse of their distance 

In [14]:
neigh = KNeighborsRegressor(weights='distance')
neigh.fit(Xtrain_and_val_scaled, Ytrain_and_val)

print("Error on training and validation data:"+str(1 - neigh.score(Xtrain_and_val_scaled, Ytrain_and_val)))
print("Error on test data:"+str(1 - neigh.score(Xtest_scaled,Ytest)))

Error on training and validation data:0.0006819261974133628
Error on test data:0.2985873511140369


## TO DO: use cross validation to choose the best number of neighbours between 2 and 20)

In [15]:
mlp_cv_kNN = KNeighborsRegressor()
param_grid_kNN = {'n_neighbors': [i for i in range(2,20)],
              'weights': ['uniform','distance'],
             }
mlp_GS_kNN = GridSearchCV(mlp_cv_kNN, param_grid=param_grid_kNN, 
                   cv=5)
mlp_GS_kNN.fit(Xtrain_and_val_scaled, Ytrain_and_val)

GridSearchCV(cv=5, estimator=KNeighborsRegressor(),
             param_grid={'n_neighbors': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
                                         14, 15, 16, 17, 18, 19],
                         'weights': ['uniform', 'distance']})

## TO DO: print the best model according to cross validation above, and print the score of the best model 

In [16]:
#let's print the best model according to grid search
print("Best model: ", mlp_GS_kNN.best_estimator_)
#let's print the error 1-R^2 for the best model
print("Score of best model: ", 1 - mlp_GS_kNN.best_score_)

Best model:  KNeighborsRegressor(n_neighbors=7, weights='distance')
Score of best model:  0.22498547697709503


## TO DO: learn the best model on all of the training and validation scaled data, and print the error on training and validation scaled data, and on test scaled data

In [17]:
#TO DO: learn model
best_neigh = KNeighborsRegressor(n_neighbors=7, weights= 'distance')
best_neigh.fit(Xtrain_and_val_scaled, Ytrain_and_val)

print("Error best model on train and validation: "+str(1 - best_neigh.score(Xtrain_and_val_scaled, Ytrain_and_val)))
print("Error best model on test data: "+str(1 - best_neigh.score(Xtest_scaled,Ytest)))

Error best model on train and validation: 0.0006819261974133628
Error best model on test data: 0.29458714501200967


## TO DO: compare the error on test data of the best kNN model with the error on test data of linear regression and of NNs. Describe what you observe and give a potential explanation.
## [USE MAX 10 LINES]

The error on test data show that the best result is obtained by the NN model. That is not surprising since NNs are very powerful models.Then kNN and linear regression has similar values on the error, the models are not very precise. For the kNN model, in particular, the error on train and validation is very low instead of the test error; that can be a problem of overfitting. Another potential explanation can be that since kNN is a very "sensitive" model it may requires a very "precise" data set with small number of outliers or no missing data. Linear regression provides the worst results in both training and validation and test error. LR is a simple model and computationally efficient, but in complex analysis can be too simplistic, and, as we can see here, it provides bad results.

# Clustering and "Local" Linear Models

You are now going to explore the use of clustering to identify groups of *similar* instances, and then learning models that are specific to each group.

Once you have clustered the data, and then learned a model for each cluster, the prediction for a new instance is obtained by using the model of the cluster that is the closest to the instance, where the distance of a cluster to the instance is defined as the distance of the *center* of the cluster to the instance.

**Note**: in this part you are not explicitely told which part of the data to use, deciding which one is the correct one is part of the homework!

## TO DO: use k-means in sklearn to learn a cluster with 5 clusters.

In [18]:
from sklearn.cluster import KMeans

kmeans =  KMeans(n_clusters=5, random_state = numero_di_matricola)
kmeans.fit(Xtrain_and_val_scaled)

KMeans(n_clusters=5, random_state=2019296)

## TO DO: for each cluster, learn a linear model using the elements of the cluster. For each model, print the error on the data used to learn it.

In [19]:
#To learn the linear models I divide the elements of each cluster and the respective label.

cl1_i = []
cl2_i = []
cl3_i = []
cl4_i = []
cl5_i = []
for i in range(0,len(kmeans.labels_)):
    if kmeans.predict(Xtrain_and_val_scaled)[i]==0:
        cl1_i.append(i)
        cl1 = list(Xtrain_and_val_scaled[cl1_i])
        y_cl1 = list(Ytrain_and_val[cl1_i])
    elif kmeans.predict(Xtrain_and_val_scaled)[i]==1:
        cl2_i.append(i)
        cl2 = list(Xtrain_and_val_scaled[cl2_i])
        y_cl2 = list(Ytrain_and_val[cl2_i])
    elif kmeans.predict(Xtrain_and_val_scaled)[i]==2:
        cl3_i.append(i)
        cl3 = list(Xtrain_and_val_scaled[cl3_i])
        y_cl3 = list(Ytrain_and_val[cl3_i])
    elif kmeans.predict(Xtrain_and_val_scaled)[i]==3:
        cl4_i.append(i)
        cl4 = list(Xtrain_and_val_scaled[cl4_i])
        y_cl4 = list(Ytrain_and_val[cl4_i])
    elif kmeans.predict(Xtrain_and_val_scaled)[i]==4:
        cl5_i.append(i)
        cl5 = list(Xtrain_and_val_scaled[cl5_i])
        y_cl5 = list(Ytrain_and_val[cl5_i])
        


#I fit the five models
LR_Cl1 = linear_model.LinearRegression().fit(cl1, y_cl1)
LR_Cl2 = linear_model.LinearRegression().fit(cl2, y_cl2)
LR_Cl3 = linear_model.LinearRegression().fit(cl3, y_cl3)
LR_Cl4 = linear_model.LinearRegression().fit(cl4, y_cl4)
LR_Cl5 = linear_model.LinearRegression().fit(cl5, y_cl5)


print("1 - coefficient of determination on training data for cl1:"+str(1 - LR_Cl1.score(cl1, y_cl1)))
print("1 - coefficient of determination on training data for cl2:"+str(1 - LR_Cl2.score(cl2, y_cl2)))
print("1 - coefficient of determination on training data for cl3:"+str(1 - LR_Cl3.score(cl3, y_cl3)))
print("1 - coefficient of determination on training data for cl4:"+str(1 - LR_Cl4.score(cl4, y_cl4)))
print("1 - coefficient of determination on training data for cl5:"+str(1 - LR_Cl5.score(cl5, y_cl5)))

1 - coefficient of determination on training data for cl1:0.34209117027722935
1 - coefficient of determination on training data for cl2:0.3321681868190285
1 - coefficient of determination on training data for cl3:0.22987056395499317
1 - coefficient of determination on training data for cl4:0.09407429231926945
1 - coefficient of determination on training data for cl5:0.35851920310574426


## TO DO: *compute* the error (1 - R^2) on the data not used to learn the models.
For each instance not used to learn the model, the prediction is done by:
- finding the cluster C whose center is the closest to the instance
- use the model learned for cluster C to make the prediction

In [20]:
#using transform() from kmeans it returns the distance from all the centers.
kmeans.transform(Xtest_scaled)
lab_idx =np.argmin(kmeans.transform(Xtest_scaled), axis=1)
tcl1_i = []
tcl2_i = []
tcl3_i = []
tcl4_i = []
tcl5_i = []
for i in range(0,len(Xtest_scaled)):
    if lab_idx[i]==0:
        tcl1_i.append(i)
        cl1t = list(Xtest_scaled[tcl1_i])
        y_cl1t = list(Ytest[tcl1_i])
    elif lab_idx[i]==1:
        tcl2_i.append(i)
        cl2t = list(Xtest_scaled[tcl2_i])
        y_cl2t = list(Ytest[tcl2_i])
    elif lab_idx[i]==2:
        tcl3_i.append(i)
        cl3t = list(Xtest_scaled[tcl3_i])
        y_cl3t = list(Ytest[tcl3_i])
    elif lab_idx[i]==3:
        tcl4_i.append(i)
        cl4t = list(Xtest_scaled[tcl4_i])
        y_cl4t = list(Ytest[tcl4_i])
    elif lab_idx[i]==4:
        tcl5_i.append(i)
        cl5t = list(Xtest_scaled[tcl5_i])
        y_cl5t = list(Ytest[tcl5_i])
#Now I can make the prediction using the appropriate model:
predC1 = LR_Cl1.predict(cl1t)
predC2 = LR_Cl2.predict(cl2t)
predC3 = LR_Cl3.predict(cl3t)
predC4 = LR_Cl4.predict(cl4t)
predC5 = LR_Cl5.predict(cl5t)

pred_test = np.concatenate((predC1,predC2,predC3,predC4,predC5))
truelab = np.concatenate((y_cl1t,y_cl2t,y_cl3t,y_cl4t,y_cl5t))

#Finally I can compute the R^2 score
from sklearn.metrics import r2_score
r2error = r2_score(truelab,pred_test)

## TO DO: *print* the error (1-R^2) on the data not used to learn the models

In [21]:
print("1 - coefficient of determination on the data not used to learn the model:"+str(1 - r2error))

1 - coefficient of determination on the data not used to learn the model:0.216981135776511


## TO DO: compare the error of the model "clustering + linear models" and of the linear model (see the beginning of the HW). Describe what you observe, and provide a possible explanation.
## [USE MAX 10 LINES]

Using clustering + linear models I have obtained an improvement on the error results, expecially on the test data with respect to linear regression. A possibile explanation is that clustering the data, i.e. group a set of objects such that similar objects ends up in the same group and dissimilar objects sare separated into different groups, helped to simplify the models that linear regression had to learn. In fact, as I wrote before (in kNN answer) linear regression has limits about the complexity of the model to learn, but in this case clustering helped to overwhelm them, producing better resuls then linear regression.

## TO DO: compare the error of the model "clustering + linear models" and of kNN. Describe what you observe, and provide a possible explanation.
## [USE MAX 10 LINES]

In this case we have opposite situations: for kNN I have obtained a very good result on the training and validation set but for the test set the result was not good; for the clustering + linear models I have obtained the opposite. A possibile explanation is that for both model the training data were too much. In particular for kNN I have a possibile overfitting. For clustering + linear model probably I have too many features that instead of help, they damage the model. Maybe a solution will be to use only the features that are more correlated with the price of the houses in order to simplify the analysis.

# Clustering and "Local" NNs

Repeat the same as above, but using neural networks instead of linear models.

**Note**: note that we are not telling you which parameters to use for NNs. You have to decide how to select the parameters.

## TO DO: clearly explain how you decided to set the parameters, motivating the choice of your strategy.

I used grid search CV to find the best parameters for each cluster. I tried hidden_layer_sizes from 1 to 50, with activation 'relu' and solver 'lbfgs' and 'adam'. The results I have obtained are:
Cluster 1 (label 0) : hidden_layer_sizes = 34, activation 'relu' and solver 'lbfgs';
Cluster 2 (label 1) : hidden_layer_sizes = 4, activation 'relu' and solver 'lbfgs';
Cluster 3 (label 2) : hidden_layer_sizes = 18, activation 'relu' and solver 'lbfgs';
Cluster 4 (label 3) : hidden_layer_sizes = 2, activation 'relu' and solver 'lbfgs';
Cluster 5 (label 4) : hidden_layer_sizes = 14, activation 'relu' and solver 'lbfgs';

## TO DO: repeat the analysis in part "Clustering and "Local" Linear Models" using NNs instead of linear models.

In [22]:
#I fit the five models using the best parameters founded with grid search CV.
best_mlp_cl1 = MLPRegressor(hidden_layer_sizes=(34,), activation='relu', solver='lbfgs', random_state = numero_di_matricola).fit(cl1, y_cl1)
best_mlp_cl2 = MLPRegressor(hidden_layer_sizes=(4,), activation='relu', solver='lbfgs', random_state = numero_di_matricola).fit(cl2, y_cl2)
best_mlp_cl3 = MLPRegressor(hidden_layer_sizes=(18,), activation='relu', solver='lbfgs', random_state = numero_di_matricola).fit(cl3, y_cl3)
best_mlp_cl4 = MLPRegressor(hidden_layer_sizes=(2,), activation='relu', solver='lbfgs', random_state = numero_di_matricola).fit(cl4, y_cl4)
best_mlp_cl5 = MLPRegressor(hidden_layer_sizes=(14,), activation='relu', solver='lbfgs', random_state = numero_di_matricola).fit(cl5, y_cl5)

print("Error of best model for cluster 1:"+str(1 - best_mlp_cl1.score(cl1, y_cl1)))
print("Error of best model for cluster 2:"+str(1 - best_mlp_cl2.score(cl2, y_cl2)))
print("Error of best model for cluster 3:"+str(1 - best_mlp_cl3.score(cl3, y_cl3)))
print("Error of best model for cluster 4:"+str(1 - best_mlp_cl4.score(cl4, y_cl4)))
print("Error of best model for cluster 5:"+str(1 - best_mlp_cl5.score(cl5, y_cl5)))

mlp_predC1 = best_mlp_cl1.predict(cl1t)
mlp_predC2 = best_mlp_cl2.predict(cl2t)
mlp_predC3 = best_mlp_cl3.predict(cl3t)
mlp_predC4 = best_mlp_cl4.predict(cl4t)
mlp_predC5 = best_mlp_cl5.predict(cl5t)

mlp_pred_test = np.concatenate((mlp_predC1,mlp_predC2,mlp_predC3,mlp_predC4,mlp_predC5))
mlp_truelab = np.concatenate((y_cl1t,y_cl2t,y_cl3t,y_cl4t,y_cl5t))

#Finally I can compute the R^2 score
from sklearn.metrics import r2_score
r2error = r2_score(mlp_truelab,mlp_pred_test)
print("Error on the data not used to learn the model:"+str(1 - r2error))

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


Error of best model for cluster 1:0.19731398468779693
Error of best model for cluster 2:0.25304286905539275
Error of best model for cluster 3:0.14236998171344473
Error of best model for cluster 4:0.09407429261210098
Error of best model for cluster 5:0.261371722470109
Error on the data not used to learn the model:0.20533294884865916


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


## TO DO: compare the error of the model "clustering + NNs" and of NNs (see the beginning of the HW). Describe what you observe, and provide a possible explanation.
## [USE MAX 10 LINES]

In this case, using clustering + NNs, I have obtained worst results than the NNs only. Since NNs are a very powerful models a possibile explanation is that the clusters are not good enough. There are many variables that were not taken into account in my analysis, like the number of clusters equal to 5. Perhaps deeper analysis might say that the optimal number is different. Also the number of featured and the number of samples for the training and test can be a problem of those results. 

## TO DO: compare the error of the model "clustering + NNs" and of kNN. Describe what you observe, and provide a possible explanation.
## [USE MAX 10 LINES]

The error of the model kNN has a good result on training and validation instead of the error for the test data. For the model clustering + NNs I did not find optimal values, neither in training nor in test, for the error. If I have to choose one I woud pick clustering + NNs because the error on the test data is foundamental for choosing a model in real machine learning analisys. 

## TO DO: compare the error of the model "clustering + NNs" and of "clustering + Linear Models". Describe what you observe, and provide a possible explanation.
## [USE MAX 10 LINES]

The error of the model clustering + NNs and clustering + linear model are very close if we see it on the test data. For the training data NNs performs better, as one can expect. In this case the errors have close values, instead of using NNs and linear regression(without clustering), where NNs clearly exceeded the results of linear regression model. A possible explanation could be that the cluster analysis I have performed "facilitated" linear models instead of NNs. Probably using different techiques (as I have proposed on last aswers) could led to different results, expecially for NNs.