In [1]:
# initial setup
%run "../../../common/0_notebooks_base_setup.py"


/Users/csuarezgurruchaga/Desktop/Digital-House/CLASE_23/dsad_2021/common
default checking
Running command `conda list`... ok
jupyterlab=2.2.6 already installed
pandas=1.1.5 already installed
bokeh=2.2.3 already installed
seaborn=0.11.0 already installed
matplotlib=3.3.2 already installed
ipywidgets=7.5.1 already installed
pytest=6.2.1 already installed
chardet=4.0.0 already installed
psutil=5.7.2 already installed
scipy=1.5.2 already installed
statsmodels=0.12.1 already installed
scikit-learn=0.23.2 already installed
xlrd=2.0.1 already installed
Running command `conda install --yes nltk=3.5.0`... ok
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.


unidecode=1.1.1 already installed
pydotplus=2.0.2 already installed
pandas-datareader=0.9.0 already installed
flask=1.1.2 already installed


[<img src="https://www.digitalhouse.com/ar/logo-DH.png" width="400" height="200" align='right'>](http://digitalhouse.com.ar/)

# Regularización

### Nota:

En este ejercicio vamos a escalar las features del dataset usando `MinMaxScaler` con el objetivo de que tengan un ejercicio resuelto de ejemplo con una alternativa a `StandardScaler`, no porque consideremos que en este problema `MinMaxScaler` resulte en una mejor performance que `StandardScaler`.

---

Aunque la normalización a través de min-max es una técnica de uso común que es útil cuando necesitamos valores en un intervalo acotado, la estandarización puede ser más práctica para muchos algoritmos de aprendizaje automático. 

La razón es que muchos modelos lineales inicializan las ponderaciones en O o valores aleatorios pequeños cercanos a 0.

Usando la estandarización centramos las columnas de features en la media 0 con el desvío estándar 1, así las columnas de features adoptan la forma de una distribución normal, lo que facilita el aprendizaje de los pesos. 

Además, la estandarización mantiene información útil sobre los valores atípicos y hace que el algoritmo sea menos sensible a ellos en contraste con el escalado min-max, que escala los datos a un rango limitado de valores.


## Imports

In [2]:
import pandas as pd
import numpy as np

import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn import linear_model
from sklearn import metrics

import statsmodels.api as sm
from statsmodels.tools import eval_measures


## Dataset

Este dataset contiene los precios y otros atributos de casi 54.000 diamantes.

Sus features son:

* **price**: price in US dollars (\$326--\$18,823).  **Esta es la variable target**.

* carat: weight of the diamond (0.2--5.01)

* cut: quality of the cut (Fair, Good, Very Good, Premium, Ideal)

* color: diamond colour, from J (worst) to D (best)

* clarity: a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))

* x: length in mm (0--10.74)

* y: width in mm (0--58.9)

* z: depth in mm (0--31.8)

* depth: total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)

* table: width of top of diamond relative to widest point (43--95)

Fuente: https://www.kaggle.com/shivam2503/diamonds

## Leemos los datos

In [3]:
data = pd.read_csv('../Data/diamonds.csv')
data.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


In [4]:
data.shape

(53940, 10)

## Ejercicio 1

Normalicemos las features y creemos las variables dummies necesarias para poder entrenar un modelo de regresión para predecir el valor de `price` para cada registro

https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features

In [5]:
categoricals = ['cut', 'color', 'clarity']

enc = OneHotEncoder(drop='first')
X = data[categoricals]
enc.fit(X)
enc.categories_

[array(['Fair', 'Good', 'Ideal', 'Premium', 'Very Good'], dtype=object),
 array(['D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype=object),
 array(['I1', 'IF', 'SI1', 'SI2', 'VS1', 'VS2', 'VVS1', 'VVS2'],
       dtype=object)]

In [6]:
dummies = enc.transform(X).toarray()
dummies

array([[0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.]])

In [7]:
dummies.shape

(53940, 17)

In [8]:
dummies_df = pd.DataFrame(dummies)
dummies_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53935,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
53936,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
53937,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
53938,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0


In [9]:
col_names = [categoricals[i] + '_' + enc.categories_[i] for i in range(len(categoricals)) ]

col_names

[array(['cut_Fair', 'cut_Good', 'cut_Ideal', 'cut_Premium',
        'cut_Very Good'], dtype=object),
 array(['color_D', 'color_E', 'color_F', 'color_G', 'color_H', 'color_I',
        'color_J'], dtype=object),
 array(['clarity_I1', 'clarity_IF', 'clarity_SI1', 'clarity_SI2',
        'clarity_VS1', 'clarity_VS2', 'clarity_VVS1', 'clarity_VVS2'],
       dtype=object)]

In [10]:
col_names_drop_first = [sublist[i] for sublist in col_names for i in range(len(sublist)) if i != 0]
col_names_drop_first

['cut_Good',
 'cut_Ideal',
 'cut_Premium',
 'cut_Very Good',
 'color_E',
 'color_F',
 'color_G',
 'color_H',
 'color_I',
 'color_J',
 'clarity_IF',
 'clarity_SI1',
 'clarity_SI2',
 'clarity_VS1',
 'clarity_VS2',
 'clarity_VVS1',
 'clarity_VVS2']

In [11]:
dummies_df.columns = col_names_drop_first
dummies_df

Unnamed: 0,cut_Good,cut_Ideal,cut_Premium,cut_Very Good,color_E,color_F,color_G,color_H,color_I,color_J,clarity_IF,clarity_SI1,clarity_SI2,clarity_VS1,clarity_VS2,clarity_VVS1,clarity_VVS2
0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53935,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
53936,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
53937,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
53938,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0


Otra opción es usar `get_dummies`

https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html

In [12]:
pd.get_dummies(X, drop_first = True)

Unnamed: 0,cut_Good,cut_Ideal,cut_Premium,cut_Very Good,color_E,color_F,color_G,color_H,color_I,color_J,clarity_IF,clarity_SI1,clarity_SI2,clarity_VS1,clarity_VS2,clarity_VVS1,clarity_VVS2
0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0
1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0
2,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0
3,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0
4,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53935,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
53936,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
53937,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0
53938,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0


Ahora estandarizamos las features numéricas:

`carat` `depth` `table` `x` `y` `z`

usando  `MinMaxScaler`

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler


In [13]:
numericals = ['carat', 'depth', 'table', 'x', 'y', 'z']

X = data[numericals]

scaler = MinMaxScaler()
scaler.fit(X)

std_numerical_data = scaler.transform(X)
std_df = pd.DataFrame(std_numerical_data)
std_df.columns = [i + '_std' for i in numericals]
std_df

Unnamed: 0,carat_std,depth_std,table_std,x_std,y_std,z_std
0,0.006237,0.513889,0.230769,0.367784,0.067572,0.076415
1,0.002079,0.466667,0.346154,0.362197,0.065195,0.072642
2,0.006237,0.386111,0.423077,0.377095,0.069100,0.072642
3,0.018711,0.538889,0.288462,0.391061,0.071817,0.082704
4,0.022869,0.563889,0.288462,0.404097,0.073854,0.086478
...,...,...,...,...,...,...
53935,0.108108,0.494444,0.269231,0.535382,0.097793,0.110063
53936,0.108108,0.558333,0.230769,0.529795,0.097623,0.113522
53937,0.103950,0.550000,0.326923,0.527002,0.096435,0.111950
53938,0.137214,0.500000,0.288462,0.572626,0.103905,0.117610


Entonces nuestro dataset de features serán las variables dummies y las variables numéricas estandarizadas

In [14]:
X = pd.concat([dummies_df, std_df], axis = 1)
X

Unnamed: 0,cut_Good,cut_Ideal,cut_Premium,cut_Very Good,color_E,color_F,color_G,color_H,color_I,color_J,...,clarity_VS1,clarity_VS2,clarity_VVS1,clarity_VVS2,carat_std,depth_std,table_std,x_std,y_std,z_std
0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.006237,0.513889,0.230769,0.367784,0.067572,0.076415
1,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.002079,0.466667,0.346154,0.362197,0.065195,0.072642
2,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.006237,0.386111,0.423077,0.377095,0.069100,0.072642
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,1.0,0.0,0.0,0.018711,0.538889,0.288462,0.391061,0.071817,0.082704
4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.022869,0.563889,0.288462,0.404097,0.073854,0.086478
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53935,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.108108,0.494444,0.269231,0.535382,0.097793,0.110063
53936,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.108108,0.558333,0.230769,0.529795,0.097623,0.113522
53937,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.103950,0.550000,0.326923,0.527002,0.096435,0.111950
53938,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.137214,0.500000,0.288462,0.572626,0.103905,0.117610


Y la variable target es `price`

In [15]:
y = data.price

## Ejercicio 2

Separemos el conjunto en train y test

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 117)

## Ejercicio 3

Ajustemos una regresión lineal múltiple con los datos del conjunto de entrenamiento usando statsmodels y evaluemos la significancia de cada uno de los coeficientes

In [17]:
# Tenemos que agregar explícitamente a una constante:
X_train_sm = sm.add_constant(X_train)

model = sm.OLS(y_train, X_train_sm).fit()

model.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.92
Model:,OLS,Adj. R-squared:,0.92
Method:,Least Squares,F-statistic:,18960.0
Date:,"Mon, 18 Oct 2021",Prob (F-statistic):,0.0
Time:,19:29:25,Log-Likelihood:,-318900.0
No. Observations:,37758,AIC:,637900.0
Df Residuals:,37734,BIC:,638100.0
Df Model:,23,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,605.6677,197.624,3.065,0.002,218.319,993.017
cut_Good,587.5927,39.948,14.709,0.000,509.294,665.891
cut_Ideal,829.0476,39.667,20.900,0.000,751.299,906.796
cut_Premium,774.2104,38.335,20.196,0.000,699.072,849.349
cut_Very Good,743.0087,38.300,19.400,0.000,667.940,818.078
color_E,-226.0425,21.320,-10.602,0.000,-267.830,-184.255
color_F,-282.8884,21.535,-13.137,0.000,-325.097,-240.680
color_G,-499.0250,21.074,-23.679,0.000,-540.331,-457.719
color_H,-1003.0732,22.435,-44.711,0.000,-1047.046,-959.101

0,1,2,3
Omnibus:,10637.428,Durbin-Watson:,2.0
Prob(Omnibus):,0.0,Jarque-Bera (JB):,310890.318
Skew:,0.747,Prob(JB):,0.0
Kurtosis:,16.978,Cond. No.,314.0


In [41]:
no_reg_model_params = model.params
no_reg_model_params

const              605.667749
cut_Good           587.592690
cut_Ideal          829.047578
cut_Premium        774.210400
cut_Very Good      743.008715
color_E           -226.042453
color_F           -282.888446
color_G           -499.024986
color_H          -1003.073182
color_I          -1490.788960
color_J          -2380.458791
clarity_IF        5413.271592
clarity_SI1       3717.545056
clarity_SI2       2754.524335
clarity_VS1       4637.599019
clarity_VS2       4321.151595
clarity_VVS1      5083.510522
clarity_VVS2      5021.388087
carat_std        54475.791402
depth_std        -2276.199186
table_std        -1462.430826
x_std           -10941.853262
y_std             -420.690376
z_std            -1292.337580
dtype: float64

Vemos que los p-value de los coeficientes de y_std y z_std son altos, por lo tanto no podemos rechazar la hipótesis nula que dice que los coeficientes de esas dos variables son 0.

In [19]:
sm_prediction_train = model.predict(X_train_sm)
print(eval_measures.rmse(y_train, sm_prediction_train))

X_test_sm = sm.add_constant(X_test)
sm_prediction_test = model.predict(X_test_sm)
print(eval_measures.rmse(y_test, sm_prediction_test))


1126.6429803257492
1137.7066896840536


## Ejercicio 4

Ajustamos el modelo aplicando regularización de Lasso y validación cruzada para estimar el mejor valor de $\alpha$ para este problema

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html

¿Cuál es el mejor valor de $\alpha$ para este problema?

¿Cuál es el score obtenido ($R^2$) para este modelo en entrenamiento?

In [20]:
# Definimos el rango de de búsqueda del hiperparametro explicitamente
lm_lasso = linear_model.LassoCV(alphas=[0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01,\
                                        0.05, 0.1, 1, 5, 10],\
                                        normalize = False, cv = 5) 

model_cv = lm_lasso.fit(X_train, y_train)

model_cv.score(X_train, y_train)

0.920344576494572

In [21]:
model_cv.coef_

array([ 5.85509327e+02,  8.31885697e+02,  7.75093667e+02,  7.42889792e+02,
       -2.23597371e+02, -2.80598258e+02, -4.96387467e+02, -9.99781369e+02,
       -1.48644827e+03, -2.37531160e+03,  5.38546744e+03,  3.68890616e+03,
        2.72677009e+03,  4.60898935e+03,  4.29316952e+03,  5.05632690e+03,
        4.99380981e+03,  5.42209208e+04, -2.27810854e+03, -1.39746598e+03,
       -1.10520262e+04, -0.00000000e+00, -1.28848643e+01])

In [22]:
model_cv.intercept_

519.6726533321871

In [23]:
model_cv.alpha_

0.05

In [24]:
model_cv.score(X_train, y_train)

0.920344576494572

## Ejercicio 5 

Ajustemos los datos de entrenamiento con una regresión con regularización de Lasso para el valor de $\alpha$ calculado en el punto anterior usando statsmodels.

Usemos scatterplots para mostrar 

* los valores de los coeficientes de la regresión lineal múltiple obtenidos en el Ejercicio 3, y los valores de los coeficientes de la regresión lineal con regularización de Lasso para el modelo entrenado.

* los valores de los residuos en entrenamiento resultado del Ejercicio 3, y los residuos en entrenamiento para el modelo con regularización.

https://www.statsmodels.org/0.6.1/generated/statsmodels.regression.linear_model.OLS.fit_regularized.html

In [25]:
best_alpha = model_cv.alpha_

#L1_wt : 0, the fit is ridge regression. 1, the fit is the lasso 

no_reg_model = sm.OLS(y_train, X_train_sm)

reg_model = no_reg_model.fit_regularized(alpha = best_alpha, L1_wt = 1)


In [26]:
 reg_model.params

const             2895.912284
cut_Good           371.826397
cut_Ideal          264.475793
cut_Premium        288.015942
cut_Very Good      368.561174
color_E           -232.313472
color_F           -312.275408
color_G           -487.556369
color_H           -938.460368
color_I          -1364.929918
color_J          -2210.644786
clarity_IF        3282.680978
clarity_SI1       1558.391870
clarity_SI2        633.211414
clarity_VS1       2459.251861
clarity_VS2       2176.430037
clarity_VVS1      2997.127317
clarity_VVS2      2895.893234
carat_std        39326.796755
depth_std       -10484.048501
table_std        -5473.598927
x_std             2283.459596
y_std                0.000000
z_std             2478.957642
dtype: float64

In [42]:
sns.scatterplot(x=reg_model.params, y=no_reg_model_params);

In [43]:
reg_residuals = y_train - reg_model.fittedvalues

linear_residuals = y_train - model.fittedvalues

sns.scatterplot(x = reg_residuals, y = linear_residuals)

<AxesSubplot:xlabel='price'>

## Ejercicio 6

Usandos statsmodels y scikit-learn calculemos la performance en test del modelo construído y comparemos los resultados de las dos bibliotecas usando como métricas el error absoluto medio (MAE) y la raiz del error cuadrático medio (RMSE) 

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html

In [29]:

sm_prediction = reg_model.predict(X_test_sm)
sm_prediction

42170     1131.494743
5732      4238.179799
3407        39.173665
25415    15905.693317
5615      4672.560426
             ...     
1258      4129.234376
47750     2724.781375
38570      669.042105
31172     -196.972835
12270     6875.815588
Length: 16182, dtype: float64

In [30]:
skl_lasso = linear_model.Lasso(alpha = best_alpha, fit_intercept=True, normalize=False)

skl_lasso = skl_lasso.fit(X= X_train, y = y_train)

skl_prediction = skl_lasso.predict(X_test)


In [31]:
skl_residuals = y_test - skl_prediction

sm_residuals = y_test - sm_prediction

sns.scatterplot(x = skl_residuals, y = sm_residuals)

<AxesSubplot:xlabel='price'>

In [32]:

lasso_coef = np.insert(skl_lasso.coef_, 0, skl_lasso.intercept_)

sns.scatterplot(x = lasso_coef, y = reg_model.params);


In [33]:
lasso_coef

array([ 5.19672653e+02,  5.85509327e+02,  8.31885697e+02,  7.75093667e+02,
        7.42889792e+02, -2.23597371e+02, -2.80598258e+02, -4.96387467e+02,
       -9.99781369e+02, -1.48644827e+03, -2.37531160e+03,  5.38546744e+03,
        3.68890616e+03,  2.72677009e+03,  4.60898935e+03,  4.29316952e+03,
        5.05632690e+03,  4.99380981e+03,  5.42209208e+04, -2.27810854e+03,
       -1.39746598e+03, -1.10520262e+04, -0.00000000e+00, -1.28848643e+01])

In [34]:
reg_model.params

const             2895.912284
cut_Good           371.826397
cut_Ideal          264.475793
cut_Premium        288.015942
cut_Very Good      368.561174
color_E           -232.313472
color_F           -312.275408
color_G           -487.556369
color_H           -938.460368
color_I          -1364.929918
color_J          -2210.644786
clarity_IF        3282.680978
clarity_SI1       1558.391870
clarity_SI2        633.211414
clarity_VS1       2459.251861
clarity_VS2       2176.430037
clarity_VVS1      2997.127317
clarity_VVS2      2895.893234
carat_std        39326.796755
depth_std       -10484.048501
table_std        -5473.598927
x_std             2283.459596
y_std                0.000000
z_std             2478.957642
dtype: float64

Métricas en `statsmodels`

https://www.statsmodels.org/stable/generated/statsmodels.tools.eval_measures.rmse.html

https://www.statsmodels.org/stable/generated/statsmodels.tools.eval_measures.meanabs.html

In [35]:
eval_measures.rmse(y_test, sm_prediction)

1238.342041371642

In [36]:
eval_measures.meanabs(y_test, sm_prediction)

862.4836582045623

In [37]:
# de scikit-learn
metrics.r2_score(y_test, sm_prediction)

0.9033482626243312

Métricas en `scikit-learn`

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html

In [38]:
np.sqrt(metrics.mean_squared_error(y_test, skl_prediction))

1137.5509799751533

In [39]:
metrics.mean_absolute_error(y_test, skl_prediction)

746.3658443842598

In [40]:
# de scikit-learn
metrics.r2_score(y_test, skl_prediction)

0.9184413237524383

## Referencias

https://www.kaggle.com/yogendran/intro-to-linear-ridge-and-lasso-regressions
    
https://towardsdatascience.com/intro-to-regularization-with-ridge-and-lasso-regression-with-sklearn-edcf4c117b7a