## US county-level sociodemographic and health resource data (2018-2019)

Sociodemographic and health resource data have been collected by county in the United States and we want to find out if there is any relationship between health resources and sociodemographic data.

To do this, you need to set a target variable (health-related) to conduct the analysis.

In [2]:
import pandas as pd

data = pd.read_csv("https://raw.githubusercontent.com/4GeeksAcademy/regularized-linear-regression-project-tutorial/main/demographic_health_data.csv")
data.head()

Unnamed: 0,fips,TOT_POP,0-9,0-9 y/o % of total pop,19-Oct,10-19 y/o % of total pop,20-29,20-29 y/o % of total pop,30-39,30-39 y/o % of total pop,...,COPD_number,diabetes_prevalence,diabetes_Lower 95% CI,diabetes_Upper 95% CI,diabetes_number,CKD_prevalence,CKD_Lower 95% CI,CKD_Upper 95% CI,CKD_number,Urban_rural_code
0,1001,55601,6787,12.206615,7637,13.735364,6878,12.370281,7089,12.749771,...,3644,12.9,11.9,13.8,5462,3.1,2.9,3.3,1326,3
1,1003,218022,24757,11.355276,26913,12.344167,23579,10.814964,25213,11.564429,...,14692,12.0,11.0,13.1,20520,3.2,3.0,3.5,5479,4
2,1005,24881,2732,10.980266,2960,11.896628,3268,13.13452,3201,12.865239,...,2373,19.7,18.6,20.6,3870,4.5,4.2,4.8,887,6
3,1007,22400,2456,10.964286,2596,11.589286,3029,13.522321,3113,13.897321,...,1789,14.1,13.2,14.9,2511,3.3,3.1,3.6,595,2
4,1009,57840,7095,12.266598,7570,13.087828,6742,11.656293,6884,11.901798,...,4661,13.5,12.6,14.5,6017,3.4,3.2,3.7,1507,2


In [3]:
data.info()
data.shape
data.dtypes

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3140 entries, 0 to 3139
Columns: 108 entries, fips to Urban_rural_code
dtypes: float64(61), int64(45), object(2)
memory usage: 2.6+ MB


fips                        int64
TOT_POP                     int64
0-9                         int64
0-9 y/o % of total pop    float64
19-Oct                      int64
                           ...   
CKD_prevalence            float64
CKD_Lower 95% CI          float64
CKD_Upper 95% CI          float64
CKD_number                  int64
Urban_rural_code            int64
Length: 108, dtype: object

En total hay 108 variables, de las cuales se tiene que elegir una variable objetivo, que será la y.
En este caso, la cantidad de variables es muy grande, y no se tiene conocimiento de cuántas son numéricas/categóricas, por lo que se identificaran primero.

In [12]:
for c in data.columns:
    if data[c].dtype == 'object':
        print(c, data[c].dtype)

COUNTY_NAME object
STATE_NAME object


Solo dos columnas de las 108 son de tipo object (no numéricas), por lo que podemos factorizarlas o eliminarlas. En este caso, las factorizaremos.

In [13]:
data['COUNTY_NAME'] = pd.factorize(data['COUNTY_NAME'])[0]
data['STATE_NAME'] = pd.factorize(data['STATE_NAME'])[0]


Con este procesimiento, ahora todas las columnas son numéricas. 
El siguiente paso es la eliminación de duplicados

In [15]:
data = data.drop_duplicates().reset_index(drop = True)
data.describe()

Unnamed: 0,fips,TOT_POP,0-9,0-9 y/o % of total pop,19-Oct,10-19 y/o % of total pop,20-29,20-29 y/o % of total pop,30-39,30-39 y/o % of total pop,...,COPD_number,diabetes_prevalence,diabetes_Lower 95% CI,diabetes_Upper 95% CI,diabetes_number,CKD_prevalence,CKD_Lower 95% CI,CKD_Upper 95% CI,CKD_number,Urban_rural_code
count,3140.0,3140.0,3140.0,3140.0,3140.0,3140.0,3140.0,3140.0,3140.0,3140.0,...,3140.0,3140.0,3140.0,3140.0,3140.0,3140.0,3140.0,3140.0,3140.0,3140.0
mean,30401.640764,104189.4,12740.3,11.871051,13367.98,12.694609,14469.33,12.283979,13916.49,11.751535,...,5827.242357,13.073503,12.088089,14.053726,9326.577707,3.446242,3.207516,3.710478,2466.234076,4.63535
std,15150.559265,333583.4,41807.3,2.124081,42284.39,1.815044,49577.73,3.126297,48990.95,1.696599,...,15720.551934,2.724351,2.622948,2.824828,29754.601185,0.568059,0.52774,0.613069,7730.422067,1.510447
min,1001.0,88.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0,6.092789,...,7.0,6.1,5.5,6.7,11.0,1.8,1.7,1.9,3.0,1.0
25%,18180.5,10963.25,1280.5,10.594639,1374.5,11.674504,1263.75,10.496774,1232.75,10.689322,...,815.0,11.2,10.3,12.1,1187.75,3.1,2.9,3.3,314.75,3.0
50%,29178.0,25800.5,3057.0,11.802727,3274.0,12.687422,3108.0,11.772649,3000.5,11.580861,...,1963.5,12.8,11.8,13.8,2743.0,3.4,3.2,3.7,718.0,5.0
75%,45081.5,67913.0,8097.0,12.95184,8822.25,13.659282,8976.25,13.18226,8314.25,12.639379,...,4727.0,14.8,13.7,15.9,6679.25,3.8,3.5,4.1,1776.25,6.0
max,56045.0,10105520.0,1208253.0,25.460677,1239139.0,23.304372,1557073.0,37.570198,1501844.0,22.225129,...,434075.0,25.6,24.2,27.0,952335.0,6.2,5.8,6.6,237766.0,6.0


Nuestra variable objetivo es diabetes_number, que es la que indica la población con diabetes.
A Continuación preparamos los datasets train y test eliminando y, para despues hacer el preprocesamiento utilizando la normalizacion.

In [27]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

col = list(data.columns)
col.remove('diabetes_number')

# We divide the dataset into training and test samples
X = data.drop('diabetes_number', axis = 1)[col]
y = data['diabetes_number']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

X_train.head()

Unnamed: 0,fips,TOT_POP,0-9,0-9 y/o % of total pop,19-Oct,10-19 y/o % of total pop,20-29,20-29 y/o % of total pop,30-39,30-39 y/o % of total pop,...,COPD_Upper 95% CI,COPD_number,diabetes_prevalence,diabetes_Lower 95% CI,diabetes_Upper 95% CI,CKD_prevalence,CKD_Lower 95% CI,CKD_Upper 95% CI,CKD_number,Urban_rural_code
1292,26127,26625,3221,12.097653,3463,13.006573,2922,10.974648,2829,10.625352,...,13.0,2314,13.7,12.6,14.9,3.8,3.5,4.1,771,6
2302,42121,51266,5272,10.283619,5751,11.217961,5137,10.020286,5341,10.418211,...,11.5,4097,13.1,11.9,14.2,3.5,3.2,3.8,1454,5
761,18133,37779,3915,10.3629,5118,13.547209,6202,16.416528,4363,11.548744,...,10.4,2792,12.2,11.2,13.1,2.9,2.7,3.1,871,2
2194,40131,91984,11163,12.135806,12646,13.748043,11595,12.605453,11357,12.346712,...,9.3,5716,11.2,10.4,12.0,3.0,2.8,3.2,2118,3
1241,26025,134487,16698,12.41607,17666,13.135842,17281,12.849569,15993,11.891856,...,11.0,10002,12.5,11.7,13.4,3.4,3.2,3.6,3490,4


In [29]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train_norm = scaler.transform(X_train)
X_train_norm = pd.DataFrame(X_train_norm, index = X_train.index, columns = col)

X_test_norm = scaler.transform(X_test)
X_test_norm = pd.DataFrame(X_test_norm, index = X_test.index, columns = col)

X_train_norm.head()

Unnamed: 0,fips,TOT_POP,0-9,0-9 y/o % of total pop,19-Oct,10-19 y/o % of total pop,20-29,20-29 y/o % of total pop,30-39,30-39 y/o % of total pop,...,COPD_Upper 95% CI,COPD_number,diabetes_prevalence,diabetes_Lower 95% CI,diabetes_Upper 95% CI,CKD_prevalence,CKD_Lower 95% CI,CKD_Upper 95% CI,CKD_number,Urban_rural_code
1292,-0.301633,-0.229763,-0.225393,0.102383,-0.23135,0.162374,-0.229775,-0.429454,-0.22378,-0.665485,...,0.989521,-0.222477,0.244361,0.209312,0.314479,0.644725,0.57643,0.657452,-0.21695,0.910528
2302,0.761573,-0.16128,-0.179851,-0.754597,-0.181109,-0.836073,-0.188375,-0.736296,-0.176225,-0.785934,...,0.413368,-0.117073,0.021661,-0.060621,0.064137,0.109985,0.000382,0.161947,-0.135212,0.249092
761,-0.833037,-0.198764,-0.209983,-0.717144,-0.195009,0.46417,-0.16847,1.320194,-0.19474,-0.128551,...,-0.009144,-0.19422,-0.312388,-0.330555,-0.329256,-0.959495,-0.959698,-0.994232,-0.204982,-1.735217
2194,0.629287,-0.048115,-0.049041,0.120407,-0.029705,0.57628,-0.067671,0.094875,-0.062335,0.335452,...,-0.431656,-0.021363,-0.683554,-0.63905,-0.72265,-0.781249,-0.767682,-0.829064,-0.055748,-1.07378
1241,-0.308413,0.070012,0.073864,0.252809,0.080526,0.234535,0.038603,0.173362,0.02543,0.070962,...,0.221317,0.232009,-0.201038,-0.137745,-0.221967,-0.068262,0.000382,-0.16839,0.108446,-0.412344


In [30]:
X_train_norm["diabetes_number"] = list(y_train)
X_test_norm["diabetes_number"] = list(y_test)

X_train_norm.to_csv("/workspaces/regularizedLinearReg/data/processed/data_train.csv", index=False)
X_test_norm.to_csv("/workspaces/regularizedLinearReg/data/processed/data_test.csv", index=False)

### Modelo

In [31]:
train_data = pd.read_csv("/workspaces/regularizedLinearReg/data/processed/data_train.csv")
test_data = pd.read_csv("/workspaces/regularizedLinearReg/data/processed/data_test.csv")
train_data.head()

Unnamed: 0,fips,TOT_POP,0-9,0-9 y/o % of total pop,19-Oct,10-19 y/o % of total pop,20-29,20-29 y/o % of total pop,30-39,30-39 y/o % of total pop,...,COPD_number,diabetes_prevalence,diabetes_Lower 95% CI,diabetes_Upper 95% CI,CKD_prevalence,CKD_Lower 95% CI,CKD_Upper 95% CI,CKD_number,Urban_rural_code,diabetes_number
0,-0.301633,-0.229763,-0.225393,0.102383,-0.23135,0.162374,-0.229775,-0.429454,-0.22378,-0.665485,...,-0.222477,0.244361,0.209312,0.314479,0.644725,0.57643,0.657452,-0.21695,0.910528,2823
1,0.761573,-0.16128,-0.179851,-0.754597,-0.181109,-0.836073,-0.188375,-0.736296,-0.176225,-0.785934,...,-0.117073,0.021661,-0.060621,0.064137,0.109985,0.000382,0.161947,-0.135212,0.249092,5416
2,-0.833037,-0.198764,-0.209983,-0.717144,-0.195009,0.46417,-0.16847,1.320194,-0.19474,-0.128551,...,-0.19422,-0.312388,-0.330555,-0.329256,-0.959495,-0.959698,-0.994232,-0.204982,-1.735217,3698
3,0.629287,-0.048115,-0.049041,0.120407,-0.029705,0.57628,-0.067671,0.094875,-0.062335,0.335452,...,-0.021363,-0.683554,-0.63905,-0.72265,-0.781249,-0.767682,-0.829064,-0.055748,-1.07378,7913
4,-0.308413,0.070012,0.073864,0.252809,0.080526,0.234535,0.038603,0.173362,0.02543,0.070962,...,0.232009,-0.201038,-0.137745,-0.221967,-0.068262,0.000382,-0.16839,0.108446,-0.412344,12987


In [32]:
X_train = train_data.drop(["diabetes_number"], axis = 1)
y_train = train_data["diabetes_number"]
X_test = test_data.drop(["diabetes_number"], axis = 1)
y_test = test_data["diabetes_number"]

Ahora que los sets estan definidos, se empleara SelectKBest para determinar qué variables utilizaremos. Se decidió conservar el correspondiente al 0.5 de variables con el objetivo de no tener demasiado ruido, pero tampoco complicar el modelo con pocas variables.

In [51]:
from sklearn.feature_selection import SelectKBest, f_regression

k = int(len(X_train.columns) * 0.5)
selection_model = SelectKBest(score_func = f_regression, k = k)
selection_model.fit(X_train, y_train)
ix = selection_model.get_support()

X_train_sel = pd.DataFrame(selection_model.transform(X_train), columns = X_train.columns.values[ix])
X_test_sel = pd.DataFrame(selection_model.transform(X_test), columns = X_test.columns.values[ix])

X_train_sel.head()

Unnamed: 0,TOT_POP,0-9,19-Oct,20-29,30-39,30-39 y/o % of total pop,40-49,50-59,60-69,70-79,...,anycondition_number,Obesity_Upper 95% CI,Obesity_number,Heart disease_prevalence,Heart disease_Lower 95% CI,Heart disease_Upper 95% CI,Heart disease_number,COPD_number,CKD_number,Urban_rural_code
0,-0.229763,-0.225393,-0.23135,-0.229775,-0.22378,-0.665485,-0.228216,-0.22703,-0.231375,-0.232151,...,-0.231574,0.833508,-0.230636,0.866051,0.818916,0.874465,-0.224232,-0.222477,-0.21695,0.910528
1,-0.16128,-0.179851,-0.181109,-0.188375,-0.176225,-0.785934,-0.163303,-0.13912,-0.112664,-0.124916,...,-0.146243,-0.324419,-0.158219,0.351189,0.255939,0.350729,-0.121664,-0.117073,-0.135212,0.249092
2,-0.198764,-0.209983,-0.195009,-0.16847,-0.19474,-0.128551,-0.193726,-0.19936,-0.218215,-0.220562,...,-0.193279,-0.258876,-0.201734,-0.735742,-0.744908,-0.696745,-0.215308,-0.19422,-0.204982,-1.735217
3,-0.048115,-0.049041,-0.029705,-0.067671,-0.062335,0.335452,-0.044847,-0.033104,-0.044844,-0.036279,...,-0.016632,0.331011,-0.007673,-0.449707,-0.432144,-0.48725,-0.021238,-0.021363,-0.055748,-1.07378
4,0.070012,0.073864,0.080526,0.038603,0.02543,0.070962,0.04885,0.079829,0.119807,0.116625,...,0.140471,0.52764,0.15696,-0.106466,-0.056826,-0.173008,0.169262,0.232009,0.108446,-0.412344


#### Regresión Logistica

Dado que la regresión lineal no tiene hiperparámetros que puedan optimizarse, se utilizará la regresión logística

In [61]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

In [62]:
from sklearn.metrics import mean_squared_error, r2_score

print(f"Intercep (a): {model.intercept_}")
print(f"Coefficients: {model.coef_}")

Intercep (a): 9762.924424760971
Coefficients: [ 9.34786846e+03 -1.00360693e+04  6.52134131e+03  1.34787294e+09
 -9.97990463e+03  1.14068449e+09 -2.21056639e+04  1.98049021e+09
 -6.60268731e+03  1.09506382e+09 -1.74638485e+03  8.52513456e+08
 -6.63928624e+01  9.45595915e+08 -3.65271009e+03  1.58807999e+09
 -4.78592803e+03  1.38400657e+09  2.57174510e+03  9.71474703e+08
 -4.16824320e+03 -1.21848408e+11  8.93197405e+02 -1.07860853e+11
 -2.97890560e+02 -5.74734716e+10  4.57059896e+02 -2.13583871e+10
  1.18938488e+02 -7.90071271e+09 -2.41576188e+02 -1.12478073e+10
 -4.88804494e+03 -1.37291690e+03 -9.26247648e+02  2.14591002e+03
 -2.25270578e+03 -3.35574047e+03 -2.08632458e+02 -1.31996217e+03
  1.36694492e+03 -6.26358946e+03 -1.10860916e+04 -2.18453472e+04
 -2.52046539e+04  8.81468811e+02  1.04761436e+03  9.99159113e+02
  1.51787813e+03  2.71883737e+03  2.87279892e+01  6.56436272e+01
 -9.94806366e+01  9.96931227e+03  3.40688112e+03  1.82190027e+03
 -4.77972195e+03 -5.69383663e+03  5.94750036

In [63]:
y_pred = model.predict(X_test)
y_pred

array([ 1.49951389e+03,  2.14404097e+04,  2.37102871e+03,  9.18133663e+03,
        1.76347338e+04,  1.88405993e+03,  1.13082394e+03,  2.28209871e+03,
       -1.05004464e+02,  6.81836236e+02,  1.83364455e+03,  4.26066922e+02,
        6.84356125e+03,  1.38833333e+03,  1.42445095e+04,  4.02759687e+03,
        9.28148320e+04,  2.09839559e+03,  2.83536723e+03,  9.36227841e+02,
        1.65422733e+03,  4.65061981e+02,  1.02125214e+03,  6.01359563e+02,
        1.89227068e+05,  5.29841253e+03,  9.51226478e+02,  2.33388532e+03,
        1.46664164e+03,  1.44683709e+03,  2.24728384e+03,  3.26591140e+03,
        4.14364357e+03,  4.79785060e+03,  1.27122953e+04,  1.93454016e+03,
        4.75653705e+03,  1.86996027e+03,  7.03137216e+02,  3.06831802e+03,
        2.06977690e+03,  1.59479456e+04,  2.78547560e+02,  8.74486806e+02,
        3.96124819e+03,  6.63083352e+02,  9.44316785e+02,  1.17250601e+04,
        1.82007001e+03,  1.34470105e+03,  2.48996982e+04,  1.76985662e+03,
        2.67876277e+03,  

In [64]:
base_score = model.score(X_test, y_test)
print("Coefficients:", model.coef_)
print("R2 score:", base_score)

Coefficients: [ 9.34786846e+03 -1.00360693e+04  6.52134131e+03  1.34787294e+09
 -9.97990463e+03  1.14068449e+09 -2.21056639e+04  1.98049021e+09
 -6.60268731e+03  1.09506382e+09 -1.74638485e+03  8.52513456e+08
 -6.63928624e+01  9.45595915e+08 -3.65271009e+03  1.58807999e+09
 -4.78592803e+03  1.38400657e+09  2.57174510e+03  9.71474703e+08
 -4.16824320e+03 -1.21848408e+11  8.93197405e+02 -1.07860853e+11
 -2.97890560e+02 -5.74734716e+10  4.57059896e+02 -2.13583871e+10
  1.18938488e+02 -7.90071271e+09 -2.41576188e+02 -1.12478073e+10
 -4.88804494e+03 -1.37291690e+03 -9.26247648e+02  2.14591002e+03
 -2.25270578e+03 -3.35574047e+03 -2.08632458e+02 -1.31996217e+03
  1.36694492e+03 -6.26358946e+03 -1.10860916e+04 -2.18453472e+04
 -2.52046539e+04  8.81468811e+02  1.04761436e+03  9.99159113e+02
  1.51787813e+03  2.71883737e+03  2.87279892e+01  6.56436272e+01
 -9.94806366e+01  9.96931227e+03  3.40688112e+03  1.82190027e+03
 -4.77972195e+03 -5.69383663e+03  5.94750036e+02 -6.88116493e+01
 -1.4987735

Utilizando regresión lineal, el modelo muestra un R2 de 0.99, lo cual indica que nuestro modelo esta en riesgo de sobreajuste. Se realizará la optimización para intentar obtener un mejor resultado.

#### Optimización

In [None]:
from sklearn.linear_model import Lasso

# Load of train and test data
# These data must have been standardized and correctly processed in a complete EDA

lasso_model = Lasso()
lasso_model.fit(X_train, y_train)

  model = cd_fast.enet_coordinate_descent(


In [67]:
score = lasso_model.score(X_test, y_test)
print("Coefficients:", lasso_model.coef_)
print("R2 score:", score)

Coefficients: [ 6.50584591e+00  2.07797867e+04 -5.56145028e+03  4.44178671e+02
 -1.21382304e+03  3.45408951e+02 -6.51202224e+02  7.87535814e+02
  4.71628749e+02  4.73172085e+02  1.30363137e+04  1.75242231e+02
  5.52603885e+03  2.27494007e+02 -6.22009248e+02  4.98292705e+02
 -4.35204689e+02  4.42112238e+02  2.25807758e+03  4.72545504e+02
 -1.08445802e+04  1.41413153e+02 -1.24474695e+02 -1.57858296e+02
 -5.25869563e+02  1.33020037e+02 -8.62263097e+02 -1.45430961e+02
  1.66344481e+02 -2.01420174e+01 -8.38716791e+02  6.16605183e+01
 -0.00000000e+00 -3.81672446e+01 -9.33218046e+01  1.90961818e+01
 -6.22890254e+01  0.00000000e+00  1.32970501e+02  0.00000000e+00
  3.56216719e+01  5.12713741e+03  3.50559323e+03 -3.58611294e+03
 -5.20349145e+03 -3.03988639e+02 -2.45890052e+02  1.10823217e+02
  4.21007504e+00  5.01140288e+03 -0.00000000e+00 -2.37800020e+01
 -8.71790097e+01  0.00000000e+00  6.85192919e+01 -9.88915555e+02
 -4.69539836e+03 -3.57559571e+00 -4.54232939e+02 -5.18417160e+01
  9.0798421

El R2 score del modelo es 0.99, lo que indica que el 99% de los datos de test fueron correctos, por lo que si introducimos un nuevo dataset con valores desconocidos, lo esperado es que el R2 score también tenga un puntaje alto.
En su estado actual, el modelo ya es bueno, pero se realizará la optimización para intentar subir más el puntaje.

In [68]:
lasso_model = Lasso(alpha = 0.1, max_iter = 100)
lasso_model.fit(X_train, y_train)

  model = cd_fast.enet_coordinate_descent(


In [69]:
score = lasso_model.score(X_test, y_test)
print("Coefficients:", lasso_model.coef_)
print("R2 score:", score)

Coefficients: [-2.12207073e+03  2.82056323e+04 -2.45943030e+03  7.17880722e+02
  1.88781453e+02  6.33612278e+02 -1.41437245e+03  1.15838687e+03
  1.04124878e+03  7.61820604e+02  3.53151859e+03  4.80695404e+02
  2.01625085e+03  5.74268440e+02 -8.35682474e+01  1.03249522e+03
  1.81809088e+03  8.27569832e+02  1.82123291e+03  8.50480190e+02
 -6.46960482e+03  8.82466740e+01  6.02465525e+02 -1.61851101e+02
 -7.79468792e+02  2.62755183e+02  4.42120035e+02 -2.20805438e+02
  7.63986396e+02 -7.70650338e+01 -1.88341894e+03  9.43424566e+01
  1.26430893e+03 -3.24489433e+02 -1.51527872e+02  6.16614479e+01
 -7.63838925e+01 -5.63235485e+01  2.31840526e+02  8.02645017e+01
  1.24488412e+01  4.37745192e+03  2.50623959e+03 -4.43537244e+03
 -6.48399862e+03 -4.26172341e+02 -3.81510442e+02  9.14755502e+01
 -7.99430670e+01  4.41640432e+03  1.73899536e+02  1.58669654e+01
 -2.37581053e+02 -2.02656344e+02  5.64371058e+02 -1.11301314e+03
 -4.43425637e+02 -1.55569396e+01  3.54946742e+02 -8.00962126e+01
  5.4735373

In [72]:
lasso_model = Lasso(alpha = 0.4, max_iter = 150)
lasso_model.fit(X_train, y_train)

  model = cd_fast.enet_coordinate_descent(


In [73]:
score = lasso_model.score(X_test, y_test)
print("Coefficients:", lasso_model.coef_)
print("R2 score:", score)

Coefficients: [-2.02503482e+03  2.75701579e+04 -3.12179174e+03  8.81861361e+02
  1.57982468e+02  7.52748657e+02 -1.51176293e+03  1.40451373e+03
  1.41093385e+03  8.63531789e+02  4.52947696e+03  5.71154876e+02
  2.42915058e+03  6.54756510e+02 -3.61111062e+02  1.16373613e+03
  1.63573717e+03  9.22036421e+02  2.11795245e+03  8.92011757e+02
 -7.21595208e+03  1.54229287e+02  5.28181455e+02 -1.18041021e+02
 -7.68328040e+02  2.68015999e+02  3.00256351e+02 -2.03492317e+02
  5.89154567e+02 -5.08352216e+01 -1.58817311e+03  8.13706247e+01
  1.13656963e+03 -2.68320929e+02 -5.03838028e+01  6.35108693e+00
 -4.90418099e+01 -0.00000000e+00  2.14898675e+02  5.27191474e+01
  2.33415841e+01  4.54552512e+03  2.83215183e+03 -4.50781607e+03
 -6.88757597e+03 -3.84331346e+02 -3.21308829e+02  1.38132766e+02
  1.27700589e-01  4.07662992e+03  1.35502250e+02  1.04051333e+00
 -1.58828918e+02  2.13009702e+01  5.01850264e+02 -1.06355585e+03
 -2.48017365e+02 -1.64205327e+01  2.76649706e+02 -7.88749701e+01
  4.3472427

El R2 score volvió a subir ligeramente, por lo cual, los mejores parámetros obtenidos manualmente son los del segundo modelo Lasso

### Conclusiones:

Modificando los hiperparámetros, se logró subir mínimamente el R2, pero aún así se considera que el segundo modelo es mejor que el primero.
Ambos modelos indicaron un acierto del 99% en los datos nuevos introducidos, por lo que se concluye que el modelo es bueno y se espera que en caso de introducir nuevos datos, también se obtenga el resultado correcto en su mayoría.