### Primer modelo: Todas las variables con regresión lineal

Es relevante mencionar que, como criterio de selección de modelos, se compara la métrica del R^2 ajustado de cada modelo. Asimismo, resulta importante aclarar que el modelo que se pretende hallar es para explicar el precio de un inmueble a partir de sus características estadísticamente significativas.

In [200]:
import pandas as pd

In [201]:
#Lectura de los datos preparados para el modelamiento.
df = pd.read_csv('datosPreparados.csv')
df

Unnamed: 0,title,body,bathrooms,bedrooms,currency,fee,price,price_display,square_feet,address,...,Garbage Disposal,Fireplace,Luxury,Wood Floors,Playground,Storage,Clubhouse,Basketball,View,longitud_descripcion
0,Studio apartment 333 Hyde St,"This unit is located at 333 Hyde St, San Franc...",1.0,0.0,USD,No,1495,"$ 1,495",138,333 Hyde St,...,0,0,0,0,0,0,0,0,0,193
1,Studio apartment 57 Taylor Street,"This unit is located at 57 Taylor Street, San ...",1.0,0.0,USD,No,1695,"$ 1,695",190,57 Taylor St,...,0,0,0,0,0,0,0,0,0,297
2,Studio Cottage 214,"New Bern Studio includes : 1 bedrooms, 1 micro...",1.0,1.0,USD,No,1560,"$ 1,560",200,180 Moonlight Lake Drive,...,0,0,0,0,0,0,0,1,0,761
3,One BR 501 Chapel Drive,"This unit is located at 501 Chapel Drive, Tall...",1.0,1.0,USD,No,544,$ 544,200,501 Chapel Dr,...,0,0,0,0,0,0,1,0,0,304
4,Studio apartment 420 W. Fullerton Pkwy,"This unit is located at 420 W. Fullerton Pkwy,...",1.0,1.0,USD,No,942,$ 942,225,420 W Fullerton Parkway,...,0,0,0,0,0,0,0,0,0,298
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3538,Five BR 55 West 10th Avenue,"This unit is located at 55 West 10th Avenue, C...",4.0,5.0,USD,No,4200,"$ 4,200",4736,55 West 10th Ave,...,0,0,0,0,0,0,0,0,0,308
3539,Five BR 757 N Saint Andrews St,This unit is located at 757 N Saint Andrews St...,5.0,5.0,USD,No,4500,"$ 4,500",4741,757 N Saint Andrews St,...,0,0,0,0,0,0,0,0,0,142
3540,Eight BR 46 N Breese Terrace,"This unit is located at 46 N Breese Terrace, M...",4.0,8.0,USD,No,8345,"$ 8,345",4900,46 N Breese Terrace,...,1,1,0,0,0,1,0,0,0,271
3541,Four BR 864 Teakwood Rd,"This unit is located at 864 Teakwood Rd, Los A...",5.0,4.0,USD,No,19500,"$ 19,500",5000,864 Teakwood Road,...,0,0,0,0,0,0,0,0,0,140


Iteración 1 para la selección del modelo con regresión lineal

In [202]:
#Se eliminan todas las variables que no son relevantes para la estimación del modelo.
features = df.drop(columns=['price','title','body','currency','fee','price_display','address']).columns

x = df[features]

#El modelo pretende explicar el precio de un inmueble a partir de sus características.
y = df['price']

x.head()

Unnamed: 0,bathrooms,bedrooms,square_feet,latitude,longitude,time,category_housing/rent/apartment,pets_allowed_Cats,"pets_allowed_Cats,Dogs",pets_allowed_Dogs,...,Garbage Disposal,Fireplace,Luxury,Wood Floors,Playground,Storage,Clubhouse,Basketball,View,longitud_descripcion
0,1.0,0.0,138,377599.0,-1224379.0,1577358313,1,0,0,0,...,0,0,0,0,0,0,0,0,0,193
1,1.0,0.0,190,377599.0,-1224379.0,1577015121,1,0,0,0,...,0,0,0,0,0,0,0,0,0,297
2,1.0,1.0,200,35096.0,-770272.0,1576406273,1,0,1,0,...,0,0,0,0,0,0,0,1,0,761
3,1.0,1.0,200,304601.0,-842714.0,1577359108,1,0,0,0,...,0,0,0,0,0,0,1,0,0,304
4,1.0,1.0,225,418625.0,-876825.0,1577359301,1,1,0,0,...,0,0,0,0,0,0,0,0,0,298


Datos de entrenamiento y de prueba

In [203]:
import statsmodels.api as sm

# regresión usando mínimos cuadrados ordinarios (ordinary least squares - OLS) 
model = sm.OLS(y, x).fit()

# resumen de resultados
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  price   R-squared:                       0.781
Model:                            OLS   Adj. R-squared:                  0.714
Method:                 Least Squares   F-statistic:                     11.60
Date:                Sun, 23 Feb 2025   Prob (F-statistic):               0.00
Time:                        00:42:20   Log-Likelihood:                -26078.
No. Observations:                3543   AIC:                         5.383e+04
Df Residuals:                    2708   BIC:                         5.898e+04
Df Model:                         834                                         
Covariance Type:            nonrobust                                         
                                      coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------------
bathrooms 

In [204]:
#Se seleccionan las variables no significativas del modelo.
insignificant_vars = model.pvalues[model.pvalues > 0.05].index
print("Variables no significativas:", insignificant_vars)


Variables no significativas: Index(['bedrooms', 'latitude', 'longitude', 'time',
       'category_housing/rent/apartment', 'pets_allowed_Cats',
       'pets_allowed_Cats,Dogs', 'pets_allowed_Dogs',
       'pets_allowed_No permitido', 'has_photo_No',
       ...
       'Tennis', 'AC', 'Patio/Deck', 'Washer Dryer', 'Fireplace', 'Luxury',
       'Wood Floors', 'Storage', 'Basketball', 'View'],
      dtype='object', length=776)


Iteración 2 para la selección del modelo con regresión lineal

In [205]:
#Para estimar un nuevo modelo, se eliminan, adicionalmente, las variables no significativas del último modelo estimado.
features = df.drop(columns=(['price','title','body','currency','fee','price_display','address']+list(insignificant_vars))).columns

x = df[features]

y = df['price']

model = sm.OLS(y, x).fit()

# resumen de resultados
print(model.summary())

insignificant_vars_2 = model.pvalues[model.pvalues > 0.05].index
print("Variables no significativas ", insignificant_vars_2)
print(len(insignificant_vars_2))

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.909
Model:                            OLS   Adj. R-squared (uncentered):              0.907
Method:                 Least Squares   F-statistic:                              418.2
Date:                Sun, 23 Feb 2025   Prob (F-statistic):                        0.00
Time:                        00:42:20   Log-Likelihood:                         -26806.
No. Observations:                3543   AIC:                                  5.378e+04
Df Residuals:                    3460   BIC:                                  5.429e+04
Df Model:                          83                                                  
Covariance Type:            nonrobust                                                  
                              coef    std err          t      P>|t|      [0.025      0.975]
----------------------------

Iteración 3 para la selección del modelo con regresión lineal

In [206]:
#Como el R^2 ajustado del último modelo mejoró, ahora se estima un nuevo modelo sacando las variables no significativas del último modelo.
features = df.drop(columns=['price','title','body','currency','fee','price_display','address']+list(insignificant_vars)+list(insignificant_vars_2)).columns

x = df[features]

y = df['price']

model = sm.OLS(y, x).fit()

# resumen de resultados
print(model.summary())

insignificant_vars_3 = model.pvalues[model.pvalues > 0.05].index
print("Variables no significativas:", insignificant_vars_3)

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.908
Model:                            OLS   Adj. R-squared (uncentered):              0.906
Method:                 Least Squares   F-statistic:                              603.0
Date:                Sun, 23 Feb 2025   Prob (F-statistic):                        0.00
Time:                        00:42:21   Log-Likelihood:                         -26834.
No. Observations:                3543   AIC:                                  5.378e+04
Df Residuals:                    3486   BIC:                                  5.413e+04
Df Model:                          57                                                  
Covariance Type:            nonrobust                                                  
                              coef    std err          t      P>|t|      [0.025      0.975]
----------------------------

Iteración 4

In [207]:
#Teniendo en cuenta que el R^2 ajustado no cambia considerablemente y, además, todavía hay variables no significativas, se estima un nuevo modelo sacando las variables no significativas del último modelo.
features = df.drop(columns=['price','title','body','currency','fee','price_display','address']+list(insignificant_vars)+list(insignificant_vars_2)+list(insignificant_vars_3)).columns

x = df[features]

y = df['price']

model = sm.OLS(y, x).fit()

# resumen de resultados
print(model.summary())

insignificant_vars_4 = model.pvalues[model.pvalues > 0.05].index
print("Variables no significativas:", insignificant_vars_4)

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.908
Model:                            OLS   Adj. R-squared (uncentered):              0.906
Method:                 Least Squares   F-statistic:                              634.9
Date:                Sun, 23 Feb 2025   Prob (F-statistic):                        0.00
Time:                        00:42:21   Log-Likelihood:                         -26839.
No. Observations:                3543   AIC:                                  5.379e+04
Df Residuals:                    3489   BIC:                                  5.412e+04
Df Model:                          54                                                  
Covariance Type:            nonrobust                                                  
                              coef    std err          t      P>|t|      [0.025      0.975]
----------------------------

Iteración 5

In [208]:
#Teniendo en cuenta que el R^2 ajustado no cambia considerablemente y, además, todavía hay variables no significativas, se estima un nuevo modelo sacando las variables no significativas del último modelo.
features = df.drop(columns=['price','title','body','currency','fee','price_display','address']+list(insignificant_vars)+list(insignificant_vars_2)+list(insignificant_vars_3)+list(insignificant_vars_4)).columns

x = df[features]

y = df['price']

model = sm.OLS(y, x).fit()

# resumen de resultados
print(model.summary())

insignificant_vars_5 = model.pvalues[model.pvalues > 0.05].index
print("Variables no significativas:", insignificant_vars_5)

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.907
Model:                            OLS   Adj. R-squared (uncentered):              0.906
Method:                 Least Squares   F-statistic:                              658.2
Date:                Sun, 23 Feb 2025   Prob (F-statistic):                        0.00
Time:                        00:42:21   Log-Likelihood:                         -26843.
No. Observations:                3543   AIC:                                  5.379e+04
Df Residuals:                    3491   BIC:                                  5.411e+04
Df Model:                          52                                                  
Covariance Type:            nonrobust                                                  
                              coef    std err          t      P>|t|      [0.025      0.975]
----------------------------

Iteración 6

In [209]:
#Teniendo en cuenta que el R^2 ajustado no cambia considerablemente y, además, todavía hay variables no significativas, se estima un nuevo modelo sacando las variables no significativas del último modelo.
features = df.drop(columns=['price','title','body','currency','fee','price_display','address']+list(insignificant_vars)+list(insignificant_vars_2)+list(insignificant_vars_3)+list(insignificant_vars_4)+list(insignificant_vars_5)).columns

x = df[features]

y = df['price']

model = sm.OLS(y, x).fit()

# resumen de resultados
print(model.summary())

insignificant_vars_6 = model.pvalues[model.pvalues > 0.05].index
print("Variables no significativas:", insignificant_vars_6)

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.907
Model:                            OLS   Adj. R-squared (uncentered):              0.906
Method:                 Least Squares   F-statistic:                              670.5
Date:                Sun, 23 Feb 2025   Prob (F-statistic):                        0.00
Time:                        00:42:21   Log-Likelihood:                         -26845.
No. Observations:                3543   AIC:                                  5.379e+04
Df Residuals:                    3492   BIC:                                  5.411e+04
Df Model:                          51                                                  
Covariance Type:            nonrobust                                                  
                              coef    std err          t      P>|t|      [0.025      0.975]
----------------------------

In [210]:
#Debido a que ya no quedan variables no significativas en el modelo, dicho último modelo estimado se toma como el mejor modelo.
variables_significativas = model.pvalues[model.pvalues < 0.05].index
print("Variables significativas: " + str(len(variables_significativas)))
for i in variables_significativas:
    print(str(i)+": "+str(model.params[i]))

Variables significativas: 52
bathrooms: 228.78678608983859
square_feet: 0.7966438958299303
cityname_Austin: 257.3890804246009
cityname_Bellevue: 715.7128293477062
cityname_Berlin: -1107.551798638517
cityname_Beverly Hills: 1932.5354050464512
cityname_Boston: 1073.7875282566374
cityname_Chicago: 634.6666863280345
cityname_Chula Vista: -572.1021203544792
cityname_Corona: -710.8125941039973
cityname_Crowley: -968.989373910445
cityname_Detroit: 1475.462557966469
cityname_Evanston: 607.3559683354026
cityname_Fond Du Lac: -554.9545760030064
cityname_Ingalls: -1183.0963825953263
cityname_Jersey City: 1435.3575586341756
cityname_Los Angeles: 1036.6567658085075
cityname_Maricopa: -1774.4456574560616
cityname_Miami: 1177.4064240643906
cityname_Miami Beach: 3885.7868950071775
cityname_Moreno Valley: -625.7367102251072
cityname_Mountain View: 1265.4588599532296
cityname_New Bern: 1021.5259479904231
cityname_Oakland: 958.2862179196381
cityname_Odessa: 314.42208400901495
cityname_Palo Alto: 2164.534

Consideración de los outliers para estimar el modelo

In [211]:
# disntacia de Cook
model_cooksd = model.get_influence().cooks_distance[0]

# get length of df to obtain n
n = x.shape[0]

# umbral
critical_d = 4/n
print('Umbral con distancia de Cook:', critical_d)

# puntos que podrían ser ourliers con alta influencia
out_d = model_cooksd > critical_d

print(x.index[out_d], "\n", model_cooksd[out_d])

Umbral con distancia de Cook: 0.0011289867344058708
Index([   0,    1,    2,    5,   10,   47,   63,   85,   92,  106,
       ...
       3527, 3528, 3529, 3530, 3535, 3536, 3537, 3540, 3541, 3542],
      dtype='int64', length=146) 
 [2.78826375e-03 2.22399455e-03 3.05434103e+00 2.31193828e-03
 2.36280392e-03 1.56198617e-03 1.16791108e-03 5.94108930e-02
 3.55799021e-03 7.22187266e-03 5.67388277e-03 1.49069346e-03
 6.56979983e-03 3.11660526e-03 1.43861751e-02 2.11330544e-03
 1.22191574e-03 2.27929175e-03 1.33870724e-03 1.09626711e-02
 5.60178810e-03 1.65159867e-03 6.11368204e-03 1.73118608e-03
 3.07638147e-03 4.04817497e-03 1.18909654e-03 2.08450826e-03
 2.88354416e-03 2.81519289e-03 1.34226504e-03 1.52878928e-03
 1.66763674e-02 2.15892822e-03 2.05209158e-03 4.80542201e-01
 1.32524177e-03 2.55855903e-03 1.41326859e-03 5.17862188e-02
 6.29116144e-03 1.40550515e-03 1.17651374e-03 9.84095847e-03
 1.70331069e-02 1.21864343e-03 1.25152794e-03 6.25152455e-03
 3.02174715e-03 1.26057716e-03 5.21

  return self.resid / sigma / np.sqrt(1 - hii)


Eliminación de outliers para volver a estimar el modelo

In [212]:
#Se eliminan los outliers del último modelo estimado para volver a estimar el modelo.
x_out = x.drop(x.index[out_d])
y_out = y.drop(y.index[out_d])

In [213]:
model = sm.OLS(y_out, x_out).fit()

# resumen de resultados
print(model.summary())

insignificant_vars_7 = model.pvalues[model.pvalues > 0.05].index
print("Variables no significativas:", insignificant_vars_7)

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.936
Model:                            OLS   Adj. R-squared (uncentered):              0.935
Method:                 Least Squares   F-statistic:                              1199.
Date:                Sun, 23 Feb 2025   Prob (F-statistic):                        0.00
Time:                        00:42:21   Log-Likelihood:                         -24705.
No. Observations:                3397   AIC:                                  4.949e+04
Df Residuals:                    3356   BIC:                                  4.974e+04
Df Model:                          41                                                  
Covariance Type:            nonrobust                                                  
                              coef    std err          t      P>|t|      [0.025      0.975]
----------------------------

In [214]:
#Como el R^2 ajustado mejoró, se toma el modelo sin outliers como el mejor modelo.
variables = model.params.index
print("Variables significativas: " + str(len(variables)))
for i in variables_significativas:
    print(str(i)+": "+str(model.params[i]))

Variables significativas: 52
bathrooms: 297.1949141986848
square_feet: 0.6366846491494366
cityname_Austin: 259.112906257318
cityname_Bellevue: 766.6010358264723
cityname_Berlin: -1015.5949640223765
cityname_Beverly Hills: -1.1801682435349256e-11
cityname_Boston: 1039.6611989132266
cityname_Chicago: 611.2996674486535
cityname_Chula Vista: -591.2922469935601
cityname_Corona: -567.1981904955488
cityname_Crowley: -7.740035215724419e-12
cityname_Detroit: 1518.218462159002
cityname_Evanston: 582.9036282717268
cityname_Fond Du Lac: -500.5233838097615
cityname_Ingalls: -1032.5714023045
cityname_Jersey City: -6.85229173134684e-12
cityname_Los Angeles: 882.0782011042595
cityname_Maricopa: -1382.7161183949172
cityname_Miami: 1293.3487125535823
cityname_Miami Beach: 3884.95420601059
cityname_Moreno Valley: -610.4410621402038
cityname_Mountain View: -1.386662611080912e-12
cityname_New Bern: 9.64071743265929e-15
cityname_Oakland: 944.2232912134623
cityname_Odessa: 354.60534737941805
cityname_Palo Al

### Segundo modelo: Todas las variables con regresión polinómica

Grado 2

Iteración 1

In [215]:
from sklearn.preprocessing import PolynomialFeatures

#Ahora se estiman modelos de regresión polinomiales con el fin de encontrar un mejor modelo.
#En este caso, se prueba con una regresión polinómica de grado 2.
poly = PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)
x_poly = poly.fit_transform(x)

feature_names = poly.get_feature_names_out(x.columns)

x_poly = pd.DataFrame(x_poly,columns=feature_names)

#Se eliminan la multiplicación de variables distintas.
names = []
for name in feature_names:
    n = name.split("^")
    if n[0] in x.columns:
        if len(n) > 1 and len(n[1]) == 1:
            names.append(name)

x_poly = x_poly[names]

model = sm.OLS(y, x_poly).fit()

# Resumen del modelo
print(model.summary())

insignificant_vars_5 = model.pvalues[model.pvalues > 0.05].index
print("Variables no significativas:", insignificant_vars_5)

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.874
Model:                            OLS   Adj. R-squared (uncentered):              0.872
Method:                 Least Squares   F-statistic:                              475.6
Date:                Sun, 23 Feb 2025   Prob (F-statistic):                        0.00
Time:                        00:42:21   Log-Likelihood:                         -27387.
No. Observations:                3543   AIC:                                  5.488e+04
Df Residuals:                    3492   BIC:                                  5.519e+04
Df Model:                          51                                                  
Covariance Type:            nonrobust                                                  
                                coef    std err          t      P>|t|      [0.025      0.975]
--------------------------

Iteración 2

In [216]:
#Se estima un nuevo modelo que elimine las variables no significativas.
x_poly = x_poly.drop(columns = insignificant_vars_5)

model = sm.OLS(y, x_poly).fit()

# Resumen del modelo
print(model.summary())

insignificant_vars_6 = model.pvalues[model.pvalues > 0.05].index
print("Variables no significativas:", insignificant_vars_6)

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.873
Model:                            OLS   Adj. R-squared (uncentered):              0.872
Method:                 Least Squares   F-statistic:                              589.1
Date:                Sun, 23 Feb 2025   Prob (F-statistic):                        0.00
Time:                        00:42:21   Log-Likelihood:                         -27398.
No. Observations:                3543   AIC:                                  5.488e+04
Df Residuals:                    3502   BIC:                                  5.513e+04
Df Model:                          41                                                  
Covariance Type:            nonrobust                                                  
                                coef    std err          t      P>|t|      [0.025      0.975]
--------------------------

In [217]:
# disntacia de Cook
model_cooksd = model.get_influence().cooks_distance[0]

# get length of df to obtain n
n = x_poly.shape[0]

# umbral
critical_d = 4/n
print('Umbral con distancia de Cook:', critical_d)

# puntos que podrían ser ourliers con alta influencia
out_d = model_cooksd > critical_d

print(x_poly.index[out_d], "\n", model_cooksd[out_d])

Umbral con distancia de Cook: 0.0011289867344058708
Index([   0,    1,    5,   10,   29,   47,   63,   66,   85,   92,
       ...
       3532, 3533, 3534, 3535, 3536, 3537, 3538, 3539, 3541, 3542],
      dtype='int64', length=163) 
 [3.06074765e-03 2.41401456e-03 2.42521792e-03 3.62909463e-03
 1.14161322e-03 2.37096908e-03 1.15254540e-03 1.15934921e-03
 1.03004801e-01 8.06158625e-03 1.62921235e-03 1.79966930e-03
 1.23976864e-03 2.39339360e-03 4.54216861e-03 3.92347365e-03
 1.35040140e-02 3.43046485e-03 1.93543078e-03 1.97303036e-03
 1.18756593e-03 5.24583423e-03 1.34144397e-03 3.71263963e-03
 2.32294730e-03 1.57525771e-03 3.99463811e-03 1.41029494e-03
 1.15985969e-03 5.51206100e-03 1.49895826e-03 3.66055725e-03
 5.55392701e-03 2.13273609e-03 1.52150705e-03 1.23357721e-03
 2.57547098e-03 1.81908349e-03 2.34610685e-02 4.47450891e-03
 4.11107461e-03 1.62767210e-03 4.55414841e-01 1.77423931e-03
 6.59507127e-03 1.76159664e-03 4.87685296e-02 1.23762813e-03
 1.23628596e-03 1.67864268e-02 1.21

  return self.resid / sigma / np.sqrt(1 - hii)


In [218]:
x_poly = x_poly.drop(x_poly.index[out_d])
y_poly = y.drop(y.index[out_d])

model = sm.OLS(y_poly, x_poly).fit()

# resumen de resultados
print(model.summary())

insignificant_vars_7 = model.pvalues[model.pvalues > 0.05].index
print("Variables no significativas:", insignificant_vars_7)

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.894
Model:                            OLS   Adj. R-squared (uncentered):              0.893
Method:                 Least Squares   F-statistic:                              880.7
Date:                Sun, 23 Feb 2025   Prob (F-statistic):                        0.00
Time:                        00:42:21   Log-Likelihood:                         -25426.
No. Observations:                3380   AIC:                                  5.092e+04
Df Residuals:                    3348   BIC:                                  5.111e+04
Df Model:                          32                                                  
Covariance Type:            nonrobust                                                  
                                coef    std err          t      P>|t|      [0.025      0.975]
--------------------------

### Mejor modelo usando Statsmodels

Debido a que las preguntas de negocio a responder están relacionadas con la interpretación de cómo las características de un apartamento explican el precio de dicho apartamento, se usa como métrica de selección del R^2 ajustado. Por tal motivo, como modelo elegido mediante statsmodels se escoge el modelo de regresión lineal, sin variables no significativas y sin outliers.

In [219]:
model = sm.OLS(y_out, x_out).fit()

# resumen de resultados
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.936
Model:                            OLS   Adj. R-squared (uncentered):              0.935
Method:                 Least Squares   F-statistic:                              1199.
Date:                Sun, 23 Feb 2025   Prob (F-statistic):                        0.00
Time:                        00:42:21   Log-Likelihood:                         -24705.
No. Observations:                3397   AIC:                                  4.949e+04
Df Residuals:                    3356   BIC:                                  4.974e+04
Df Model:                          41                                                  
Covariance Type:            nonrobust                                                  
                              coef    std err          t      P>|t|      [0.025      0.975]
----------------------------

In [220]:
#Se crea un DataFrame con las variables significativas y sus respectivos coeficientes.
modelo = pd.DataFrame({"Variables":model.params.index,
                       "Coeficientes":model.params.values})

In [221]:
#Se exporta el modelo en un archivo csv.
modelo.to_csv("modeloFinal.csv",index = False)