<a id = "top"></a>
# Modeling

<a id="home"></a>

#### This Notebook
- [Linear Regression Modeling](#linear)
- [Scaling the Data](#scale)
- [Ridge Regression Modeling](#ridge)
- [Lasso Regression Modeling](#lasso)
- [ElasticNet Regression Modeling](#enet)
- [Models using all numeric data](#numeric)

#### Kaggle Submissions
- [MLR Model](#sub1)
- [MLR Model with Power Transformation](#sub2)
- [Ridge Model](#sub3)
- [Lasso Model](#sub4)

#### Other Notebooks
- [Cleaning and EDA](cleaning_and_EDA.ipynb)

### Importing

In [205]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, RidgeCV, LassoCV, ElasticNetCV
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.metrics import r2_score
from sklearn.preprocessing import PowerTransformer, StandardScaler

%matplotlib inline

In [308]:
ames = pd.read_csv("../datasets/ames.csv")

In [207]:
ames.head()

Unnamed: 0,id,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,...,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
0,109,533352170,60,RL,68.0,13517,Pave,,IR1,Lvl,...,0,0,,,,0,3,2010,WD,130500
1,544,531379050,60,RL,43.0,11492,Pave,,IR1,Lvl,...,0,0,,,,0,4,2009,WD,220000
2,153,535304180,20,RL,68.0,7922,Pave,,Reg,Lvl,...,0,0,,,,0,1,2010,WD,109000
3,318,916386060,60,RL,73.0,9802,Pave,,Reg,Lvl,...,0,0,,,,0,4,2010,WD,174000
4,255,906425045,50,RL,82.0,14235,Pave,,IR1,Lvl,...,0,0,,,,0,3,2010,WD,138500


<a id = "linear"></a>

## Linear Regression Modeling
---
- [Back to top](#top)

### First Model
---
Picking a few strongly correlated features

In [208]:
features = ["gr_liv_area", "overall_qual", "garage_area", "garage_cars"]

In [209]:
# Confirming that our features have no null values

ames[features].isna().sum()

gr_liv_area     0
overall_qual    0
garage_area     0
garage_cars     0
dtype: int64

In [210]:
lr = LinearRegression()

X = ames[features]
y = ames["saleprice"]

mlr_model = lr.fit(X, y)

In [211]:
mlr_model.score(X, y)

0.7578979067748626

In [212]:
mlr_model.coef_

array([   48.12075931, 27780.06743078,    62.80868165,  5558.42375848])

In [213]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    random_state = 35, 
                                                    test_size = 0.25)

In [214]:
cross_val_score(mlr_model, X_train, y_train, cv = 5)

array([0.76568822, 0.67259394, 0.73937328, 0.79102134, 0.73358479])

### Second Model
---
Picking all strongly correlated features

In [215]:
ames_corr = ames.corr()["saleprice"]

In [216]:
# Isolating only the features that have a strong positive correlation
# with sale price. None of the negatively correlated columns were
# strong enough to justify including in the model.

features = ames_corr[(ames_corr >= 0.5) & (ames_corr != 1.0)].index

In [217]:
# Confirming that our features have no null values

ames[features].isna().sum()

overall_qual      0
year_built        0
year_remod/add    0
mas_vnr_area      0
total_bsmt_sf     0
1st_flr_sf        0
gr_liv_area       0
full_bath         0
totrms_abvgrd     0
garage_yr_blt     0
garage_cars       0
garage_area       0
dtype: int64

In [218]:
# At the time of submission, the garage year built column still
# had some null values, and in the interest of time, they were
# dropped. Later models include this column.

features = features.drop("garage_yr_blt")

In [219]:
lr = LinearRegression()

X = ames[features]
y = ames["saleprice"]

mlr_model = lr.fit(X, y)

In [220]:
mlr_model.score(X, y)

0.7976690016050638

In [221]:
mlr_model.coef_

array([ 1.91108864e+04,  2.22091111e+02,  3.77693903e+02,  4.00992090e+01,
        1.57307033e+01,  1.73362680e+01,  4.52223092e+01, -6.09547052e+03,
       -1.34534485e+02,  5.86774920e+03,  2.99241842e+01])

In [222]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    random_state = 35, 
                                                    test_size = 0.25)

In [223]:
cross_val_score(mlr_model, X_train, y_train, cv = 5)

array([0.81750876, 0.66703368, 0.79138565, 0.83955468, 0.70419145])

In [224]:
cross_val_score(mlr_model, X_train, y_train, cv = 5).mean()

0.7639348451191761

<a id = "sub1"></a>

### First Submission, Using MLR Model 2

In [225]:
ames_test = pd.read_csv("../datasets/ames_test.csv")

In [226]:
submission = pd.DataFrame()

submission["Id"] = ames_test["id"]

In [227]:
# Confirming we have no null values in the test set

ames_test[features].isna().sum()

overall_qual      0
year_built        0
year_remod/add    0
mas_vnr_area      0
total_bsmt_sf     0
1st_flr_sf        0
gr_liv_area       0
full_bath         0
totrms_abvgrd     0
garage_cars       0
garage_area       0
dtype: int64

In [228]:
test_X = ames_test[features]

In [229]:
submission["SalePrice"] = mlr_model.predict(test_X)

In [230]:
submission.head()

Unnamed: 0,Id,SalePrice
0,2658,154628.719992
1,2718,205538.027838
2,2414,192405.566787
3,1989,130062.964291
4,625,185955.2832


In [231]:
submission.to_csv("../datasets/submission_1.csv", index = False)

> MSE, as calculated by Kaggle: 39382.82867

### Second Model, using Power Transformations
---
Same features, but with power transformation and the "garage year built" column. The process here is a little lengthy because I was still trying to figure out how to power transform, I kept it here so I could better evaluate my workflow. Click <a href = "#ridge">here</a> to skip to ridge regression.

In [232]:
features = ames_corr[(ames_corr >= 0.5) & (ames_corr != 1.0)].index

In [233]:
ames[features].isna().sum()

overall_qual      0
year_built        0
year_remod/add    0
mas_vnr_area      0
total_bsmt_sf     0
1st_flr_sf        0
gr_liv_area       0
full_bath         0
totrms_abvgrd     0
garage_yr_blt     0
garage_cars       0
garage_area       0
dtype: int64

In [234]:
lr = LinearRegression()

X = ames[features]
y = ames["saleprice"]

mlr_model = lr.fit(X, y)

In [235]:
mlr_model.score(X, y)

0.7976692072790448

In [236]:
mlr_model.coef_

array([ 1.91101090e+04,  2.20870145e+02,  3.77004664e+02,  4.01032795e+01,
        1.57274568e+01,  1.73493678e+01,  4.52296673e+01, -6.10955796e+03,
       -1.36293853e+02,  2.62317370e+00,  5.88261445e+03,  2.98519448e+01])

In [237]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    random_state = 35, 
                                                    test_size = 0.25)

In [238]:
cross_val_score(mlr_model, X_train, y_train, cv = 5)

array([0.81740666, 0.66699623, 0.79119805, 0.83870926, 0.70368882])

In [239]:
cross_val_score(mlr_model, X_train, y_train, cv = 5).mean()

0.7635998024537661

In [240]:
pt_X = PowerTransformer()
pt_X.fit(X_train)
X_train_pt = pt_X.transform(X_train)
X_test_pt = pt_X.transform(X_test)

pt_y = PowerTransformer()
pt_y.fit(y_train.to_frame()) 
y_train_pt = pt_y.transform(y_train.to_frame())
y_test_pt = pt_y.transform(y_test.to_frame())

  x = um.multiply(x, x, out=x)


In [241]:
cross_val_score(mlr_model, X_train_pt, y_train_pt, cv = 5)

array([0.86848388, 0.79883225, 0.84492835, 0.83998425, 0.82546548])

In [242]:
cross_val_score(mlr_model, X_train_pt, y_train_pt, cv = 5).mean()

0.835538842812839

In [243]:
X = ames[features]
y = ames["saleprice"]

In [244]:
lr = LinearRegression()

pt_X = PowerTransformer()
pt_X.fit(X)
X = pt_X.transform(X)

pt_y = PowerTransformer()
pt_y.fit(y.to_frame()) 
y = pt_y.transform(y.to_frame())

  x = um.multiply(x, x, out=x)


In [245]:
mlr_model_pt = lr.fit(X, y)

In [246]:
mlr_model_pt.score(X, y)

0.8418379368234711

In [247]:
predictions_pt = mlr_model_pt.predict(X)

In [248]:
r2_score(predictions_pt, y)

0.8121229083909831

In [249]:
pred = mlr_model_pt.predict(X)

In [250]:
# The .reshape(-1,1) method changes a numpy array into a numpy matrix with 1 column
pred_reversed = pt_y.inverse_transform(pred.reshape(-1,1))

In [251]:
y = ames["saleprice"]

In [252]:
r2_score(y, pred_reversed)

0.809156719601258

<a id = "sub2"></a>

### Second Submission, Using MLR Model 2 with Power Transformation

In [253]:
submission = pd.DataFrame()

submission["Id"] = ames_test["id"]

In [254]:
# Confirming we have no null values in the test set

ames_test[features].isna().sum()

overall_qual      0
year_built        0
year_remod/add    0
mas_vnr_area      0
total_bsmt_sf     0
1st_flr_sf        0
gr_liv_area       0
full_bath         0
totrms_abvgrd     0
garage_yr_blt     0
garage_cars       0
garage_area       0
dtype: int64

In [255]:
test_X = ames_test[features]

In [256]:
test_X = pt_X.transform(test_X)

In [257]:
predictions_pt = mlr_model_pt.predict(test_X)

In [258]:
submission["SalePrice"] = pt_y.inverse_transform(predictions_pt.reshape(-1,1))

In [259]:
submission.head()

Unnamed: 0,Id,SalePrice
0,2658,148398.977267
1,2718,193606.913157
2,2414,184722.908271
3,1989,130200.840098
4,625,171344.698538


In [260]:
submission.to_csv("../datasets/submission_2.csv", index = False)

> MSE, as calculated by Kaggle: 38573.25737

<a id = "scale"></a>

## Standardizing the Predictors
---
In this section, the data is standardized using z-scores via scikit's StandardScaler. This will allow us to use some more advanced models, Ridge, Lasso, and ElasticNET.

- [Back to top](#top)

In [261]:
features = ames_corr[(ames_corr >= 0.5) & (ames_corr != 1.0)].index

In [262]:
ames[features].isna().sum()

overall_qual      0
year_built        0
year_remod/add    0
mas_vnr_area      0
total_bsmt_sf     0
1st_flr_sf        0
gr_liv_area       0
full_bath         0
totrms_abvgrd     0
garage_yr_blt     0
garage_cars       0
garage_area       0
dtype: int64

In [263]:
X = ames[features]
y = ames["saleprice"]

In [264]:
pt_X = PowerTransformer()
pt_X.fit(X)
X = pt_X.transform(X)

pt_y = PowerTransformer()
pt_y.fit(y.to_frame()) 
y = pt_y.transform(y.to_frame())

  x = um.multiply(x, x, out=x)


In [265]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    random_state = 35, 
                                                    test_size = 0.25)

In [266]:
ss = StandardScaler()

ss.fit(X_train)

X_train_scaled = ss.transform(X_train)
X_test_scaled = ss.transform(X_test)

<a id = "ridge"></a>

## Ridge Regression Modeling
---
First of three regularized linear regression models. General workflow is adapted from Lab 2 Part 2.

- [Back to top](#top)



In [267]:
# Find the optimal alpha value from RidgeCV

ridge = RidgeCV(alphas = np.logspace(0, 5, 100), scoring = "r2", cv = 5)

ridge = ridge.fit(X_train_scaled, y_train)

ridge.alpha_

25.95024211399736

In [268]:
ridge_model = Ridge(alpha = ridge.alpha_)

ridge_model.fit(X_train_scaled, y_train)

Ridge(alpha=25.95024211399736, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [269]:
ridge_model.score(X_train_scaled, y_train)

0.8433760085599237

In [270]:
cross_val_score(ridge_model, X_train_scaled, y_train, cv = 5)

array([0.86815931, 0.80030111, 0.84619808, 0.84072346, 0.82521814])

In [271]:
cross_val_score(ridge_model, X_train_scaled, y_train, cv = 5).mean()

0.836120019765018

In [272]:
ridge_model.coef_

array([[ 0.31900468,  0.16860252,  0.13337041,  0.02435726,  0.07036237,
         0.12284118,  0.31182023, -0.04829658, -0.01765195, -0.05113777,
         0.08338345,  0.05883941]])

<a id = "lasso"></a>

## Lasso Regression Modeling
---
Second regularized linear regression model.

- [Back to top](#top)

In [273]:
# Find the optimal alpha value from LassoCV. Let scikit choose the optimal alpha

lasso = LassoCV(cv = 5)

lasso = lasso.fit(X_train_scaled, y_train)

lasso.alpha_

  y = column_or_1d(y, warn=True)


0.0008249926692768886

In [274]:
lasso_model = Lasso(alpha = lasso.alpha_)

lasso_model.fit(X_train_scaled, y_train)

Lasso(alpha=0.0008249926692768886, copy_X=True, fit_intercept=True,
   max_iter=1000, normalize=False, positive=False, precompute=False,
   random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [275]:
lasso_model.score(X_train_scaled, y_train)

0.8435411471315052

In [276]:
cross_val_score(lasso_model, X_train_scaled, y_train, cv = 5)

array([0.86814423, 0.79857247, 0.8449271 , 0.84217972, 0.82549441])

In [277]:
cross_val_score(lasso_model, X_train_scaled, y_train, cv = 5).mean()

0.8358635864302746

In [278]:
lasso_model.coef_

array([ 0.3249635 ,  0.17608417,  0.13251936,  0.02162295,  0.06679212,
        0.12281473,  0.32694313, -0.05402997, -0.02590032, -0.05718651,
        0.08415574,  0.05463246])

<a id = "enet"></a>

## ElasticNet Regression Modeling
---
Final regularized linear regression model.

- [Back to top](#top)

In [279]:
enet_alphas = np.linspace(0.1, 1.0, 100)

enet_ratio = np.linspace(0.1, 1.0, 100)

enet = ElasticNetCV(alphas = enet_alphas, l1_ratio = enet_ratio, cv = 5)

enet.fit(X_train_scaled, y_train)

  y = column_or_1d(y, warn=True)


ElasticNetCV(alphas=array([0.1    , 0.10909, ..., 0.99091, 1.     ]),
       copy_X=True, cv=5, eps=0.001, fit_intercept=True,
       l1_ratio=array([0.1    , 0.10909, ..., 0.99091, 1.     ]),
       max_iter=1000, n_alphas=100, n_jobs=None, normalize=False,
       positive=False, precompute='auto', random_state=None,
       selection='cyclic', tol=0.0001, verbose=0)

In [280]:
enet.alpha_

0.1

In [281]:
enet.l1_ratio_

0.1

In [282]:
enet_model = ElasticNet(alpha = enet.alpha_, l1_ratio = enet.l1_ratio_)

enet_model.fit(X_train_scaled, y_train)

ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.1,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [283]:
enet_model.score(X_train_scaled, y_train)

0.8385816579614989

In [284]:
enet_model.coef_

array([ 0.29395339,  0.12044156,  0.11719177,  0.02836592,  0.07996031,
        0.12019552,  0.24897144, -0.        ,  0.00431635,  0.        ,
        0.07905026,  0.06825734])

In [285]:
cross_val_score(enet_model, X_train_scaled, y_train, cv = 5)

array([0.86200176, 0.80508197, 0.84634592, 0.83200587, 0.82210137])

In [286]:
cross_val_score(enet_model, X_train_scaled, y_train, cv = 5).mean()

0.8335073779326526

In [287]:
print(f"Linear: {cross_val_score(mlr_model, X_train_scaled, y_train, cv = 5).mean()}")

print(f"Ridge: {cross_val_score(ridge_model, X_train_scaled, y_train, cv = 5).mean()}")

print(f"Lasso: {cross_val_score(lasso_model, X_train_scaled, y_train, cv = 5).mean()}")

print(f"ElasticNet: {cross_val_score(enet_model, X_train_scaled, y_train, cv = 5).mean()}")

Linear: 0.8358392299505617
Ridge: 0.836120019765018
Lasso: 0.8358635864302746
ElasticNet: 0.8335073779326526


In [288]:
# print(f"Linear: {mlr_model.score(X_test_scaled, y_test)}")

print(f"Ridge: {ridge_model.score(X_test_scaled, y_test)}")

print(f"Lasso: {lasso_model.score(X_test_scaled, y_test)}")

print(f"ElasticNet: {enet_model.score(X_test_scaled, y_test)}")

Ridge: 0.8344110811276848
Lasso: 0.8342588967194973
ElasticNet: 0.8303837321910497


<a id = "sub3"></a>

### Third Submission, Using Ridge Regression and Power Transformation

In [289]:
submission = pd.DataFrame()

submission["Id"] = ames_test["id"]

In [290]:
features = ames_corr[(ames_corr >= 0.5) & (ames_corr != 1.0)].index

# Confirming we have no null values in the test set
ames_test[features].isna().sum()

overall_qual      0
year_built        0
year_remod/add    0
mas_vnr_area      0
total_bsmt_sf     0
1st_flr_sf        0
gr_liv_area       0
full_bath         0
totrms_abvgrd     0
garage_yr_blt     0
garage_cars       0
garage_area       0
dtype: int64

In [291]:
test_X = ames_test[features]

In [292]:
test_X = pt_X.transform(test_X)

In [293]:
predictions = ridge_model.predict(test_X)

In [294]:
submission["SalePrice"] = pt_y.inverse_transform(predictions.reshape(-1,1))

In [295]:
submission.head()

Unnamed: 0,Id,SalePrice
0,2658,144714.427811
1,2718,191277.642427
2,2414,185163.818691
3,1989,129592.998
4,625,170803.088547


In [296]:
submission.to_csv("../datasets/submission_3.csv", index = False)

<a id = "numeric"></a>

## Models using all numeric columns
---
Linear regression models using all numeric data available. In this section, regularization is relied on to zero out the unnecessary features. Workflow is similar to the three models above but cut down in the interest of space.

- [Back to top](#top)

In [314]:
features = ames_corr[ames_corr != 1.0].index

In [315]:
# These "features" are not actually descriptors of the houses

features = features.drop(["pid", "id"])

In [337]:
ames[features].isna().sum().sum()

0

In [316]:
X = ames[features]
y = ames["saleprice"]

In [317]:
pt_X = PowerTransformer()
pt_X.fit(X)
X = pt_X.transform(X)

pt_y = PowerTransformer()
pt_y.fit(y.to_frame()) 
y = pt_y.transform(y.to_frame())

  x = um.multiply(x, x, out=x)


In [318]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    random_state = 35, 
                                                    test_size = 0.25)

In [319]:
ss = StandardScaler()

ss.fit(X)

X_train_scaled = ss.transform(X_train)
X_test_scaled = ss.transform(X_test)

In [327]:
# Linear regression

mlr_model = LinearRegression()
mlr_model.fit(X_train_scaled, y_train)

cross_val_score(mlr_model, X_train_scaled, y_train, cv = 5).mean()

0.8785254608253389

In [320]:
# Ridge regression

ridge = RidgeCV(alphas = np.logspace(0, 5, 100), scoring = "r2", cv = 5)
ridge = ridge.fit(X_train_scaled, y_train)

ridge_model = Ridge(alpha = ridge.alpha_)
ridge_model.fit(X_train_scaled, y_train)

cross_val_score(ridge_model, X_train_scaled, y_train, cv = 5).mean()

0.8790255758118699

In [321]:
# Lasso regression

lasso = LassoCV(cv = 5)
lasso = lasso.fit(X_train_scaled, y_train)

lasso_model = Lasso(alpha = lasso.alpha_)
lasso_model.fit(X_train_scaled, y_train)

cross_val_score(lasso_model, X_train_scaled, y_train, cv = 5).mean()

  y = column_or_1d(y, warn=True)


0.8819869885063232

In [322]:
# ElasticNet regression

enet_alphas = np.linspace(0.1, 1.0, 100)
enet_ratio = np.linspace(0.1, 1.0, 100)

enet = ElasticNetCV(alphas = enet_alphas, l1_ratio = enet_ratio, cv = 5)
enet.fit(X_train_scaled, y_train)

print(enet.alpha_)
print(enet.l1_ratio_)

enet_model = ElasticNet(alpha = enet.alpha_, l1_ratio = enet.l1_ratio_)
enet_model.fit(X_train_scaled, y_train)

cross_val_score(enet_model, X_train_scaled, y_train, cv = 5).mean()

  y = column_or_1d(y, warn=True)


0.1
0.1


0.8786538469060865

In [329]:
print(f"Linear: {cross_val_score(mlr_model, X_train_scaled, y_train, cv = 5).mean()}")

print(f"Ridge: {cross_val_score(ridge_model, X_train_scaled, y_train, cv = 5).mean()}")

print(f"Lasso: {cross_val_score(lasso_model, X_train_scaled, y_train, cv = 5).mean()}")

print(f"ElasticNet: {cross_val_score(enet_model, X_train_scaled, y_train, cv = 5).mean()}")

Linear: 0.8785254608253389
Ridge: 0.8790255758118699
Lasso: 0.8819869885063232
ElasticNet: 0.8786538469060865


In [328]:
print(f"Linear: {cross_val_score(mlr_model, X_test_scaled, y_test, cv = 5).mean()}")

print(f"Ridge: {cross_val_score(ridge_model, X_test_scaled, y_test, cv = 5).mean()}")

print(f"Lasso: {cross_val_score(lasso_model, X_test_scaled, y_test, cv = 5).mean()}")

print(f"ElasticNet: {cross_val_score(enet_model, X_test_scaled, y_test, cv = 5).mean()}")

Linear: 0.8773391978469288
Ridge: 0.881239068081738
Lasso: 0.8847701027043723
ElasticNet: 0.8846399868241515


In [331]:
print(f"Linear: {mlr_model.score(X_test_scaled, y_test)}")

print(f"Ridge: {ridge_model.score(X_test_scaled, y_test)}")

print(f"Lasso: {lasso_model.score(X_test_scaled, y_test)}")

print(f"ElasticNet: {enet_model.score(X_test_scaled, y_test)}")

Linear: 0.8766173297852686
Ridge: 0.878192198191553
Lasso: 0.8784704239628583
ElasticNet: 0.8777677040566545


In [344]:
print(lasso_model.coef_)

[-0.01846767  0.          0.10175976  0.28763724  0.13139458  0.23380047
  0.04480441  0.          0.08835849 -0.          0.          0.02789564
  0.07264041 -0.         -0.00301252  0.28917692  0.0337025  -0.
  0.          0.         -0.         -0.02745045  0.          0.04338534
  0.01695893  0.09070914  0.          0.0133897   0.00266904  0.
  0.00814432  0.03109725 -0.02591654 -0.01419385  0.         -0.006331  ]


<a id = "sub4"></a>

### Fourth submission, using many features, lasso regression, and power transformation

In [345]:
submission = pd.DataFrame()

submission["Id"] = ames_test["id"]

In [346]:
features = ames_corr[ames_corr != 1.0].index

# These "features" are not actually descriptors of the houses
features = features.drop(["pid", "id"])

# Confirming we have no null values in the test set
ames_test[features].isna().sum().sum()

In [349]:
test_X = ames_test[features]

In [350]:
test_X = pt_X.transform(test_X)

In [351]:
predictions = lasso_model.predict(test_X)

In [352]:
submission["SalePrice"] = pt_y.inverse_transform(predictions.reshape(-1,1))

In [353]:
submission.head()

Unnamed: 0,Id,SalePrice
0,2658,138961.822589
1,2718,152116.186948
2,2414,223090.280697
3,1989,108168.323588
4,625,176598.173128


In [354]:
submission.to_csv("../datasets/submission_4.csv", index = False)

>

### Lasso Model Coefficients
---

In [358]:
lm_coefs = lasso_model.coef_

In [361]:
coefs = pd.DataFrame()

coefs["coefs"] = lm_coefs

coefs["coefs_abs"] = abs(lm_coefs)

coefs["feature"] = features

In [366]:
coefs.head()

Unnamed: 0,coefs,coefs_abs,feature
0,-0.018468,0.018468,ms_subclass
1,0.0,0.0,lot_frontage
2,0.10176,0.10176,lot_area
3,0.287637,0.287637,overall_qual
4,0.131395,0.131395,overall_cond


---
- [Back to top](#top)