<a id = "top"></a>
# Presentation Prep

<a id="home"></a>

This notebook is primarily concerned with preparing models, drawing plots, compiling research, etc. in the interest of presenting my findings. The objective of this notebook is to prepare models with a focus on interpretability rather than chasing the highest R2 score or MSE, such as those submitted to Kaggle. 

#### This Notebook
- [First Model, Linear Regression](#linear)
- [Second Model, Lasso Regression](#lasso)

#### Other Notebooks
- [Cleaning and EDA](cleaning_and_EDA.ipynb)
- [Models](modeling.ipynb)

#### Importing

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, RidgeCV, LassoCV, ElasticNetCV
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.metrics import r2_score
from sklearn.preprocessing import PowerTransformer, StandardScaler

%matplotlib inline

In [3]:
ames = pd.read_csv("../datasets/ames.csv")

coefs = pd.read_csv("../datasets/coefs.csv")

In [4]:
ames.head(3)

Unnamed: 0,id,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,...,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
0,109,533352170,60,RL,68.0,13517,Pave,,IR1,Lvl,...,0,0,,,,0,3,2010,WD,130500
1,544,531379050,60,RL,43.0,11492,Pave,,IR1,Lvl,...,0,0,,,,0,4,2009,WD,220000
2,153,535304180,20,RL,68.0,7922,Pave,,Reg,Lvl,...,0,0,,,,0,1,2010,WD,109000


In [5]:
coefs.head(3)

Unnamed: 0,feature,lasso_coefs_pt,lasso_coefs_pt_abs
0,ms_subclass,-0.018468,0.018468
1,lot_frontage,0.0,0.0
2,lot_area,0.10176,0.10176


### Preprocessing
---

In [28]:
ames_corr = ames.corr()["saleprice"]

In [29]:
features = ames_corr[ames_corr != 1.0].index

# These "features" are not actually descriptors of the houses
features = features.drop(["pid", "id"])

ames[features].isna().sum().sum()

0

In [30]:
X = ames[features]
y = ames["saleprice"]

In [31]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    random_state = 35, 
                                                    test_size = 0.25)

In [32]:
pt_X = PowerTransformer()
pt_X.fit(X_train)
X_train = pt_X.transform(X_train)
X_test = pt_X.transform(X_test)

pt_Y = PowerTransformer()
pt_Y.fit(y_train.to_frame())
y_train = pt_Y.transform(y_train.to_frame())
y_test = pt_Y.transform(y_test.to_frame())

  x = um.multiply(x, x, out=x)


In [33]:
# Scaling the data

ss = StandardScaler()

ss.fit(X_train)

X_train_scaled = ss.transform(X_train)
X_test_scaled = ss.transform(X_test)

In [37]:
lasso = LassoCV(cv = 5)
lasso = lasso.fit(X_train_scaled, y_train)

lasso_model = Lasso(alpha = lasso.alpha_)
lasso_model = lasso_model.fit(X_train_scaled, y_train)

lasso_model.score(X_train_scaled, y_train)

  y = column_or_1d(y, warn=True)


0.8920270573785805

In [38]:
lasso_model.score(X_train_scaled, y_train)

0.8920270573785805

In [39]:
lasso_model.score(X_test_scaled, y_test)

0.8772332990546344

<a id = "linear"></a>
### First Model, Linear Regression
---

This first model does not use the scaled data, instead focusing on interpretability of coefficients.

- [Back to top](#top)

In [None]:
lr = LinearRegression()

lr.fit(X_train, y_train)

lr.score(X_train, y_train)

In [None]:
cross_val_score(lr, X_train, y_train, cv = 5).mean()

In [None]:
lr.score(X_test, y_test)

In [None]:
cross_val_score(lr, X_test, y_test, cv = 5).mean()

In [None]:
lr.coef_

<a id = "lasso"></a>
### Second Model, Lasso Regression
---



- [Back to top](#top)

In [None]:
# Lasso regression

lasso = LassoCV(cv = 5)
lasso = lasso.fit(X_train_scaled, y_train)

lasso_model = Lasso(alpha = lasso.alpha_)
lasso_model.fit(X_train_scaled, y_train)

lasso_model.score(X_train_scaled, y_train)

In [None]:
lasso.alpha_

In [None]:
cross_val_score(lasso_model, X_train_scaled, y_train, cv = 5).mean()

In [None]:
lasso_model.score(X_test_scaled, y_test)

In [None]:
cross_val_score(lr, X_test_scaled, y_test, cv = 5).mean()

In [None]:
lasso_model.coef_

In [None]:
ridge = RidgeCV(alphas = np.logspace(0, 5, 100), cv = 5)
ridge = ridge.fit(X_train_scaled, y_train)

ridge.alpha_

In [None]:
ridge_model = Ridge(alpha = ridge.alpha_)

ridge_model.fit(X_train_scaled, y_train)

In [None]:
ridge_model.score(X_train_scaled, y_train)

In [None]:
ridge_model.coef_

In [None]:
# import math 

# # difference of lasso and ridge regression is that some of the coefficients 
# # can be zero i.e. some of the features are completely neglected

# from sklearn.datasets import load_breast_cancer

# cancer = load_breast_cancer()
# #print cancer.keys()
# cancer_df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
# #print cancer_df.head(3)
# X = ames[features]
# Y = ames["saleprice"]




# X_train,X_test,y_train,y_test=train_test_split(X,Y, test_size=0.3, random_state=31)

# lasso = Lasso()
# lasso.fit(X_train,y_train)
# train_score=lasso.score(X_train,y_train)
# test_score=lasso.score(X_test,y_test)
# coeff_used = np.sum(lasso.coef_!=0)

# lasso001 = Lasso(alpha=0.01, max_iter=10e5)
# lasso001.fit(X_train,y_train)
# train_score001=lasso001.score(X_train,y_train)
# test_score001=lasso001.score(X_test,y_test)
# coeff_used001 = np.sum(lasso001.coef_!=0)

# lasso00001 = Lasso(alpha=0.0001, max_iter=10e5)
# lasso00001.fit(X_train,y_train)
# train_score00001=lasso00001.score(X_train,y_train)
# test_score00001=lasso00001.score(X_test,y_test)
# coeff_used00001 = np.sum(lasso00001.coef_!=0)

# lr = LinearRegression()
# lr.fit(X_train,y_train)
# lr_train_score=lr.score(X_train,y_train)
# lr_test_score=lr.score(X_test,y_test)

# plt.figure(figsize = (12, 6))

# plt.subplot(1,2,1)
# plt.plot(lasso.coef_, 
#          alpha=0.7,
#          linestyle='none',
#          marker='*',
#          markersize=5,
#          color='red',
#          label=r'Lasso; $\alpha = 1$',
#          zorder=7) # alpha here is for transparency

# plt.plot(lasso001.coef_,
#          alpha=0.5,
#          linestyle='none',
#          marker='d',
#          markersize=6,
#          color='blue',
#          label=r'Lasso; $\alpha = 0.01$') # alpha here is for transparency

# plt.xlabel('Coefficient Index',fontsize=16)
# plt.ylabel('Coefficient Magnitude',fontsize=16)
# plt.legend(fontsize=13,loc=1)



# plt.subplot(1,2,2)

# plt.plot(lasso.coef_,
#          alpha=0.7,
#          linestyle='none',
#          marker='*',
#          markersize=5,
#          color='red',
#          label=r'Lasso; $\alpha = 1$',
#          zorder=7) # alpha here is for transparency

# plt.plot(lasso001.coef_,
#          alpha=0.5,
#          linestyle='none',
#          marker='d',
#          markersize=6,
#          color='blue',
#          label=r'Lasso; $\alpha = 0.01$') # alpha here is for transparency

# plt.plot(lasso00001.coef_,
#          alpha=0.8,
#          linestyle='none',
#          marker='v',
#          markersize=6,
#          color='black',
#          label=r'Lasso; $\alpha = 0.00001$') # alpha here is for transparency

# plt.plot(lr.coef_,
#          alpha=0.7,
#          linestyle='none',
#          marker='o',
#          markersize=5,
#          color='green',
#          label='Linear Regression',
#          zorder=2)

# plt.xlabel('Coefficient Index',fontsize=16)
# plt.ylabel('Coefficient Magnitude',fontsize=16)
# plt.legend(fontsize=13,loc=1)
# plt.tight_layout();

In [None]:
plt.figure(figsize = (8,8))
for a in (1000, 0):
    lasso_model = Lasso(alpha = a)
    lasso_model.fit(X_train_scaled, y_train)
    plt.plot(lasso_model.coef_,
         linestyle = "none",
         marker = "*",
         markersize = 10
        )
    
plt.plot(lr.coef_,
         linestyle = "none",
         marker = ".",
         markersize = 10
        )

In [None]:
plt.figure(figsize = (8,8))
plt.plot(lr.coef_,
         linestyle = "none",
         marker = ".",
         markersize = 10
        )

plt.plot(ridge_model.coef_,
         linestyle = "none",
         marker = "*",
         markersize = 10
        )

In [None]:
plt.figure(figsize = (8,8))
plt.plot(ridge_model.coef_,
         linestyle = "none",
         marker = ".",
         markersize = 10
        )

plt.plot(lasso_model.coef_,
         linestyle = "none",
         marker = "*",
         markersize = 10
        )

plt.plot(lr.coef_,
         linestyle = "none",
         marker = ".",
         markersize = 10
        )

In [None]:
plt.figure(figsize = (8,8))
plt.plot(lasso_model.coef_,
         linestyle = "none",
         marker = ".",
         markersize = 10
        )

In [None]:
plt.figure(figsize = (8,8))
plt.plot(coefs["lasso_coefs_pt"],
         linestyle = "none",
         marker = ".",
         markersize = 10
        )

---
- [Back to top](#top)