### Lasso
The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer non-zero coefficients, effectively reducing the number of features upon which the given solution is dependent.

#### Loss Function
Mathematically, it consists of a linear model with an added regularization term. The objective function to minimize is:
$$\min_{w} { \frac{1}{2n_{\text{samples}}} ||X w - y||_2 ^ 2 + \alpha ||w||_1}$$
The lasso estimate thus solves the minimization of the least-squares penalty with $\alpha ||w||_1$ added, where $\alpha$ is a constant and $||w||_1$ is the $\ell_1$-norm of the coefficient vector.

### 1. Data Loading

In [1]:
import pandas as pd

X_train = pd.read_csv('data/house_prices/X_train.csv')
X_test = pd.read_csv('data/house_prices/X_test.csv')
y_train = pd.read_csv('data/house_prices/y_train.csv', header = None)
y_test = pd.read_csv('data/house_prices/y_test.csv', header = None)
X_train.head(5)

Unnamed: 0,1stFlrSF,2ndFlrSF,3SsnPorch,BedroomAbvGr,BldgType,BsmtCond,BsmtExposure,BsmtFinSF1,BsmtFinSF2,BsmtFinType1,...,SaleType,ScreenPorch,Street,TotRmsAbvGrd,TotalBsmtSF,Utilities,WoodDeckSF,YearBuilt,YearRemodAdd,YrSold
0,1054,0,0,3,0,4,1,763,0,2,...,8,0,1,6,936,0,120,1963,1963,2010
1,1120,0,0,3,0,4,4,206,0,0,...,6,0,1,6,1120,0,0,2007,2007,2007
2,1616,0,0,3,0,4,0,0,0,6,...,8,0,1,7,1616,0,208,2005,2005,2006
3,1073,0,0,3,0,4,4,836,0,0,...,8,0,1,6,1073,0,0,1965,1965,2007
4,1389,0,0,2,0,4,0,1071,123,0,...,8,0,1,6,1389,0,240,1974,1975,2006


### 2. Lasso Regression

In [3]:
from sklearn import linear_model

regressor = linear_model.Lasso(alpha=0.1)
regressor.fit(X_train, y_train)

  positive)


In [4]:
import math
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

y_pred = regressor.predict(X_test)
r2_variance_weighted = r2_score(y_test, y_pred, multioutput='variance_weighted')
r2_uniform_average = r2_score(y_test, y_pred, multioutput='uniform_average')
print('R squared:{:.2f}'.format(r2_uniform_average))
mse = mean_squared_error(y_test, y_pred)
rmse = math.sqrt(mse)
print('root mean square error: {:.2f}'.format(rmse))

R squared:0.81
root mean square error: 37590.28
