## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則化來觀察訓練情形。

In [10]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.linear_model import Lasso, Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

### Import the 'Boston' datatset (house price in Boston)

#### Split the raw data into features(df) and target

In [2]:
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
#features in df
df = pd.DataFrame(np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]]))
target = pd.DataFrame(raw_df.values[1::2, 2])

Variables in order:  
 CRIM : per capita crime rate by town  
 ZN : proportion of residential land zoned for lots over 25,000 sq.ft.  
 INDUS    :proportion of non-retail business acres per town  
 CHAS     :Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)  
 NOX      :nitric oxides concentration (parts per 10 million)  
 RM       :average number of rooms per dwelling  
 AGE      :proportion of owner-occupied units built prior to 1940  
 DIS      :weighted distances to five Boston employment centres  
 RAD      :index of accessibility to radial highways  
 TAX      :full-value property-tax rate per $10,000  
 PTRATIO  :pupil-teacher ratio by town  
 B        :1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town  
 LSTAT    :% lower status of the population  
 MEDV     :Median value of owner-occupied homes in $1000's  


 MEDV is target

#### Set the column name

In [3]:
df.columns = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
        'TAX', 'PTRATIO', 'B', 'LSTAT']

target.columns = ['MEDV']

### Build a model to predict the house price(MEDV)

1. for simplification, I use 'CRIM' only
2. By definition, 'CRIM' is numerical, so I am going to build a Linear Regression model

#### Regular Linear Regression

In [11]:
crim = df[['CRIM']]

# separate the data into train and test
x_train, x_test, y_train, y_test = train_test_split(crim, target, test_size = 0.2, random_state=6)

# new a Linear Regression model
regr = linear_model.LinearRegression()

# train the model with training data
regr.fit(x_train, y_train)

# test the predicition with testing data
y_pred = regr.predict(x_test)

print('Coefficients: ', regr.coef_)

print('Mean squared error: %.2f' % mean_squared_error(y_test, y_pred))

Coefficients:  [[-0.45237872]]
Mean squared error: 78.87


#### LASSO

In [21]:
crim = df[['CRIM']]

# separate the data into train and test
x_train, x_test, y_train, y_test = train_test_split(crim, target, test_size = 0.2, random_state=6)

# new a Linear Regression model
regr = Lasso(alpha=1)

# train the model with training data
regr.fit(x_train, y_train)

# test the predicition with testing data
y_pred = regr.predict(x_test)

print('Coefficients: ', regr.coef_)

print('Mean squared error: %.2f' % mean_squared_error(y_test, y_pred))

Coefficients:  [-0.43745057]
Mean squared error: 78.49


#### Ridge

In [20]:
crim = df[['CRIM']]

# separate the data into train and test
x_train, x_test, y_train, y_test = train_test_split(crim, target, test_size = 0.2, random_state=6)

# new a Linear Regression model
regr = Ridge(alpha=100)

# train the model with training data
regr.fit(x_train, y_train)

# test the predicition with testing data
y_pred = regr.predict(x_test)

print('Coefficients: ', regr.coef_)

print('Mean squared error: %.2f' % mean_squared_error(y_test, y_pred))

Coefficients:  [[-0.4507133]]
Mean squared error: 78.83


As we can see, the MSE of LASSO and Ridge is lower than regular Linear Regression model