## 규제
* 학습이 과대적합도는 것을 방지하고자 하는 알고리즘
* 라쏘(Lasso)
    - L1규제를 추가한 모형
    - 영향력이 크지 않은 회귀계수 값을 0으로 만드는 특성이 있다.
        * 회귀계수 : 독립변수의 값이 변화함에 따라 종속변수에 미치는 영향력 크기
    - alpha를 이용하여 가중치 제어. alpha값에 따라 과적합될 우려가 있다.
    - 영향력이 작은 회귀계수를 0으로 만듦으로써 모델에서 가장 중요한 특성이 무엇인지 알 수 있다
* 릿지(Ridge)
    - L2규제를 추가한 모형
    - 계수값을 0이 아닌 작게 만드는 특성이 있다.
    - alpha를 이용하여 가중치 제어. alpha값에 따라 과적합될 우려가 있다.
* 엘라스틱넷(ElasticNet)
    - L1, L2를 함께 결합한 모형
    - 피처가 많은 데이터세트에 적용
    - L1 규제로 feature의 수를 줄이고 L2규제로 계수값의 크기를 조정
    - 파라미터
        * alpha : L1규제의 alpha(a) + L2규제의 alpha(b). L1과 L2의 alpha를 합처논 것이다.
        * l1_ratio = 0 : 0에 가까워 질수록 L2규제와 동일
        * l1_ratio = 1 : 1에 가까워 질수록 L1규제와 동일
        * 0 < l1_ratio < 1 : L1과 L2규제를 적절히 적용
* 계수 : 계산해서 얻은 값

In [1]:
import pandas as pd
import numpy as np
import warnings

warnings.filterwarnings("ignore")

In [2]:
df = pd.read_csv("data/boston.csv")
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,PRICE
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


In [3]:
df.columns

Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT', 'PRICE'],
      dtype='object')

In [4]:
f = [ 'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT' ]
label = "PRICE"

X, y = df[f], df[label]

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3 )

X_train.shape, X_test.shape

((354, 13), (152, 13))

In [6]:
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import Lasso

# alpha : 규제 강도
# 수치가 높을수록 강한 강도
# 강도가 높다 : 영향력이 높은 속성들도 0으로 만듬
lasso = Lasso( alpha = 0.07 )
lasso.fit( X_train, y_train )

train_pred = lasso.predict( X_train )
test_pred = lasso.predict( X_test )

print("score : ", lasso.score( X_train, y_train ), " / MSE : ", mean_squared_error( train_pred, y_train ))
print("score : ", lasso.score( X_test, y_test ), " / MSE : ", mean_squared_error( test_pred, y_test ))

score :  0.7451288578985521  / MSE :  21.467955344524885
score :  0.6673459737260232  / MSE :  28.05614597014181


In [7]:
alphas = [ 0.07, 0.1, 0.5, 1.3, 2 ]

for a in alphas :
    lasso = Lasso( alpha = a )
    lasso.fit( X_train, y_train )

    train_pred = lasso.predict( X_train )
    test_pred = lasso.predict( X_test )

    print("alpha : ", a)
    print("score : ", lasso.score( X_train, y_train ), " / MSE : ", mean_squared_error( train_pred, y_train ))
    print("score : ", lasso.score( X_test, y_test ), " / MSE : ", mean_squared_error( test_pred, y_test ))
    print("-" * 50)

alpha :  0.07
score :  0.7451288578985521  / MSE :  21.467955344524885
score :  0.6673459737260232  / MSE :  28.05614597014181
--------------------------------------------------
alpha :  0.1
score :  0.7443229203201789  / MSE :  21.53583996967442
score :  0.6658612573054097  / MSE :  28.181367423456464
--------------------------------------------------
alpha :  0.5
score :  0.7327382330637867  / MSE :  22.5116254063855
score :  0.6650598106583867  / MSE :  28.248961687587336
--------------------------------------------------
alpha :  1.3
score :  0.6674553953519273  / MSE :  28.010439564813137
score :  0.6053865870976394  / MSE :  33.281820268864045
--------------------------------------------------
alpha :  2
score :  0.63695992479898  / MSE :  30.579092079346783
score :  0.5640557119965337  / MSE :  36.76767936968033
--------------------------------------------------


In [8]:
from sklearn.model_selection import GridSearchCV

params = { "alpha" : [ 0.07, 0.1, 0.5, 1.3, 2 ] }

lasso = Lasso()

grid_cv = GridSearchCV( lasso, param_grid=params, cv=5 )
grid_cv.fit( X_train, y_train )

print("최적의 하이퍼 파라미터 : ", grid_cv.best_params_ )
print("train : ", grid_cv.score( X_train, y_train ))
print("test : ", grid_cv.score( X_test, y_test ))

최적의 하이퍼 파라미터 :  {'alpha': 0.1}
train :  0.7443229203201789
test :  0.6658612573054097


In [9]:
lasso = Lasso( alpha=0.07 )
lasso.fit( X_train, y_train )

print( X_train.columns )
lasso.coef_

Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT'],
      dtype='object')


array([-0.10854616,  0.04443484, -0.04883964,  0.30977395, -0.        ,
        4.30466049, -0.02226266, -1.15775863,  0.19297852, -0.01184985,
       -0.82137671,  0.00909857, -0.46741158])

In [10]:
coeff_df = pd.DataFrame( index=X_train.columns )


for idx, alpha in enumerate( alphas ) :
    print( idx, " : ", alpha )
    
    lasso = Lasso( alpha = alpha )
    lasso.fit( X_train, y_train )
    
    col_name = "alpha : " + str(alpha)
    coeff_df[col_name] = lasso.coef_
    
coeff_df

0  :  0.07
1  :  0.1
2  :  0.5
3  :  1.3
4  :  2


Unnamed: 0,alpha : 0.07,alpha : 0.1,alpha : 0.5,alpha : 1.3,alpha : 2
CRIM,-0.108546,-0.107722,-0.087927,-0.049589,-0.020913
ZN,0.044435,0.044753,0.047953,0.052099,0.043792
INDUS,-0.04884,-0.044474,-0.023739,-0.0,-0.0
CHAS,0.309774,0.0,0.0,0.0,0.0
NOX,-0.0,-0.0,-0.0,-0.0,-0.0
RM,4.30466,4.206999,2.855468,0.21285,0.0
AGE,-0.022263,-0.020515,-1.8e-05,0.031538,0.044406
DIS,-1.157759,-1.138831,-0.857135,-0.379369,-0.0
RAD,0.192979,0.195539,0.197633,0.192423,0.123381
TAX,-0.01185,-0.01205,-0.012731,-0.01322,-0.010882


In [11]:
from sklearn.linear_model import Ridge

alphas = [ 0.01, 0.1, 1, 10, 100 ]

In [12]:
for alpha in alphas :
    ridge = Ridge( alpha=alpha )
    ridge.fit( X_train, y_train )
    
    train_pred = ridge.predict( X_train )
    test_pred = ridge.predict( X_test )
    
    train_score = ridge.score( X_train, y_train )
    test_score = ridge.score( X_test, y_test )
    
    train_mse = mean_squared_error( train_pred, y_train )
    test_mse = mean_squared_error( test_pred, y_test )
    
    print("alpha : ", alpha)
    print("train : ", train_score, "mse : ", train_mse)
    print("test : ", test_score, "mse : ", test_mse)
    print("-" * 50)

alpha :  0.01
train :  0.757708012294651 mse :  20.40840532006646
test :  0.6868304682625404 mse :  26.412817527693562
--------------------------------------------------
alpha :  0.1
train :  0.757633235500631 mse :  20.414703816089165
test :  0.6864432537978736 mse :  26.445475318324032
--------------------------------------------------
alpha :  1
train :  0.7551853456184459 mse :  20.620891108403914
test :  0.6826516150112775 mse :  26.765263334883855
--------------------------------------------------
alpha :  10
train :  0.7477653477723669 mse :  21.24588215722454
test :  0.6764690085520205 mse :  27.286706322482
--------------------------------------------------
alpha :  100
train :  0.72838592918904 mse :  22.878222677690836
test :  0.6703279108024836 mse :  27.804648452351234
--------------------------------------------------


In [13]:
coeff_df = pd.DataFrame( index=X_train.columns )

for alpha in alphas :
    ridge = Ridge( alpha=alpha )
    ridge.fit( X_train, y_train )
    
    col_name = "alpha : " + str(alpha)
    coeff_df[col_name] = ridge.coef_
    
coeff_df

Unnamed: 0,alpha : 0.01,alpha : 0.1,alpha : 1,alpha : 10,alpha : 100
CRIM,-0.119311,-0.118601,-0.114818,-0.110553,-0.104997
ZN,0.041863,0.042013,0.042937,0.045973,0.055189
INDUS,0.010525,0.005158,-0.023572,-0.056876,-0.073255
CHAS,1.593127,1.576852,1.464834,1.017594,0.293741
NOX,-17.232359,-15.971296,-9.23152,-1.797981,-0.205986
RM,4.198071,4.217929,4.303358,4.041714,2.162366
AGE,-0.008663,-0.009931,-0.016494,-0.020071,0.000127
DIS,-1.453648,-1.434945,-1.334936,-1.220928,-1.077624
RAD,0.238473,0.234609,0.214608,0.202853,0.242906
TAX,-0.010313,-0.010382,-0.010777,-0.011603,-0.013349


In [14]:
from sklearn.linear_model import ElasticNet

# l1_ratios : 0에 가까울 수록 L2(릿지) 규제에 가깝다.
# l1_ratios : 1에 가까울 수록 L1(라쏘) 규제에 가깝다.

ratios = [ 0.2, 0.5, 0.8 ]
alphas = [ 0.1, 0.7, 1.5 ] # 규제 강도

In [15]:
el = ElasticNet( alpha=0.7, l1_ratio=0.2 )
el.fit( X_train, y_train )

print("train : ", el.score( X_train, y_train ))

train :  0.7102179391643195


In [16]:
params = {
    "alpha" : alphas,
    "l1_ratio" : ratios
}

el = ElasticNet()
grid_cv = GridSearchCV(el, param_grid=params, cv=5)
grid_cv.fit( X_train, y_train )

print("최적의 하이퍼 파라미터 : ", grid_cv.best_params_)
print("train : ", grid_cv.score( X_train, y_train ))
print("test : ", grid_cv.score( X_test, y_test ))

최적의 하이퍼 파라미터 :  {'alpha': 0.1, 'l1_ratio': 0.8}
train :  0.7439110725995968
test :  0.6688862656948521
