<a href="https://colab.research.google.com/github/JakeOh/202505_BD50/blob/main/lab_da/ml06_regularization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 농어(Perch) 무게 예측

*   농어의 모든 특성들을 사용한 무게 예측
*   KNN Regressor vs Linear Regression 비교
*   다항 회귀
*   규제(Regularization)

# Imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.metrics import r2_score, mean_squared_error

# 데이터 준비

In [2]:
file_path = 'https://github.com/JakeOh/202505_BD50/raw/refs/heads/main/datasets/fish.csv'

In [3]:
fish = pd.read_csv(file_path)

In [4]:
fish.head()

Unnamed: 0,Species,Weight,Length,Diagonal,Height,Width
0,Bream,242.0,25.4,30.0,11.52,4.02
1,Bream,290.0,26.3,31.2,12.48,4.3056
2,Bream,340.0,26.5,31.1,12.3778,4.6961
3,Bream,363.0,29.0,33.5,12.73,4.4555
4,Bream,430.0,29.0,34.0,12.444,5.134


In [5]:
perch = fish[fish.Species == 'Perch']  # 농어 데이터셋

In [6]:
perch.head()

Unnamed: 0,Species,Weight,Length,Diagonal,Height,Width
72,Perch,5.9,8.4,8.8,2.112,1.408
73,Perch,32.0,13.7,14.7,3.528,1.9992
74,Perch,40.0,15.0,16.0,3.824,2.432
75,Perch,51.5,16.2,17.2,4.5924,2.6316
76,Perch,70.0,17.4,18.5,4.588,2.9415


Weight ~ Length + Diagonal + Height + Width

In [9]:
# perch.columns[2:]
X = perch[perch.columns[2:]].values  # 특성(features) 배열

In [10]:
X[:5, :]

array([[ 8.4   ,  8.8   ,  2.112 ,  1.408 ],
       [13.7   , 14.7   ,  3.528 ,  1.9992],
       [15.    , 16.    ,  3.824 ,  2.432 ],
       [16.2   , 17.2   ,  4.5924,  2.6316],
       [17.4   , 18.5   ,  4.588 ,  2.9415]])

In [11]:
y = perch['Weight'].values  # 타겟(target) 배열

In [12]:
y[:5]

array([ 5.9, 32. , 40. , 51.5, 70. ])

# 훈련 셋/테스트 셋 나누기

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [14]:
X_train.shape

(42, 4)

In [15]:
X_test.shape

(14, 4)

In [17]:
y_train.shape

(42,)

In [18]:
y_test.shape

(14,)

# 1차항만 고려한 회귀

## KNN

In [19]:
knn = KNeighborsRegressor()  # ML 모델 생성

In [20]:
knn.fit(X_train, y_train)  # ML 모델 훈련

In [21]:
train_pred = knn.predict(X_train)  # 훈련 셋 예측값 계산

In [22]:
train_pred[:5]

array([ 87.6, 123. ,  79.6,  70.6, 723. ])

In [23]:
y_train[:5]  # 실젯값(농어의 무게)

array([ 85., 135.,  78.,  70., 700.])

In [24]:
test_pred = knn.predict(X_test)  # 테스트 셋 예측값 계산

In [25]:
test_pred[:5]

array([ 60. ,  79.6, 248. , 122. , 130. ])

In [26]:
y_test[:5]

array([  5.9, 100. , 250. , 130. , 130. ])

In [27]:
print('훈련 셋 MSE:', mean_squared_error(y_train, train_pred))
print('훈련 셋 R2:', r2_score(y_train, train_pred))
print('테스트 셋 MSE:', mean_squared_error(y_test, test_pred))
print('테스트 셋 R2:', r2_score(y_test, test_pred))

훈련 셋 MSE: 2986.5723809523806
훈련 셋 R2: 0.97579760182756
테스트 셋 MSE: 837.3100000000001
테스트 셋 R2: 0.9916579819676246


KNN 모델은 과소적합.

## Linear Regression