<a href="https://colab.research.google.com/github/clustering-jun/GNU-MachineLearning/blob/main/L9_1_KNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **KNN 회귀 모델 훈련하기**

In [1]:
import pandas as pd
from sklearn.neighbors import KNeighborsRegressor

boston = pd.read_csv('boston.csv')

features = boston.drop(columns = 'PRICE').iloc[:, 0:2]
target = boston['PRICE']

model = KNeighborsRegressor(n_neighbors=3, metric = 'euclidean')
model.fit(features, target)

In [2]:
obs = [[0.02, 16]]
print(model.predict(obs))

[26.33333333]




In [3]:
distance, indices = model.kneighbors(obs)

print(distance)
print(indices)

[[1.50000008 2.00004679 3.50020509]]
[[64  0 67]]




In [4]:
print(target[[64, 0, 67]])
print(target[[64, 0, 67]].mean())

64    33.0
0     24.0
67    22.0
Name: PRICE, dtype: float64
26.333333333333332


### **KNN 회귀 모델 훈련하기**

In [5]:
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

iris = datasets.load_iris()

features = iris.data
target = iris.target

model = KNeighborsClassifier(n_neighbors=6, metric = 'minkowski', p = 1.5)
model.fit(features, target)

In [8]:
obs = [[5,4,3,2]]
print(model.predict(obs))
print(model.predict_proba(obs))

[1]
[[0.16666667 0.83333333 0.        ]]


In [9]:
# n_neighbors= 6 으로 설정하였기 때문에 근접한 6개 데이터와의 거리 및 인덱스에 대한 정보를 구할 수 있음.

distance, indices = model.kneighbors(obs)

print('distance:', distance)
print('index:', indices)
print('class:', target[[indices]])

distance: [[1.92405745 1.94977477 2.0290355  2.17339296 2.18048122 2.21592082]]
index: [[64 98 59 88 57 44]]
class: [[[1 1 1 1 1 0]]]


In [10]:
# sklearn의 GridSearchCV를 사용하여 최적의 k값을 찾을 수 있음.

from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

iris = datasets.load_iris()
features = iris.data
target = iris.target

scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

model = KNeighborsClassifier()
param_grid = {'n_neighbors': list(range(1, 15))}

grid_search = GridSearchCV(model, param_grid, cv = 5)
grid_search.fit(features_standardized, target)

grid_search.best_params_

{'n_neighbors': 6}

### **아래 코드를 참고하여 Survived에 대해 예측하는 최적의 k값을 가지는 KNN 분류 모델을 학습시킨 후, 다음 입력값에 대한 예측값과 예측확률, 그리고 k개의 거리 및 인덱스 정보를 출력하시오.**

In [15]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

titanic = pd.read_csv("titanic.csv")
titanic = titanic.dropna()

features = titanic[["Pclass", "Sex", "Age", "Fare"]]
target = titanic["Survived"]

sex_mapping = {'male': 0, 'female': 1}
features["Sex"] = features["Sex"].replace(sex_mapping)

# 2. 표준화
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# 3. KNN 모델 및 GridSearchCV로 최적의 K 찾기
model = KNeighborsClassifier()
param_grid = {'n_neighbors': list(range(1, 21))}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(features_standardized, target)

# 최적의 K 값
print('===============\n')
best_k = grid_search.best_params_['n_neighbors']
print("최적의 K 값:", best_k)

# 4. 최적의 모델로 예측 수행
best_model = grid_search.best_estimator_

observation = [[1, 0, 28, 62]]
observation_std = scaler.transform(observation)

# 예측 결과
predicted_class = best_model.predict(observation_std)
predicted_proba = best_model.predict_proba(observation_std)

print("예측 생존 여부 (0=사망, 1=생존):", predicted_class[0])
print("예측 확률 [사망, 생존]:", predicted_proba[0])

# 5. K개의 거리 및 인덱스 정보 출력
distances, indices = best_model.kneighbors(observation_std)
print("K개의 거리:", distances)
print("K개의 인덱스:", indices)
print('\n===============')

  features["Sex"] = features["Sex"].replace(sex_mapping)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  features["Sex"] = features["Sex"].replace(sex_mapping)



최적의 K 값: 18
예측 생존 여부 (0=사망, 1=생존): 1
예측 확률 [사망, 생존]: [0.44444444 0.55555556]
K개의 거리: [[0.08808322 0.13331245 0.2031986  0.20379422 0.21070501 0.2328671
  0.24455437 0.32098708 0.34171523 0.34804785 0.42514396 0.42760051
  0.43940086 0.46559609 0.4677459  0.48672293 0.49156344 0.55850272]]
K개의 인덱스: [[ 67 151 140 138  76 136 177  16  24   6 165  95 182  83  89 130  17 154]]



