### KNN
<br>
- [Sklearn Official document URL - KNN](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)

> class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)

```python
from sklearn.neighbors import KNeighborsClassifier
```

### Example code

#####  Iris dataset

In [10]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score

iris = datasets.load_iris()
# x : Sepal Length, Sepal Width, Petal Length and Petal Width
X = iris.data
# y : 0 - Setosa, 1 - Versicolour, and 2 - Virginica
y = iris.target

# build your own linear regression model using iris data set
# 1) Divide train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=1/3, 
                                                    random_state=1)
# 2) Use KNN with pipeline(PCA : reduce dimension)
knn = KNeighborsClassifier(n_neighbors=5)

knn1, pca = KNeighborsClassifier(n_neighbors=5), PCA(n_components=2)
knn_pca = Pipeline(steps = [('feature_extractor_pca',pca), ('neibor', knn1)])

# 3) 학습하고
knn.fit(X_train, y_train)
knn_pca.fit(X_train, y_train)

# 4) 성능 비교
print("Only kneighbor Score :",accuracy_score(y_test, np.abs(np.round(knn.predict(X_test)))),"\n\
PCA+kneighbor Score :",accuracy_score(y_test, np.abs(np.round(knn_pca.predict(X_test)))))

Only kneighbor Score : 0.98 
PCA+kneighbor Score : 1.0


### K-fold Cross Validation
<br>
### KNN

- [Sklearn Official document URL - cross_validation](https://scikit-learn.org/stable/modules/cross_validation.html)

```python
from sklearn.model_selection import cross_val_score
```

In [42]:
import pandas as pd
import numpy as np

train = pd.read_csv('titanic/train_preprocessing.csv') # X_train
test = pd.read_csv('titanic/test_preprocessing.csv') # y_train
target = pd.read_csv('titanic/target_preprocessing.csv') # X_tet

print(f"list of missing value \n train : {train.isna().sum().values}\n test : {test.isna().sum().values}\n target : {target.isna().sum().values}")

train.fillna(0, inplace=True)
test.fillna(0, inplace=True)
# target.fillna(0, inplace=True)

x = train.to_numpy()[:,1:]
y = target.to_numpy()[:,1]
X_test = test.to_numpy()[:,2:]

list of missing value 
 train : [0 0 0 0 0 0 2 0 0]
 test : [0 0 0 0 0 0 0 0 1 0]
 target : [0 0]


array([0, 0, 0, 0, 0, 0, 2, 0, 0])