# Python | sklearn.neighbors
작성일 : 2020-09-28

---

## sklearn.neighbors

In [1]:
import pandas as pd
food = pd.read_csv("c:/data/food.csv")
food

Unnamed: 0,ingredient,sweetness,crunchiness,class
0,apple,10,9,Fruits
1,bacon,1,4,Proteins
2,banana,10,1,Fruits
3,carrot,7,10,Vegetables
4,celery,3,10,Vegetables
5,cheese,1,1,Proteins
6,cucumber,2,8,Vegetables
7,fish,3,1,Proteins
8,grape,8,5,Fruits
9,green bean,3,7,Vegetables


In [3]:
label = food['class']
label

0         Fruits
1       Proteins
2         Fruits
3     Vegetables
4     Vegetables
5       Proteins
6     Vegetables
7       Proteins
8         Fruits
9     Vegetables
10    Vegetables
11      Proteins
12        Fruits
13        Fruits
14      Proteins
Name: class, dtype: object

In [6]:
import numpy as np
x_train = np.array(food.iloc[:,1:3])
x_train

array([[10,  9],
       [ 1,  4],
       [10,  1],
       [ 7, 10],
       [ 3, 10],
       [ 1,  1],
       [ 2,  8],
       [ 3,  1],
       [ 8,  5],
       [ 3,  7],
       [ 1,  9],
       [ 3,  6],
       [ 7,  3],
       [10,  7],
       [ 2,  3]])

In [11]:
y = np.array([[6,4]]) # 토마토 단맛 6, 아삭한맛 4
y

array([[6, 4]])

In [1]:
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)

In [8]:
clf.fit(x_train, label) # 학습 method

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')

In [12]:
clf.predict(y)

array(['Fruits'], dtype=object)

In [13]:
clf.predict(y)[0]

'Fruits'

<br>

## 실습

[문제3] 키, 몸무게에 따른 성별을 분류해주세요.

```
[[158,64],
 [170,86],
 [183,84],
 [191,80],
 [155,49],
 [163,59],
 [180,67],
 [158,54],
 [170,67]]

 ['male','male','male','male','female','female','female','female','female']
 ```

 `[155,70]` 성별을 분류해주세요.


In [14]:
# sklearn.neighbors 사용

x_train = np.array([[158,64],
 [170,86],
 [183,84],
 [191,80],
 [155,49],
 [163,59],
 [180,67],
 [158,54],
 [170,67]])

label = ['male','male','male','male','female','female','female','female','female']

y = np.array([[155,70]])

In [15]:
clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(x_train, label)
clf.predict(y)[0]

'female'

In [16]:
# 알고리즘 원리 이용

dist = np.sqrt(np.sum(pow(x_train - y, 2), axis = 1))
indices = dist.argsort()[:3]
for i in indices:
  print(label[i])

male
female
female


In [17]:
label[indices[2]]

'female'

In [19]:
from collections import Counter

result = np.take(label, indices)
Counter(result).most_common(1)[0][0]

'female'

<br/>  


[문제4] 상품 구매 여부를 kNN 알고리즘을 이용해서 분류해주세요.

```
나이 : 20, 월수입 : 50
나이 : 20, 월수입 : 150
```

In [20]:
buy = pd.read_csv("c:/data/buy.csv")
buy

Unnamed: 0,나이,월수입,상품구매여부
0,26,160,구매
1,35,210,비구매
2,26,220,비구매
3,29,260,구매
4,22,110,비구매
5,32,210,비구매
6,37,310,구매
7,21,110,비구매
8,28,210,비구매
9,31,260,구매


In [21]:
x_train = np.array(buy.iloc[:,0:2])
x_train

array([[ 26, 160],
       [ 35, 210],
       [ 26, 220],
       [ 29, 260],
       [ 22, 110],
       [ 32, 210],
       [ 37, 310],
       [ 21, 110],
       [ 28, 210],
       [ 31, 260],
       [ 36, 390],
       [ 23, 110],
       [ 32, 340],
       [ 29, 170],
       [ 37, 340],
       [ 31, 240],
       [ 27, 230],
       [ 23, 210],
       [ 40, 440],
       [ 27, 140],
       [ 43, 400]])

In [22]:
label = list(buy.iloc[:,-1])
label

['구매',
 '비구매',
 '비구매',
 '구매',
 '비구매',
 '비구매',
 '구매',
 '비구매',
 '비구매',
 '구매',
 '구매',
 '비구매',
 '비구매',
 '구매',
 '구매',
 '비구매',
 '비구매',
 '비구매',
 '구매',
 '비구매',
 '비구매']

In [23]:
# 나이 : 20, 월수입 : 50
y = np.array([[20,50]])

clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(x_train, label)
clf.predict(y)[0]

'비구매'

In [24]:
# 나이 : 20, 월수입 : 150

y = np.array([[20,150]])

clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(x_train, label)
clf.predict(y)[0]

'구매'

<br>

작성일 : 2020-09-29

[문제5] bmi 데이터를 이용해서 키 : 178, 몸무게 71 분류해주세요.

In [25]:
import pandas as pd
import numpy as np
from sklearn.neighbors import KNeighborsClassifier

In [26]:
bmi = pd.read_csv("c:/data/bmi.csv")
bmi

Unnamed: 0,height,weight,label
0,142,62,fat
1,142,73,fat
2,177,61,normal
3,187,48,thin
4,153,60,fat
...,...,...,...
19995,122,58,fat
19996,193,69,normal
19997,193,37,thin
19998,195,51,thin


In [27]:
# sklearn.neighbors 사용

x_train = np.array(bmi.iloc[:,0:2])
x_train

array([[142,  62],
       [142,  73],
       [177,  61],
       ...,
       [193,  37],
       [195,  51],
       [163,  67]])

In [28]:
label = bmi['label']
y = np.array([[178, 71]])

In [29]:
clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(x_train, label)
clf.predict(y)[0]

'normal'

In [30]:
# 알고리즘 원리 이용

dist = np.sqrt(np.sum(pow(x_train - y, 2),axis=1))
indices = dist.argsort()[:3]
label[indices]

10353    normal
14793    normal
6680     normal
Name: label, dtype: object

In [31]:
for i in indices:
  print(label[i])

normal
normal
normal


In [32]:
from collections import Counter
result = np.take(label, indices)
Counter(result).most_common(1)[0][0]

'normal'