# Rep 5. KNN분류 및 로지스틱회귀
- Kaggle에서 분류 관련 데이터셋을 구해서, 2진분류와 다중분류를, 교재에 흐름대로 실습한 결과물

#### 201845092 이정윤

##### dataset : KRmatch.csv
출처 : https://www.kaggle.com/datasets/andrewasuter/lol-challenger-soloq-data-jan-krnaeuw?select=KRmatch.csv
- d_spell : D키 스킬 발동 횟수
- f_spell : F키 스킬 발동 횟수
- role : 포지션
- assists : 어시스트 횟수
- deaths : 데스 횟수
- kills : 킬 횟수
- gold_earned : 총 얻은 골드
- level : 게임 종료 시 레벨
- damage_total : 총 받은 데미지
- total_minions_killed : 총 죽인 미니언 수
- result : 게임 결과
  
#### (1) KNN 다중 분류
    리그오브레전드의 여러가지 게임 내 수치로 플레이어의 포지션 예측

In [209]:
import pandas as pd
lol = pd.read_csv("data/KRmatch.csv")
lol[['d_spell', 'f_spell', 'result', 'assists', 'deaths', 'kills',
     'gold_earned', 'level', 'damage_total', 'total_minions_killed', 'role']]


Unnamed: 0,d_spell,f_spell,result,assists,deaths,kills,gold_earned,level,damage_total,total_minions_killed,role
0,14,4,False,6,6,2,6043,9,15214,36,Lane.utility
1,11,4,True,8,3,12,12919,16,194463,23,Lane.jungle
2,14,4,False,2,7,0,5641,10,34038,26,Lane.utility
3,14,4,True,8,3,7,10688,14,94369,133,Lane.mid_lane
4,4,12,True,8,5,13,12368,15,110580,141,Lane.top_lane
...,...,...,...,...,...,...,...,...,...,...,...
5692,12,4,False,18,6,7,12178,16,140108,174,Lane.mid_lane
5693,4,7,False,10,14,8,18347,18,275357,253,Lane.bot_lane
5694,12,4,True,3,4,3,9262,13,99117,169,Lane.top_lane
5695,4,11,True,6,2,9,8863,11,100611,22,Lane.jungle


Train, Test 분리  
라벨인코더로 role 인코딩  
StandardScaler로 스케일링

In [210]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import LabelEncoder

# encoder = LabelEncoder()
# lol['role'] = encoder.fit_transform(lol['role'])

lol_input = lol[['d_spell', 'f_spell', 'result', 'assists', 'deaths', 'kills',
                 'gold_earned', 'level', 'damage_total', 'total_minions_killed']].to_numpy()
lol_target = lol['role'].to_numpy()


train_input, test_input, train_target, test_target = train_test_split(
    lol_input, lol_target, random_state=42)

ss = StandardScaler()
ss.fit(train_input)
train_scaled = ss.transform(train_input)
test_scaled = ss.transform(test_input)


In [211]:
from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier(n_neighbors=3)
kn.fit(train_scaled, train_target)
print(kn.score(train_scaled, train_target))
print(kn.score(test_scaled, test_target))


0.8953651685393258
0.8084210526315789


In [212]:
import numpy as np
print(kn.classes_)
proba = kn.predict_proba(test_scaled[:5])
print(np.round(proba, decimals=4))

['Lane.bot_lane' 'Lane.jungle' 'Lane.mid_lane' 'Lane.top_lane'
 'Lane.utility']
[[0.     0.     0.     0.     1.    ]
 [0.3333 0.     0.     0.6667 0.    ]
 [0.     0.     0.3333 0.6667 0.    ]
 [0.     0.     0.     1.     0.    ]
 [0.     0.     0.     0.     1.    ]]


### (2) 로지스틱 회귀 2진 분류

불리언 인덱싱으로 탑 라인과 미드 라인 분류

In [213]:
top_mid_indexes = (train_target == 'Lane.top_lane') | (
    train_target == 'Lane.mid_lane')
train_top_mid = train_scaled[top_mid_indexes]
target_top_mid = train_target[top_mid_indexes]


로지스틱 회귀 사용

In [214]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(train_top_mid, target_top_mid)
print('예측한 포지션:', lr.predict(train_top_mid[:5]))
print('클래스:', lr.classes_)
print('확률:', lr.predict_proba(train_top_mid[:5]))


예측한 포지션: ['Lane.mid_lane' 'Lane.top_lane' 'Lane.mid_lane' 'Lane.mid_lane'
 'Lane.top_lane']
클래스: ['Lane.mid_lane' 'Lane.top_lane']
확률: [[0.68868396 0.31131604]
 [0.3294298  0.6705702 ]
 [0.56523694 0.43476306]
 [0.70832692 0.29167308]
 [0.45822728 0.54177272]]


### (3) 로지스틱 회귀 다중 분류

In [215]:
from sklearn.linear_model import LogisticRegression
import numpy as np
lr = LogisticRegression(C=20, max_iter=2000)
lr.fit(train_scaled, train_target)

print(lr.score(train_scaled, train_target))
print(lr.score(test_scaled, test_target), end='\n\n')

print('예측한 포지션:')
print(lr.predict(test_scaled[:10]), end='\n\n')
print('클래스:')
print(lr.classes_, end='\n\n')
print('확률:')
print(np.round(lr.predict_proba(test_scaled[:10]), decimals=3))

0.841058052434457
0.8238596491228071

예측한 포지션:
['Lane.utility' 'Lane.mid_lane' 'Lane.top_lane' 'Lane.top_lane'
 'Lane.utility' 'Lane.mid_lane' 'Lane.bot_lane' 'Lane.bot_lane'
 'Lane.bot_lane' 'Lane.jungle']

클래스:
['Lane.bot_lane' 'Lane.jungle' 'Lane.mid_lane' 'Lane.top_lane'
 'Lane.utility']

확률:
[[0.    0.054 0.002 0.011 0.933]
 [0.059 0.    0.476 0.464 0.   ]
 [0.006 0.    0.443 0.551 0.   ]
 [0.    0.    0.389 0.611 0.   ]
 [0.    0.117 0.    0.002 0.881]
 [0.052 0.    0.573 0.375 0.   ]
 [0.997 0.    0.003 0.    0.   ]
 [0.623 0.    0.13  0.247 0.   ]
 [0.937 0.    0.043 0.021 0.   ]
 [0.    1.    0.    0.    0.   ]]
