# KNN exercise with NBA player data

## Introduction

- NBA player statistics from 2014-2015 (partial season): [data](https://github.com/justmarkham/DAT4-students/blob/master/kerry/Final/NBA_players_2015.csv), [data dictionary](https://github.com/justmarkham/DAT-project-examples/blob/master/pdf/nba_paper.pdf)
- **Goal:** Predict player position using assists, steals, blocks, turnovers, and personal fouls

## Step 1: Read the data into Pandas

In [9]:
import pandas as pd
url = 'https://raw.githubusercontent.com/justmarkham/DAT4-students/master/kerry/Final/NBA_players_2015.csv'
nba = pd.read_csv(url)

In [10]:
nba.head()

Unnamed: 0.1,Unnamed: 0,season_end,player,pos,age,bref_team_id,g,gs,mp,fg,...,TOV%,USG%,OWS,DWS,WS,WS/48,OBPM,DBPM,BPM,VORP
0,0,2015,Quincy Acy,F,24,NYK,52,21,19.2,2.2,...,15.1,14.7,0.6,0.5,1.0,0.05,-2.6,-0.7,-3.4,-0.3
1,1,2015,Jordan Adams,G,20,MEM,18,0,7.3,1.0,...,15.9,17.7,0.0,0.2,0.2,0.076,-2.3,1.8,-0.5,0.0
2,2,2015,Steven Adams,C,21,OKC,51,50,24.2,3.0,...,19.2,14.8,1.0,1.8,2.8,0.109,-2.0,2.0,-0.1,0.6
3,3,2015,Jeff Adrien,F,28,MIN,17,0,12.6,1.1,...,12.9,14.1,0.2,0.2,0.4,0.093,-2.6,0.8,-1.8,0.0
4,4,2015,Arron Afflalo,G,29,TOT,60,54,32.5,5.0,...,10.9,19.6,1.4,0.7,2.1,0.051,-0.2,-1.4,-1.6,0.2


## Step 2: Create X and y

Use the following features: assists, steals, blocks, turnovers, personal fouls

In [13]:
features_cols = ['ast','stl','blk','tov','pf']
X = nba[features_cols]

In [15]:
nba['pos_num'] = nba.pos.map({'F':0,'G':1,'C':2})

In [17]:
y = nba.pos_num

## Step 3: Train a KNN model (K=5)

In [19]:
from sklearn.neighbors import KNeighborsClassifier

In [21]:
knn = KNeighborsClassifier(n_neighbors = 5)
type(knn)

sklearn.neighbors.classification.KNeighborsClassifier

In [23]:
knn.fit(X,y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_neighbors=5, p=2, weights='uniform')

## Step 4: Predict player position and calculate predicted probability of each position

Predict for a player with these statistics: 1 assist, 1 steal, 0 blocks, 1 turnover, 2 personal fouls

In [25]:
knn.predict([0.5,0.5,1,0.8,2])

array([2], dtype=int64)

In [27]:
knn.predict([1.8,0.5,0.1,1.5,2])

array([1], dtype=int64)

In [29]:
player = [1,1,0,1,2]

In [39]:
knn.predict(player)

array([0], dtype=int64)

In [40]:
knn.predict_proba(player)

array([[ 0.62,  0.32,  0.06]])

## Step 5: Repeat steps 3 and 4 using K=50

In [32]:
from sklearn.neighbors import KNeighborsClassifier

In [35]:
knn = KNeighborsClassifier(n_neighbors = 50)
type(knn)
knn.fit(X,y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_neighbors=50, p=2, weights='uniform')

In [37]:
player = [1,1,0,1,2]

In [38]:
knn.predict(player)

array([0], dtype=int64)

## Bonus: Explore the features to decide which ones are predictive