# KNN exercise with NBA player data

## Introduction

- NBA player statistics from 2014-2015 (partial season): [data](https://github.com/justmarkham/DAT4-students/blob/master/kerry/Final/NBA_players_2015.csv), [data dictionary](https://github.com/justmarkham/DAT-project-examples/blob/master/pdf/nba_paper.pdf)
- **Goal:** Predict player position using assists, steals, blocks, turnovers, and personal fouls

## Step 1: Read the data into Pandas

In [4]:
import pandas as pd
url = 'https://raw.githubusercontent.com/justmarkham/DAT4-students/master/kerry/Final/NBA_players_2015.csv'
nba = pd.read_csv(url, header=0)

In [16]:
nba.columns

Index(['Unnamed: 0', 'season_end', 'player', 'pos', 'age', 'bref_team_id', 'g',
       'gs', 'mp', 'fg', 'fga', 'fg_', 'x3p', 'x3pa', 'x3p_', 'x2p', 'x2pa',
       'x2p_', 'ft', 'fta', 'ft_', 'orb', 'drb', 'trb', 'ast', 'stl', 'blk',
       'tov', 'pf', 'pts', 'G', 'MP', 'PER', 'TS%', '3PAr', 'FTr', 'TRB%',
       'AST%', 'STL%', 'BLK%', 'TOV%', 'USG%', 'OWS', 'DWS', 'WS', 'WS/48',
       'OBPM', 'DBPM', 'BPM', 'VORP', 'pos_num'],
      dtype='object')

In [19]:
nba.pos.value_counts

TypeError: 'method' object is not subscriptable

## Step 2: Create X and y

Use the following features: assists, steals, blocks, turnovers, personal fouls

In [10]:
nba['pos_num']=nba.pos.map({'F':0, 'G':1, 'C':2})

In [11]:
nba_feature_cols = ['ast', 'stl', 'blk', 'tov', 'pf']
X=nba[nba_feature_cols]
y=nba.pos_num
#AST, STL, BLK, TO, PF
#F = 0 G = 1 C=2

## Step 3: Train a KNN model (K=5)

In [21]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)

In [22]:
knn.fit(X,y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_neighbors=5, p=2, weights='uniform')

## Step 4: Predict player position and calculate predicted probability of each position

Predict for a player with these statistics: 1 assist, 1 steal, 0 blocks, 1 turnover, 2 personal fouls

In [23]:
player = [1, 1, 0, 1 , 2]
knn.predict(player)

array([1])

In [24]:
knn.predict_proba(player)

array([[ 0.2,  0.8,  0. ]])

## Step 5: Repeat steps 3 and 4 using K=50

In [26]:
knn = KNeighborsClassifier(n_neighbors=50)
knn.fit(X,y)
print(knn.predict(player))
print(knn.predict_proba(player))

[0]
[[ 0.62  0.32  0.06]]


## Bonus: Explore the features to decide which ones are predictive

In [None]:
knn.predict_proba