# Player Categorization
This Notebook explores possibility of classifying players into n types, instead of the convention 5 positions

I downloaded players data from 2015-2018 (3 seasons) at https://www.basketball-reference.com

In [1]:
# load libraries
import pandas as pd
import numpy as np

In [2]:
# load data
# 2017-18
playerPerGame18 = pd.read_csv('2017_18_PlayerPerGame.csv')
playerAdvanced18 = pd.read_csv('2017_18_PlayerAdvanced.csv')
playerTotal18 = pd.read_csv('2017_18_PlayerTotals.csv')
player36Mins18 = pd.read_csv('2017_18_PlayerPer36Minutes.csv')
player100Poss18 = pd.read_csv('2017_18_PlayerPer100Poss.csv')

# 2016-17
playerPerGame17 = pd.read_csv('2016_17_PlayerPerGame.csv')
playerAdvanced17 = pd.read_csv('2016_17_PlayerAdvanced.csv')
playerTotal17 = pd.read_csv('2016_17_PlayerTotals.csv')
player36Mins17 = pd.read_csv('2016_17_PlayerPer36Minutes.csv')
player100Poss17 = pd.read_csv('2016_17_PlayerPer100Poss.csv')

# 2015-16
playerPerGame16 = pd.read_csv('2015_16_PlayerPerGame.csv')
playerAdvanced16 = pd.read_csv('2015_16_PlayerAdvanced.csv')
playerTotal16 = pd.read_csv('2015_16_PlayerTotals.csv')
player36Mins16 = pd.read_csv('2015_16_PlayerPer36Minutes.csv')
player100Poss16 = pd.read_csv('2015_16_PlayerPer100Poss.csv')

### Data Preparation
Before the actual analysis, need to clean and prepare the data.  
There are x steps here:  
1. Remove duplicates
2. Remove players who played a total of less than 20 games in a season  
  
I am going to use data from the per game and advanced table

In [3]:
# a nice thing about these data from basektball-reference.com is that
# for players who played for more than one team, it already calculated 
# players overall (aggregated) statistics and placed at the first row of that player
playerPerGame16 = playerPerGame16.drop_duplicates(['Player'], keep="first")
playerPerGame17 = playerPerGame17.drop_duplicates(['Player'], keep="first")
playerPerGame18 = playerPerGame18.drop_duplicates(['Player'], keep="first")

playerAdvanced16 = playerAdvanced16.drop_duplicates(['Player'], keep="first")
playerAdvanced17 = playerAdvanced17.drop_duplicates(['Player'], keep="first")
playerAdvanced18 = playerAdvanced18.drop_duplicates(['Player'], keep="first")

In [4]:
playerPerGame16.duplicated()

0      False
1      False
2      False
3      False
4      False
5      False
6      False
7      False
8      False
9      False
10     False
11     False
12     False
15     False
16     False
17     False
18     False
19     False
20     False
21     False
22     False
23     False
24     False
25     False
26     False
27     False
28     False
31     False
32     False
33     False
       ...  
544    False
545    False
546    False
549    False
552    False
553    False
554    False
555    False
556    False
557    False
558    False
559    False
560    False
561    False
562    False
563    False
564    False
565    False
566    False
567    False
568    False
569    False
570    False
571    False
572    False
573    False
574    False
575    False
576    False
577    False
Length: 476, dtype: bool

In [3]:
# keep only players who played at least 20 games in a season
minimum_games = 20

In [4]:
PlayerPerGame = PlayerPerGame[x]

In [8]:
PlayerPerGame.fillna(0, inplace = True)

In [14]:
PlayerPerGame.loc[625:, "Player":"3P%"]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%
625,Russell Westbrook\westbru01,PG,29,OKC,80,80,36.4,9.5,21.1,0.449,1.2,4.1,0.298
626,Andrew White\whitean01,SF,24,ATL,15,0,13.9,1.7,4.9,0.342,1.2,3.3,0.367
627,Derrick White\whitede01,PG,23,SAS,17,0,8.2,0.9,1.9,0.485,0.5,0.8,0.615
628,Okaro White\whiteok01,PF,25,MIA,6,4,13.3,1.2,2.7,0.438,0.7,1.8,0.364
629,Isaiah Whitehead\whiteis01,PG,22,BRK,16,0,11.3,2.5,5.4,0.465,0.4,1.1,0.389
630,Hassan Whiteside\whiteha01,C,28,MIA,54,54,25.3,5.8,10.7,0.54,0.0,0.0,1.0
631,Andrew Wiggins\wiggian01,SF,22,MIN,82,82,36.3,6.9,15.9,0.438,1.4,4.1,0.331
633,Damien Wilkins\wilkida02,SF,38,IND,19,1,8.0,0.7,2.1,0.333,0.2,0.9,0.222
635,C.J. Williams\willicj01,SG,27,LAC,38,17,18.6,2.3,5.2,0.442,0.6,2.1,0.282
637,Lou Williams\willilo02,SG,31,LAC,79,19,32.8,7.4,16.9,0.435,2.4,6.6,0.359
