In [2]:
import pandas as pd
import numpy as np

The goal of this notebook is to create the different features in the data for the model. I want to find the (x, y) coordinates of the court itself then seperate it into 13 different shooting locations

- left corner 3, left wing 3, center 3, right wing 3, right corner 3
- deep left baseline 2, deep left wing 2, deep center 2, deep right wing 2, deep right baseline 2
- short left baseline 2, short left wing 2, short center 2, short right wing 2, short right baseline 2
- floater range, layup and dunk (could make layup and dunk a single position
- deep 3 (anything beyond 28 ft in theory)

In [3]:
df = pd.read_csv('2000-2020_shot_charts.csv', index_col = 'GAME_DATE')

In [11]:
df.shape

(640705, 23)

# Messing with the data to try and get an idea of court locations

In [24]:
df[(df.LOC_Y >= 150) & (df.LOC_Y <= 237)].SHOT_ZONE_BASIC.unique()

array(['Above the Break 3', 'Mid-Range'], dtype=object)

In [26]:
df.SHOT_ZONE_BASIC.unique()

array(['Restricted Area', 'Right Corner 3', 'In The Paint (Non-RA)',
       'Left Corner 3', 'Mid-Range', 'Above the Break 3', 'Backcourt'],
      dtype=object)

For x and y location, 10 units in the LOC_X or LOC_Y value is equal to 1 ft on the court. 

-250, -50 = left corner

250, -50 = right corner

0, 238 = top of the key (any value for LOC_Y that is 238 or higher would be great)

Shot Areas: 

### 3 Pointers

X(-250 to -220) & Y(-50 to 87) = left corner 3

X(220 to 250) & Y(-50 to 87) = right corner 3

X(-250 to -80) & Y(87 to 280) & Shot_zone(above the break) = left wing 3

X(80 to 250) & Y(87 to 280) & Shot_zone(above the break) = right wing 3

X(-80 to 80) and Y(87 to 280) & Shot_zone(above the break) = Center 3

X(any) and Y(280 to 350) = Deep 3

X(any) and Y(350+) = Heave

### Mid Range 

X(-220 to -150) & Y(-50 to 90) = left baseline deep midrange

X(150 to 220) & Y(-50 to 90) = right baseline deep midrange

((X(-220 to 150) & Y(90+)) & (X(-150 to 80) & Y(150+))) & Shotzone(Mid-Range) = left wing deep mid ranger

((X(150 to 220) & Y(90+)) & (X(80 to 150) & Y(150+))) & Shotzone(Mid-Range) = right wing deep mid ranger

X(-150 to -80) & Y(-50 to 90) = short left baseline midranger

X(80 to 150) & Y(-50 to 90) = short right baseline midranger

X(-150 to -80) & Y(90 to 150) = short left wing mid ranger

X(80 to 150) & Y(90 to 150) = short right wing mid ranger

X(-80 to 80) & Y(210+) & Shotzone(Mid-Range) = deep center midranger

X(-80 to 80) & Y(150 to 210) = short center midrange

### Paint

X(-80 to 80) & Y(90 to 150) = Floater

X(-80 to 80) & Y(-50) & Shotzone(In The Paint (Non-RA)) = layup/in the paint

Shotzone(restricted area) = restricted area

## Trying to organize the data

In [78]:
new_df = df.reset_index()

In [79]:
new_df['GAME_DATE'] = pd.to_datetime(new_df.GAME_DATE, format= '%Y-%m-%d')

In [80]:
grouped_df = new_df.groupby(['PLAYER_NAME', new_df.GAME_DATE.dt.year]).mean()

In [86]:
for i, j in grouped_df.index:
    print(j)

1999
2000
1999
2000
2001
2002
2016
2017
2014
2015
2014
2015
2016
2014
2015
2016
2017
2018
2019
2020
2010
2011
2018
2018
2019
2020
2018
2005
2006
2007
2005
2006
2017
2018
2007
2008
2009
2002
2000
2001
2020
2009
2010
2019
2020
1999
2000
2001
2002
2003
2004
2005
2006
2007
2014
2014
2015
1999
2000
2001
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2004
2005
2006
2007
2011
2011
2012
2013
2014
2016
2017
1999
2000
2001
2002
2003
2004
2019
2010
2018
2019
2017
2018
2000
2019
2020
2016
2017
2018
2019
2009
2017
2018
2019
2020
2004
2014
2018
2019
2020
2018
2019
2016
2007
2008
2014
2015
2013
2014
2015
2016
2017
2019
2020
2018
2019
1999
2000
2001
2002
2003
2004
2005
2006
2007
2019
2020
2006
2007
2008
2018
2019
2020
2010
2011
2012
2013
2014
1999
2000
2001
2002
2005
2006
2007
2003
2002
2002
2015
2019
2020
2020
2015
2016
2017
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
1999
2000
2012
2013
2014
2006
2007
2007
2014
2020
2012
2012
2013
2018
1999
2000
2001
2002
2007


2013
2014
2008
2009
2010
1999
2000
2011
2004
2017
2000
2001
2003
2004
2004
2005
2000
2001
1999
2000
2001
2017
2018
2019
2010
2011
2012
2019
2020
2002
2003
2017
2018
2000
2001
2002
2005
2006
2007
2002
2015
2016
2017
2004
2002
2003
2018
2019
2019
2018
2019
2016
2017
2018
2019
2020
2018
2019
1999
2000
2017
2018
2015
2016
2017
2016
2017
2018
2015
2011
2012
2013
2006
2017
2018
2019
2019
2020
2006
2007
2008
2009
2010
2011
2014
2015
2016
2017
2018
2019
2020
2002
2003
2009
2010
2000
2008
2009
2010
2011
2019
2020
2015
2016
2017
2018
2019
2020
2011
2012
2019
2019
2020
2015
2016
2017
2018
2015
2019
2020
2004
2005
2018
2005
2006
2005
2006
2018
2019
2020
2017
2017
2001
2002
2003
2004
2014
2015
2016
2017
2003
2004
2018
2019
2020
2019
2020
2007
2000
2001
2018
2019
2020
2018
2019
2020
2018
2019
2020
2019
2020
2002
2003
2015
2002
2003
2004
2005
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2002
2003
2004
2016
2017
2018
2019
2020
2000
2006
2007
2009
2010
2011
2018
2014
2015
2017
2018
1999
2013
