# Pandas Presentation 
## Featuring NBA Shot Charts of Lebron James, Steph Curry, and James Harden
### Arupava Saha

First, we will import pandas and numpy into our environment, and load up the different csv files and show off a summary using the `.head()` method.
A special aspect of pandas is that one can immediately turn a csv file into a pandas dataframe with the `pandas.read_csv()` method. 
A pandas dataframe is a tabular data structure consisting of rows and columns of different information, where each row is essentially a unique item. Each row and columns can have 
their own label. 

\
Let's start by giving a summary of the shot charts of Lebron James, James Harden, and Steph Curry



In [7]:
import pandas as pd
import numpy as np

lj = pd.read_csv('1_lebron_james_shot_chart_1_2023.csv')
print("Lebron James summary")
print(lj.head())

jh = pd.read_csv('2_james_harden_shot_chart_2023.csv')
print("James Harden summary")
print(jh.head())

sc = pd.read_csv('3_stephen_curry_shot_chart_2023.csv')
print("Stephen Curry summary")
print(sc.head())

Lebron James summary
   top  left          date      qtr time_remaining  result  shot_type  \
0  310   203  Oct 18, 2022  1st Qtr          09:26   False          3   
1  213   259  Oct 18, 2022  1st Qtr          08:38   False          2   
2  143   171  Oct 18, 2022  1st Qtr          08:10   False          2   
3   68   215  Oct 18, 2022  1st Qtr          05:24    True          2   
4   66   470  Oct 18, 2022  1st Qtr          01:02   False          3   

   distance_ft   lead  lebron_team_score  opponent_team_score opponent team  \
0           26  False                  2                    2      GSW  LAL   
1           16  False                  4                    5      GSW  LAL   
2           11  False                  4                    7      GSW  LAL   
3            3  False                 12                   19      GSW  LAL   
4           23  False                 22                   23      GSW  LAL   

   season  color  
0    2023    red  
1    2023    red  
2    202

## Data Frame Structure
As you can see below, the pandas data frame is a class called `pandas.core.frame.DataFrame`. The dataframe can be indexed by row, starting at 0. Thge special aspect of the pandas dataframe is the ability to store different datatypes. For example, our Lebron James dataframe has `int64` datatype for the `top`, `left`, and `shot_type` columns.  

We also can see how many non-null entries are in each column. Luckily, our data does not have any non-null values, but we can drop entries with `.dropna()` function. We can change the datatype of columns with `astype()`.

Let's see all of this in action by calling the `.info()` method on our dataframe.

In [8]:
print("\n Data Frame Structure")
print(lj.info())


 Data Frame Structure
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1533 entries, 0 to 1532
Data columns (total 15 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   top                  1533 non-null   int64 
 1   left                 1533 non-null   int64 
 2   date                 1533 non-null   object
 3   qtr                  1533 non-null   object
 4   time_remaining       1533 non-null   object
 5   result               1533 non-null   bool  
 6   shot_type            1533 non-null   int64 
 7   distance_ft          1533 non-null   int64 
 8   lead                 1533 non-null   bool  
 9   lebron_team_score    1533 non-null   int64 
 10  opponent_team_score  1533 non-null   int64 
 11  opponent             1533 non-null   object
 12  team                 1533 non-null   object
 13  season               1533 non-null   int64 
 14  color                1533 non-null   object
dtypes: bool(2), int64(7), object(6)


## Descriptive Statistics

Another nice feature of pandas is the rapid access to descriptive statistics. All we have to do is call `.describe()` on our dataframe we will be given: count, mean, standard deviation, and more about the dataframe.
\
Let's try it out now with all of dataframes. 

In [9]:
print("\n Lebron James descriptive statistics")
print(lj.describe())

print("\n James Harden descriptive statistics")
print(jh.describe())

print("\n Stephen Curry descriptive statistics")
print(sc.describe())


 Lebron James descriptive statistics
               top         left    shot_type  distance_ft  lebron_team_score  \
count  1533.000000  1533.000000  1533.000000  1533.000000        1533.000000   
mean    151.410959   231.982387     2.318982    12.960861          59.813438   
std      98.316684    91.626122     0.466234    10.698687          35.376109   
min      36.000000     2.000000     2.000000     0.000000           0.000000   
25%      67.000000   193.000000     2.000000     3.000000          31.000000   
50%      98.000000   238.000000     2.000000    10.000000          60.000000   
75%     251.000000   261.000000     3.000000    25.000000          88.000000   
max     389.000000   478.000000     3.000000    34.000000         140.000000   

       opponent_team_score  season  
count          1533.000000  1533.0  
mean             59.098500  2023.0  
std              35.456701     0.0  
min               0.000000  2023.0  
25%              31.000000  2023.0  
50%              59

## Data Wrangling 

Let's do some practice with data wrangling. Let's use an advanced basketball statistic, the Effective Field Goal percentage, to practice. 

The Formula is as follow:
$EFG = \frac{FG + \frac{3P}{2}}{FGA}$ 
 
where FG stands for number of field goals, 3P stands for number of three pointers made, and FGA stands for field goals attempted.

Let's tackle each one of these variables one after another. First, for number of field goals made, we need to count the occurrence of true in the `result` category. 

In [19]:
value_counts_lj = lj['result'].value_counts()
print("\nLebron James shot result")
print(value_counts)

fg_lj = value_counts_lj.get(True)
print("Lebron James's field goals made: " + str(fg_lj))


Lebron James shot result
result
True     768
False    765
Name: count, dtype: int64
Lebron James's field goals made: 768


Here, we used `.value_counts()` to figure out how many shot attempts went in. We can figure out the three pointers made in a similar fashion. 

In [29]:
made_shots = lj[lj['result'] == True]
print("\nLebron James's made shots")
print(made_shots.head())

made_value_counts_lj = made_shots['shot_type'].value_counts()
three_pt_made_lj = made_value_counts_lj.get(3)
print("\nLebron James's made 3PT shots: " + str(three_pt_made_lj))


Lebron James's made shots
    top  left          date      qtr time_remaining  result  shot_type  \
3    68   215  Oct 18, 2022  1st Qtr          05:24    True          2   
5    63   239  Oct 18, 2022  2nd Qtr          11:30    True          2   
7    53   224  Oct 18, 2022  2nd Qtr          10:05    True          2   
11   63   249  Oct 18, 2022  2nd Qtr          03:53    True          2   
13   54   249  Oct 18, 2022  2nd Qtr          01:48    True          2   

    distance_ft   lead  lebron_team_score  opponent_team_score opponent team  \
3             3  False                 12                   19      GSW  LAL   
5             1  False                 24                   25      GSW  LAL   
7             2  False                 26                   27      GSW  LAL   
11            1  False                 39                   49      GSW  LAL   
13            1  False                 44                   53      GSW  LAL   

    season  color  
3     2023  green  
5     2

Whoa, we just did some very different code. We used filtering to find which shots Lebron actually made. We did this with the format `new_df = df[df['column'] == True]` we can use this format with other conditionals as well. 

Finally, we have to find the attempted field goals, which we can just caluclate by getting the number of rows of the dataframe.

In [33]:
field_goals_attempted_lj = len(lj)
print("\nLebron James's field goals attempted: " + str(field_goals_attempted_lj))


Lebron James's field goals attempted: 1533


Now we can finally calculate Lebron's Effective Field Goal Percentage

In [34]:
efg_lj = (fg_lj + 0.5 * three_pt_made_lj) / field_goals_attempted_lj
print("Lebron James's effective field goal percentage: " + str(efg_lj))

Lebron James's effective field goal percentage: 0.5505544683626875
