# Additional Topics

## Common python functions

This section will focus on functions that are used frequently in python - these range from data manipulation to completing exploratory data analysis. First up: 
- .head() - display the first n rows in a dataframe - by default it will be 5 records
- .tail() - display the last n rows in a dataframe - by default it will be 5 records

In [3]:
# Load pandas package
import pandas as pd

# Define the file path
file_path = '../../2022-fall-python-tutorial/data/2022_boxscores.csv' # '../..' helps python to find the file

# Load file
df = pd.read_csv(file_path)

In [7]:
# Display the top 2 rows
df.head(2)

Unnamed: 0,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
0,50.0,11,15.4,6,98.6,51.3,20,0.421,57,0.386,...,17,0.0,0,"Jon M. Huntsman Center, Salt Lake City, Utah",ABILENE-CHRISTIAN,Abilene Christian,71.0,Home,UTAH,Utah
1,68.8,22,0.0,0,101.3,67.7,21,0.507,73,0.438,...,15,0.0,0,"Reed Arena, College Station, Texas",ABILENE-CHRISTIAN,Abilene Christian,64.3,Home,TEXAS-AM,Texas A&M


In [8]:
# Display the last 3 rows of data
df.tail(3)

Unnamed: 0,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
11120,28.0,7,0.0,0,115.1,70.4,19,0.5,57,0.439,...,30,0.0,0,"Ervin J. Nutter Center, Dayton, Ohio",YOUNGSTOWN-STATE,Youngstown State,72.5,Home,WRIGHT-STATE,Wright State
11121,69.6,16,6.3,2,117.2,59.3,16,0.529,52,0.442,...,17,0.0,0,"Bank of Kentucky Center, Highland Heights, Ken...",YOUNGSTOWN-STATE,Youngstown State,63.5,Home,NORTHERN-KENTUCKY,Northern Kentucky
11122,42.9,12,16.7,7,101.4,65.1,28,0.577,52,0.538,...,18,0.0,0,"Beeghly Center, Youngstown, Ohio",YOUNGSTOWN-STATE,Youngstown State,72.2,Away,ROBERT-MORRIS,Robert Morris


To calculate general summary statistics on a dataframe we use **.describe()**. This will return the following values for each column: 
- count of records
- mean
- standard deviations
- min
- 25, 50, & 75 percentile value
- max

In [9]:
# Calculate summary statistics on df
df.describe()

Unnamed: 0,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,...,home_total_rebounds,home_true_shooting_percentage,home_turnover_percentage,home_turnovers,home_two_point_field_goal_attempts,home_two_point_field_goal_percentage,home_two_point_field_goals,home_win_percentage,home_wins,pace
count,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0,...,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0
mean,48.633678,11.85004,7.942803,2.848422,104.750274,72.546256,22.712218,0.486718,57.497078,0.424392,...,32.979052,0.548731,15.000917,11.688483,36.039108,0.518032,18.593275,0.0,0.0,69.302167
std,13.574212,4.104921,5.45806,2.004859,15.183268,9.940378,4.83768,0.087463,6.926235,0.073961,...,6.811988,0.078797,4.566044,3.745543,7.253632,0.097084,4.85501,0.0,0.0,5.523341
min,4.5,1.0,0.0,0.0,42.6,21.7,8.0,0.143,29.0,0.143,...,12.0,0.238,1.4,1.0,14.0,0.179,5.0,0.0,0.0,51.5
25%,39.1,9.0,3.6,1.0,94.6,65.8,19.0,0.427,53.0,0.375,...,28.0,0.496,11.7,9.0,31.0,0.452,15.0,0.0,0.0,65.5
50%,48.3,12.0,7.1,3.0,104.5,73.1,23.0,0.484,57.0,0.423,...,33.0,0.548,14.7,11.0,36.0,0.516,18.0,0.0,0.0,69.0
75%,58.3,15.0,11.1,4.0,114.7,79.3,26.0,0.545,62.0,0.475,...,37.0,0.601,18.0,14.0,41.0,0.581,22.0,0.0,0.0,72.8
max,92.3,31.0,37.5,14.0,169.3,100.0,44.0,0.821,96.0,0.692,...,71.0,0.824,34.5,28.0,74.0,0.917,43.0,0.0,0.0,107.8


To get data types & non-null values, we use **.info()**

In [10]:
# Get dataframe data types & non-null values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11123 entries, 0 to 11122
Data columns (total 86 columns):
away_assist_percentage                    11123 non-null float64
away_assists                              11123 non-null int64
away_block_percentage                     11123 non-null float64
away_blocks                               11123 non-null int64
away_defensive_rating                     11123 non-null float64
away_defensive_rebound_percentage         11123 non-null float64
away_defensive_rebounds                   11123 non-null int64
away_effective_field_goal_percentage      11123 non-null float64
away_field_goal_attempts                  11123 non-null int64
away_field_goal_percentage                11123 non-null float64
away_field_goals                          11123 non-null int64
away_free_throw_attempt_rate              11123 non-null float64
away_free_throw_attempts                  11123 non-null int64
away_free_throw_percentage                11114 non-null f

Calculate the count of each unique values for a given column: **.value_counts()**.

In [12]:
df['away_assists'].value_counts()

12    1119
11    1053
10    1041
9      997
13     949
14     814
8      809
15     772
7      603
16     566
17     452
6      428
18     333
5      274
19     213
20     167
4      149
21      88
3       69
22      69
23      63
24      25
2       21
25      14
26      12
1        8
27       7
28       6
31       2
Name: away_assists, dtype: int64

For a list of all column names in a dataframe use **.columns**

In [16]:
# Return list of column names
list(df.columns)

['away_assist_percentage',
 'away_assists',
 'away_block_percentage',
 'away_blocks',
 'away_defensive_rating',
 'away_defensive_rebound_percentage',
 'away_defensive_rebounds',
 'away_effective_field_goal_percentage',
 'away_field_goal_attempts',
 'away_field_goal_percentage',
 'away_field_goals',
 'away_free_throw_attempt_rate',
 'away_free_throw_attempts',
 'away_free_throw_percentage',
 'away_free_throws',
 'away_losses',
 'away_minutes_played',
 'away_offensive_rating',
 'away_offensive_rebound_percentage',
 'away_offensive_rebounds',
 'away_personal_fouls',
 'away_points',
 'away_ranking',
 'away_steal_percentage',
 'away_steals',
 'away_three_point_attempt_rate',
 'away_three_point_field_goal_attempts',
 'away_three_point_field_goal_percentage',
 'away_three_point_field_goals',
 'away_total_rebound_percentage',
 'away_total_rebounds',
 'away_true_shooting_percentage',
 'away_turnover_percentage',
 'away_turnovers',
 'away_two_point_field_goal_attempts',
 'away_two_point_field_go

Replace NA values with **.fillna()**

In [17]:
# Replace NA values with 0
df['away_assists'] = df['away_assists'].fillna(0)

## Data Structure Manipulation
- pivot
- unpivot
- other?