# Making use of Playoff Game Logs <img src="playoffs.jpeg" style="height:100px" align = 'right' />

We're looking at historical NBA playoff series data, on a period by period level. I will go through how I pull the data using python's [nba_api](https://github.com/swar/nba_api) package. We will look at data from seasons `1996-97` through `2024-25`.

In [23]:
import nba_api.stats.endpoints as nba
import pandas as pd
import time
from nba_api.stats.static import teams

Using `TeamGameLogs`, we can pull historical game logs from previous seasons.

In [24]:
def get_game_log(season,measure_type="Base",stype = "Playoffs",location = None):
	df = nba.TeamGameLogs(season_nullable=season,location_nullable = location,measure_type_player_game_logs_nullable=measure_type,season_type_nullable = stype).get_data_frames()[0]
	df = df.rename(columns = {"TEAM_ABBREVIATION":"team","GAME_DATE":"date"})
	return df

In [25]:
seasons = [str(i)+'-'+str(i+1)[2:] for i in range(1996,2025)]
season = seasons[0]
df = get_game_log(season)
for season in seasons[1:]:
	P = get_game_log(season)
	df = pd.concat([df,P])
	time.sleep(0.5)

## Wrangling until we have the right setup
This is the "not-so fun" part, but we will create some useful columns. I'll create the following columns:
- `opp` = Opponent Team abbreviation
- `is_home` = **True** if game is at home, **False** if away
- `game_no` = **1** for first game of the series, **2** for the second, etc..
- `round_no` = **1** (first round), **2** (conference semfinals), **3** (conference finals), and **4** (finals)
- `series` - current series standing (e.g. **3-1** if the team is up **3-1** going into the game)
- `total_games` - total number of games the series went to.
- `has_home_court` - **True** if a team has home court in the series, **False** otherwise.

In [26]:
df['is_home'] = df['MATCHUP'].str.contains("vs.")
df = df.rename(columns = {"SEASON_YEAR":"season",'GAME_DATE':"date"})
df['opp'] = df['MATCHUP'].apply(lambda x: x.split(' ')[-1])
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values("date")

To create the `game_no` column, I need to groupby `season`,`team` and `series_matchup`. First, I create a `series_matchup` to get a unique identifier for each series.


In [27]:
def get_matchup(s):
    S = s.split(' ')
    S = sorted([S[0],S[-1]])
    return S[0]+'-'+S[1]
df['series_matchup'] = df['MATCHUP'].apply(get_matchup)

# Create the game_no column
df['counter'] = 1
df['game_no'] = df.groupby(['season','team','series_matchup'])['counter'].cumsum()
df = df.drop(columns = ['counter'])

Now to create the `series` column, I first create `series_wins` and `series_losses` columns. Then I will concatenate the two joined by a *-* in between.

In [28]:
# get the current number of series wins
df['win'] = 0
df.loc[df['WL']=='W','win'] = 1
df['series_wins'] = df.groupby(['season','team','series_matchup'])['win'].transform(lambda x: x.cumsum().shift().fillna(0))
df['series_wins'] = df['series_wins'].astype(int)

#get the current number of series losses
df['loss'] = 0
df.loc[df['WL']=='L','loss'] = 1
df['series_losses'] = df.groupby(['season','team','series_matchup'])['loss'].transform(lambda x: x.cumsum().shift().fillna(0))
df['series_losses'] = df['series_losses'].astype(int)



# create a 'series' column (represents what the series is at going into the game)
def do_series(s):
    w,l = s['series_wins'],s['series_losses']
    return str(w)+'-'+str(l)
df['series'] = df.apply(do_series,axis = 1)

To create the `round_no` column, I first create a dataframe that contains all the opponents a team faced that year. Then I return the index for that team (Note: this only works if your original DataFrame is correctly sorted by date.)

In [29]:
# Create a 'round_no' column (1 = first round, 2 = second round, 3 = third round, 4 = fourth round)
opps = df.groupby(['season','team'])['opp'].unique()
def do(s):
    season,team,opp = s['season'],s['team'],s['opp']
    k = (season,team)
    l = opps.loc[k].tolist()
    return l.index(opp)+1
df['round_no'] = df.apply(do,axis = 1)

Now to create the `total_games` column, I group by matchup and use the pandas' `.size()` method. This allows me to return the size of each group.

In [30]:
# get the total number of games
totals = df.groupby(['season','team','series_matchup']).size()
def get_num_games(s):
    season,team,matchup = s['season'],s['team'],s['series_matchup']
    k = (season,team,matchup)
    return totals.loc[k]
df['total_games'] = df.apply(get_num_games,axis = 1)	


To create the `has_home_court` column, I look at the first game of each series to determine who has home court advantage.

In [31]:
# Create the 'has_home_court' in series column
X = df.groupby(['season','team','series_matchup']).first()['is_home']
X = X.reset_index()
X = X.rename(columns = {'is_home':'has_home_court'})
df = pd.merge(df,X, on = ['season','team','series_matchup'], how = 'left')

# What can we do with our data?

Let's see what our added columns look like:

In [32]:
df.loc[:,['date','team','opp','is_home','has_home_court','game_no','round_no','series','WL']].set_index('date').tail()

Unnamed: 0_level_0,team,opp,is_home,has_home_court,game_no,round_no,series,WL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-04-29,BOS,ORL,True,True,5,1,3-1,W
2025-04-29,NYK,DET,True,True,5,1,3-1,L
2025-04-29,LAC,DEN,False,False,5,1,2-2,L
2025-04-29,ORL,BOS,False,False,5,1,1-3,L
2025-04-29,DEN,LAC,True,True,5,1,2-2,W


As you can see above, the last rows in our data are up-to-date: the **Celtics** were home last night against the **Magic** (Apr 29, 2025), they have home court in the series, and the series was at **3-1** going into the game. Meanwhile, the **Clippers** went into **Denver** tied **2-2**, they don't have home court in the series, and they lost last night.

# How teams fare when leading 3-1

Given that there are two games tonight (Apr 30 2025) with two more series at **3-1**, let's take a look at how teams up three games to one fare in game 5.

In [33]:
FILTER = df['series'] == '3-1'
df.loc[FILTER,['date','team','opp','is_home','WL','win','series']].set_index('date').tail()

Unnamed: 0_level_0,team,opp,is_home,WL,win,series
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-05-30,DAL,MIN,False,W,1,3-1
2024-06-17,BOS,DAL,True,W,1,3-1
2025-04-29,IND,MIL,True,W,1,3-1
2025-04-29,BOS,ORL,True,W,1,3-1
2025-04-29,NYK,DET,True,L,0,3-1


In four out of the last five times a team was up **3-1**, they closed out the series. Let's now calculate *how often* this occurs.

In [34]:
df.loc[FILTER]['WL'].value_counts()

W    106
L     64
Name: WL, dtype: int64

In [35]:
df.loc[FILTER]['win'].mean()

0.6235294117647059

Teams up 3-1 win *62*% of the time. However, the two teams tonight (**Warriors** and **Timberwolves**) don't have home court in the series.

In [37]:
df.loc[FILTER].groupby(['has_home_court','is_home'])['win'].mean()

has_home_court  is_home
False           False      0.260000
                True       0.666667
True            False      0.600000
                True       0.785714
Name: win, dtype: float64

When you're up **3-1**, home court advantage in game 5 is a big factor. The winning percentage is near *80*% if at home, and near *26*% on the road. Compare that to the overall home court advantage of the playoffs:

In [38]:
df.groupby(['has_home_court','is_home'])['win'].mean()

has_home_court  is_home
False           False      0.267303
                True       0.505714
True            False      0.494286
                True       0.732697
Name: win, dtype: float64

Teams with home court advantage playing at home win *73*% of the time.

There are different ways one can use this data. The above mainly serves as a guide to how to make columns and process the data before using it.