In [1]:
from analytics_functions import *
import pandas as pd
import plotly.graph_objects as go
import numpy as np
from collections import defaultdict

df = pd.read_csv('UM_plays.txt',sep='\t')

# Measuring team defensive efficiency using foul rates

Definition: **Time since last rest** is a measure of how long a player has been on court since their last break. Both benchings and halftime qualify as 'rests', timeouts do not. We are looking for extended rests, therefore timeouts don't qualify. It may not be a perfect measure, (subbing player out for defensive possesion to avoid a foul, then subbing in on offense is not much of a break) but these events are rare and short, so they do not affect results much.

Here we are looking to measure defensive efficiency using foul rates (fouls per minute), as our team grows more tired. Foul rates should act as a good proxy of defensive efficiency, as players that get beat off the dribble, are slow to move their feet, reach, get in improper rebounding position,..., are more likely to foul. Measuring this foul rate as a team should act as a good measure, as a tired, lazy defender getting beat off the dribble might cause a teammate under the basket to pick up a foul contesting the drive. We group time into different buckets to avoid small sample sizes. For example, we look at the foul rate when the average time since last rest for the players on court falls into the range (5 minutes - 6 minutes). We are using 1 minute windows. Foul rates are good measure of efficiency here because they normalize for how common it is to be in a given time window. An example is given below.

Suppose we want to compare how often we foul when our average time

The calculation steps are as follows.
1. Calculate the average time since last rest for every play in our play-by-play data.
2. Filter our data to show only sequences where the average time since last rest falls into our window (0 mins - 1 min, 1 min - 2 min, etc.).
3. Calculate the number of fouls and total number of minutes played in that time window.
4. Report fouls per minute as (number of fouls)/(minutes played)


Time windows with less than 10 minutes played are dropped. Ex. We have only seen 8 total minutes played where the average time since last rest of players on court is in range (13 mins - 14 mins), this is not included.

In [19]:
foul_list, durations, windows = team_foul_rate(df)
durations = [x/sum(durations)*(31*40) for x in durations] #Scale durations to 31games*40mins. Fixes small errors
durations = [np.nan if x==0 else x for x in durations]
foul_rates=[foul/d for foul,d in zip(foul_list,durations)]

season_foul_rate = sum(foul_list)/1240

duration_cutoff=10
rate_df=pd.DataFrame()
rate_df['foul_rates']=foul_rates
rate_df['windows']= windows
rate_df['durations']=durations

rate_df=rate_df[rate_df.durations >=duration_cutoff]
fig = go.Figure(data=go.Scatter(x=[x[0] for x in list(rate_df.windows)], y=rate_df.foul_rates, mode='lines+markers',))

fig.add_shape(type="line",
    x0=-0.5, y0=season_foul_rate, x1=7.5, y1=season_foul_rate,
    line=dict(
        color="red",
        width=4,
        dash="dot",
    ))
              
fig.update_layout(title='Team Foul Rate as a function of Time Since Last Rest',
    xaxis = dict(
        tickmode = 'array',
        tickvals = [x[0] for x in list(rate_df.windows)],
        ticktext = [str(x) for x in list(rate_df.windows)]
    )
)

fig.add_annotation(x=6, y=0.5,
            text="Season Average",
            showarrow=False)

fig.update_xaxes(title='Average Time Since Last Rest (Team) in Minutes 2019-20')
fig.update_yaxes(title='Fouls per Min (Team) ')
fig.show()


# Measuring Player Defensive Efficiency Using Foul Rates

Similar to what we did when examining the team as a whole, we want to know how a players foul rate changes as their time since rest increases. We take a similar approach as before. We look to bin/bucket data to avoid small sample size and smooth our data. Example, we look at Josh Vazquez's foul rate when he has (0 mins -3 mins),(1 mins -4 mins),... time since rest. You may notice these windows are of length = 3 minutes compared to 1 minute before. The window size needs to increase when our analysis gets more granular, i.e. focusing on one person rather than the team as a whole, especially when looking at sort-of-rare events like individual player fouls. You may also notice that these windows overlap. This is a technique to smooth the data, to reveal patterns underneath.

We need to drop data where we do not have a ton of information, as we can make bad assumptions based off of limited data. Here, a player needs to play at least 30 minutes in a window for that window to be valid. That probably didn't make any sense, but this is what I mean. Suppose we are looking at Josh Vazquez's foul rate when he is (0 mins - 3 mins) since last rest. We need to know how many minutes he played where he was in this (0 mins - 3 mins) range. This is what allows us to compare different windows, even though some have fewer minutes played in them. Foul rate for a window like ( x minutes - y minutes) = (# of fouls in that window)/(minutes played in that window).


**Some takeaways from graphics below** Josh Vazquez's foul rate has a sharp rise when he is in a time window that includes 10+ minutes. It is possible that after ~10 minutes, Josh becomes more fatigued, slower on his feet, and more apt to foul.

Derrick Carter-Holinger foul rates seem to steadily increase until he has been on the court (4-7) straight minutes, but then seem to decline after that. To me, this is probably indication that after picking up a foul or two, he is very good about controlling himself, and settles in.

Mack Anderson, much less data on extended periods of play. Foul rate shoots up incredibly high in the first few minutes of play, to start to die down again. Additionally, he could do a good job of controlling himself and settling in after picking up early fouls. However, the dip may also be from games against weaker opponents/matchups, where his extended minutes are because he did not foul early on.

Others, nothing super interesting, not a ton of data on extended minutes.

In [35]:
fouls_p,durations_p,windows_p = player_foul_rate(df,'VAZQUEZ,JOSH',window_size=3)

min_cutoff=30
p_df = pd.DataFrame()
p_df['fouls']=fouls_p
p_df['durations']=durations_p
p_df['windows']=windows_p
p_df['foul_rates']=p_df['fouls']/p_df['durations']
p_df=p_df[p_df.durations>=min_cutoff]

fig = go.Figure(data=go.Scatter(x=[x[0] for x in list(p_df.windows)], y=p_df.foul_rates, mode='lines+markers',line=dict(dash='dash')))

fig.update_layout(title='VAZQUEZ,JOSH',
    xaxis = dict(
        tickmode = 'array',
        tickvals = [x[0] for x in list(p_df.windows)],
        ticktext = [str(x) for x in list(p_df.windows)]
    )
)

fig.show()

In [34]:
fouls_p,durations_p,windows_p = player_foul_rate(df,'CARTER-HOLLI,DERRICK',window_size=3)

min_cutoff=30
p_df = pd.DataFrame()
p_df['fouls']=fouls_p
p_df['durations']=durations_p
p_df['windows']=windows_p
p_df['foul_rates']=p_df['fouls']/p_df['durations']
p_df=p_df[p_df.durations>=min_cutoff]

fig = go.Figure(data=go.Scatter(x=[x[0] for x in list(p_df.windows)], y=p_df.foul_rates, mode='lines+markers',line=dict(dash='dash')))

fig.update_layout(title='CARTER-HOLLI,DERRICK',
    xaxis = dict(
        tickmode = 'array',
        tickvals = [x[0] for x in list(p_df.windows)],
        ticktext = [str(x) for x in list(p_df.windows)]
    )
)

fig.show()

In [40]:
fouls_p,durations_p,windows_p = player_foul_rate(df,'ANDERSON,MACK',window_size=3)

min_cutoff=20
p_df = pd.DataFrame()
p_df['fouls']=fouls_p
p_df['durations']=durations_p
p_df['windows']=windows_p
p_df['foul_rates']=p_df['fouls']/p_df['durations']
p_df=p_df[p_df.durations>=min_cutoff]

fig = go.Figure(data=go.Scatter(x=[x[0] for x in list(p_df.windows)], y=p_df.foul_rates, mode='lines+markers',line=dict(dash='dash')))

fig.update_layout(title='ANDERSON,MACK',
    xaxis = dict(
        tickmode = 'array',
        tickvals = [x[0] for x in list(p_df.windows)],
        ticktext = [str(x) for x in list(p_df.windows)]
    )
)

fig.show()

In [44]:
fouls_p,durations_p,windows_p = player_foul_rate(df,'OWENS,KYLE',window_size=3)

min_cutoff=30
p_df = pd.DataFrame()
p_df['fouls']=fouls_p
p_df['durations']=durations_p
p_df['windows']=windows_p
p_df['foul_rates']=p_df['fouls']/p_df['durations']
p_df=p_df[p_df.durations>=min_cutoff]

fig = go.Figure(data=go.Scatter(x=[x[0] for x in list(p_df.windows)], y=p_df.foul_rates, mode='lines+markers',line=dict(dash='dash')))

fig.update_layout(title='OWENS,KYLE',
    xaxis = dict(
        tickmode = 'array',
        tickvals = [x[0] for x in list(p_df.windows)],
        ticktext = [str(x) for x in list(p_df.windows)]
    )
)

fig.show()

In [43]:
fouls_p,durations_p,windows_p = player_foul_rate(df,'EGUN,EDDY',window_size=3)

min_cutoff=30
p_df = pd.DataFrame()
p_df['fouls']=fouls_p
p_df['durations']=durations_p
p_df['windows']=windows_p
p_df['foul_rates']=p_df['fouls']/p_df['durations']
p_df=p_df[p_df.durations>=min_cutoff]

fig = go.Figure(data=go.Scatter(x=[x[0] for x in list(p_df.windows)], y=p_df.foul_rates, mode='lines+markers',line=dict(dash='dash')))

fig.update_layout(title='EGUN,EDDY',
    xaxis = dict(
        tickmode = 'array',
        tickvals = [x[0] for x in list(p_df.windows)],
        ticktext = [str(x) for x in list(p_df.windows)]
    )
)

fig.show()