# Exloring event types

In this notebook we'll use play by play data to examine the underlying data structure (e.g. which player ID is used to indicate a steal).

In [1]:
import sys
import logging

import pandas as pd

from nba_survival.data.endpoints.pbp import EventTypes, PlayByPlay

logging.basicConfig(level=logging.INFO, stream=sys.stdout)

## Load the data

### Play-by-play

In [2]:
pbp = PlayByPlay(GameID="0021800359", output_dir="../nba-data/2018-19")
pbp.load()

pbp_df = pbp.get_data()

INFO:nba_survival.data.endpoints.base:Reading existing file ../nba-data/2018-19/playbyplayv2/data_0021800359.json...


In [3]:
pbp_df.head(n=10)

Unnamed: 0,GAME_ID,EVENTNUM,EVENTMSGTYPE,EVENTMSGACTIONTYPE,PERIOD,WCTIMESTRING,PCTIMESTRING,HOMEDESCRIPTION,NEUTRALDESCRIPTION,VISITORDESCRIPTION,...,PLAYER2_TEAM_NICKNAME,PLAYER2_TEAM_ABBREVIATION,PERSON3TYPE,PLAYER3_ID,PLAYER3_NAME,PLAYER3_TEAM_ID,PLAYER3_TEAM_CITY,PLAYER3_TEAM_NICKNAME,PLAYER3_TEAM_ABBREVIATION,VIDEO_AVAILABLE_FLAG
0,21800359,2,12,0,1,8:16 PM,12:00,,,,...,,,0,0,,,,,,0
1,21800359,4,10,0,1,8:16 PM,12:00,Jump Ball Ibaka vs. Embiid: Tip to Simmons,,,...,76ers,PHI,5,1627732,Ben Simmons,1610613000.0,Philadelphia,76ers,PHI,1
2,21800359,8,1,58,1,8:17 PM,11:38,,,Embiid 8' Turnaround Hook Shot (2 PTS),...,,,0,0,,,,,,1
3,21800359,9,2,1,1,8:17 PM,11:24,MISS Lowry 27' 3PT Jump Shot,,,...,,,0,0,,,,,,1
4,21800359,10,4,0,1,8:17 PM,11:22,,,Embiid REBOUND (Off:0 Def:1),...,,,0,0,,,,,,1
5,21800359,11,2,2,1,8:17 PM,11:19,,,MISS Redick 25' 3PT Running Jump Shot,...,,,0,0,,,,,,1
6,21800359,12,4,0,1,8:17 PM,11:16,Leonard REBOUND (Off:0 Def:1),,,...,,,0,0,,,,,,1
7,21800359,13,2,79,1,8:18 PM,10:53,MISS Ibaka 21' Pullup Jump Shot,,,...,,,0,0,,,,,,1
8,21800359,15,4,0,1,8:18 PM,10:49,Siakam REBOUND (Off:1 Def:0),,,...,,,0,0,,,,,,1
9,21800359,16,2,72,1,8:18 PM,10:48,MISS Siakam 2' Putback Layup,,,...,,,0,0,,,,,,1


## Look at event types

### Assisted field goals

In [4]:
for index, row in pbp_df.loc[pbp_df["EVENTMSGTYPE"] == EventTypes().FIELD_GOAL_MADE].head(n=10).iterrows():
    if not pd.isnull(row["HOMEDESCRIPTION"]):
        print(row["HOMEDESCRIPTION"])
        print(f"PLAYER1_NAME: {row['PLAYER1_NAME']}")
        print(f"PLAYER2_NAME: {row['PLAYER2_NAME']}")
    else:
        print(row["VISITORDESCRIPTION"])
        print(f"PLAYER1_NAME: {row['PLAYER1_NAME']}")
        print(f"PLAYER2_NAME: {row['PLAYER2_NAME']}")

Embiid 8' Turnaround Hook Shot (2 PTS)
PLAYER1_NAME: Joel Embiid
PLAYER2_NAME: None
Leonard 9' Turnaround Fadeaway (2 PTS)
PLAYER1_NAME: Kawhi Leonard
PLAYER2_NAME: None
Simmons  Reverse Layup (2 PTS) (Chandler 1 AST)
PLAYER1_NAME: Ben Simmons
PLAYER2_NAME: Wilson Chandler
Ibaka 1' Driving Dunk (2 PTS)
PLAYER1_NAME: Serge Ibaka
PLAYER2_NAME: None
Chandler 24' 3PT Jump Shot (3 PTS) (Butler 1 AST)
PLAYER1_NAME: Wilson Chandler
PLAYER2_NAME: Jimmy Butler
Leonard 2' Running Dunk (4 PTS) (Siakam 1 AST)
PLAYER1_NAME: Kawhi Leonard
PLAYER2_NAME: Pascal Siakam
Butler 26' 3PT Jump Shot (3 PTS) (Redick 1 AST)
PLAYER1_NAME: Jimmy Butler
PLAYER2_NAME: JJ Redick
Leonard 15' Pullup Jump Shot (6 PTS)
PLAYER1_NAME: Kawhi Leonard
PLAYER2_NAME: None
Redick 19' Step Back Jump Shot (2 PTS) (Simmons 1 AST)
PLAYER1_NAME: JJ Redick
PLAYER2_NAME: Ben Simmons
Ibaka 1' Running Layup (4 PTS) (Lowry 1 AST)
PLAYER1_NAME: Serge Ibaka
PLAYER2_NAME: Kyle Lowry


CONCLUSION: ``PLAYER1_ID`` is the player that makes the field goal, ``PLAYER2_ID`` is the assisting player (if applicable).

### Turnovers

In [5]:
for index, row in pbp_df.loc[pbp_df["EVENTMSGTYPE"] == EventTypes().TURNOVER].head(n=10).iterrows():
    if not pd.isnull(row["HOMEDESCRIPTION"]):
        print(row["HOMEDESCRIPTION"])
        print(f"PLAYER1_NAME: {row['PLAYER1_NAME']}")
        print(f"PLAYER2_NAME: {row['PLAYER2_NAME']}")
    else:
        print(row["VISITORDESCRIPTION"])
        print(f"PLAYER1_NAME: {row['PLAYER1_NAME']}")
        print(f"PLAYER2_NAME: {row['PLAYER2_NAME']}")

Simmons Out of Bounds Lost Ball Turnover (P1.T1)
PLAYER1_NAME: Ben Simmons
PLAYER2_NAME: None
Lowry Out of Bounds - Bad Pass Turnover Turnover (P1.T1)
PLAYER1_NAME: Kyle Lowry
PLAYER2_NAME: None
Green Out of Bounds - Bad Pass Turnover Turnover (P1.T2)
PLAYER1_NAME: Danny Green
PLAYER2_NAME: None
Lowry STEAL (1 STL)
PLAYER1_NAME: Joel Embiid
PLAYER2_NAME: Kyle Lowry
Ibaka Traveling Turnover (P1.T3)
PLAYER1_NAME: Serge Ibaka
PLAYER2_NAME: None
McConnell Traveling Turnover (P1.T3)
PLAYER1_NAME: T.J. McConnell
PLAYER2_NAME: None
Anunoby Bad Pass Turnover (P1.T4)
PLAYER1_NAME: OG Anunoby
PLAYER2_NAME: T.J. McConnell
VanVleet Bad Pass Turnover (P1.T5)
PLAYER1_NAME: Fred VanVleet
PLAYER2_NAME: T.J. McConnell
Embiid Offensive Foul Turnover (P2.T4)
PLAYER1_NAME: Joel Embiid
PLAYER2_NAME: None
Lowry Bad Pass Turnover (P2.T6)
PLAYER1_NAME: Kyle Lowry
PLAYER2_NAME: T.J. McConnell


CONCLUSION: ``PLAYER1_ID`` is the player that committed the turnover. ``PLAYER2_ID`` has the player that stole the ball (if applicable).

### Foul

In [6]:
for index, row in pbp_df.loc[pbp_df["EVENTMSGTYPE"] == EventTypes().FOUL].head(n=10).iterrows():
    if not pd.isnull(row["HOMEDESCRIPTION"]):
        print(row["HOMEDESCRIPTION"])
        print(f"PLAYER1_NAME: {row['PLAYER1_NAME']}")
        print(f"PLAYER2_NAME: {row['PLAYER2_NAME']}")
    else:
        print(row["VISITORDESCRIPTION"])
        print(f"PLAYER1_NAME: {row['PLAYER1_NAME']}")
        print(f"PLAYER2_NAME: {row['PLAYER2_NAME']}")

Siakam P.FOUL (P1.T1) (J.Phillips)
PLAYER1_NAME: Pascal Siakam
PLAYER2_NAME: Ben Simmons
Simmons P.FOUL (P1.T1) (J.Phillips)
PLAYER1_NAME: Ben Simmons
PLAYER2_NAME: Pascal Siakam
Butler S.FOUL (P1.T2) (E.Dalen)
PLAYER1_NAME: Jimmy Butler
PLAYER2_NAME: Kawhi Leonard
Muscala P.FOUL (P1.T3) (B.Barnaky)
PLAYER1_NAME: Mike Muscala
PLAYER2_NAME: Kawhi Leonard
Embiid S.FOUL (P1.T4) (E.Dalen)
PLAYER1_NAME: Joel Embiid
PLAYER2_NAME: Jonas Valanciunas
Valanciunas P.FOUL (P1.T2) (E.Dalen)
PLAYER1_NAME: Jonas Valanciunas
PLAYER2_NAME: Joel Embiid
Valanciunas L.B.FOUL (P2.T3) (J.Phillips)
PLAYER1_NAME: Jonas Valanciunas
PLAYER2_NAME: Furkan Korkmaz
Anunoby P.FOUL (P1.PN) (E.Dalen)
PLAYER1_NAME: OG Anunoby
PLAYER2_NAME: JJ Redick
Valanciunas S.FOUL (P3.PN) (B.Barnaky)
PLAYER1_NAME: Jonas Valanciunas
PLAYER2_NAME: Joel Embiid
Embiid Offensive Charge Foul (P2.T1) (B.Barnaky)
PLAYER1_NAME: Joel Embiid
PLAYER2_NAME: Kyle Lowry


Let's look at one of the shooting fouls to confirm the player that commits the foul

In [7]:
idx = pbp_df.index[pbp_df["HOMEDESCRIPTION"] == "Valanciunas S.FOUL (P3.PN) (B.Barnaky)"].values[0]
pbp_df.loc[list(range(idx, idx+5)), ["EVENTMSGTYPE", "HOMEDESCRIPTION", "VISITORDESCRIPTION", "PLAYER1_NAME", "PLAYER2_NAME"]]

Unnamed: 0,EVENTMSGTYPE,HOMEDESCRIPTION,VISITORDESCRIPTION,PLAYER1_NAME,PLAYER2_NAME
101,6,Valanciunas S.FOUL (P3.PN) (B.Barnaky),,Jonas Valanciunas,Joel Embiid
102,3,,MISS Embiid Free Throw 1 of 2,Joel Embiid,
103,4,,76ers Rebound,,
104,8,SUB: Monroe FOR Valanciunas,,Jonas Valanciunas,Greg Monroe
105,3,,MISS Embiid Free Throw 2 of 2,Joel Embiid,


CONCLUSION: ``PLAYER1_ID`` is the player committing the foul, ``PLAYER2_ID`` is the player that drew the foul.

### Violation

In [8]:
for index, row in pbp_df.loc[pbp_df["EVENTMSGTYPE"] == EventTypes().VIOLATION].head(n=10).iterrows():
    if not pd.isnull(row["HOMEDESCRIPTION"]):
        print(row["HOMEDESCRIPTION"])
        print(f"PLAYER1_NAME: {row['PLAYER1_NAME']}")
        print(f"PLAYER2_NAME: {row['PLAYER2_NAME']}")
    else:
        print(row["VISITORDESCRIPTION"])
        print(f"PLAYER1_NAME: {row['PLAYER1_NAME']}")
        print(f"PLAYER2_NAME: {row['PLAYER2_NAME']}")

Butler Violation:Kicked Ball (B.Barnaky)
PLAYER1_NAME: Jimmy Butler
PLAYER2_NAME: None
Embiid Violation:Defensive Goaltending (J.Phillips)
PLAYER1_NAME: Joel Embiid
PLAYER2_NAME: None


Let's look at the goaltending violation.

In [9]:
idx = pbp_df.index[pbp_df["VISITORDESCRIPTION"] == "Embiid Violation:Defensive Goaltending (J.Phillips)"].values[0]
pbp_df.loc[list(range(idx-3, idx+1)), ["EVENTMSGTYPE", "SCOREMARGIN", "HOMEDESCRIPTION", "VISITORDESCRIPTION", "PLAYER1_NAME", "PLAYER2_NAME"]]

Unnamed: 0,EVENTMSGTYPE,SCOREMARGIN,HOMEDESCRIPTION,VISITORDESCRIPTION,PLAYER1_NAME,PLAYER2_NAME
133,1,-6.0,Anunoby 2' Driving Layup (2 PTS),,OG Anunoby,
134,5,,Monroe STEAL (1 STL),Redick Bad Pass Turnover (P1.T6),JJ Redick,Greg Monroe
135,1,-4.0,Monroe 2' Running Layup (2 PTS),,Greg Monroe,
136,7,,,Embiid Violation:Defensive Goaltending (J.Phil...,Joel Embiid,


Looks like the previous event (``Monroe 2' Running Layup (2 PTS)``) is counted as a make. Simpler for us!

### "Unknown" Events

In [10]:
for index, row in pbp_df.loc[pbp_df["EVENTMSGTYPE"] == EventTypes().UNKNOWN].head(n=10).iterrows():
    if not pd.isnull(row["HOMEDESCRIPTION"]):
        print(row["HOMEDESCRIPTION"])
        print(f"PLAYER1_NAME: {row['PLAYER1_NAME']}")
        print(f"PLAYER2_NAME: {row['PLAYER2_NAME']}")
    elif not pd.isnull(row["VISITORDESCRIPTION"]):
        print(row["VISITORDESCRIPTION"])
        print(f"PLAYER1_NAME: {row['PLAYER1_NAME']}")
        print(f"PLAYER2_NAME: {row['PLAYER2_NAME']}")
    else:
        print(row["NEUTRALDESCRIPTION"])

None
None
None
None


In [11]:
pbp_df.loc[pbp_df["EVENTMSGTYPE"] == EventTypes().UNKNOWN]

Unnamed: 0,GAME_ID,EVENTNUM,EVENTMSGTYPE,EVENTMSGACTIONTYPE,PERIOD,WCTIMESTRING,PCTIMESTRING,HOMEDESCRIPTION,NEUTRALDESCRIPTION,VISITORDESCRIPTION,...,PLAYER2_TEAM_NICKNAME,PLAYER2_TEAM_ABBREVIATION,PERSON3TYPE,PLAYER3_ID,PLAYER3_NAME,PLAYER3_TEAM_ID,PLAYER3_TEAM_CITY,PLAYER3_TEAM_NICKNAME,PLAYER3_TEAM_ABBREVIATION,VIDEO_AVAILABLE_FLAG
111,21800359,162,13,0,1,8:42 PM,0:00,,,,...,,,0,0,,,,,,1
258,21800359,371,13,0,2,9:18 PM,0:00,,,,...,,,0,0,,,,,,1
371,21800359,529,13,0,3,9:58 PM,0:00,,,,...,,,0,0,,,,,,1
482,21800359,690,13,0,4,10:34 PM,0:00,,,,...,,,0,0,,,,,,1


Not sure what these events are. We should scrub them out.