<a href="https://colab.research.google.com/github/VKarpick/powerplay/blob/main/potential_data_issues.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from ast import literal_eval
import pandas as pd

In [2]:
def literal_converter(x):
    return pd.NA if x == "" else literal_eval(x)
    
df = pd.read_csv(
    "https://github.com/VKarpick/powerplay/blob/main/data/powerplay_coords.csv?raw=true",
    converters={
        c: literal_converter
        for c in ("possession_coord", "opponent_coords", "goalie_coord")
    }
)

# Fewer Than 4 PKers in Frame

In [3]:
df["n_opponents"] = df.opponent_coords.apply(len)

In [4]:
missing_players = df.loc[df.n_opponents < 4]
missing_players.set_index("pp_name")
first_frames = missing_players.frame_id - missing_players.frame_id.shift(1) == 1
last_frames = missing_players.frame_id - missing_players.frame_id.shift(-1) == -1

pd.DataFrame({
    "pp_name": missing_players.loc[~first_frames, "pp_name"].values,
    "first": missing_players.loc[~first_frames, "frame_id"].values,
    "last": missing_players.loc[~last_frames, "frame_id"].values,
})

Unnamed: 0,pp_name,first,last
0,2022-02-08 Canada at USA P1 PP2,165,256
1,2022-02-08 Canada at USA P2 PP3,1499,1526
2,2022-02-08 Canada at USA P2 PP3,1541,1585
3,2022-02-08 Canada at USA P2 PP3,2865,2880
4,2022-02-08 Canada at USA P2 PP3,2895,2928
5,2022-02-08 Canada at USA P2 PP5,3172,3175
6,2022-02-08 ROC at Finland P1 PP1,455,570
7,2022-02-08 ROC at Finland P2 PP5,2249,2278
8,2022-02-08 ROC at Finland P3 PP6,872,892
9,2022-02-12 Switzerland at ROC P1 PP1,477,477


Missing backcheckers could affect:
- CAN USA P2 PP3
- SUI ROC P1 PP2
- SUI ROC P3 PP3
- SUI FIN P2 PP3 (maybe)

# Puck Recovery Immediately Follow Possession

In [5]:
possession = pd.read_csv("https://github.com/VKarpick/powerplay/blob/main/data/powerplay_possession.csv?raw=true")

In [6]:
pbp = pd.read_csv("https://github.com/VKarpick/powerplay/blob/main/data/powerplay_pbp.csv?raw=true")

In [7]:
in_zone_possession = possession.copy().loc[possession.is_pp & possession.is_pp_ozone]

recoveries = pbp.copy().loc[pbp.event == "Puck Recovery"]
recoveries["previous_frame_id"] = recoveries.frame_id - 1
recoveries["pp_frame"] = tuple(zip(recoveries.pp_name, recoveries.previous_frame_id))

has_possession = recoveries.pp_frame.isin(zip(in_zone_possession.pp_name, in_zone_possession.frame_id))

recoveries.loc[has_possession, ["pp_name", "frame_id", "clock_seconds", "pp_frame"]]

Unnamed: 0,pp_name,frame_id,clock_seconds,pp_frame
15,2022-02-08 Canada at USA P1 PP1,734.0,362,"(2022-02-08 Canada at USA P1 PP1, 733.0)"
56,2022-02-08 Canada at USA P1 PP2,1180.0,177,"(2022-02-08 Canada at USA P1 PP2, 1179.0)"
94,2022-02-08 Canada at USA P1 PP2,2920.0,119,"(2022-02-08 Canada at USA P1 PP2, 2919.0)"
122,2022-02-08 Canada at USA P2 PP3,467.0,975,"(2022-02-08 Canada at USA P2 PP3, 466.0)"
148,2022-02-08 Canada at USA P2 PP3,1783.0,932,"(2022-02-08 Canada at USA P2 PP3, 1782.0)"
212,2022-02-08 Canada at USA P2 PP5,1734.0,240,"(2022-02-08 Canada at USA P2 PP5, 1733.0)"
267,2022-02-08 Canada at USA P3 PP6,246.0,536,"(2022-02-08 Canada at USA P3 PP6, 245.0)"
424,2022-02-08 ROC at Finland P2 PP5,2661.0,369,"(2022-02-08 ROC at Finland P2 PP5, 2660.0)"
425,2022-02-08 ROC at Finland P2 PP5,2757.0,367,"(2022-02-08 ROC at Finland P2 PP5, 2756.0)"
458,2022-02-08 ROC at Finland P3 PP6,405.0,867,"(2022-02-08 ROC at Finland P3 PP6, 404.0)"


Remove possession:
- CAN USA P1 PP1
 - 719 to 733
- CAN USA P1 PP2
 - 1114 to 1179
 - 2853 to 2920
- CAN USA P2 PP3
 - 438 to 466
 - 1743 to 1782
- CAN USA P2 PP5
 - 1689 to 1733
- ROC FIN P2 PP5
 - 2600 to 2660
 - 2662 to 2756
- ROC FIN P3 PP6
 - 375 to 403
 - 976 to 1052
- SUI ROC P1 PP1
 - 1106 to 1144
- SUI ROC P1 PP2
 - 547 to 628
- SUI ROC P3 PP3
 - 2432 to 2451
- SUI ROC P3 PP5
 - 463 to 502
- SUI CAN P1 PP1
 - 5576 to 5626
- SUI CAN P3 PP5
 - 211 to 248
- USA FIN P2 PP1
 - 562 to 601
 - 706 to 723
- USA FIN P2 PP3
 - 1042 to 1098
 - 1238 to 1454
- SUI FIN P2 PP3
 - 5125 to 5179
- SUI FIN P2 PP4
 - 195 to 252
- SUI FIN P3 PP7
 - 162 to 201
- SUI FIN P3 PP8
 - 3107 to 3192
 - 3194 to 3265
 - 3853 to 3988