<a href="https://colab.research.google.com/github/KeoniM/NFL_Data_Cleaning/blob/main/NFL_Plays_Week1_2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**PURPOSE:**
- Accurately clean a week's worth of play data
  - Season 2023 -> Week 1

**THOUGHTS, CONCERNS AND IDEAS FOR LATER:**

*General*

1. Players with the same name
  - I do think that the raw data has naming conventions to decipher between two players with the exact same name but not 100% sure.
2. Cleaning check (TESTING)
  - I need some type of method that will help decern whether these plays have been cleaned correctly. Currently I am manually checking but this is not sustainable or efficient.
    - **IDEA:** Cross reference recorded NFL stats with stats here and compare likeness. (maybe return a df that highlights differences?)
3. Adjust features (PlayOutcomes/PlayTypes/IsScoringDrive/etc...) for plays that have been split up into multiple rows (Fumble Recoveries, Interceptions, etc...).
  - EXAMPLE: Running back fumbles on a run play but recovers it and rushes for x yards.
    - This would still count towards his rushing yards.
    - 'PlayType' = 'Run'
    - 'PlayOutcome' = 'X Yard Run'
      - 2 rows will be present for this type of play. 1 before fumble and 1 after fumble. Each will have their own separate 'PlayOutcome'..?
  - EXAMPLE: Any fumble recovery that is not the runningback on an intended running play
    - This would not count as rushing yards for the player who recovered the fumble.
    - 'PlayType' = 'Fumble Return'..?
    - 'PlayOutcome' = 'X Yard Fumble Return'..?
  - EXAMPLE: If a team throws an interception and that interception results in a touchdown for the opposing team, I do not think it should be considered as a 'scoring drive' for the team that threw the interception.
    - IDEA: For the category "isScoringDrive" the categories could be:
      1. 0 - Is not a scoring drive
      2. 1 - Is scoring drive for team on offense
      3. 2 - Is scoring drive for team on defense
  - When a play is split up into multiple rows, should each row have the starting formation of the play or should the initial starting row of the play have the formation?
  - IDEA: Should I broaden 'playtypes' to include:
    1. yardage after fumble (Currently have it as 'Run' playtype)
    2. yardage after interception (Currently have it as 'Interception')
4. Condense features.
  - For plays such as punt or kickoff, maybe I can group together data such as who is the longsnapper, holder and kicker instead of representing them on their own.
5. Condense regular expressions to grab multiple pieces of wanted data instead of individual.
6. Use 'Fuzzywuzzy' to find like play outcomes.
  - This will give me a chance to automate play types instead of eying them and seaparating them manually.
    - Not sure if I will actually need this?
7. Map team name with their abbreviations ( e.g. "Cowboys" <-> "DAL" )
  - Maybe with larger datasets with multiple weeks, I can map team names with team abbrevations that match up the most.
8. Shorten cleaning methods by creating a helper method to grab data from the defense on a play
9. Add features to break down penalty plays.
10. Punt and kickoff returns are practically identical. Try to find a regular expression that will capture them both AND catch touchdown plays too.

*Offense*

1. Trick plays
  - Need a larger sample size that contains more trick plays
2. Latterals
  - Need a larger sample size that contains more latterals
    - (Only one has been found within the dataset "Season 2023 Week 1", it was handled for that specific play type but have not implement for all)
      - IDEA: Make a new helper method to handle "Handoff" plays.
        - Should I make a new feature for handoffs? Like a feature that links one action to another? Would that be valuable?

*Defense*

1. Nuance of players recorded for sacks & forced fumbles
  - Look under sack play type cleaning method
    - The formatting of multiple defending players in on a fumbled play may cause wrong recording of data (e.i. player who assisted in tackle may be credited for the forced fumble)

2. DEFENSIVE STATS ARE CURRENTLY WRONG
  - I will work on 0.5 tackes, solo tackles and assists. I need to adjust cleaning methods to collect this data better.
    - ';' means solo and assisted tackle
    - ',' means 0.5 tackle
  - Need to figure out when players a noticed for good coverage? Assisting in an interception (I think this is what the play descriptions are stating?)

# MOUNTING AND IMPORTS

In [1]:
# Mount your Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# Used to access personal google cloud services
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


In [3]:
# Imports

# Data manipulation
import pandas as pd

# Regular expressions
import re

# Grab data from database
from google.cloud import bigquery

In [4]:
# # debugger (maybe use in the future)
# %pdb on

# LOADING DATA (BigQuery queries)

In [5]:
# Client connect to bigquery project
client = bigquery.Client('nfl-data-430702')

## Season 2023 Week 1

In [6]:
# Grabbing all plays from 2023 Week 1 NFL Sesason
week1_2023_plays_query = """
                         SELECT *
                         FROM `nfl-data-430702.NFL_Scores.NFL-Plays-Week1_2023`
                         """

# Running psuedo query, and returns the amount of bytes it will take to run query
dry_run_config = bigquery.QueryJobConfig(dry_run=True)
dry_run_query = client.query(week1_2023_plays_query, job_config=dry_run_config)
print("This query will process {} bytes.".format(dry_run_query.total_bytes_processed))

# Running query (Being mindful of the amount of data being grabbed)
# Will grab a maximum of a Gigabyte
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**9)
safe_config_query = client.query(week1_2023_plays_query, job_config=safe_config)

This query will process 570194 bytes.


In [7]:
# Putting data attained from query into a dataframe
week1_2023_plays = safe_config_query.to_dataframe()

In [8]:
week1_2023_plays.head()

Unnamed: 0,Season,Week,Day,Date,AwayTeam,HomeTeam,Quarter,DriveNumber,TeamWithPossession,IsScoringDrive,PlayNumberInDrive,IsScoringPlay,PlayOutcome,PlayDescription,PlayStart
0,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,1,0,Kickoff,G.Zuerlein kicks 65 yards from NYJ 35 to end z...,Kickoff from NYJ 35
1,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,2,0,7 Yard Pass,(15:00) (Shotgun) J.Allen pass short right to ...,1st & 10 at BUF 25
2,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,3,0,5 Yard Pass,"(14:34) (No Huddle, Shotgun) J.Allen pass shor...",2nd & 3 at BUF 32
3,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,4,0,3 Yard Run,(14:01) J.Cook up the middle to BUF 40 for 3 y...,1st & 10 at BUF 37
4,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,5,0,2 Yard Run,(13:24) (Shotgun) J.Cook up the middle to BUF ...,2nd & 7 at BUF 40


In [9]:
# Noting the original size of the raw uncleaned dataframe of data
# - (rows, columns)
week1_2023_plays.shape

(2600, 15)

# CATEGORIZE PLAYS
- The goal here is to parse out the different values for 'PlayOutcome'
  - This is where I will separate different types of plays
    - ( pass / run / kickoff / etc. )

In [10]:
# Maybe try to fuzzywuzzy this in the future?
# - I need to narrow these down into basic categories.
# - (Take away numbers & "Yard")
# - Find the most common words between all outcomes (hoping to get all categories e.i. 'Pass', 'Run', 'Touchdown', etc...)

# All play outcomes from the game
# - From here we can categorize and clean plays accordingly
week1_2023_plays['PlayOutcome'].unique()

array(['Kickoff', '7 Yard Pass', '5 Yard Pass', '3 Yard Run',
       '2 Yard Run', 'Pass Incomplete', 'Punt', '-5 Yard Penalty',
       '5 Yard Run', '1 Yard Pass', '14 Yard Run', '3 Yard Pass',
       '8 Yard Run', '6 Yard Pass', '15 Yard Pass', '-9 Yard Sack',
       '4 Yard Pass', '13 Yard Pass', 'Field Goal', '-2 Yard Sack',
       'Interception', '-5 Yard Run', '18 Yard Pass', '8 Yard Pass',
       '6 Yard Run', '12 Yard Run', '-1 Yard Run', '26 Yard Pass',
       'Touchdown Bills', 'Extra Point Good', '13 Yard Run',
       '-3 Yard Sack', '7 Yard Run', '9 Yard Pass', '4 Yard Run',
       'Fumble', '-10 Yard Penalty', '10 Yard Pass', '26 Yard Run',
       '5 Yard Penalty', '-10 Yard Sack', '22 Yard Pass', '-4 Yard Run',
       '-12 Yard Sack', '83 Yard Run', '1 Yard Run', '2 Yard Pass',
       '10 Yard Run', 'Run for No Gain', '12 Yard Pass', '20 Yard Pass',
       '9 Yard Run', '-2 Yard Pass', 'Sack', '24 Yard Pass',
       '14 Yard Pass', 'Touchdown Jets', '-3 Yard Run', '-2 Yar

In [11]:
# NOTES:
# - Currently, I am eyeing at all unique play outcomes to categorizing them.
#   - This type of approach is not flexable because a play outcome can
#     arise that has not been seen yet.
#     - There may be more play outcomes in the future when working on a full season,
#       let alone all seasons and future games

# Play Types with complete cleaning methods (As far as this sample size goes)

# ~ OFFENSE ~
df_2023_pass_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Pass')]
df_2023_run_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Run')]
# ~ DEFENSE ~
df_2023_interception_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Interception')]
df_2023_sack_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Sack')]
# ~ SPECIAL TEAMS ~
df_2023_punt_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Punt')]
df_2023_kickoff_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Kickoff')]
# ~ SCORING ~
df_2023_touchdown_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Touchdown')]
df_2023_extrapoint_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Extra Point')]
df_2023_fieldgoal_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Field Goal')]
df_2023_2pt_conversion_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('2PT Conversion')]
# ~ OTHER ~
df_2023_fumble_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Fumble')]
df_2023_penalty_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Penalty')]
df_2023_turnover_on_downs_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Turnover on Downs')]


## SANITY CHECK (All Plays Accounted for)
  - Once all plays have been categorizing, will compare the sum of all plays within each category to the size of the original dataframe of plays.
    - Goal is to make sure the number of plays is the same.

In [12]:
# Categorized plays

plays_list = [df_2023_pass_week1,         # Offense
              df_2023_run_week1,
              df_2023_interception_week1, # Defense
              df_2023_sack_week1,
              df_2023_punt_week1,         # Special Teams
              df_2023_kickoff_week1,
              df_2023_touchdown_week1,    # Scoring
              df_2023_extrapoint_week1,
              df_2023_fieldgoal_week1,
              df_2023_2pt_conversion_week1,
              df_2023_fumble_week1,       # Other
              df_2023_penalty_week1,
              df_2023_turnover_on_downs_week1]

num_plays_categorized = 0

for plays in plays_list:
  num_plays_categorized = num_plays_categorized + len(plays)

num_plays_categorized == len(week1_2023_plays)

True

# HELPER METHODS (personal use)
- For personal use, does not actually take part in cleaning dataset at all.

In [13]:
# PURPOSE:
# - Quick look at a section of plays
#   - Ideally the plays that the user wants to break down and clean.
# INPUT PARAMETERS:
# df_all_plays      - DataFrame - The original dataframe where the desired plays to view came from
# df_section_plays  - DataFrame - A section of the original dataframe the user wants to view
# RETURN:
# - Printing to the console:
#   1. index of play
#   2. 'PlayDescription' feature of play
#   3. 'PlayOutcome' feature of play
def print_plays(df_all_plays, df_section_plays):
  for idx, value in df_section_plays['PlayOutcome'].items():
    play = df_all_plays['PlayDescription'].iloc[idx]
    print("index:" + str(idx))
    for i in play.split(". "):
      print(i)
    print(value)
    print()

In [14]:
# EXAMPLE: Displaying all touchdown plays within dataset

print_plays(week1_2023_plays, df_2023_touchdown_week1)

index:33
(4:51) (Shotgun) J.Allen pass short right to S.Diggs for 5 yards, TOUCHDOWN.
Touchdown Bills

index:134
(4:58) Z.Wilson pass short left to G.Wilson for 3 yards, TOUCHDOWN.
Touchdown Jets

index:152
(9:21) S.Martin punts 42 yards to NYJ 35, Center-R.Ferguson
X.Gipson for 65 yards, TOUCHDOWN.
Touchdown Jets

index:163
(6:14) (Shotgun) J.Love pass short middle to R.Doubs for 8 yards, TOUCHDOWN.
Touchdown Packers

index:197
(10:23) (Shotgun) A.Jones right guard for 1 yard, TOUCHDOWN.
Touchdown Packers

index:202
(6:34) (Shotgun) J.Love pass short middle to A.Jones for 35 yards, TOUCHDOWN
GB-A.Jones was injured during the play
His return is Questionable.
Touchdown Packers

index:214
(13:34) J.Love pass short left to R.Doubs for 4 yards, TOUCHDOWN.
Touchdown Packers

index:219
(12:53) (Shotgun) J.Fields pass short middle intended for D.Mooney INTERCEPTED by Q.Walker [K.Clark] at CHI 37
Q.Walker for 37 yards, TOUCHDOWN
PENALTY on GB-R.Douglas, Unsportsmanlike Conduct, 15 yards, enfor

# PIPELINE
  - ORDER
    1. Regular expressions
      - Used to find common patterns within raw data
    1. Cleaning methods
      - Unique cleaning methods for each play type
    2. Main pipeline method
      - Control flow of cleaning methods



## 1. REGULAR EXPRESSIONS

In [15]:
####################################################
# REGULAR EXPRESSIONS USED TO LOCATE SPECIFIC DATA #
####################################################

# Will eventually have to combine some regular expressions into one
# - For example, punt returns <-> kick returns <-> interceptions <-> fumble recoveries (?)

###########
# GENERAL #
###########

# Players name (Grabs every variation come across so far)
name_pattern = "(?:[A-Za-z]+-)*[A-Za-z]+\.[A-Za-z]+(?:-[A-Za-z]+)*"

################
# PLAY DETAILS #
################

# Play start time
time_on_clock_pattern = r'\((\d*:\d+)\)'

# Offense play formation
formation = r'\(([A-Za-z]+ ?[A-Za-z]*,? ?[A-Za-z]*)\)'

# Yards gained on play
yardage_gained = r'for (-?[0-9]+) yards?'

###########
# OFFENSE #
###########

# Passer (Player passing, Player spiking, Player who got sacked)
passer_name_pattern = f"({name_pattern}) (?:pass|spiked|sacked)"

# Rushing play (Player running ball)
rusher_pattern = f"({name_pattern})(?: scrambles)? (?:left|right|up|kneels).?"

# Pass play (Returns intended receiver and the direction of the pass)
receiver_pattern = f"(short|deep) (left|right|middle) (?:to|intended for) ({name_pattern})"

# 2 Point Conversion (Pass attempt)
tp_conversion_pass_pattern = f"({name_pattern}) pass to ({name_pattern})"

# 2 Point Conversion (Rush attempt)
tp_conversion_rush_pattern = f"({name_pattern}) rushes (?:left|right|up)"

# Handoff
handoff_pattern = f"Handoff to ({name_pattern}) to(?: [A-Z]+)? [0-9]+ for -?[0-9]+ yards?"

###########
# DEFENSE #
###########

# Tackles (solo, assist, shared) <-- the goal. Right now all I have is tackle1 and tackle2

# Main defender on play (Used to grab tackler1 and used to grab players that sacked the passer)
defense_tackler_1_name_pattern = f"\(({name_pattern})"

# Second defender on play (Used to grab tackler2)
defense_tackler_2_name_pattern = f" ({name_pattern})\)" # Will have a ")" at the end of the name



solo_tackle_pattern = f"\(({name_pattern})\)"

shared_tackle_pattern = f"\(({name_pattern}), ({name_pattern})\)"

assisted_tackle_pattern = f"\(({name_pattern}); ({name_pattern})\)"



# Pressure (Who applied pressure to passer)
# - I think it might be possible for multiple defenders to apply pressure to the passer.
defense_pressure_name_pattern = f"\[({name_pattern})\]"

# Interception (Player who intercepted pass)
interception_name_pattern = f"INTERCEPTED by ({name_pattern})"

# Quarterback Fumbles (Quarterback fumble solo, Quarterback fumble solo -> who recovers, Quarterback <-> Center discrepancy)

# How far passer went before fumbling on his own
qb_fumble_pattern = f" ({name_pattern}) to(?: [A-Z]+) [0-9]+ for -?[0-9]+ yards$" # Passer fumbles are always the initial action of the play

# Action directly after a quarterback only fumble
qb_fumble_description_pattern = f"^FUMBLES, "

# Fumble missnap (Will either be the quarterback or center.)
aborted_fumble_pattern = f"({name_pattern}) FUMBLES"

# Forced fumbles (Player who forced the fumble)
forced_fumble_pattern = f"FUMBLES \(({name_pattern})\)"

# Sack (Who is credited with a sack, who split sack, how many yards was the sack)

# Fumble from sack (Player who forced the fumble on a sack)
sacked_forced_fumble_sentence = f"FUMBLES \({name_pattern}\) \[({name_pattern})\]"

# Split sack (Players who equally received credit for sack)
split_sack_pattern = f"sack split by ({name_pattern}) and ({name_pattern})"

# Yardage of sack (starting from line of scrimmage)
yardage_from_sack = r'sacked(?: ob)? at(?: [A-Z]+)? [0-9]+ for (-?[0-9]+) yards'

# Defense takeaway (takeaway for yardage)
defensive_takeaway_run_pattern = f"^({name_pattern}) (?:pushed ob at|ran ob at|to)(?: [A-Z]+) -?[0-9]+ for " # yardage after fumble recovery & yardage after interception

# Defense takeaway (takeaway for touchdown)
touchdown_after_takeaway_pattern = f"({name_pattern}) for [0-9]+ yards, TOUCHDOWN" # touchdown after a fumble recovery or interception

#################
# SPECIAL TEAMS #
#################

# Punting play (Who was the punter, How many yards the ball went, Who was the Longsnapper)
punting_pattern = f"({name_pattern}) punts (-?[0-9]+) yards? to(?: [A-Z]+ -?[0-9]+| -?[0-9]+| end zone), Center-({name_pattern})"

# Punt return (Who was returning the punt, How many yards did they go, The player(s) that tackled the returner)
# punt_return_pattern = f"({name_pattern}) (?:pushed ob at|ran ob at|to)(?: [A-Z]+)? [0-9]+ for (-?[0-9]+) yards? \(({name_pattern})(?:(?:,|;) ({name_pattern}))?\)" # yardage after punt
punt_return_pattern = f"({name_pattern}) (?:pushed ob at|ran ob at|to)(?: [A-Z]+)? [0-9]+ for"

# J.Reed (didn't try to advance) to CHI 44 for no gain.
kick_return_pattern = f"({name_pattern})(?: \(didn't try to advance\))? (?:pushed ob at|ran ob at|to)(?: [A-Z]+)? [0-9]+ for (no gain|(-?[0-9]+) yards? \(({name_pattern})(?:(?:,|;) ({name_pattern}))?\))" # yardage after kickoff

# Punt return resulting in fair catch
punt_fair_catch_pattern = f", fair catch by ({name_pattern})"

# Punt or kickoff downed by
kick_downed_by_pattern = f", downed by ({name_pattern})"

# Kickoff play (Who was the kicker, How many yards the ball was kicked )
kickoff_pattern = f"({name_pattern}) kicks(?: onside)? (-?[0-9]+) yards from"

# Field goal (Good)
field_goal_good_pattern = f"({name_pattern}) (-?[0-9]+) yard field goal is GOOD, Center-({name_pattern}), Holder-({name_pattern})."

# Field goal (no good)
field_goal_no_good_pattern = f"({name_pattern}) (-?[0-9]+) yard field goal is No Good, ([A-Za-z]+(?: [A-Za-z]+)*), Center-({name_pattern}), Holder-({name_pattern})."

# Field goal (blocked)
field_goal_blocked_pattern = f"({name_pattern}) (-?[0-9]+) yard field goal is BLOCKED \(({name_pattern})\), Center-({name_pattern}), Holder-({name_pattern}), RECOVERED by ({name_pattern})"

# Extra point (good)
extra_point_good_pattern = f"({name_pattern}) extra point is GOOD, Center-({name_pattern}), Holder-({name_pattern})."

# Extra point (no good)
extra_point_no_good_pattern = f"({name_pattern}) extra point is No Good, ([A-Za-z]+(?: [A-Za-z]+)*), Center-({name_pattern}), Holder-({name_pattern})."

##############
#  INJURIES  #
##############

# Injuries (Returns the player(s) who go injuried during play)
# injury = f"[A-Z]+-({name_pattern}) was injured during the play"
injury_pattern = f"[A-Z]+-({name_pattern}) was injured during the play"

## 2. CLEANING METHODS

###HELPER CLEANING METHODS

#### helper method for fumbles

In [16]:
# PURPOSE:
# - Universal helper method that extracts fumbled data from every playtype.

# BASIC PLAN:
# 1. Accept a single row of a play that has been fumbled from the main dataframe of plays.
# 2. Replace that single row with a dataframe containing all extracted data.
#    - These replacement dataframes are not limited to a single row but can be many, depending on the play.

# BASIC DESIGN STEP BY STEP:
# 1. Split play description into significant actions and put into a list
#    EXAMPLES:
#    - intended play
#    - fumble recovery for yardage
# 2. Clean significant actions as their own rows
#    EXAMPLE METHODS USED TO CLEAN:
#    - main cleaning method (method used to clean a playtype that is using this helper method)
#    - run playtype cleaning method (Will be used to clean all fumble recoveries for yardage)
# 3. Create and return replacement dataframe containing all cleaned significant actions (or rows)

# INPUT PARAMETERS:
# df_plays                  - dataframe - dataframe of plays
# play                      -  String   - 'PlayDescription' of the current play that is being cleaned
# play_index                -  Integer  - index of play (Almost always from main dataframe of plays)
# main_action_patterns      -    list   - A list of regular expressions that are meant to pinpoint primary
#                                         actions within a play that will be used to extract these actions
#                                         to create a row within the replacement dataframe
# main_cleaning_method      - function  - A callback function (the function using this helper method) which
#                                         is used to clean intended play actions

# RETURN:
# df_multi_row_play - dataframe - dataframe of organized and cleaned actions stemming from a single unclean fumbled play

# NOTE: I need to comment effectively, grabbing all the nuances of what is being grabbed
#       for each playtype. All playtypes are different and need to be described.

# CONCERNS:
# 1. Nuance on sacked plays
#    - Formatting of defender who caused sack is different from a solo and an assisted
# 2. Who is at fault for aborted plays
#    - Formatting on aborted plays is different if the fault lands on the center or passer
# 3. May have to add the parameter "secondary_action_patterns"
#    - I just ran into the issue of a kickoff return fumble.
#      - In this case there is 1. the kickoff 2. the kickoff return 3. the fumble from kickoff return.

def extract_fumble_data(df_plays, play, play_index, main_action_patterns, main_cleaning_method):

  original_play_copy = df_plays.loc[play_index]

  # Breaking play description into a list of sentences
  play_elements = play.split(". ")

  #################
  # KEY VARIABLES #
  #################

  # 'play_split' info:
  # - Designed to be a 2D list (list of lists)
  # - All elements within this list together will represent a single play.
  # - Each element within the list will become a separate row that will replace/add to the original dataframe of plays.
  #   - Each element represents a distict action within the single play and will have all data required for that new row.
  #   ROW CONTENTS:
  #   1. [ ( The intended play ) + ( Extra data ) , ( Who caused the fumble ) ]   <-  This row will have extra info such as (injuries / penalties / eligibility / etc...)
  #                                                                                   - "The intended play" includes 'Aborted' plays
  # ~ 2. [            ( The fumble recovery )     , ( Who caused the fumble ) ]   <-  This can happen repeatedly or not at all
  # ~ 3. [ (The fumble recovery for a touchdown) ]                                <-  This can only happen once for a single play or not at all
  play_split = []

  # 'extra_data' info:
  # - Will be a single string containing all additional data from the play such as (injuries / penalties / eligibility / etc...)
  # - Will be put into a single row dataframe and cleaned
  #   - Once extra data has been cleaned, the single row (now clean) dataframe will serve as a shell for
  #     the first new row that will replace the old play within the main dataframe.
  #     - This first new row will have the initial action of the play as well as all additional information from the play
  extra_data = ""

  # - Iterate through each element within play_elements
  # - NOTE: We are iterating through actions of the play cronologically
  for string in play_elements:

    ######################################
    # ORGANIZING KEY ACTIONS WITHIN PLAY #
    ######################################

    # ACTIONS WITHIN PLAY THAT DESERVE THEIR OWN ROW:
    # These situations will have their own list element within "play_split" (meaning their own row within the new cleaned replacement dataframe)
    # 1. intended play (initial action might be a better name for plays such as ones that have been aborted)
    #    RUN PLAYS:
    #     - Fumbles after inteded run play
    #     - Aborted fumbles
    #     - qb only fumbles
    #    SACKED PLAYS:
    #     - fumbles after sack
    #    PASSING PLAYS:
    #     - Fumbles after intended pass play
    #     - qb only fumbles
    #    KICKOFF PLAYS:
    #     - Fumbles happen during kickoff return
    # 2. runs after fumble recoveries (emphasis on the plural)
    # 3. touchdown after fumble recovery (can only happen once) (looks unique for each playtype) <- this might not be true.
    #    ! ! ! ATTENTION ! ! !
    #    - I have a small sample size for this.
    #    - This is one thing that I need to double check correctness on later in the future when having a larger sample size.
    #    RUNS PLAYS:
    #    - Are fumble recovery touchdown from run plays accounted for?
    #    SACKED PLAYS
    #    - touchdown after a sacked play
    #    PASSING PLAYS:
    #    - Are fumble recovery touchdown from passing plays accounted for?
    #
    #    - Are all fumble recoveries the same? wouldn't they all be rushing playtypes?
    # 4. handoffs
    for play_pattern in main_action_patterns:
      if re.search(play_pattern, string) != None:
        play_split.append([string])
        break
    if re.search(play_pattern, string) != None:
      continue

    # ADD ON SECTION (Actions that will add to elements that will obtain their own row)
    # - Appends data to elements within 'play_split'
    #   - Every element within play_split is a list, this section will add to those individual lists
    #     - Specifically it will append to the last element within 'play_split' and the reason for that
    #       is because as we are iterating through sentences cronologically, the appending element
    #       will always follow directly after the element that needs it
    # These situations will add to the last element within 'play_split' (For all playtypes)
    # 1. forced fumble description (happens after regular plays & sometimes after fumble recoveries)
    # 2. fumble description describing a qb only fumble (happens after a qb only fumble)
    for play_pattern in [forced_fumble_pattern, qb_fumble_description_pattern]:
      if re.search(play_pattern, string):
        index_last_element = len(play_split) - 1
        play_split[index_last_element] = [play_split[index_last_element][0], string]
        break
    if re.search(play_pattern, string) != None:
      continue

    # When a sentence does not fit within the top 2 sections ( 1. adding an element to the list || 2. appending to an element in the list )
    # - Glue the sentence into 'extra_data' to be cleaned separately.
    extra_data = extra_data + string + ". "

  ################################
  # CLEANING ACTIONS WITHIN PLAY #
  ################################

  # GRABBING: Initial action of play (e.g. Intended play / aborted fumble / qb only fumble / etc...)
  intended_play_description = play_split.pop(0)

  # Creating a single row dataframe of the original play
  unclean_original_play_copy = pd.DataFrame([original_play_copy.copy()], columns=df_plays.columns)

  # CREATING SHELL FOR: Initial action of play
  # - shell is only necessary with plays that have extra data (injuries / penalties / eligibility / etc...)
  # - extra data will only be available within the first row of the replacement dataframe
  if extra_data:
    unclean_original_play_copy['PlayDescription'] = extra_data
    unclean_original_play_copy = main_cleaning_method(unclean_original_play_copy)

  # CLEANING: Initial action of play
  # No matter what the initial action is, the description will always be the first element of the first element within 'play_split'
  unclean_original_play_copy['PlayDescription'] = intended_play_description[0]

  # May have to adjust in the future.
  # - ON SACKED PLAYS, there is nuance on the formatting of a player who caused a sack and a forced fumble.
  #   - Sometimes it'll look something like this "FUMBLES (B.Burns) [B.Burns]" <- [B.Burns] is credited with the forced fumble
  #   - less often it'll look like "FUMBLES (B.Burns)" <- B.Burns is credited with the forced fumble.
  # - ON ABORTED PLAYS, there is nuace on the formatting of a player who caused the play to be aborted.
  #   - the word "Aborted" will either be in parenthesis or without, this signals whether the center was at fault or the passer.
  #     - Need to figure out how to record this data.
  # - ON KICKOFF PLAYS
  #   - Because there is the kickoff, then the kickoff return, then the fumble on the kickoff return,
  #     the intended play will not have the fumble detail but still needs to be cleaned.

  # intended play / qb only fumble
  if len(intended_play_description) > 1:
    unclean_original_play_copy['FumbleDetails'] = intended_play_description[1]
    forced_fumble = re.findall(forced_fumble_pattern, intended_play_description[1])
    if len(forced_fumble) > 0:
      unclean_original_play_copy['ForcedFumbleBy'] = forced_fumble[0]
    cleaned_original_play_copy = main_cleaning_method(unclean_original_play_copy)
  # kickoff (fumble occurs after kickoff return)
  kickoff = re.findall(kickoff_pattern, intended_play_description[0])
  if len(kickoff) > 0:
    cleaned_original_play_copy = main_cleaning_method(unclean_original_play_copy)
  # Aborted fumble
  else:
    unclean_original_play_copy['FumbleDetails'] = intended_play_description[0]
    cleaned_original_play_copy = unclean_original_play_copy

  # FUMBLE RECOVERIES FOR YARDAGE & FUMBLE RECOVERIES FOR TOUCHDOWNS

  # Created list for the possibility of having multiple fumbles and recoveries in a single play
  list_recovery_runs = []

  for play in play_split:

    recovery_run_row = pd.DataFrame([original_play_copy.copy()], columns=df_plays.columns)

    # Recovery after fumble was fumbled
    if len(play) > 1:
      recovery_run_row['FumbleDetails'] = play[1]
      forced_fumble = re.findall(forced_fumble_pattern, play[1])
      if len(forced_fumble) > 0:
        recovery_run_row['ForcedFumbleBy'] = forced_fumble[0]

    recovery_run_row['PlayDescription'] = play[0]
    recovery_run_row['PlayOutcome'] = 'Run' # <-- Possibly change this in the future (could be something like 'fumble recovery run?' unless it was the rb that recovered)
    cleaned_recovery_run_row = clean_run_plays(recovery_run_row)
    cleaned_recovery_run_row['PlayOutcome'] = original_play_copy['PlayOutcome'] # <- Maybe this isn't correct? when a play is split by multiple rows, this becomes tricky.
    list_recovery_runs.append(cleaned_recovery_run_row)

  ###################
  # 3.NEW DATAFRAME #
  ###################
  # - Create the cleaned replacement row(s) for the original row.

  if len(list_recovery_runs) > 0:
    df_multi_row_play = pd.DataFrame(columns=df_plays.columns)
    df_multi_row_play = pd.concat([cleaned_original_play_copy, *list_recovery_runs], ignore_index=True)
  else:
    df_multi_row_play = cleaned_original_play_copy

  return df_multi_row_play

### OFFENSE CLEANING METHODS

#### PASS PLAYS

In [17]:
# PURPOSE:
# - Clean all passing type plays within a given dataframe.
# INPUT PARAMETERS:
# df_plays    - dataframe - NFL plays (can include play types other than passing)
# index_start -  integer  - index where within the dataframe the method will start
#                           cleaning in ascending order.
# RETURN:
# df_plays - dataframe - the same input df_plays but with all passing play types cleaned

# NOTE:
# - I want this to work with slices of the main dataframe as well.
#   - Within slices, I think it is crucial to keep the original indexing from the main
#     dataframe for ease to put back into the original dataframe.

def clean_pass_plays(df_plays, index_start = None):

  # Adjusting df_plays to start cleaning at a specified index (index_start)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    # Locating all passing type plays within dataframe
    df_pass_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Pass')]
  else:
    # Locating all passing type plays within dataframe
    df_pass_plays = df_plays[df_plays['PlayOutcome'].str.contains('Pass')]

  for idx, play in df_pass_plays['PlayDescription'].items():

    ################
    # Play details #
    ################

    # Play Type
    df_plays.loc[idx, 'PlayType'] = 'Pass'

    # TimeOnTheClock
    TimeOnTheClock = re.findall(time_on_clock_pattern, play)
    if len(TimeOnTheClock) > 0:
      df_plays.loc[idx, 'TimeOnTheClock'] = TimeOnTheClock[0]

    ############
    # REVERSES #
    ############

    # In 'PlayDescription' all information before the "reversed" sentence is not needed.
    # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
    if play.find('REVERSED') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find("REVERSED") != -1:
          df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############################
    # REPORTING IN AS ELIGIBLE #
    ############################

    # I do not think this contains any useful data so I am going to exclude it.
    if play.find('reported in as eligible') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find('reported in as eligible') != -1:
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ###########
    # FUMBLES #
    ###########

    # Additional rows may be added after certain types of fumbled passing plays.
    # - The idea here is that, in those situations, the helping method 'extract_fumble_data'
    #   will return a small dataframe of the rows that the single play split into.
    #   - When this small dataframe is returned, it will replace the original play
    #     within the main dataframe of plays and then continue on cleaning the rest of the passing plays.

    if play.find('FUMBLES') != -1:
      main_action_patterns = [passer_name_pattern, qb_fumble_pattern, defensive_takeaway_run_pattern]
      main_cleaning_method = clean_pass_plays
      df_replacement_rows = extract_fumble_data(df_plays, play, idx,
                                                main_action_patterns,
                                                main_cleaning_method)

      # "df_plays.index.tolist().index(idx)" needed for method usage with slices of original dataframe.
      df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
      index_of_last_added_row = idx + len(df_replacement_rows) - 1
      if df_pass_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_pass_plays(df_plays, index_of_last_added_row + 1)

    ###########
    # OFFENSE #
    ###########

    # NOTE:
    # - Incomplete passes will have 'PlayOutcome' as 'Pass Incomplete' as well
    #   as yardage value being 0.0

    # Yardage gained
    yardage = re.findall(yardage_gained, play)
    if len(yardage) > 0:
      df_plays.loc[idx, 'Yardage'] = int(yardage[0])
    else:
      df_plays.loc[idx, 'Yardage'] = 0

    # Formation
    Formation = re.findall(formation, play)
    if len(Formation) > 0:
      if Formation[0] == 'Aborted':
        pass
      else:
        df_plays.loc[idx, 'Formation'] = Formation[0]

    # Passer (What about spikes?)
    passer_name = re.findall(passer_name_pattern, play)
    if len(passer_name) > 0:
      df_plays.loc[idx, 'Passer'] = passer_name[0]

    receiver_name_and_passing_details = re.findall(receiver_pattern, play)
    if len(receiver_name_and_passing_details) > 0:
      df_plays.loc[idx, 'Direction'] = f"{receiver_name_and_passing_details[0][0]} {receiver_name_and_passing_details[0][1]}"
      df_plays.loc[idx, 'Receiver'] = receiver_name_and_passing_details[0][2]

    # Unique situation (offense spikes the ball)
    if play.find('spike') != -1:
      df_plays.loc[idx, 'Direction'] = 'spiked' # Direction?

    #############
    #  DEFENSE  #
    #############

    solo_tackle = re.findall(solo_tackle_pattern, play)
    if len(solo_tackle) > 0:
      df_plays.loc[idx, 'SoloTackle'] = solo_tackle[0]

    shared_tackle = re.findall(shared_tackle_pattern, play)
    if len(shared_tackle) > 0:
      df_plays.at[idx, 'SharedTackle'] = shared_tackle[0]

    assisted_tackle = re.findall(assisted_tackle_pattern, play)
    if len(assisted_tackle) > 0:
      df_plays.loc[idx, 'SoloTackle'] = assisted_tackle[0][0]
      df_plays.loc[idx, 'AssistedTackle'] = assisted_tackle[0][1]

    pressure_by = re.findall(defense_pressure_name_pattern, play)
    if len(pressure_by) > 0:
      df_plays.loc[idx, 'PressureBy'] = pressure_by[0]

    ##############
    #  INJURIES  #
    ##############

    injuries = re.findall(injury_pattern, play)
    if len(injuries) > 0:
      df_plays.at[idx, 'InjuredPlayers'] = injuries

    #############
    #  PENALTY  #
    #############

    # Accepted Penalty
    if play.find('PENALTY') != -1:
      play_elements = play.split(". ")
      penalties = []
      for i in play_elements:
        if i.find('PENALTY') != -1:
          penalties.append(i)
      df_plays.at[idx, 'AcceptedPenalty'] = penalties

    # Declined Penalty
    if play.find('Penalty') != -1:
      play_elements = play.split(". ")
      penalties = []
      for i in play_elements:
        if i.find('Penalty') != -1:
          penalties.append(i)
      df_plays.at[idx, 'DeclinedPenalty'] = penalties

  if df_pass_plays.tail(1).index.tolist()[0] == idx:
    return df_plays

#### RUN PLAYS

In [18]:
# PURPOSE:
# - Clean run play types
# INPUT PARAMETERS:
# df_plays    - dataframe - dataframe of plays
# index_start -  integer  - the starting index of the associated input dataframe
#                           to begin cleaning.
# RETURN:
# df_plays - dataframe - dataframe of plays that now has all useful run play
#                        data accessable and clean.

# NOTE:
# - Need to comment on how this is also a method being used for
#   1. fumble recoveries for yardage
#   2. fumble recoveries for touchdown
# - I also have not come across a case where a rushing play has been fumbled and someone
#   recovered the ball and scored a touchdown yet.

def clean_run_plays(df_plays, index_start = None):

  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_run_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Run')]
  else:
    df_run_plays = df_plays[df_plays['PlayOutcome'].str.contains('Run')]

  # Iterating through every run play within 'df_run_plays'
  for idx, play in df_run_plays['PlayDescription'].items():

    ################
    # Play details #
    ################

    # Play Type
    df_plays.loc[idx, 'PlayType'] = 'Run'

    # TimeOnTheClock
    TimeOnTheClock = re.findall(time_on_clock_pattern, play)
    if len(TimeOnTheClock) > 0:
      df_plays.loc[idx, 'TimeOnTheClock'] = TimeOnTheClock[0]

    # Formation
    Formation = re.findall(formation, play)
    if len(Formation) > 0:
      if Formation[0] == 'Aborted':
        pass
      else:
        df_plays.loc[idx, 'Formation'] = Formation[0]

    ############
    # REVERSES #
    ############

    # In 'PlayDescription' all information before the "reversed" sentence is not needed.
    # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
    if play.find('REVERSED') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find("REVERSED") != -1:
          df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############################
    # REPORTING IN AS ELIGIBLE #
    ############################

    # I do not think this contains any useful data so I am going to exclude it.
    if play.find('reported in as eligible') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find('reported in as eligible') != -1:
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ###########
    # FUMBLES #
    ###########

    if play.find('FUMBLES') != -1:

      # - I think it would help to comment on each action added
      # - Does this catch fumble recovery touchdowns?
      main_action_patterns = [rusher_pattern, aborted_fumble_pattern, qb_fumble_pattern, defensive_takeaway_run_pattern, handoff_pattern]
      main_cleaning_method = clean_run_plays
      df_replacement_rows = extract_fumble_data(df_plays, play, idx,
                                                main_action_patterns,
                                                main_cleaning_method)

      # "df_plays.index.tolist().index(idx)" needed for method usage with slices of original dataframe.
      df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
      index_of_last_added_row = idx + len(df_replacement_rows) - 1

      # returning row after the last index
      if df_run_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_run_plays(df_plays, index_of_last_added_row + 1)

    #############
    #  OFFENSE  #
    #############

    # Rusher
    rusher_patterns = [rusher_pattern, defensive_takeaway_run_pattern, qb_fumble_pattern, touchdown_after_takeaway_pattern, handoff_pattern]
    # Loop through patterns and find the first match
    for pattern in rusher_patterns:
      rusher = re.findall(pattern, play)
      if len(rusher) > 0:
        rusher_name = rusher[0]
        df_plays.loc[idx, 'Rusher'] = rusher_name
        break

    # Direction
    rushing_directions = ['guard', 'middle', 'tackle', 'end', 'kneels']
    for i in rushing_directions:
      if play.find(i) != -1:
        start = play.find(rusher_name) + len(rusher_name) + 1
        end = play.find(i) + len(i)
        df_plays.loc[idx, 'Direction'] = play[start:end]
        break

    # Yardage gained
    yardage = re.findall(yardage_gained, play)
    if len(yardage) > 0:
      df_plays.loc[idx, 'Yardage'] = int(yardage[0])
    else:
      df_plays.loc[idx, 'Yardage'] = 0

    #############
    #  DEFENSE  #
    #############

    solo_tackle = re.findall(solo_tackle_pattern, play)
    if len(solo_tackle) > 0:
      df_plays.loc[idx, 'SoloTackle'] = solo_tackle[0]

    shared_tackle = re.findall(shared_tackle_pattern, play)
    if len(shared_tackle) > 0:
      df_plays.at[idx, 'SharedTackle'] = shared_tackle[0]

    assisted_tackle = re.findall(assisted_tackle_pattern, play)
    if len(assisted_tackle) > 0:
      df_plays.loc[idx, 'SoloTackle'] = assisted_tackle[0][0]
      df_plays.loc[idx, 'AssistedTackle'] = assisted_tackle[0][1]

    ##############
    #  INJURIES  #
    ##############

    injuries = re.findall(injury_pattern, play)
    if len(injuries) > 0:
      df_plays.at[idx, 'InjuredPlayers'] = injuries

    #############
    #  PENALTY  #
    #############

    # Accepted Penalty
    if play.find('PENALTY') != -1:
      play_elements = play.split(". ")
      penalties = []
      for i in play_elements:
        if i.find('PENALTY') != -1:
          penalties.append(i)
      df_plays.at[idx, 'AcceptedPenalty'] = penalties

    # Declined Penalty
    if play.find('Penalty') != -1:
      play_elements = play.split(". ")
      penalties = []
      for i in play_elements:
        if i.find('Penalty') != -1:
          penalties.append(i)
      df_plays.at[idx, 'DeclinedPenalty'] = penalties

    # Return if the last play has been cleaned in 'df_run_plays'
    if df_run_plays.tail(1).index.tolist()[0] == idx:
      return df_plays

####2PT CONVERSIONS

In [19]:
# I NEED A LARGER SAMPLE SIZE FOR MORE PLAYS
# - I need a sample size that has fumbled plays (if that's possible?)
# - I need a sample size that has interception (if that's possible?)
# - I need a sample size with injuries (as dark as that may sound)

def cleaning_2pt_conversion_plays(df_plays, index_start = None):

  # Cut 'df_plays' to begin from 'index_start' to the last penalty play available in dataframe
  if index_start != None:
    # df_plays_adjusted = df_plays.iloc[df_plays.index.tolist().index(index_start):]
    df_plays_adjusted = df_plays.loc[index_start]
    df_2pt_conversion_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('2PT Conversion', case=False)]
  else:
    df_2pt_conversion_plays = df_plays[df_plays['PlayOutcome'].str.contains('2PT Conversion', case=False)]

  # Iterating through every penalty play within 'df_2pt_conversion_plays'
  for idx, play in df_2pt_conversion_plays['PlayDescription'].items():

    pass_2ptc = re.findall(tp_conversion_pass_pattern, play)
    if len(pass_2ptc) > 0:
      df_plays.loc[idx, 'Passer'] = pass_2ptc[0][0]
      df_plays.loc[idx, 'Receiver'] = pass_2ptc[0][1]
      df_plays.loc[idx, 'PlayType'] = '2PT Conversion Pass'

    rush_2ptc = re.findall(tp_conversion_rush_pattern, play)
    if len(rush_2ptc) > 0:
      df_plays.loc[idx, 'Rusher'] = rush_2ptc[0]
      df_plays.loc[idx, 'PlayType'] = '2PT Conversion Run'
      # Direction
      rushing_directions = ['guard', 'middle', 'tackle', 'end', 'kneels']
      for i in rushing_directions:
        if play.find(i) != -1:
          start = play.find('rushes') + len('rushes') + 1
          end = play.find(i) + len(i)
          df_plays.loc[idx, 'Direction'] = play[start:end]
          break

  return df_plays

###DEFENSE CLEANING METHODS

#### INTERCEPTIONS

In [20]:
# PURPOSE:
# - Clean intercepted plays
# INPUT PARAMETERS:
# df_plays    - dataframe - dataframe of plays
# index_start -  integer  - the starting index of the associated input dataframe
#                           to begin cleaning.
# RETURN:
# df_plays - dataframe - dataframe of plays that now has all useful intercepted play
#                        data accessible and clean.

# ROUGH DESGIN
# 1. Narrow dataframe using 'index_start'
#    - This is a recursive method, the narrowing will get smaller and
#      smaller until all 'intercepted' type plays have been cleaned.
# 2. Grab first 'intercepted' play from narrowed dataframe
# 3. Create 2 single row dataframes.
#    a. intended play
#    b. yardage after interception
# 4. Break down play into sentences and clean
#    - Depending on the sentence within the play, will determine which
#      single row dataframe it will go to.
# 5. Combine both dataframes of cleaned data into one dataframe
# 6. Replace old play row with new cleaned multi row
# 7. return clean_interceped_plays( x , y)
#    - x = updated df_plays
#    - y = index directly after the last clean added row

# Concerns:
# ~ 1 ~
# PLAY SNIP - "(9:53) (Shotgun) D.Watson pass short left intended for E.Moore INTERCEPTED by D.Hill (Z.Carter) at CIN 30."
# - The concern here is (Z.Carter)
#   - I do not know what to categorize this player as? I believe that he had an impact on the play and could possibly be a reason
#     that D.Hill was able to intercept the ball.
#     - Should I create a feature called "ImpactPlayer" or something?
# ~ 2 ~
# PLAY SNIP - "(4:16) (Shotgun) J.Allen pass deep middle intended for S.Diggs INTERCEPTED by J.Whitehead [Q.Williams] at NYJ -1. Touchback."
# - The concern here is 'touchback'
#   - I have no idea what to do with that
# ~ 3 ~
#`- I do not have anything set in play to handle fumbles? What happens if a QB fumbles, recovers, then throws an interception? -> Then player that intercepted fumbles?
# ~ 4 ~
# - There are 2 rows within this sinlge play. (Intended throwing play, yardage after interception)
#   - For both of these rows that represent a single play, they both state that the throwing team has possession
#     - I do not know how this is going to effect the future with analysis on data
# - -----> GRAB DATA FOR TOUCHBACKS <-----
# - -----> GRAB DATA FOR PLAYTYPE INTERCEPTION FOR YARDAGE <-----

def clean_intercepted_plays(df_plays, index_start = None):

  # Will cut df_plays starting from index_start (narrowing our search space)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_intercepted_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Interception')]
  else:
    df_intercepted_plays = df_plays[df_plays['PlayOutcome'].str.contains('Interception')]

  # Exit case (If no more 'Interception' type plays are found)
  if df_intercepted_plays.empty:
    return df_plays

  # Retrieve the index and 'PlayDescription' of the first intercepted play in 'df_intercepted_plays'
  # - Process one play per iteration in the recursive method
  idx = df_intercepted_plays.index[0]
  play = df_plays['PlayDescription'].loc[idx]

  ############
  # REVERSES #
  ############

  # In 'PlayDescription' all information before the "reversed" sentence is not needed.
  # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
  if play.find('REVERSED') != -1:
    play_elements = play.split(". ")
    for i in play_elements:
      if i.find("REVERSED") != -1:
        df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
        play = ". ".join(play_elements[play_elements.index(i) + 1:])
        break

  # Create 2 single row dataframes.
  # 1. intended play
  df_intended_play = df_plays.loc[idx].copy()
  df_intended_play = pd.DataFrame([df_intended_play], columns=df_plays.columns)
  df_intended_play.reset_index(drop=True, inplace=True)
  df_intended_play['PlayDescription'] = 'nan'
  # 2. yardage after interception
  df_yardage_after_interception = df_plays.loc[idx].copy()
  df_yardage_after_interception = pd.DataFrame([df_yardage_after_interception], columns=df_plays.columns)
  df_yardage_after_interception.reset_index(drop=True, inplace=True)
  df_yardage_after_interception['PlayDescription'] = 'nan'

  # break down play by sentences.
  play_elements = play.split(". ")

  # Every sentence within 'PlayDescription' except yardage/touchdown after interception
  intended_play_data = []

  # iterate through play_elements
  for i in play_elements:

    ##############################
    # YARDAGE AFTER INTERCEPTION #
    ##############################

    yardage_after_interception = re.findall(defensive_takeaway_run_pattern, i)
    if len(yardage_after_interception) > 0:
      df_yardage_after_interception['PlayDescription'] = i

      # Player running after interception
      df_yardage_after_interception['Rusher'] = yardage_after_interception[0]

      # Playtype?
      # - Should this be a new playtype? Something like "RunAfterInterception"?

      # Yardage gained
      yardage = re.findall(yardage_gained, i)
      if len(yardage) > 0:
        df_yardage_after_interception['Yardage'] = int(yardage[0])
      else:
        df_yardage_after_interception['Yardage'] = 0

      # Who made tackle
      tackler = re.findall(defense_tackler_1_name_pattern, i)
      if len(tackler) > 0:
        df_yardage_after_interception['TackleBy1'] = tackler[0]

      continue

    ################################
    # TOUCHDOWN AFTER INTERCEPTION #
    ################################

    touchdown_after_interception_check = re.findall(touchdown_after_takeaway_pattern, i)
    if len(touchdown_after_interception_check) > 0:
      df_yardage_after_interception['PlayDescription'] = i

      # Player running after interception
      df_yardage_after_interception['Rusher'] = touchdown_after_interception_check[0]

      # Yardage gained
      yardage = re.findall(yardage_gained, i)
      if len(yardage) > 0:
        df_yardage_after_interception['Yardage'] = int(yardage[0])

      # PlayOutcome
      df_yardage_after_interception['PlayOutcome'] = 'Touchdown'

      # IsScoringPlay
      df_yardage_after_interception['IsScoringPlay'] = 1

      continue

    intended_play_data.append(i)

  #################
  # INTENDED PLAY #
  #################

  intended_play_playdescription = ". ".join(intended_play_data)

  df_intended_play['PlayDescription'] = intended_play_playdescription

  df_intended_play['PlayOutcome'] = 'Pass'
  df_intended_play = clean_pass_plays(df_intended_play)
  df_intended_play['PlayOutcome'] =  df_plays['PlayOutcome'].loc[idx]

  # Intercepted by
  intercepted_by = re.findall(interception_name_pattern, intended_play_playdescription)
  if len(intercepted_by) > 0:
    df_intended_play['InterceptedBy'] = intercepted_by[0]

  #############################
  # NEW REPLACEMENT DATAFRAME #
  #############################

  # combine both single row dataframes into one
  if df_yardage_after_interception['PlayDescription'].iloc[0] == 'nan':
    df_cleaned_replacement = df_intended_play
  else:
    df_cleaned_replacement = pd.concat([df_intended_play, df_yardage_after_interception], ignore_index=True)

  # Replace old row with new cleaned dataframe
  df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
  df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
  df_plays = pd.concat([df_before_row, df_cleaned_replacement, df_after_row], ignore_index=True)

  # If this is the last play in the dataset
  if df_intercepted_plays.tail(1).index.tolist()[0] == idx:
    return df_plays
  else:
    return clean_intercepted_plays(df_plays, idx+len(df_cleaned_replacement))

#### SACKS


In [21]:
def clean_sacked_plays(df_plays, index_start = None):

  # Will cut df_plays starting from index_start (narrowing our search space)
  if index_start != None:
    df_plays_adjusted = df_plays.iloc[index_start:]
    df_sacked_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Sack')]
  else:
    df_sacked_plays = df_plays[df_plays['PlayOutcome'].str.contains('Sack')]

  for idx, play in df_sacked_plays['PlayDescription'].items():

    ################
    # Play details #
    ################

    # TimeOnTheClock
    TimeOnTheClock = re.findall(time_on_clock_pattern, play)
    if len(TimeOnTheClock) > 0:
      df_plays.loc[idx, 'TimeOnTheClock'] = TimeOnTheClock[0]

    ###########
    # FUMBLES #
    ###########

    if play.find('FUMBLES') != -1:

      main_action_patterns = [passer_name_pattern, defensive_takeaway_run_pattern, touchdown_after_takeaway_pattern]
      main_cleaning_method = clean_sacked_plays
      df_replacement_rows = extract_fumble_data(df_plays, play, idx,
                                                main_action_patterns,
                                                main_cleaning_method)

      # "df_plays.index.tolist().index(idx)" needed for method usage with slices of original dataframe.
      df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
      index_of_last_added_row = idx + len(df_replacement_rows) - 1
      # returning row after the last index
      if df_sacked_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_sacked_plays(df_plays, index_of_last_added_row + 1)

    #############
    #  OFFENSE  #
    #############

    # Formation
    Formation = re.findall(formation, play)
    if len(Formation) > 0:
      if Formation[0] == 'Aborted':
        pass
      else:
        df_plays.loc[idx, 'Formation'] = Formation[0]

    # Sacked Passer
    sacked_passer_name = re.findall(passer_name_pattern, play)
    if len(sacked_passer_name) > 0:
      df_plays.loc[idx, 'Passer'] = sacked_passer_name[0]

    # Yardage lost
    yardage = re.findall(yardage_from_sack, play)
    if len(yardage) > 0:
      df_plays.loc[idx, 'Yardage'] = int(yardage[0])

    #############
    #  DEFENSE  #
    #############

    # Solo sack (One person sacked the passer)
    solo_sack = re.findall(defense_tackler_1_name_pattern, play)
    if len(solo_sack) > 0:
      df_plays.loc[idx, 'SackedBy'] = solo_sack[0]

    # Split sack (A sack was given to the passer by multiple defenders)
    split_sack = re.findall(split_sack_pattern, play)
    if len(split_sack) > 0:
      df_plays.at[idx, 'SackedBy'] = split_sack[0]

    ##############
    #  INJURIES  #
    ##############

    injuries = re.findall(injury_pattern, play)
    if len(injuries) > 0:
      df_plays.at[idx, 'InjuredPlayers'] = injuries

    #############
    #  PENALTY  #
    #############

    # Accepted Penalty
    if play.find('PENALTY') != -1:
      play_elements = play.split(". ")
      penalties = []
      for i in play_elements:
        if i.find('PENALTY') != -1:
          penalties.append(i)
      df_plays.at[idx, 'AcceptedPenalty'] = penalties

    # Declined Penalty
    if play.find('Penalty') != -1:
      play_elements = play.split(". ")
      penalties = []
      for i in play_elements:
        if i.find('Penalty') != -1:
          penalties.append(i)
      df_plays.at[idx, 'DeclinedPenalty'] = penalties

    if df_sacked_plays.tail(1).index.tolist()[0] == idx:
      return df_plays

### SPECIAL TEAMS CLEANING METHODS

#### PUNTS

In [22]:
# A punt playtype will be split into 2 or more rows
#   1. The Punt
#      - 'PlayType'
#         - Punt
#      - 'Punter'
#      - 'LongSnapper'
#   2. The Punt Return
#      - 'PlayType'
#         - Punt Return
#      - 'PlayOutcome'
#         - x yard punt return
#         - fair catch
#         - touchback
#         - out of bounds
#         - downed
#      - 'Returner'
#      - 'Receiver'
#      - 'Yardage'
#      - 'TackleBy1'
#      - 'TackleBy2'
#      - 'DownedBy'

# I need to figure out a fake punt
# I need to figure out a punt that has been blocked
# I need to figure out what to do when a fumble happens
# I need to figure out what to do when a touchdown happens
# Maybe in the future, to make this more space friendly, I can combine features
# - Such as 'Punter' & 'LongSnapper' OR 'TackleBy1' & 'DownedBy'
#   OR 'Returner' & 'Receiver'

def clean_punt_plays(df_plays, index_start = None):

  # Will cut df_plays starting from index_start (narrowing our search space)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_punt_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Punt')]
  else:
    df_punt_plays = df_plays[df_plays['PlayOutcome'].str.contains('Punt')]

  if df_punt_plays.empty:
    return df_plays

  # Retrieve the index and 'PlayDescription' of the first punt play in 'df_punt_plays'
  # - Process one play per iteration in the recursive method
  idx = df_punt_plays.index[0]
  play = df_plays['PlayDescription'].loc[idx]
  row_copy = df_plays.loc[idx].copy()

  ############
  # REVERSES #
  ############

  # In 'PlayDescription' all information before the "reversed" sentence is not needed.
  # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
  if play.find('REVERSED') != -1:
    play_elements = play.split(". ")
    for i in play_elements:
      if i.find("REVERSED") != -1:
        df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
        play = ". ".join(play_elements[play_elements.index(i) + 1:])
        break

  # Create 2 single row dataframes.
  # 1. The Punt
  df_punt = row_copy
  df_punt = pd.DataFrame([df_punt], columns=df_plays.columns)
  df_punt.reset_index(drop=True, inplace=True)
  df_punt['PlayDescription'] = 'nan'
  # 2. The Punt Return
  df_punt_return = row_copy
  df_punt_return = pd.DataFrame([df_punt_return], columns=df_plays.columns)
  df_punt_return.reset_index(drop=True, inplace=True)
  df_punt_return['PlayDescription'] = 'nan'

  #############
  # PLAY TIME #
  #############

  time = re.findall(time_on_clock_pattern, play)
  if len(time) > 0:
    df_punt.loc[0, 'TimeOnTheClock'] = time[0]

  # break down play by sentences.
  play_elements = play.split(". ")

  accepted_penalties = []
  declined_penalties = []

  for i in play_elements:

    ########
    # PUNT #
    ########

    # All data needed for first row in replacement dataframe
    punt = re.findall(punting_pattern, i)
    if len(punt) > 0:
      df_punt['PlayType'] = 'Punt'
      df_punt['PlayDescription'] = i
      df_punt['Kicker'] = punt[0][0]
      df_punt['Yardage'] = int(punt[0][1])
      df_punt['LongSnapper'] = punt[0][2]
      # Touchback
      if i.find('Touchback') != -1:
        df_punt['PlayOutcome'] = 'Touchback'
        continue
      # Out of bounds
      if i.find('out of bounds') != -1:
        df_punt['PlayOutcome'] = 'out of bounds'
        continue
      # Downed by
      if i.find('downed by') != -1:
        df_punt['PlayOutcome'] = 'downed'
        downed_by = re.findall(kick_downed_by_pattern, i)
        df_punt['DownedBy'] = downed_by[0][downed_by[0].find("-")+1:] # Need to get abreviation of team name away from player name (e.g. IND-G.Stuard)
        continue
      # fair catch
      if i.find('fair catch') != -1:
        df_punt['PlayOutcome'] = 'fair catch'
        fair_catch_by = re.findall(punt_fair_catch_pattern, i)
        df_punt['Returner'] = fair_catch_by[0]
        continue
      continue

    ######################################
    # PUNT RETURN (Including touchdowns) #
    ######################################

    # All data needed for the second row within replacement dataframe
    # - Second row only needed when there is a punt return for yardage
    # - I think I am going to run into trouble if there is a fumble recovery for yardage
    punt_return_patterns = [punt_return_pattern, touchdown_after_takeaway_pattern]
    for return_pattern in punt_return_patterns:
      punt_return = re.findall(return_pattern, i)
      if len(punt_return) > 0:
        df_punt_return['PlayDescription'] = i
        df_punt_return['PlayOutcome'] = 'Run'
        df_punt_return = clean_run_plays(df_punt_return)
        df_punt_return['PlayOutcome'] = row_copy['PlayOutcome']
        df_punt_return['PlayType'] = 'Punt Return'
        df_punt_return['Rusher'] = 'nan'
        df_punt_return['Returner'] = punt_return[0]
        break

    #############
    #  PENALTY  #
    #############

    # Accepted Penalty
    if i.find('PENALTY') != -1:
      accepted_penalties.append(i)

    # Declined Penalty
    if i.find('Penalty') != -1:
      declined_penalties.append(i)

    # If playoutcome is the same as the original play, then run second sentence through
    # run cleaning method.

  if len(accepted_penalties) > 0:
    df_punt.at[0, 'AcceptedPenalty'] = accepted_penalties
  if len(declined_penalties) > 0:
    df_punt.at[0, 'DeclinedPenalty'] = declined_penalties

  #############################
  # NEW REPLACEMENT DATAFRAME #
  #############################

  if df_punt_return['PlayDescription'].iloc[0] == 'nan':
    df_replacement_rows = df_punt
  else:
    df_replacement_rows = pd.concat([df_punt, df_punt_return], ignore_index=True)

  df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
  df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
  df_plays = pd.concat([df_before_row, df_replacement_rows, df_after_row], ignore_index=True)

  if df_punt_plays.tail(1).index.tolist()[0] == idx:
    return df_plays
  else:
    return clean_punt_plays(df_plays, idx+len(df_replacement_rows))

#### KICKOFFS

In [23]:
# A kickoff playtype will be split into 1 or more rows

# I need to figure out an onside kick (recovered by kicking team)
# I need to figure out fumbled kickoff returns
# I need to figure out returns for a touchdown
# injuries?

# Method can mirror punts method.

def clean_kickoff_plays(df_plays, index_start = None):

  # Will cut df_plays starting from index_start (narrowing our search space)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_kickoff_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('kickoff', case=False)]
  else:
    df_kickoff_plays = df_plays[df_plays['PlayOutcome'].str.contains('kickoff', case=False)]

  # exit case
  if df_kickoff_plays.empty:
    return df_plays

  # Retrieve the index and 'PlayDescription' of the first kickoff play in 'df_kickoff_plays'
  # - Process one play per iteration in the recursive method
  idx = df_kickoff_plays.index[0]
  play = df_plays['PlayDescription'].loc[idx]

  ############
  # REVERSES #
  ############

  # In 'PlayDescription' all information before the "reversed" sentence is not needed.
  # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
  if play.find('REVERSED') != -1:
    play_elements = play.split(". ")
    for i in play_elements:
      if i.find("REVERSED") != -1:
        df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
        play = ". ".join(play_elements[play_elements.index(i) + 1:])
        break

  ###########
  # FUMBLES #
  ###########

  if play.find('FUMBLES') != -1:
    main_action_patterns = [kickoff_pattern, kick_return_pattern, defensive_takeaway_run_pattern, handoff_pattern]
    main_cleaning_method = clean_kickoff_plays
    df_replacement_rows = extract_fumble_data(df_plays, play, idx,
                                              main_action_patterns,
                                              main_cleaning_method)

    df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
    df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
    df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
    index_of_last_added_row = idx + len(df_replacement_rows) - 1

    # returning row after the last index
    if df_kickoff_plays.tail(1).index.tolist()[0] == idx:
      return df_plays
    else:
      return clean_run_plays(df_plays, index_of_last_added_row + 1)

  # Create 2 single row dataframes.
  # 1. The Kickoff
  df_kickoff = df_plays.loc[idx].copy()
  df_kickoff = pd.DataFrame([df_kickoff], columns=df_plays.columns)
  df_kickoff.reset_index(drop=True, inplace=True)
  df_kickoff['PlayDescription'] = 'nan'
  # 2. The Kickoff Return
  df_kickoff_return = df_plays.loc[idx].copy()
  df_kickoff_return = pd.DataFrame([df_kickoff_return], columns=df_plays.columns)
  df_kickoff_return.reset_index(drop=True, inplace=True)
  df_kickoff_return['PlayDescription'] = 'nan'

  # break down play by sentences.
  play_elements = play.split(". ")

  accepted_penalties = []
  declined_penalties = []

  for i in play_elements:

    ###########
    # KICKOFF #
    ###########

    kickoff = re.findall(kickoff_pattern, i)
    if len(kickoff) > 0:
      df_kickoff['PlayType'] = 'Kickoff'
      df_kickoff['PlayDescription'] = i
      df_kickoff['Kicker'] = kickoff[0][0]
      df_kickoff['Yardage'] = int(kickoff[0][1])
      if i.find('Touchback') != -1:
        df_kickoff['PlayOutcome'] = 'Touchback'
        continue
      # I need to figure out what the difference will be when the kicking team recovers
      if i.find('onside') != -1:
        df_kickoff['PlayOutcome'] = 'onside'
        downed_by = re.findall(kick_downed_by_pattern, i)
        if len(downed_by) > 0:
          df_kickoff['DownedBy'] = downed_by[0][downed_by[0].find("-")+1:]
        continue
      continue

    #########################################
    # KICKOFF RETURN (Including touchdowns) #
    #########################################

    kick_return_patterns = [kick_return_pattern, touchdown_after_takeaway_pattern]
    for return_pattern in kick_return_patterns:
      kick_return = re.findall(return_pattern, i)
      if len(kick_return) > 0:
        df_kickoff_return['PlayDescription'] = i
        df_kickoff_return['PlayOutcome'] = 'Run'
        df_kickoff_return = clean_run_plays(df_kickoff_return)
        df_kickoff_return['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]
        df_kickoff_return['PlayType'] = 'Kickoff Return'
        df_kickoff_return['Rusher'] = 'nan'
        df_kickoff_return['Returner'] = kick_return[0][0] # I think this will be a problem once I get a dataset with kick return touchdowns
        break

    #############
    #  PENALTY  #
    #############

    # Accepted Penalty
    if i.find('PENALTY') != -1:
      accepted_penalties.append(i)

    # Declined Penalty
    if i.find('Penalty') != -1:
      declined_penalties.append(i)

    # If playoutcome is the same as the original play, then run second sentence through
    # run cleaning method.

  if len(accepted_penalties) > 0:
    df_kickoff.at[0, 'AcceptedPenalty'] = accepted_penalties
  if len(declined_penalties) > 0:
    df_kickoff.at[0, 'DeclinedPenalty'] = declined_penalties

  #############################
  # NEW REPLACEMENT DATAFRAME #
  #############################

  if df_kickoff_return['PlayDescription'].iloc[0] == 'nan':
    df_replacement_rows = df_kickoff
  else:
    df_replacement_rows = pd.concat([df_kickoff, df_kickoff_return], ignore_index=True)

  df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
  df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
  df_plays = pd.concat([df_before_row, df_replacement_rows, df_after_row], ignore_index=True)

  if df_kickoff_plays.tail(1).index.tolist()[0] == idx:
    return df_plays
  else:
    return clean_kickoff_plays(df_plays, idx+len(df_replacement_rows))

###SCORING CLEANING METHODS

#### TOUCHDOWNS

In [157]:
# Still need to figure out whether or not plays that have multiple rows will all have
# 'IsScoringDrive' = 1, 'IsScoringDrive' = 1, 'PlayOutcome' = *teamname* Touchdown
# - The reasoning to not have this is because if a qb was to throw a pick 6,
#   it wouldn't count as a "Scoring Drive" for them but the opposing team.
# - For consistency, I will have the entire play have
#   'IsScoringDrive' = 1, 'IsScoringDrive' = 1, 'PlayOutcome' = *teamname* Touchdown

def clean_touchdown_plays(df_plays, index_start=None):

  # Cut 'df_plays' to begin from 'index_start' to the last touchdown play available in dataframe
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_touchdown_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Touchdown')]
  else:
    df_touchdown_plays = df_plays[df_plays['PlayOutcome'].str.contains('Touchdown')]

  # Iterating through every touchdown play within 'df_touchdown_plays'
  for idx, play in df_touchdown_plays['PlayDescription'].items():

    # - Once i figure out what kind of touchdown it was, then I will be able to
    #   determine the 'PlayType'

    ##########################
    # PUNT RETURN TOUCHDOWNS #
    ##########################

    punt_play = re.findall(punting_pattern, play)
    if len(punt_play) > 0:

      # creating a copy of the punt touchdown play and cleaning the copy
      punt_touchdown_row = df_plays.loc[idx].copy()
      punt_touchdown_row['PlayOutcome'] = 'Punt'
      punt_touchdown_row['IsScoringPlay'] = 1 # This will only be the value for the team that punted the ball
      punt_touchdown_row = pd.DataFrame([punt_touchdown_row], columns=df_plays.columns)
      punt_touchdown_row.reset_index(drop=True, inplace=True)
      punt_touchdown_row = clean_punt_plays(punt_touchdown_row)
      punt_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, punt_touchdown_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_touchdown_plays(df_plays, idx+len(punt_touchdown_row))

    #####################################
    # SACKED FUMBLE RECOVERY TOUCHDOWNS #
    #####################################

    if play.find("sacked") != -1:

      # creating a copy of the sack touchdown play and cleaning the copy
      sacked_touchdown_row = df_plays.loc[idx].copy()
      sacked_touchdown_row['PlayOutcome'] = 'Sack'
      sacked_touchdown_row['IsScoringPlay'] = 1
      sacked_touchdown_row = pd.DataFrame([sacked_touchdown_row], columns=df_plays.columns)
      sacked_touchdown_row.reset_index(drop=True, inplace=True)
      sacked_touchdown_row = clean_sacked_plays(sacked_touchdown_row)
      sacked_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row (Original row can sometimes be replaced with multiple rows)
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, sacked_touchdown_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_touchdown_plays(df_plays, idx+len(sacked_touchdown_row))

    ##########################
    # INTERCEPTED TOUCHDOWNS #
    ##########################

    # Still need to clean intercepted play types
    if play.find("INTERCEPTED") != -1:

      # creating a copy of the incercepted touchdown play and cleaning the copy
      intercepted_touchdown_row = df_plays.loc[idx].copy()
      intercepted_touchdown_row['PlayOutcome'] = 'Interception'
      intercepted_touchdown_row['IsScoringPlay'] = 1 # This will only be the value for the team that threw the interception
      intercepted_touchdown_row = pd.DataFrame([intercepted_touchdown_row], columns=df_plays.columns)
      intercepted_touchdown_row.reset_index(drop=True, inplace=True)
      intercepted_touchdown_row = clean_intercepted_plays(intercepted_touchdown_row)
      intercepted_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, intercepted_touchdown_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_touchdown_plays(df_plays, idx+len(intercepted_touchdown_row))

    ######################
    # PASSING TOUCHDOWNS #
    ######################

    # If a play has a passer throwing the ball, I am assuming it is a passing play
    passing_play = re.findall(passer_name_pattern, play)
    if len(passing_play) > 0 and play.find("sacked") == -1:

      # creating a copy of the passing touchdown play row and cleaning the copy
      passing_touchdown_row = df_plays.loc[idx].copy()
      passing_touchdown_row['PlayType'] = 'Pass'
      passing_touchdown_row['PlayOutcome'] = 'Pass'
      passing_touchdown_row['IsScoringPlay'] = 1
      passing_touchdown_row = pd.DataFrame([passing_touchdown_row], columns=df_plays.columns)
      passing_touchdown_row = clean_pass_plays(passing_touchdown_row)
      passing_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, passing_touchdown_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_touchdown_plays(df_plays, idx+len(passing_touchdown_row))

    ######################
    # RUSHING TOUCHDOWNS #
    ######################

    # Rusher
    rusher_patterns = [rusher_pattern, defensive_takeaway_run_pattern]
    # Loop through patterns and find the first match
    for pattern in rusher_patterns:
      rusher = re.findall(pattern, play)
      if len(rusher) > 0:
        # creating a copy of the rushing touchdown play row and cleaning the copy
        rushing_touchdown_row = df_plays.loc[idx].copy()
        rushing_touchdown_row['PlayType'] = 'Run'
        rushing_touchdown_row['PlayOutcome'] = 'Run'
        rushing_touchdown_row['IsScoringPlay'] = 1
        rushing_touchdown_row = pd.DataFrame([rushing_touchdown_row], columns=df_plays.columns)
        rushing_touchdown_row = clean_run_plays(rushing_touchdown_row)
        rushing_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

        # Replacing old row with cleaned row
        df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
        df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
        df_plays = pd.concat([df_before_row, rushing_touchdown_row, df_after_row], ignore_index=True)

        # Recursion to update 'df_plays'
        if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
          return df_plays
        else:
          return clean_touchdown_plays(df_plays, idx+len(rushing_touchdown_row))

#### FIELD GOALS

In [189]:
# I need an example of when a player returns the field goal for yardage
# I need a larger sample size for "Blocked" field goals
# I need to figure out what to do if someone fumbles a recovery
# I need to figure out what to do on a trick play (e.i. holder runs out with the ball)
# - INCOMPLETE. NEED LARGER SAMPLE SIZE

def clean_field_goal_plays(df_plays, index_start = None):

  # Adjusting df_plays to start cleaning at a specified index (index_start)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    # Locating all field goal plays within dataframe
    df_field_goal_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Field Goal')]
  else:
    # Locating all field goal plays within dataframe
    df_field_goal_plays = df_plays[df_plays['PlayOutcome'].str.contains('Field Goal')]

  for idx, play in df_field_goal_plays['PlayDescription'].items():

    play_elements = play.split(". ")

    ###################
    # EXTRA PLAY DATA #
    ###################

    if len(play_elements) > 1:

      accepted_penalties = []
      declined_penalties = []
      injured_players = []

      for i in play_elements:

        # Accepted Penalty
        if i.find('PENALTY') != -1:
          accepted_penalties.append(i)

        # Declined Penalty
        if i.find('Penalty') != -1:
          declined_penalties.append(i)

        # Injuries
        injury_on_play = re.findall(injury_pattern, i)
        if len(injury_on_play) > 0:
          injured_players.append(injury_on_play[0])

      if len(accepted_penalties) > 0:
        df_plays.at[idx, 'AcceptedPenalty'] = accepted_penalties
      if len(declined_penalties) > 0:
        df_plays.at[idx, 'DeclinedPenalty'] = declined_penalties
      if len(injured_players) > 0:
        df_plays.at[idx, 'InjuredPlayers'] = injured_players

    # Time of play
    time_on_clock = re.findall(time_on_clock_pattern, play)
    if len(time_on_clock) > 0:
      df_plays.loc[idx, 'TimeOnTheClock'] = time_on_clock[0]

    #########################
    # FIELD GOAL SITUATIONS #
    #########################

    # Field goal good
    field_goal_good = re.findall(field_goal_good_pattern, play)
    if len(field_goal_good) > 0:
      df_plays.loc[idx, 'PlayOutcome'] = 'Field Goal Good'
      df_plays.loc[idx, 'PlayType'] = 'Field Goal'
      df_plays.loc[idx, 'Kicker'] = field_goal_good[0][0]
      df_plays.loc[idx, 'Yardage'] = int(field_goal_good[0][1])
      df_plays.loc[idx, 'LongSnapper'] = field_goal_good[0][2]
      df_plays.loc[idx, 'Holder'] = field_goal_good[0][3]
      continue

    # Field goal no good
    field_goal_no_good = re.findall(field_goal_no_good_pattern, play)
    if len(field_goal_no_good) > 0:
      df_plays.loc[idx, 'PlayOutcome'] = 'Field Goal No Good'
      df_plays.loc[idx, 'PlayType'] = 'Field Goal'
      df_plays.loc[idx, 'Kicker'] = field_goal_no_good[0][0]
      df_plays.loc[idx, 'Yardage'] = int(field_goal_no_good[0][1])
      df_plays.loc[idx, 'Direction'] = field_goal_no_good[0][2]
      df_plays.loc[idx, 'LongSnapper'] = field_goal_no_good[0][3]
      df_plays.loc[idx, 'Holder'] = field_goal_no_good[0][4]
      continue

    # Field goal blocked
    # I NEED A LARGER SAMPLE SIZE TO CORRECTLY CLEAN THESE
    field_goal_blocked = re.findall(field_goal_blocked_pattern, play)
    if len(field_goal_blocked) > 0:
      df_plays.loc[idx, 'PlayOutcome'] = 'Field Goal Blocked'
      df_plays.loc[idx, 'PlayType'] = 'Field Goal'
      df_plays.loc[idx, 'Kicker'] = field_goal_blocked[0][0]
      df_plays.loc[idx, 'Yardage'] = int(field_goal_blocked[0][1])
      df_plays.loc[idx, 'BlockedBy'] = field_goal_blocked[0][2]
      df_plays.loc[idx, 'LongSnapper'] = field_goal_blocked[0][3]
      df_plays.loc[idx, 'Holder'] = field_goal_blocked[0][4]
      continue

  return df_plays

####EXTRA POINT

In [209]:
def clean_extra_point_plays(df_plays, index_start = None):

  # Adjusting df_plays to start cleaning at a specified index (index_start)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    # Locating all extra point plays within dataframe
    df_extra_point_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Extra Point')]
  else:
    # Locating all extra point plays within dataframe
    df_field_goal_plays = df_plays[df_plays['PlayOutcome'].str.contains('Extra Point')]

  for idx, play in df_field_goal_plays['PlayDescription'].items():

    play_elements = play.split(". ")

    ###################
    # EXTRA PLAY DATA #
    ###################

    if len(play_elements) > 1:

      accepted_penalties = []
      declined_penalties = []
      injured_players = []

      for i in play_elements:

        # Accepted Penalty
        if i.find('PENALTY') != -1:
          accepted_penalties.append(i)

        # Declined Penalty
        if i.find('Penalty') != -1:
          declined_penalties.append(i)

        # Injuries
        injury_on_play = re.findall(injury_pattern, i)
        if len(injury_on_play) > 0:
          injured_players.append(injury_on_play[0])

      if len(accepted_penalties) > 0:
        df_plays.at[idx, 'AcceptedPenalty'] = accepted_penalties
      if len(declined_penalties) > 0:
        df_plays.at[idx, 'DeclinedPenalty'] = declined_penalties
      if len(injured_players) > 0:
        df_plays.at[idx, 'InjuredPlayers'] = injured_players

    ##########################
    # EXTRA POINT SITUATIONS #
    ##########################

    # Extra point good
    extra_point_good = re.findall(extra_point_good_pattern, play)
    if len(extra_point_good) > 0:
      df_plays.loc[idx, 'PlayOutcome'] = 'Extra Point Good'
      df_plays.loc[idx, 'PlayType'] = 'Extra Point'
      df_plays.loc[idx, 'Kicker'] = extra_point_good[0][0]
      df_plays.loc[idx, 'LongSnapper'] = extra_point_good[0][1]
      df_plays.loc[idx, 'Holder'] = extra_point_good[0][2]
      continue

    # Extra point no good
    extra_point_no_good = re.findall(extra_point_no_good_pattern, play)
    if len(extra_point_no_good) > 0:
      df_plays.loc[idx, 'PlayOutcome'] = 'Extra Point No Good'
      df_plays.loc[idx, 'PlayType'] = 'Extra Point'
      df_plays.loc[idx, 'Kicker'] = extra_point_no_good[0][0]
      df_plays.loc[idx, 'Direction'] = extra_point_no_good[0][1]
      df_plays.loc[idx, 'LongSnapper'] = extra_point_no_good[0][2]
      df_plays.loc[idx, 'Holder'] = extra_point_no_good[0][3]
      continue

  return df_plays

###OTHER CLEANING METHODS

#### FUMBLE PLAYS

In [27]:
# What about punt returns?

def clean_fumble_plays(df_plays, index_start=None):

  # Cut 'df_plays' to begin from 'index_start' to the last penalty play available in dataframe
  if index_start != None:
    # df_plays_adjusted = df_plays.iloc[df_plays.index.tolist().index(index_start):]
    df_plays_adjusted = df_plays.loc[index_start:]
    df_fumble_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('fumble', case=False)]
  else:
    df_fumble_plays = df_plays[df_plays['PlayOutcome'].str.contains('fumble', case=False)]

  for idx, play in df_fumble_plays['PlayDescription'].items():

    ############
    # REVERSES #
    ############

    # In 'PlayDescription' all information before the "reversed" sentence is not needed.
    # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
    if play.find('REVERSED') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find("REVERSED") != -1:
          df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############################
    # REPORTING IN AS ELIGIBLE #
    ############################

    # I do not think this contains any useful data so I am going to exclude it.
    if play.find('reported in as eligible') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find('reported in as eligible') != -1:
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    initial_action = play.split(". ")[0]

    ##################
    # PASSING FUMBLE #
    ##################

    fumble_pass = re.findall(receiver_pattern, initial_action)
    if len(fumble_pass) > 0:

      # creating a copy of the passing fumbled play row and cleaning the copy
      # passing_fumble_row = df_plays.iloc[idx].copy()
      passing_fumble_row = df_plays.loc[idx].copy()
      passing_fumble_row['PlayOutcome'] = 'Pass'
      passing_fumble_row = pd.DataFrame([passing_fumble_row], columns=df_plays.columns)
      cleaned_passing_fumble_row = clean_pass_plays(passing_fumble_row)
      cleaned_passing_fumble_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, cleaned_passing_fumble_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_fumble_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_fumble_plays(df_plays, idx+len(cleaned_passing_fumble_row))

    ##################
    # RUSHING FUMBLE #
    ##################

    fumble_rush = re.findall(rusher_pattern, initial_action)
    qb_fumble = re.findall(qb_fumble_pattern, initial_action)
    fumble_aborted = initial_action.find('Aborted')
    if len(fumble_rush) > 0 or fumble_aborted != -1 or len(qb_fumble) > 0:

      # creating a copy of the rushing fumbled play row and cleaning the copy
      rushing_fumble_row = df_plays.loc[idx].copy()
      rushing_fumble_row['PlayOutcome'] = 'Run'
      rushing_fumble_row = pd.DataFrame([rushing_fumble_row], columns=df_plays.columns)
      cleaned_rushing_fumble_row = clean_run_plays(rushing_fumble_row)
      cleaned_rushing_fumble_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, cleaned_rushing_fumble_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_fumble_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_fumble_plays(df_plays, idx+len(cleaned_rushing_fumble_row))

    #################
    # SACKED FUMBLE #
    #################

    if initial_action.find('sacked') != -1:

      # creating a copy of the sacked fumble play row and cleaning the copy
      sacked_fumble_row = df_plays.loc[idx].copy()
      sacked_fumble_row['PlayOutcome'] = 'Sack'
      sacked_fumble_row = pd.DataFrame([sacked_fumble_row], columns=df_plays.columns)
      cleaned_sacked_fumble_row = clean_sacked_plays(sacked_fumble_row)
      # cleaned_sacked_fumble_row['PlayOutcome'] = df_plays['PlayOutcome'].iloc[idx]
      cleaned_sacked_fumble_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, cleaned_sacked_fumble_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_fumble_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_fumble_plays(df_plays, idx+len(cleaned_sacked_fumble_row))

    ##################
    # KICKOFF FUMBLE #
    ##################

    kickoff_fumble = re.findall(kickoff_pattern, initial_action)
    if len(kickoff_fumble) > 0:

      # creating a copy of the passing fumbled play row and cleaning the copy
      kickoff_fumble_row = df_plays.loc[idx].copy()
      kickoff_fumble_row['PlayOutcome'] = 'kickoff'
      kickoff_fumble_row = pd.DataFrame([kickoff_fumble_row], columns=df_plays.columns)
      cleaned_kickoff_fumble_row = clean_kickoff_plays(kickoff_fumble_row)
      cleaned_kickoff_fumble_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, cleaned_kickoff_fumble_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_fumble_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_fumble_plays(df_plays, idx+len(cleaned_kickoff_fumble_row))

  return df_plays

#### PENALTY PLAYS

In [28]:
# This probably does not cover every possible penalty play.
# For example, in this sample of plays there are no penalties during kickoffs
# when penalties during kickoffs are 100% possible.

def clean_penalty_plays(df_plays, index_start=None):

  # Cut 'df_plays' to begin from 'index_start' to the last penalty play available in dataframe
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_plays_adjusted = df_plays.iloc[df_plays.index.tolist().index(index_start):]
    df_penalty_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('penalty', case=False)]
  else:
    df_penalty_plays = df_plays[df_plays['PlayOutcome'].str.contains('penalty', case=False)]

  # Iterating through every penalty play within 'df_penalty_plays'
  for idx, play in df_penalty_plays['PlayDescription'].items():

    ############
    # REVERSES #
    ############

    # In 'PlayDescription' all information before the "reversed" sentence is not needed.
    # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
    if play.find('REVERSED') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find("REVERSED") != -1:
          df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############################
    # REPORTING IN AS ELIGIBLE #
    ############################

    # I do not think this contains any useful data so I am going to exclude it.
    if play.find('reported in as eligible') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find('reported in as eligible') != -1:
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    initial_action = play.split(". ")[0]

    ###############################
    # PENALTY DURING PASSING PLAY #
    ###############################

    penalty_pass = re.findall(receiver_pattern, initial_action)
    if len(penalty_pass) > 0 or play.find('pass incomplete') != -1:

      # creating a copy of the passing penalty play row and cleaning the copy
      passing_penalty_row = df_plays.loc[idx].copy()
      passing_penalty_row['PlayOutcome'] = 'Pass'
      passing_penalty_row = pd.DataFrame([passing_penalty_row], columns=df_plays.columns)
      cleaned_passing_penalty_row = clean_pass_plays(passing_penalty_row) # <- Check to make sure that if there are multiple rows added that the playoutcomes are correct
      #                                                                        I've got a feeling that some of them are not. Like it will change the original indexes
      #                                                                        PlayOutcome but not the row that needs it.
      cleaned_passing_penalty_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]
      cleaned_passing_penalty_row['PlayType'] = 'No Play'

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, cleaned_passing_penalty_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_penalty_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_penalty_plays(df_plays, idx+len(cleaned_passing_penalty_row))

    ###############################
    # PENALTY DURING RUSHING PLAY #
    ###############################

    penalty_rush = re.findall(rusher_pattern, initial_action)
    if len(penalty_rush) > 0 or play.find('Aborted') != -1:

      # creating a copy of the rushing penalty play row and cleaning the copy
      rushing_penalty_row = df_plays.loc[idx].copy()
      rushing_penalty_row['PlayOutcome'] = 'Run'
      rushing_penalty_row = pd.DataFrame([rushing_penalty_row], columns=df_plays.columns)
      cleaned_rushing_penalty_row = clean_run_plays(rushing_penalty_row)
      cleaned_rushing_penalty_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]
      cleaned_rushing_penalty_row['PlayType'] = 'No Play'

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, cleaned_rushing_penalty_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_penalty_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_penalty_plays(df_plays, idx+len(cleaned_rushing_penalty_row))

    ######################################
    # PENALTY DURING 2PT CONVERSION PLAY #
    ######################################

    if play.find('TWO-POINT CONVERSION ATTEMPT') != -1:

      # creating a copy of the 2pt conversion penalty play row and cleaning the copy
      two_pt_conversion_penalty_row = df_plays.loc[idx].copy()
      two_pt_conversion_penalty_row['PlayOutcome'] = '2PT Conversion'
      two_pt_conversion_penalty_row = pd.DataFrame([two_pt_conversion_penalty_row], columns=df_plays.columns)
      cleaned_two_pt_penalty_row = cleaning_2pt_conversion_plays(two_pt_conversion_penalty_row)
      cleaned_two_pt_penalty_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]
      cleaned_two_pt_penalty_row['PlayType'] = 'No Play'

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, cleaned_two_pt_penalty_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_penalty_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_penalty_plays(df_plays, idx+len(cleaned_two_pt_penalty_row))

    #########################
    # PENALTY (False Start) #
    #########################

    # Will use 'clean_run_plays' method to clean these
    # All other penalty plays (e.i. False Start, Delay of Game, Offside, Neutral Zone Infraction, Too Many Men on Field, Encroachment, Taunting)

    # if play.find('False Start') != -1 or play.find('Delay of Game') != -1:

    # TimeOnTheClock
    TimeOnTheClock = re.findall(time_on_clock_pattern, play)
    if len(TimeOnTheClock) > 0:
      df_plays.loc[idx, 'TimeOnTheClock'] = TimeOnTheClock[0]

    # Formation
    Formation = re.findall(formation, play)
    if len(Formation) > 0:
      if Formation[0] == 'Aborted':
        pass
      else:
        df_plays.loc[idx, 'Formation'] = Formation[0]

    df_plays.at[idx, 'AcceptedPenalty'] = play
    df_plays.at[idx, 'PlayType'] = 'No Play'

  return df_plays

#### TURNOVER ON DOWNS

In [29]:
# Looks like either a pass / run / sack play

def clean_turnover_on_downs_plays(df_plays, index_start=None):

  # Cut 'df_plays' to begin from 'index_start' to the last penalty play available in dataframe
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_turnover_on_downs_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Turnover on Downs', case=False)]
  else:
    df_turnover_on_downs_plays = df_plays[df_plays['PlayOutcome'].str.contains('Turnover on Downs', case=False)]

  # Iterating through every penalty play within 'df_turnover_on_downs_plays'
  for idx, play in df_turnover_on_downs_plays['PlayDescription'].items():

    ##############################
    # TURNOVER ON DOWNS (SACKED) #
    ##############################

    if play.find("sacked") != -1:

      sacked_turnover_on_downs = df_plays.loc[idx].copy()
      sacked_turnover_on_downs['PlayOutcome'] = 'Sack'
      sacked_turnover_on_downs = pd.DataFrame([sacked_turnover_on_downs], columns=df_plays.columns)
      sacked_turnover_on_downs.reset_index(drop=True, inplace=True)
      cleaned_sacked_turnover_on_downs = clean_sacked_plays(sacked_turnover_on_downs)
      cleaned_sacked_turnover_on_downs['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, cleaned_sacked_turnover_on_downs, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_turnover_on_downs_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_turnover_on_downs_plays(df_plays, idx+len(cleaned_sacked_turnover_on_downs))

    ############################
    # TURNOVER ON DOWNS (PASS) #
    ############################

    passing_play = re.findall(passer_name_pattern, play)
    if len(passing_play) > 0:

      passing_turnover_on_downs = df_plays.loc[idx].copy()
      passing_turnover_on_downs['PlayOutcome'] = 'Pass'
      passing_turnover_on_downs = pd.DataFrame([passing_turnover_on_downs], columns=df_plays.columns)
      cleaned_passing_turnover_on_downs = clean_pass_plays(passing_turnover_on_downs)
      cleaned_passing_turnover_on_downs['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, cleaned_passing_turnover_on_downs, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_turnover_on_downs_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_turnover_on_downs_plays(df_plays, idx+len(cleaned_passing_turnover_on_downs))

      ############################
      # TURNOVER ON DOWNS (RUSH) #
      ############################

    rushing_play = re.findall(rusher_pattern, play)
    if len(rushing_play) > 0:

      rushing_turnover_on_downs = df_plays.loc[idx].copy()
      rushing_turnover_on_downs['PlayOutcome'] = 'Run'
      rushing_turnover_on_downs = pd.DataFrame([rushing_turnover_on_downs], columns=df_plays.columns)
      cleaned_rushing_turnover_on_downs = clean_run_plays(rushing_turnover_on_downs)
      cleaned_rushing_turnover_on_downs['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, cleaned_rushing_turnover_on_downs, df_after_row], ignore_index=True)

      if df_turnover_on_downs_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_turnover_on_downs_plays(df_plays, idx+len(cleaned_rushing_turnover_on_downs))

## 3. PIPELINE MAIN METHOD

In [200]:
# PURPOSE:
# - Accept a dataframe of plays (dataframes formatted by NFL_Scrapers) and
#   return a cleaned dataframe of those plays.
# INPUT PARAMTERS:
# df_all_plays         - dataframe - all plays in raw form from NFL_Scraper that user
#                                    would like to clean.
# OUTPUT:
# df_all_plays_cleaned - dataframe - all plays from 'df_all_plays' cleaned and data
#                                    dispersed into individual new features.

# CURRENT DESIGN PLAN:
# 1. Use uniquely designed methods for each play type to clean within dataframe
#    - (e.g. pass, run, touchdown, punt, sack, ... )
# 2. Repeat until all plays within dataframe have been cleaned.
#   NOTE:
#   - It is important to fully clean a play type before moving to the next
#      because sometimes cleaning could involve adding a new row to the dataframe,
#      causing a reset to the dataframes indexing.
#      - If we were to separate all play types from the beginning, the indexes
#        could shift around causing, for example, an index that might originally
#        point to a run play to now instead point at a pass play.

# NOTES:
# - I think "PlayOutcomes" is what determines the yardage gained on an intended play?
#   - This does not seem right to me.
#   - EXAMPLE:
#     - (9:54) Bre.Hall left end to BUF 22 for -1 yards (G.Rousseau)
#       FUMBLES (G.Rousseau), ball out of bounds at BUF 25.
#       - I would think that Bre.Hall would get docked -1 yards for his run.
#         - But I believe that he is actually docked -4
#           - 'PlayStart' = 2nd & 9 at BUF 21
#           - The play ends at BUF 25
#             - In my opinion and how I am going to track yardage is based on
#               possession of the ball. So I will track this as -1 yard not -4.

def clean_dataframe_of_plays(df_all_plays):

  ###########################
  # NEW COLUMN DESCRIPTIONS #
  ###########################

  # PlayType           - The type of play (e.g. pass/run)
  # TimeOnTheClock     - The time that was on the clock when the play started
  # Formation          - Play formation
  # Passer             - Player that threw the ball (mostly the quarterback)
  # Rusher             - Player that ran the ball (mostly the runningback)
  # Receiver           - Player on the same team as the passer that caught the ball
  # PassType           - Whether the pass was a deep or short pass?
  # Direction          - Where the ball is going during the play
  # Yardage            - Yards gained during the play
  # TackleBy1          - Main tackler on the play (could be solo or could be with someone else)
  # TackleBy2          - Assisted tackler1
  # PressureBy         - Defender that applied pressure to the passer
  # InterceptedBy      - Defender that intercepted the passing play
  # FumbleDetails      - A list that has what happened after the fumble
  #                      - [forced fumble by, recovered by, yards gained, tackled by]
  # ReverseDetails     - A list having plays leading up to play reversal
  # InjuredPlayers     - Players that were injured during the play
  # PenaltyDescription - If there is a penalty, gives a description of it
  #                      - [who caused the penalty, what was the penalty, yards lost if penalty accepted]

  # new_columns = ["PlayType", "TimeOnTheClock", "Formation", "Passer", "Rusher", "Receiver", "Direction", "Yardage",
  #               "TackleBy1", "TackleBy2", "PressureBy", "InterceptedBy", "SackedBy", "ForcedFumbleBy",
  #               "FumbleDetails", "ReverseDetails",
  #               "InjuredPlayers", "AcceptedPenalty", "DeclinedPenalty",
  #               "Kicker", "LongSnapper", "Returner", "DownedBy", "Holder", "BlockedBy"]

  # string_columns = ["PlayType", "TimeOnTheClock", "Formation", "Passer", "Rusher", "Receiver", "Direction",
  #                   "TackleBy1", "TackleBy2", "PressureBy", "InterceptedBy", "SackedBy", "ForcedFumbleBy",
  #                   "FumbleDetails", "ReverseDetails",
  #                   "InjuredPlayers", "AcceptedPenalty", "DeclinedPenalty",
  #                   "Kicker", "LongSnapper", "Returner", "DownedBy", "Holder", "BlockedBy"]

  # int_columns = ["Yardage"]


  new_columns = ["PlayType", "TimeOnTheClock", "Formation", "Passer", "Rusher", "Receiver", "Direction", "Yardage",
                "SoloTackle", "AssistedTackle", "SharedTackle", "PressureBy", "InterceptedBy", "SackedBy", "ForcedFumbleBy",
                "FumbleDetails", "ReverseDetails",
                "InjuredPlayers", "AcceptedPenalty", "DeclinedPenalty",
                "Kicker", "LongSnapper", "Returner", "DownedBy", "Holder", "BlockedBy"]

  string_columns = ["PlayType", "TimeOnTheClock", "Formation", "Passer", "Rusher", "Receiver", "Direction",
                    "SoloTackle", "AssistedTackle", "SharedTackle", "PressureBy", "InterceptedBy", "SackedBy", "ForcedFumbleBy",
                    "FumbleDetails", "ReverseDetails",
                    "InjuredPlayers", "AcceptedPenalty", "DeclinedPenalty",
                    "Kicker", "LongSnapper", "Returner", "DownedBy", "Holder", "BlockedBy"]

  int_columns = ["Yardage"]

  ########################################
  # RETURN DATAFRAME WITH ADDED FEATURES #
  ########################################

  df_all_plays_cleaned = df_all_plays.copy()
  df_all_plays_cleaned = df_all_plays_cleaned.reindex(columns=df_all_plays_cleaned.columns.tolist() + new_columns)
  df_all_plays_cleaned[string_columns] = df_all_plays_cleaned[string_columns].astype(str)
  df_all_plays_cleaned[int_columns] = df_all_plays_cleaned[int_columns].astype(float)

  ########################################
  # GETTING PLAY CATEGORIES AND CLEANING #
  ########################################

  df_all_plays_cleaned = clean_run_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_pass_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = cleaning_2pt_conversion_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_intercepted_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_sacked_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_punt_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_kickoff_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_touchdown_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_field_goal_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_extra_point_plays(df_all_plays_cleaned)
  # df_all_plays_cleaned = clean_penalty_plays(df_all_plays_cleaned)
  # df_all_plays_cleaned = clean_turnover_on_downs_plays(df_all_plays_cleaned)
  # df_all_plays_cleaned = clean_fumble_plays(df_all_plays_cleaned)

  return df_all_plays_cleaned

# TESTING (Helper Methods)

In [31]:
# PURPOSE:
# - A tool that can be used to compare original plays and their cleaned versions

# I would like to return a map that has:
# KEY: index of original unclean play
# VALUE: index(es) of cleaned play

def unclean_clean_matches(df_unclean_plays, df_clean_plays):

  my_map = {}

  # This group of features is unique to each play
  # - Both the unclean and cleaned versions of the plays have these
  # - These features will be used to find the matching plays between the unclean df and the cleaned df
  matching_features = ['Season', 'Week', 'Date', 'AwayTeam', 'HomeTeam', 'Quarter', 'DriveNumber', 'TeamWithPossession', 'PlayNumberInDrive']

  # Iterate through each row of the dataframe of unclean plays
  for u_row in df_unclean_plays.itertuples(index=True):
    u_features = [getattr(u_row, col) for col in matching_features]

    matching_indexes = []
    matches_found = False

    # Iterate through each row of the dataframe of cleaned plays
    # - The starting index will be the index of the unclean play within the main original dataframe of plays
    #   - The matching cleaned pair will either be at the exact same location or higher
    for c_row in df_clean_plays[u_row.Index::].itertuples(index=True):
      c_features = [getattr(c_row, col) for col in matching_features]

      # If a match is found, check for consective rows of matches because some uncleaned plays needed to be cleaned using multiple rows
      # - Once a row that does not match follows one that does, will break the loop because the one play match has been found.
      if u_features == c_features:
        matching_indexes.append(c_row.Index)
        matches_found = True
      elif matches_found:
        my_map[u_row.Index] = matching_indexes
        break

  return my_map

# TESTING AREA

In [206]:
week1_2023_plays_copy = week1_2023_plays.copy()

df_week1_plays_cleaned = clean_dataframe_of_plays(week1_2023_plays_copy)

In [192]:
df_week1_plays_cleaned.shape

(2728, 42)

## Passing plays

In [34]:
# Number of passing type plays during 2023, Week 1

df_unclean_pass_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('Pass')]

map_passing_plays = unclean_clean_matches(df_unclean_pass_plays, df_week1_plays_cleaned)

len(map_passing_plays.keys())

997

In [35]:
# Every unclean passing play and their associated cleaned play breakdown

for i in map_passing_plays.keys():
  print(f"({i}, {map_passing_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(1, [1])
(15:00) (Shotgun) J.Allen pass short right to S.Diggs to BUF 32 for 7 yards (A.Gardner).

(2, [2])
(14:34) (No Huddle, Shotgun) J.Allen pass short left to D.Harty to BUF 37 for 5 yards (Qu.Williams).

(5, [5])
(12:39) (Shotgun) J.Allen pass incomplete short left to S.Diggs.

(9, [9])
(8:44) (Shotgun) J.Allen pass short right to D.Harty to BUF 24 for 1 yard (C.Mosley).

(11, [11])
(7:25) (Shotgun) J.Allen pass short left to D.Harty pushed ob at BUF 41 for 3 yards (A.Amos).

(13, [13])
(6:18) (No Huddle, Shotgun) J.Allen pass short left to D.Knox to NYJ 45 for 6 yards (D.Reed).

(14, [14])
(5:44) (No Huddle, Shotgun) J.Allen pass short right to S.Diggs to NYJ 30 for 15 yards (A.Amos).

(16, [16])
(4:30) (Shotgun) J.Allen pass short right to D.Knox to NYJ 35 for 4 yards (Q.Williams, M.Carter).

(17, [17])
(3:56) (No Huddle, Shotgun) J.Allen pass short left to D.Harris to NYJ 22 for 13 yards (Qu.Williams).

(20, [20])
(13:54) (Shotgun) J.Allen pass short left to J.Cook to BUF 31 f

In [36]:
# passing type plays during 2023, Week 1 that have been spiked

df_unclean_pass_plays_spiked = week1_2023_plays.loc[(week1_2023_plays['PlayOutcome'].str.contains('Pass')) &
                                                    (week1_2023_plays['PlayDescription'].str.contains('spiked', case=False))]

map_passing_spiked_plays = unclean_clean_matches(df_unclean_pass_plays_spiked, df_week1_plays_cleaned)

for i in map_passing_spiked_plays.keys():
  print(f"({i}, {map_passing_spiked_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(74, [77])
(:17) (No Huddle) J.Allen spiked the ball to stop the clock.

(767, [811])
(:10) (No Huddle) J.Goff spiked the ball to stop the clock.

(1085, [1142])
(:06) (No Huddle) C.Stroud spiked the ball to stop the clock.

(1405, [1477])
(:19) (No Huddle) R.Wilson spiked the ball to stop the clock.

(2395, [2511])
(:24) (No Huddle) K.Pickett spiked the ball to stop the clock.



In [127]:
# passing type plays during 2023, Week 1 that result in touchdown

df_unclean_pass_plays_touchdown = week1_2023_plays.loc[(week1_2023_plays['PlayOutcome'].str.contains('touchdown', case=False)) &
                                                    (week1_2023_plays['PlayDescription'].str.contains('pass', case=False))]

map_passing_touchdown_plays = unclean_clean_matches(df_unclean_pass_plays_touchdown, df_week1_plays_cleaned)

for i in map_passing_touchdown_plays.keys():
  print(f"({i}, {map_passing_touchdown_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(33, [34])
(4:51) (Shotgun) J.Allen pass short right to S.Diggs for 5 yards, TOUCHDOWN.

(134, [141])
(4:58) Z.Wilson pass short left to G.Wilson for 3 yards, TOUCHDOWN.

(163, [171])
(6:14) (Shotgun) J.Love pass short middle to R.Doubs for 8 yards, TOUCHDOWN.

(202, [212])
(6:34) (Shotgun) J.Love pass short middle to A.Jones for 35 yards, TOUCHDOWN
GB-A.Jones was injured during the play
His return is Questionable.

(214, [226])
(13:34) J.Love pass short left to R.Doubs for 4 yards, TOUCHDOWN.

(219, [231, 232])
(12:53) (Shotgun) J.Fields pass short middle intended for D.Mooney INTERCEPTED by Q.Walker [K.Clark] at CHI 37
Q.Walker for 37 yards, TOUCHDOWN
PENALTY on GB-R.Douglas, Unsportsmanlike Conduct, 15 yards, enforced between downs.

(289, [310])
(1:04) (Shotgun) J.Fields pass deep right to D.Mooney for 20 yards, TOUCHDOWN.

(363, [386])
(11:33) (Shotgun) A.Richardson pass short left to M.Pittman for 39 yards, TOUCHDOWN.

(419, [447])
(5:26) (Shotgun) T.Lawrence pass short left to C

In [38]:
# passing type plays during 2023, Week 1 that result in touchdown

df_unclean_pass_plays_touchdown = week1_2023_plays.loc[(week1_2023_plays['PlayOutcome'].str.contains('touchdown', case=False)) &
                                                    (week1_2023_plays['PlayDescription'].str.contains('PENALTY', case=False))]

map_passing_touchdown_plays = unclean_clean_matches(df_unclean_pass_plays_touchdown, df_week1_plays_cleaned)

for i in map_passing_touchdown_plays.keys():
  print(f"({i}, {map_passing_touchdown_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(219, [231, 232])
(12:53) (Shotgun) J.Fields pass short middle intended for D.Mooney INTERCEPTED by Q.Walker [K.Clark] at CHI 37
Q.Walker for 37 yards, TOUCHDOWN
PENALTY on GB-R.Douglas, Unsportsmanlike Conduct, 15 yards, enforced between downs.

(1307, [1378])
(9:32) (Shotgun) J.Garoppolo pass short left to J.Meyers for 3 yards, TOUCHDOWN
PENALTY on LV-J.Meyers, Taunting, 15 yards, enforced between downs.

(1709, [1798])
(11:21) (No Huddle, Shotgun) K.Cousins pass deep middle to J.Addison for 39 yards, TOUCHDOWN
Penalty on TB-J.Dean, Illegal Contact, declined.

(1787, [1879])
(9:11) J.Herbert pass short middle to D.Parham for 1 yard, TOUCHDOWN
Penalty on MIA-K.Kohou, Defensive Offside, declined.



In [39]:
# every passing play that resulted in a fumble (including fumble recoveries resulting in a touchdown)

df_unclean_pass_fumble_plays = week1_2023_plays.loc[((week1_2023_plays['PlayOutcome'].str.contains('Pass')) |
                                                   ((week1_2023_plays['PlayDescription'].str.contains('Touchdown', case=False)) &
                                                   (week1_2023_plays['PlayOutcome'].str.contains('Pass')))) &
                                                   (week1_2023_plays['PlayDescription'].str.contains('fumbles', case=False))]

for i in unclean_clean_matches(df_unclean_pass_fumble_plays, df_week1_plays_cleaned).items():
  print(i)

(213, [224, 225])
(423, [451])
(872, [920])
(961, [1013, 1014])
(1605, [1687])
(1931, [2025])
(2295, [2403])


In [40]:
dict_unclean_to_clean_pass_fumble_plays = unclean_clean_matches(df_unclean_pass_fumble_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_pass_fumble_plays.keys():
  # print(i)
  print(f"({i}, {dict_unclean_to_clean_pass_fumble_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(213, [224, 225])
(14:21) J.Love to CHI 44 for -3 yards
FUMBLES, and recovers at CHI 46
J.Love pass deep left to L.Musgrave to CHI 4 for 37 yards (T.Stevenson) [D.Walker].

(423, [451])
(14:15) T.Lawrence pass short right to C.Ridley to JAX 47 for 14 yards (R.Thomas, E.Speed)
FUMBLES (E.Speed), RECOVERED by IND-E.Speed at IND 49
E.Speed ran ob at IND 49 for no gain
The Replay Official reviewed the ball was inbounds ruling, and the play was REVERSED
T.Lawrence pass short right to C.Ridley to JAX 47 for 14 yards (R.Thomas, E.Speed)
FUMBLES (E.Speed), ball out of bounds at IND 49
IND-K.Moore was injured during the play
IND-D.Flowers was injured during the play.

(872, [920])
(11:26) (Shotgun) D.Prescott pass short right to T.Pollard to NYG 12 for 7 yards (B.Okereke)
FUMBLES (B.Okereke), recovered by DAL-T.Biadasz at NYG 4.

(961, [1013, 1014])
(4:45) (Shotgun) D.Jones pass short left to M.Breida to NYG 43 for 5 yards (M.Bell)
FUMBLES (M.Bell), recovered by NYG-P.Campbell at NYG 35
P.Campb

## Rushing plays

In [41]:
# Number of running type plays during 2023, Week 1

df_unclean_run_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('Run')]

map_run_plays = unclean_clean_matches(df_unclean_run_plays, df_week1_plays_cleaned)

len(map_run_plays.keys())

831

In [42]:
# Every unclean passing play and their associated cleaned play breakdown

for i in map_run_plays.keys():
  print(f"({i}, {map_run_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(3, [3])
(14:01) J.Cook up the middle to BUF 40 for 3 yards (J.Johnson, J.Franklin-Myers).

(4, [4])
(13:24) (Shotgun) J.Cook up the middle to BUF 42 for 2 yards (Q.Williams; J.Franklin-Myers).

(8, [8])
(9:24) (Shotgun) J.Cook right tackle to BUF 23 for 5 yards (Qu.Williams).

(10, [10])
(8:02) (Shotgun) J.Allen scrambles up the middle to BUF 38 for 14 yards (D.Reed).

(12, [12])
(6:52) (No Huddle, Shotgun) J.Cook up the middle to BUF 49 for 8 yards (D.Reed).

(23, [24])
(10:35) (Shotgun) J.Cook left end to BUF 20 for -5 yards (Q.Williams).

(27, [28])
(8:40) (Shotgun) J.Allen scrambles left end pushed ob at NYJ 48 for 6 yards (Q.Jefferson; C.Mosley).

(28, [29])
(7:59) J.Cook right end ran ob at NYJ 36 for 12 yards (J.Sherwood).

(29, [30])
(7:32) (No Huddle, Shotgun) J.Cook up the middle to NYJ 37 for -1 yards (Qu.Williams, A.Gardner).

(32, [33])
(5:34) (Shotgun) D.Harris up the middle to NYJ 5 for 3 yards (Qu.Williams).

(36, [37])
(2:36) (Shotgun) J.Cook left tackle pushed ob at 

In [43]:
# penalty rushing plays

df_unclean_rush_penalty_plays = week1_2023_plays.loc[(week1_2023_plays['PlayOutcome'].str.contains('Run')) &
                                                    (week1_2023_plays['PlayDescription'].str.contains('penalty', case=False))]

dict_unclean_rush_penalty_plays = unclean_clean_matches(df_unclean_rush_penalty_plays, df_week1_plays_cleaned)

for i in dict_unclean_rush_penalty_plays.keys():
  print(f"({i}, {dict_unclean_rush_penalty_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(223, [236])
(7:56) A.Dillon left end to GB 35 for 11 yards (T.Edwards)
CHI-A.Billings was injured during the play
 PENALTY on GB-D.Wicks, Offensive Holding, 10 yards, enforced at GB 26.

(247, [264])
(3:20) (Shotgun) J.Fields scrambles left end pushed ob at GB 33 for 8 yards (J.Alexander)
Penalty on CHI-D.Moore, Unnecessary Roughness, offsetting
Penalty on GB-T.Slaton, Unnecessary Roughness, offsetting.

(252, [269])
(15:00) (Shotgun) J.Fields scrambles left guard to CHI 38 for 8 yards (D.Savage)
PENALTY on GB-J.Alexander, Illegal Contact, 5 yards, enforced at CHI 38.

(283, [304])
(3:22) (Shotgun) R.Johnson left guard to CHI 45 for 10 yards (K.Nixon)
PENALTY on CHI-B.Jones, Offensive Holding, 10 yards, enforced at CHI 39.

(287, [308])
(2:08) (No Huddle, Shotgun) R.Johnson left guard to GB 36 for 11 yards (R.Ford)
PENALTY on GB-R.Ford, Unnecessary Roughness, 15 yards, enforced at GB 36.

(361, [384])
(12:28) (No Huddle, Shotgun) E.Hull right end to JAX 10 for 11 yards (A.Cisco)
IND-E

In [44]:
# fumbled rushing plays (not including touchdowns)

df_unclean_rush_fumble_plays = week1_2023_plays.loc[(week1_2023_plays['PlayOutcome'].str.contains('Run')) &
                                                    (week1_2023_plays['PlayDescription'].str.contains('fumbles', case=False))]

for i in unclean_clean_matches(df_unclean_rush_fumble_plays, df_week1_plays_cleaned).items():
  print(i)

(115, [121])
(230, [245])
(756, [798, 799])
(826, [872])
(933, [983, 984])
(1015, [1072])
(1214, [1280])
(1343, [1414])
(1512, [1588, 1589])
(1921, [2015])


In [45]:
dict_unclean_to_clean_rush_fumble_plays = unclean_clean_matches(df_unclean_rush_fumble_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_rush_fumble_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_rush_fumble_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(115, [121])
(9:54) Bre.Hall left end to BUF 22 for -1 yards (G.Rousseau)
FUMBLES (G.Rousseau), ball out of bounds at BUF 25.

(230, [245])
(2:08) S.Clifford FUMBLES (Aborted) at CHI 35, and recovers at CHI 35.

(756, [798, 799])
(6:44) (Shotgun) J.Goff Aborted
F.Ragnow FUMBLES at KC 24, recovered by DET-J.Goff at KC 27
J.Goff to KC 27 for no gain (G.Karlaftis).

(826, [872])
(8:53) (Shotgun) D.Jones Aborted
J.Schmitz FUMBLES at DAL 18, recovered by NYG-D.Jones at DAL 27.

(933, [983, 984])
(9:27) (Shotgun) D.Jones FUMBLES (Aborted) at NYG 30, and recovers at NYG 30
D.Jones to NYG 32 for 2 yards (M.Smith).

(1015, [1072])
(6:33) (No Huddle, Shotgun) L.Jackson scrambles right end to HOU 20 for 6 yards (T.Thomas)
FUMBLES (T.Thomas), recovered by BAL-K.Zeitler at HOU 23
HOU-H.Ridgeway was injured during the play.

(1214, [1280])
(1:39) J.Williams right tackle to TEN 9 for 11 yards (K.Byard, S.Murphy-Bunting)
FUMBLES (S.Murphy-Bunting), and recovers at TEN 9.

(1343, [1414])
(3:02) T.Munfo

In [167]:
# All rushing touchdowns

df_unclean_pass_plays_touchdown = week1_2023_plays.loc[(week1_2023_plays['PlayOutcome'].str.contains('touchdown', case=False))]

list_all_touchdown_rushing_plays = []

for idx, play in df_unclean_pass_plays_touchdown['PlayDescription'].items():
  run_play = re.findall(rusher_pattern, play)
  if len(run_play) > 0:
    list_all_touchdown_rushing_plays.append(idx)

map_rushing_touchdown_plays = unclean_clean_matches(week1_2023_plays.loc[list_all_touchdown_rushing_plays], df_week1_plays_cleaned)

for i in map_rushing_touchdown_plays.keys():
  print(f"({i}, {map_rushing_touchdown_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(197, [207])
(10:23) (Shotgun) A.Jones right guard for 1 yard, TOUCHDOWN.

(311, [333])
(2:58) R.Johnson left guard for 2 yards, TOUCHDOWN.

(339, [361])
(15:00) (Shotgun) A.Richardson up the middle for 2 yards, TOUCHDOWN.

(481, [513])
(5:17) (Shotgun) T.Bigsby left guard for 1 yard, TOUCHDOWN.

(485, [517])
(4:15) T.Etienne left tackle for 26 yards, TOUCHDOWN.

(610, [649])
(:22) (Shotgun) D.Watson left end for 13 yards, TOUCHDOWN.

(801, [847])
(7:11) D.Montgomery up the middle for 8 yards, TOUCHDOWN.

(858, [905])
(8:07) T.Pollard right end for 2 yards, TOUCHDOWN.

(874, [922])
(10:05) T.Pollard right end for 1 yard, TOUCHDOWN.

(890, [938])
(11:37) K.Turpin left end for 7 yards, TOUCHDOWN.

(988, [1044])
(1:05) J.Dobbins right end for 4 yards, TOUCHDOWN.

(1010, [1067])
(9:58) (No Huddle, Shotgun) J.Hill right tackle for 2 yards, TOUCHDOWN.

(1018, [1075])
(5:24) (Shotgun) J.Hill right guard for 2 yards, TOUCHDOWN.

(1489, [1564])
(14:18) T.Allgeier left end for 3 yards, TOUCHDOWN

##2pt Conversions

In [46]:
# All extra point plays

dict_unclean_to_clean_2ptc = unclean_clean_matches(df_2023_2pt_conversion_week1, df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_2ptc)} number of 2pt conversion attempts")
print("\n\n")
for i in dict_unclean_to_clean_2ptc.keys():
  print(f"({i}, {dict_unclean_to_clean_2ptc.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

5 number of 2pt conversion attempts



(290, [311])
TWO-POINT CONVERSION ATTEMPT
K.Herbert rushes right guard
ATTEMPT SUCCEEDS.

(313, [335])
TWO-POINT CONVERSION ATTEMPT
J.Fields pass to D.Moore is incomplete
ATTEMPT FAILS.

(640, [679])
TWO-POINT CONVERSION ATTEMPT
D.Watson rushes left tackle
ATTEMPT SUCCEEDS.

(1012, [1069])
TWO-POINT CONVERSION ATTEMPT
G.Edwards rushes up the middle
ATTEMPT SUCCEEDS.

(2014, [2111])
TWO-POINT CONVERSION ATTEMPT
M.Jones pass to M.Gesicki is incomplete
ATTEMPT FAILS.



In [47]:
# All passing 2PT conversion attempts

index_pass_2ptc = []

for i in list(df_2023_2pt_conversion_week1.index):
  pass_2ptc = re.findall(tp_conversion_pass_pattern, week1_2023_plays['PlayDescription'].iloc[i])
  if len(pass_2ptc) > 0:
    index_pass_2ptc.append(i)

dict_unclean_to_clean_pass_2ptc = unclean_clean_matches(week1_2023_plays.iloc[index_pass_2ptc], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_pass_2ptc)} number of 2pt conversion pass attempts")
print("\n\n")
for i in dict_unclean_to_clean_pass_2ptc.keys():
  print(f"({i}, {dict_unclean_to_clean_pass_2ptc.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

2 number of 2pt conversion pass attempts



(313, [335])
TWO-POINT CONVERSION ATTEMPT
J.Fields pass to D.Moore is incomplete
ATTEMPT FAILS.

(2014, [2111])
TWO-POINT CONVERSION ATTEMPT
M.Jones pass to M.Gesicki is incomplete
ATTEMPT FAILS.



In [48]:
# All rushing 2PT conversion attempts

index_rush_2ptc = []

for i in list(df_2023_2pt_conversion_week1.index):
  rush_2ptc = re.findall(tp_conversion_rush_pattern, week1_2023_plays['PlayDescription'].iloc[i])
  if len(rush_2ptc) > 0:
    index_rush_2ptc.append(i)

dict_unclean_to_clean_rush_2ptc = unclean_clean_matches(week1_2023_plays.iloc[index_rush_2ptc], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_rush_2ptc)} number of 2pt conversion attempts")
print("\n\n")
for i in dict_unclean_to_clean_rush_2ptc.keys():
  print(f"({i}, {dict_unclean_to_clean_rush_2ptc.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

3 number of 2pt conversion attempts



(290, [311])
TWO-POINT CONVERSION ATTEMPT
K.Herbert rushes right guard
ATTEMPT SUCCEEDS.

(640, [679])
TWO-POINT CONVERSION ATTEMPT
D.Watson rushes left tackle
ATTEMPT SUCCEEDS.

(1012, [1069])
TWO-POINT CONVERSION ATTEMPT
G.Edwards rushes up the middle
ATTEMPT SUCCEEDS.



## Intercepted plays

In [49]:
df_unclean_intercepted_plays = week1_2023_plays.loc[(week1_2023_plays['PlayDescription'].str.contains('INTERCEPTED', case=False)) |
                                                    (week1_2023_plays['PlayOutcome'].str.contains('Interception', case=False))]

dict_unclean_to_clean_intercepted_plays = unclean_clean_matches(df_unclean_intercepted_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_intercepted_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_intercepted_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(21, [21, 22])
(13:12) (Shotgun) J.Allen pass deep middle intended for D.Harty INTERCEPTED by J.Whitehead at NYJ 4
J.Whitehead to NYJ 4 for no gain (D.Harty).

(52, [53])
(4:16) (Shotgun) J.Allen pass deep middle intended for S.Diggs INTERCEPTED by J.Whitehead [Q.Williams] at NYJ -1
Touchback.

(64, [66, 67])
(9:49) (Shotgun) J.Allen pass short right intended for G.Davis INTERCEPTED by J.Whitehead at NYJ 43
J.Whitehead ran ob at NYJ 43 for no gain.

(102, [107, 108])
(3:17) Z.Wilson pass short middle intended for R.Cobb INTERCEPTED by M.Milano at NYJ 48
M.Milano to NYJ 35 for 13 yards (Z.Wilson)
PENALTY on BUF-M.Milano, Taunting, 15 yards, enforced at NYJ 35.

(219, [231, 232])
(12:53) (Shotgun) J.Fields pass short middle intended for D.Mooney INTERCEPTED by Q.Walker [K.Clark] at CHI 37
Q.Walker for 37 yards, TOUCHDOWN
PENALTY on GB-R.Douglas, Unsportsmanlike Conduct, 15 yards, enforced between downs.

(388, [415, 416])
(5:09) (Shotgun) A.Richardson pass deep left intended for M.Alie-C

In [120]:
# All interceptions resulting in a touchdown

df_unclean_intercepted_touchdown_plays = week1_2023_plays.loc[(week1_2023_plays['PlayDescription'].str.contains('INTERCEPTED', case=False)) &
                                                              (week1_2023_plays['PlayOutcome'].str.contains('touchdown', case=False))]

dict_unclean_to_clean_intercepted_touchdown_plays = unclean_clean_matches(df_unclean_intercepted_touchdown_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_intercepted_touchdown_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_intercepted_touchdown_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(219, [231, 232])
(12:53) (Shotgun) J.Fields pass short middle intended for D.Mooney INTERCEPTED by Q.Walker [K.Clark] at CHI 37
Q.Walker for 37 yards, TOUCHDOWN
PENALTY on GB-R.Douglas, Unsportsmanlike Conduct, 15 yards, enforced between downs.

(777, [821, 822])
(11:04) (Shotgun) P.Mahomes pass short right intended for K.Toney INTERCEPTED by B.Branch at 50
B.Branch for 50 yards, TOUCHDOWN.

(840, [886, 887])
(2:30) (Shotgun) D.Jones pass short left intended for S.Barkley INTERCEPTED by D.Bland (T.Diggs) at NYG 22
D.Bland for 22 yards, TOUCHDOWN.

(2052, [2150, 2151])
(5:12) (Shotgun) M.Jones pass short right intended for K.Bourne INTERCEPTED by D.Slay at PHI 30
D.Slay for 70 yards, TOUCHDOWN.



## Sacked Plays

In [50]:
df_unclean_sacked_plays = week1_2023_plays.loc[(week1_2023_plays['PlayOutcome'].str.contains('Sack', case=False))]

dict_unclean_to_clean_sacked_plays = unclean_clean_matches(df_unclean_sacked_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_sacked_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_sacked_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(15, [15])
(5:10) (No Huddle, Shotgun) J.Allen sacked at NYJ 39 for -9 yards (J.Franklin-Myers).

(19, [19])
(14:54) (Shotgun) J.Allen sacked at BUF 27 for -2 yards (A.Woods).

(40, [41])
(:40) (Shotgun) J.Allen sacked at NYJ 23 for -3 yards (Q.Jefferson).

(51, [52])
(5:00) (Shotgun) J.Allen sacked at NYJ 41 for -3 yards (Q.Jefferson).

(59, [61])
(13:10) (No Huddle, Shotgun) J.Allen sacked at BUF 23 for -2 yards (J.Johnson).

(81, [84])
(11:40) (Shotgun) A.Rodgers sacked at NYJ 33 for -10 yards (L.Floyd)
NYJ-A.Rodgers was injured during the play
He is Out.

(92, [96])
(:16) (Shotgun) Z.Wilson sacked at NYJ 32 for -12 yards (J.Phillips).

(125, [132])
(15:00) (Shotgun) Z.Wilson sacked at NYJ 31 for -1 yards (sack split by L.Floyd and E.Oliver).

(187, [197])
(:37) (Shotgun) J.Love sacked at CHI 34 for -8 yards (Y.Ngakoue).

(260, [277])
(9:45) (Shotgun) J.Fields sacked at GB 11 for -7 yards (L.Van Ness).

(275, [294])
(10:19) J.Fields sacked at CHI 14 for -11 yards (D.Wyatt).

(296, [

In [111]:
# All sacked plays resulting in a touchdown

df_unclean_sacked_touchdown_plays = week1_2023_plays.loc[(week1_2023_plays['PlayOutcome'].str.contains('touchdown', case=False)) &
                                                         (week1_2023_plays['PlayDescription'].str.contains('sack', case=False))]

dict_unclean_to_clean_sacked_touchdown_plays = unclean_clean_matches(df_unclean_sacked_touchdown_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_sacked_touchdown_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_sacked_touchdown_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(375, [398, 399, 400])
(2:41) (Shotgun) T.Lawrence sacked at JAX 28 for -8 yards (D.Buckner)
FUMBLES (D.Buckner) [D.Buckner], recovered by JAX-T.Bigsby at JAX 35
T.Bigsby to JAX 35 for no gain (Z.Franklin)
FUMBLES (Z.Franklin), RECOVERED by IND-D.Buckner at JAX 26
D.Buckner for 26 yards, TOUCHDOWN
The Replay Official reviewed the score ruling, and the play was Upheld
The ruling on the field stands.

(2475, [2595, 2596])
(1:02) (Shotgun) S.Howell sacked at WAS 12 for -14 yards (D.Gardeck)
FUMBLES (D.Gardeck) [D.Gardeck], RECOVERED by ARI-C.Thomas at WAS 2
C.Thomas for 2 yards, TOUCHDOWN.



## Punt Plays

In [51]:
df_unclean_punt_plays = week1_2023_plays.loc[week1_2023_plays['PlayDescription'].str.contains('punts', case=False)]

dict_unclean_to_clean_punt_plays = unclean_clean_matches(df_unclean_punt_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_punt_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_punt_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(6, [6])
(12:37) S.Martin punts 46 yards to NYJ 12, Center-R.Ferguson, fair catch by X.Gipson.

(56, [57, 58])
(:42) S.Martin punts 53 yards to NYJ 24, Center-R.Ferguson
X.Gipson pushed ob at NYJ 25 for 1 yard (S.Neal).

(84, [87])
(9:33) T.Morstead punts 31 yards to BUF 23, Center-T.Hennessy, out of bounds.

(93, [97])
(15:00) T.Morstead punts 39 yards to BUF 29, Center-T.Hennessy, fair catch by D.Harty.

(122, [128, 129])
(2:21) T.Morstead punts 50 yards to BUF 11, Center-T.Hennessy
D.Harty ran ob at BUF 15 for 4 yards (J.Sherwood).

(126, [133])
(14:19) T.Morstead punts 54 yards to BUF 15, Center-T.Hennessy, fair catch by D.Harty.

(152, [159, 160])
(9:21) S.Martin punts 42 yards to NYJ 35, Center-R.Ferguson
X.Gipson for 65 yards, TOUCHDOWN.

(169, [177, 178])
(:42) D.Whelan punts 42 yards to CHI 29, Center-M.Orzech
T.Taylor to CHI 37 for 8 yards (I.Gaines).

(174, [184])
(7:26) D.Whelan punts 68 yards to end zone, Center-M.Orzech, Touchback.

(182, [192])
(2:19) D.Whelan punts 42 y

In [93]:
# All punt return touchdown plays

df_unclean_punt_touchdown_plays = week1_2023_plays.loc[week1_2023_plays['PlayDescription'].str.contains('punts', case=False) &
                                             week1_2023_plays['PlayDescription'].str.contains('touchdown', case=False)]

dict_unclean_to_clean_punt_touchdown_plays = unclean_clean_matches(df_unclean_punt_touchdown_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_punt_touchdown_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_punt_touchdown_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(152, [159, 160])
(9:21) S.Martin punts 42 yards to NYJ 35, Center-R.Ferguson
X.Gipson for 65 yards, TOUCHDOWN.



## Kickoffs

In [52]:
# All kickoff plays

df_unclean_kickoff_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('kickoff', case=False)]

dict_unclean_to_clean_kickoff_plays = unclean_clean_matches(df_unclean_kickoff_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_kickoff_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_kickoff_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(0, [0])
G.Zuerlein kicks 65 yards from NYJ 35 to end zone, Touchback.

(22, [23])
G.Zuerlein kicks 65 yards from NYJ 35 to end zone, Touchback.

(44, [45])
G.Zuerlein kicks 65 yards from NYJ 35 to end zone, Touchback.

(65, [68])
G.Zuerlein kicks 65 yards from NYJ 35 to end zone, Touchback.

(67, [70])
G.Zuerlein kicks 65 yards from NYJ 35 to end zone, Touchback.

(85, [88, 89])
T.Bass kicks 61 yards from BUF 35 to NYJ 4
X.Gipson pushed ob at NYJ 22 for 18 yards (T.Rapp).

(99, [103, 104])
T.Bass kicks 67 yards from BUF 35 to NYJ -2
X.Gipson to NYJ 26 for 28 yards (D.Jackson).

(103, [109])
T.Bass kicks 65 yards from BUF 35 to end zone, Touchback.

(105, [111])
T.Bass kicks 65 yards from BUF 35 to end zone, Touchback.

(145, [152])
T.Bass kicks 65 yards from BUF 35 to end zone, Touchback.

(147, [154])
G.Zuerlein kicks 65 yards from NYJ 35 to end zone, Touchback.

(165, [173])
C.Santos kicks 65 yards from CHI 35 to end zone, Touchback.

(170, [179, 180])
C.Santos kicks 69 yards from C

In [53]:
# All onside kicks

df_unclean_kickoff_onside_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('kickoff', case=False) &
                                                       week1_2023_plays['PlayDescription'].str.contains('onside', case=False)]

dict_unclean_to_clean_kickoff_onside_plays = unclean_clean_matches(df_unclean_kickoff_onside_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_kickoff_onside_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_kickoff_onside_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(228, [242, 243])
C.Santos kicks onside 9 yards from CHI 35 to CHI 44
J.Reed (didn't try to advance) to CHI 44 for no gain.

(1297, [1368])
W.Lutz kicks onside 9 yards from DEN 35 to DEN 44, downed by DEN-T.Smith
PENALTY on DEN-T.Smith, Illegal Touch Kick, 0 yards, enforced at DEN 44.



## Touchdown plays

In [54]:
# All touchdown plays

df_unclean_touchdown_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('touchdown', case=False)]

dict_unclean_to_clean_touchdown_plays = unclean_clean_matches(df_unclean_touchdown_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_touchdown_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_touchdown_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(33, [34])
(4:51) (Shotgun) J.Allen pass short right to S.Diggs for 5 yards, TOUCHDOWN.

(134, [141])
(4:58) Z.Wilson pass short left to G.Wilson for 3 yards, TOUCHDOWN.

(152, [159, 160])
(9:21) S.Martin punts 42 yards to NYJ 35, Center-R.Ferguson
X.Gipson for 65 yards, TOUCHDOWN.

(163, [171])
(6:14) (Shotgun) J.Love pass short middle to R.Doubs for 8 yards, TOUCHDOWN.

(197, [207])
(10:23) (Shotgun) A.Jones right guard for 1 yard, TOUCHDOWN.

(202, [212])
(6:34) (Shotgun) J.Love pass short middle to A.Jones for 35 yards, TOUCHDOWN
GB-A.Jones was injured during the play
His return is Questionable.

(214, [226])
(13:34) J.Love pass short left to R.Doubs for 4 yards, TOUCHDOWN.

(219, [231, 232])
(12:53) (Shotgun) J.Fields pass short middle intended for D.Mooney INTERCEPTED by Q.Walker [K.Clark] at CHI 37
Q.Walker for 37 yards, TOUCHDOWN
PENALTY on GB-R.Douglas, Unsportsmanlike Conduct, 15 yards, enforced between downs.

(289, [310])
(1:04) (Shotgun) J.Fields pass deep right to D.Moone

## Field goals

In [191]:
# All field goal plays

dict_unclean_to_clean_field_goal_plays = unclean_clean_matches(df_2023_fieldgoal_week1, df_week1_plays_cleaned)

# Number of field goal plays
print(f"{len(dict_unclean_to_clean_field_goal_plays)} number of field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_field_goal_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_field_goal_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

65 number of field goal plays



(18, [18])
(3:09) T.Bass 40 yard field goal is GOOD, Center-R.Ferguson, Holder-S.Martin.

(43, [44])
(:20) T.Bass 34 yard field goal is GOOD, Center-R.Ferguson, Holder-S.Martin.

(77, [80])
(:06) T.Bass 50 yard field goal is GOOD, Center-R.Ferguson, Holder-S.Martin.

(98, [102])
(10:39) G.Zuerlein 26 yard field goal is GOOD, Center-T.Hennessy, Holder-T.Morstead.

(117, [123])
(9:03) G.Zuerlein 43 yard field goal is GOOD, Center-T.Hennessy, Holder-T.Morstead.

(144, [151])
(1:51) G.Zuerlein 30 yard field goal is GOOD, Center-T.Hennessy, Holder-T.Morstead
BUF-M.Hyde was injured during the play.

(188, [198])
(:04) A.Carlson 52 yard field goal is GOOD, Center-M.Orzech, Holder-D.Whelan.

(250, [267])
(2:08) C.Santos 47 yard field goal is GOOD, Center-P.Scales, Holder-T.Gill.

(262, [279])
(9:04) C.Santos 29 yard field goal is GOOD, Center-P.Scales, Holder-T.Gill.

(458, [490])
(8:35) B.McManus 45 yard field goal is GOOD, Center-R.Matiscik, Holder-L.Cooke.



In [56]:
# All field goal plays (good)

made_field_goal_play_indexes = []

for i in list(df_2023_fieldgoal_week1.index):
  made_field_goal = re.findall(field_goal_good_pattern, week1_2023_plays['PlayDescription'].iloc[i])
  if len(made_field_goal) > 0:
    made_field_goal_play_indexes.append(i)

dict_unclean_to_clean_good_field_goals = unclean_clean_matches(week1_2023_plays.iloc[made_field_goal_play_indexes], df_week1_plays_cleaned)

# Number of field goal plays
print(f"{len(dict_unclean_to_clean_good_field_goals)} number of good field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_good_field_goals.keys():
  print(f"({i}, {dict_unclean_to_clean_good_field_goals.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

59 number of good field goal plays



(18, [18])
(3:09) T.Bass 40 yard field goal is GOOD, Center-R.Ferguson, Holder-S.Martin.

(43, [44])
(:20) T.Bass 34 yard field goal is GOOD, Center-R.Ferguson, Holder-S.Martin.

(77, [80])
(:06) T.Bass 50 yard field goal is GOOD, Center-R.Ferguson, Holder-S.Martin.

(98, [102])
(10:39) G.Zuerlein 26 yard field goal is GOOD, Center-T.Hennessy, Holder-T.Morstead.

(117, [123])
(9:03) G.Zuerlein 43 yard field goal is GOOD, Center-T.Hennessy, Holder-T.Morstead.

(144, [151])
(1:51) G.Zuerlein 30 yard field goal is GOOD, Center-T.Hennessy, Holder-T.Morstead
BUF-M.Hyde was injured during the play.

(188, [198])
(:04) A.Carlson 52 yard field goal is GOOD, Center-M.Orzech, Holder-D.Whelan.

(250, [267])
(2:08) C.Santos 47 yard field goal is GOOD, Center-P.Scales, Holder-T.Gill.

(262, [279])
(9:04) C.Santos 29 yard field goal is GOOD, Center-P.Scales, Holder-T.Gill.

(458, [490])
(8:35) B.McManus 45 yard field goal is GOOD, Center-R.Matiscik, Holder-L.Coo

In [57]:
# All field goal plays (no good)

no_good_field_goal_play_indexes = []

for i in list(df_2023_fieldgoal_week1.index):
  made_field_goal = re.findall(field_goal_no_good_pattern, week1_2023_plays['PlayDescription'].iloc[i])
  if len(made_field_goal) > 0:
    no_good_field_goal_play_indexes.append(i)

dict_unclean_to_clean_no_good_field_goals = unclean_clean_matches(week1_2023_plays.iloc[no_good_field_goal_play_indexes], df_week1_plays_cleaned)

# Number of field goal plays
print(f"{len(dict_unclean_to_clean_no_good_field_goals)} number of no good field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_no_good_field_goals.keys():
  print(f"({i}, {dict_unclean_to_clean_no_good_field_goals.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

5 number of no good field goal plays



(541, [577])
(6:57) E.McPherson 51 yard field goal is No Good, Wide Right, Center-C.Adomitis, Holder-B.Robbins.

(928, [978])
(1:10) G.Gano 36 yard field goal is No Good, Wide Left, Center-C.Kreiter, Holder-J.Gillan.

(1419, [1492])
(9:36) W.Lutz 55 yard field goal is No Good, Wide Right, Center-M.Fraboni, Holder-R.Dixon.

(2155, [2259])
(:05) B.Maher 56 yard field goal is No Good, Wide Right, Center-A.Ward, Holder-E.Evans.

(2253, [2358])
(:36) J.Myers 39 yard field goal is No Good, Hit Right Upright, Center-C.Stoll, Holder-M.Dickson.



In [58]:
# All field goal plays (special)

special_field_goal_play_indexes = []

special_field_goal_play_indexes = list(df_2023_fieldgoal_week1.index)

for i in made_field_goal_play_indexes:
  special_field_goal_play_indexes.pop(special_field_goal_play_indexes.index(i))

for i in no_good_field_goal_play_indexes:
  special_field_goal_play_indexes.pop(special_field_goal_play_indexes.index(i))


dict_unclean_to_clean_special_field_goals = unclean_clean_matches(week1_2023_plays.iloc[special_field_goal_play_indexes], df_week1_plays_cleaned)

# Number of field goal plays
print(f"{len(dict_unclean_to_clean_special_field_goals)} number of special field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_special_field_goals.keys():
  print(f"({i}, {dict_unclean_to_clean_special_field_goals.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

1 number of special field goal plays



(2149, [2253])
(1:54) B.Maher 57 yard field goal is BLOCKED (Ja.Reed), Center-A.Ward, Holder-E.Evans, RECOVERED by SEA-M.Jackson at SEA 48
M.Jackson to LA 42 for 10 yards (T.Anchrum).



## Extra Points

In [207]:
# All extra point plays

dict_unclean_to_clean_extrapoint = unclean_clean_matches(df_2023_extrapoint_week1, df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_extrapoint)} number of field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_extrapoint.keys():
  print(f"({i}, {dict_unclean_to_clean_extrapoint.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

56 number of field goal plays



(34, [35])
T.Bass extra point is GOOD, Center-R.Ferguson, Holder-S.Martin.

(136, [143])
G.Zuerlein extra point is GOOD, Center-T.Hennessy, Holder-T.Morstead.

(164, [172])
A.Carlson extra point is GOOD, Center-M.Orzech, Holder-D.Whelan.

(198, [208])
A.Carlson extra point is GOOD, Center-M.Orzech, Holder-D.Whelan.

(203, [213])
A.Carlson extra point is GOOD, Center-M.Orzech, Holder-D.Whelan.

(215, [227])
A.Carlson extra point is GOOD, Center-M.Orzech, Holder-D.Whelan.

(340, [362])
M.Gay extra point is GOOD, Center-L.Rhodes, Holder-R.Sanchez.

(364, [387])
M.Gay extra point is GOOD, Center-L.Rhodes, Holder-R.Sanchez.

(420, [448])
B.McManus extra point is GOOD, Center-R.Matiscik, Holder-L.Cooke.

(437, [467])
B.McManus extra point is GOOD, Center-R.Matiscik, Holder-L.Cooke.

(482, [514])
B.McManus extra point is GOOD, Center-R.Matiscik, Holder-L.Cooke.

(486, [518])
B.McManus extra point is GOOD, Center-R.Matiscik, Holder-L.Cooke.

(611, [650])
D.Hopk

In [73]:
# All extra point plays (good)

extra_point_good_index_list = []

for i in list(df_2023_extrapoint_week1.index):
  made_extra_point = re.findall(extra_point_good_pattern, week1_2023_plays['PlayDescription'].iloc[i])
  if len(made_extra_point) > 0:
    extra_point_good_index_list.append(i)

dict_unclean_to_clean_extrapoint_good = unclean_clean_matches(week1_2023_plays.iloc[extra_point_good_index_list], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_extrapoint_good)} number of field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_extrapoint_good.keys():
  print(f"({i}, {dict_unclean_to_clean_extrapoint_good.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

53 number of field goal plays



(34, [35])
T.Bass extra point is GOOD, Center-R.Ferguson, Holder-S.Martin.

(136, [143])
G.Zuerlein extra point is GOOD, Center-T.Hennessy, Holder-T.Morstead.

(164, [172])
A.Carlson extra point is GOOD, Center-M.Orzech, Holder-D.Whelan.

(198, [208])
A.Carlson extra point is GOOD, Center-M.Orzech, Holder-D.Whelan.

(203, [213])
A.Carlson extra point is GOOD, Center-M.Orzech, Holder-D.Whelan.

(215, [227])
A.Carlson extra point is GOOD, Center-M.Orzech, Holder-D.Whelan.

(340, [362])
M.Gay extra point is GOOD, Center-L.Rhodes, Holder-R.Sanchez.

(364, [387])
M.Gay extra point is GOOD, Center-L.Rhodes, Holder-R.Sanchez.

(420, [448])
B.McManus extra point is GOOD, Center-R.Matiscik, Holder-L.Cooke.

(437, [467])
B.McManus extra point is GOOD, Center-R.Matiscik, Holder-L.Cooke.

(482, [514])
B.McManus extra point is GOOD, Center-R.Matiscik, Holder-L.Cooke.

(486, [518])
B.McManus extra point is GOOD, Center-R.Matiscik, Holder-L.Cooke.

(611, [650])
D.Hopk

In [74]:
# All extra point plays (no good)

extra_point_no_good_index_list = []

for i in list(df_2023_extrapoint_week1.index):
  no_good_extra_point = re.findall(extra_point_no_good_pattern, week1_2023_plays['PlayDescription'].iloc[i])
  if len(no_good_extra_point) > 0:
    extra_point_no_good_index_list.append(i)

dict_unclean_to_clean_extrapoint_no_good = unclean_clean_matches(week1_2023_plays.iloc[extra_point_no_good_index_list], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_extrapoint_no_good)} number of field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_extrapoint_no_good.keys():
  print(f"({i}, {dict_unclean_to_clean_extrapoint_no_good.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

3 number of field goal plays



(1383, [1455])
W.Lutz extra point is No Good, Wide Right, Center-M.Fraboni, Holder-R.Dixon.

(1938, [2032])
J.Sanders extra point is No Good, Wide Right, Center-B.Ferguson, Holder-J.Bailey.

(2058, [2157])
J.Elliott extra point is No Good, Wide Right, Center-R.Lovato, Holder-A.Siposs.



In [75]:
# Blocked extra point?

## Penalties

In [59]:
# What is the difference between these penalties and penalties in other play outcomes?

# All plays with "penalty" outcomes

df_unclean_penalty_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('penalty', case=False)]

dict_unclean_to_clean_penalty_plays = unclean_clean_matches(df_unclean_penalty_plays, df_week1_plays_cleaned)

# Number of penalty plays
print(f"{len(df_unclean_penalty_plays)} number of penalty plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

141 number of penalty plays



(7, [7])
(9:24) (Shotgun) PENALTY on BUF-D.Knox, False Start, 5 yards, enforced at BUF 23 - No Play.

(71, [74])
(:39) (No Huddle, Shotgun) J.Allen pass short right to S.Diggs ran ob at NYJ 39 for 8 yards
PENALTY on BUF-S.Diggs, Offensive Pass Interference, 10 yards, enforced at NYJ 47 - No Play.

(80, [83])
(11:46) (Shotgun) A.Rodgers pass incomplete short right to T.Conklin [G.Rousseau]
PENALTY on BUF-T.Bernard, Defensive Holding, 5 yards, enforced at NYJ 38 - No Play.

(87, [91])
(2:58) (Shotgun) PENALTY on NYJ-L.Tomlinson, False Start, 5 yards, enforced at NYJ 22 - No Play.

(135, [142])
(Kick formation) PENALTY on NYJ, Delay of Game, 5 yards, enforced at BUF 15 - No Play.

(148, [155])
(10:00) (Shotgun) PENALTY on BUF-S.Brown, False Start, 5 yards, enforced at BUF 25 - No Play.

(181, [191])
(2:28) (Shotgun) PENALTY on GB, Delay of Game, 5 yards, enforced at CHI 43 - No Play.

(211, [222])
(15:00) (Shotgun) PENALTY on GB-D.Wicks, False Start, 5 yards

In [60]:
# All passing plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_passing_plays = []

for idx, play in df_unclean_penalty_plays['PlayDescription'].items():
  passing_play = re.findall(receiver_pattern, play)
  if len(passing_play) > 0 or play.find('pass incomplete') != -1:
    list_unclean_penalty_passing_plays.append(idx)

# Dataframe of all passing plays with "penalty" outcomes
df_unclean_penalty_passing_plays = week1_2023_plays.iloc[list_unclean_penalty_passing_plays]

dict_unclean_to_clean_penalty_passing_plays = unclean_clean_matches(df_unclean_penalty_passing_plays, df_week1_plays_cleaned)

# Number of passing penalty plays
print(f"{len(list_unclean_penalty_passing_plays)} number of passing penalty plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_passing_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_passing_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

62 number of passing penalty plays



(71, [74])
(:39) (No Huddle, Shotgun) J.Allen pass short right to S.Diggs ran ob at NYJ 39 for 8 yards
PENALTY on BUF-S.Diggs, Offensive Pass Interference, 10 yards, enforced at NYJ 47 - No Play.

(80, [83])
(11:46) (Shotgun) A.Rodgers pass incomplete short right to T.Conklin [G.Rousseau]
PENALTY on BUF-T.Bernard, Defensive Holding, 5 yards, enforced at NYJ 38 - No Play.

(358, [381])
(13:38) (No Huddle, Shotgun) A.Richardson pass short right to E.Hull to JAX 47 for no gain (D.Lloyd)
PENALTY on JAX-D.Lloyd, Unnecessary Roughness, 15 yards, enforced at JAX 47 - No Play.

(374, [397])
(3:10) T.Lawrence pass deep right to C.Ridley to IND 30 for 24 yards (K.Moore)
PENALTY on JAX-W.Little, Offensive Holding, 10 yards, enforced at JAX 46 - No Play.

(383, [409])
(11:31) (Shotgun) A.Richardson pass short middle to M.Pittman to IND 49 for 4 yards (Dari.Williams)
PENALTY on JAX-T.Walker, Defensive Offside, 5 yards, enforced at IND 45 - No Play.

(522, [558]

In [61]:
# All rushing plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_rushing_plays = []

for idx, play in df_unclean_penalty_plays['PlayDescription'].items():
  passing_play = re.findall(rusher_pattern, play)
  if len(passing_play) > 0 or play.find('Aborted') != -1:
    list_unclean_penalty_rushing_plays.append(idx)

# Dataframe of all passing plays with "penalty" outcomes
df_unclean_penalty_rushing_plays = week1_2023_plays.iloc[list_unclean_penalty_rushing_plays]

dict_unclean_to_clean_penalty_rushing_plays = unclean_clean_matches(df_unclean_penalty_rushing_plays, df_week1_plays_cleaned)

# Number of rushing penalty plays
print(f"{len(list_unclean_penalty_rushing_plays)} number of passing penalty plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_rushing_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_rushing_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

22 number of passing penalty plays



(264, [281])
(6:37) (Shotgun) J.Fields scrambles left end pushed ob at CHI 26 for 6 yards (D.Campbell)
PENALTY on CHI-B.Jones, Offensive Holding, 10 yards, enforced at CHI 20 - No Play.

(376, [401])
(1:03) (Shotgun) D.Jackson right end pushed ob at IND 43 for 16 yards (A.Cisco)
PENALTY on IND-M.Pittman, Offensive Holding, 10 yards, enforced at IND 27 - No Play.

(619, [658])
(11:05) N.Harris reported in as eligible
 N.Chubb left guard to CIN 41 for 1 yard (D.Reader)
PENALTY on CLE, Illegal Formation, 5 yards, enforced at CIN 42 - No Play.

(954, [1006])
(7:33) (Shotgun) D.Jones scrambles right end pushed ob at NYG 35 for 4 yards (C.Golston)
PENALTY on NYG-B.Bredeson, Offensive Holding, 10 yards, enforced at NYG 31 - No Play.

(964, [1017])
(1:10) (Shotgun) T.Taylor scrambles right end to NYG 38 for 6 yards (D.Bland)
PENALTY on NYG-J.Ezeudu, Offensive Holding, 10 yards, enforced at NYG 32 - No Play.

(983, [1039])
(3:10) (Shotgun) L.Jackson left ta

In [62]:
# All "False Start" or "Delay of Game" plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_false_start_plays = []

for idx, play in df_unclean_penalty_plays['PlayDescription'].items():
  if play.find('False Start') != -1 or play.find('Delay of Game') != -1:
    list_unclean_penalty_false_start_plays.append(idx)

dict_unclean_to_clean_penalty_false_start_plays = unclean_clean_matches(week1_2023_plays.iloc[list_unclean_penalty_false_start_plays], df_week1_plays_cleaned)

# Number of rushing penalty plays
print(f"{len(list_unclean_penalty_false_start_plays)} number of false start plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_false_start_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_false_start_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

41 number of false start plays



(7, [7])
(9:24) (Shotgun) PENALTY on BUF-D.Knox, False Start, 5 yards, enforced at BUF 23 - No Play.

(87, [91])
(2:58) (Shotgun) PENALTY on NYJ-L.Tomlinson, False Start, 5 yards, enforced at NYJ 22 - No Play.

(135, [142])
(Kick formation) PENALTY on NYJ, Delay of Game, 5 yards, enforced at BUF 15 - No Play.

(148, [155])
(10:00) (Shotgun) PENALTY on BUF-S.Brown, False Start, 5 yards, enforced at BUF 25 - No Play.

(181, [191])
(2:28) (Shotgun) PENALTY on GB, Delay of Game, 5 yards, enforced at CHI 43 - No Play.

(211, [222])
(15:00) (Shotgun) PENALTY on GB-D.Wicks, False Start, 5 yards, enforced at GB 46 - No Play.

(245, [262])
(4:00) (No Huddle, Shotgun) PENALTY on CHI-B.Jones, False Start, 5 yards, enforced at GB 31 - No Play.

(246, [263])
(3:39) (Shotgun) PENALTY on CHI-B.Jones, False Start, 5 yards, enforced at GB 36 - No Play.

(312, [334])
(Pass formation) PENALTY on CHI-D.Wright, False Start, 5 yards, enforced at GB 2 - No Play.

(347, [369]

In [63]:
# All sacked plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_sacked_plays = []

for idx, play in df_unclean_penalty_plays['PlayDescription'].items():
  if play.find('sacked') != -1:
    list_unclean_penalty_sacked_plays.append(idx)

dict_unclean_to_clean_penalty_sacked_plays = unclean_clean_matches(week1_2023_plays.iloc[list_unclean_penalty_sacked_plays], df_week1_plays_cleaned)

# Number of rushing penalty plays
print(f"{len(list_unclean_penalty_sacked_plays)} number of false start plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_sacked_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_sacked_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

5 number of false start plays



(285, [306])
(2:36) (No Huddle) J.Fields sacked at CHI 25 for -9 yards (R.Gary)
PENALTY on GB-K.Enagbare, Defensive Offside, 5 yards, enforced at CHI 34 - No Play
Penalty on GB-K.Enagbare, Defensive Too Many Men on Field, declined.

(955, [1007])
(7:11) (Shotgun) D.Jones sacked at NYG 16 for -5 yards (C.Golston)
PENALTY on DAL-M.Bell, Defensive Holding, 5 yards, enforced at NYG 21 - No Play.

(960, [1012])
(4:50) (No Huddle, Shotgun) D.Jones sacked at NYG 27 for -6 yards (S.Williams)
PENALTY on DAL-S.Williams, Defensive Offside, 5 yards, enforced at NYG 33 - No Play.

(2056, [2155])
(3:06) (Shotgun) J.Hurts sacked at NE 14 for -4 yards (D.Wise)
PENALTY on NE-K.Dugger, Defensive Holding, 5 yards, enforced at NE 10 - No Play.

(2528, [2650])
(7:35) (Shotgun) S.Howell sacked at WAS 9 for 0 yards (C.Thomas)
PENALTY on ARI-K.White, Lowering the Head to Make Forcible Contact, 15 yards, enforced at WAS 9 - No Play.



In [64]:
# All TWO-POINT CONVERSION ATTEMPT plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_2pt_plays = []

for idx, play in df_unclean_penalty_plays['PlayDescription'].items():
  if play.find('TWO-POINT CONVERSION ATTEMPT') != -1:
    list_unclean_penalty_2pt_plays.append(idx)

dict_unclean_to_clean_penalty_2pt_plays = unclean_clean_matches(week1_2023_plays.iloc[list_unclean_penalty_2pt_plays], df_week1_plays_cleaned)

# Number of rushing penalty plays
print(f"{len(list_unclean_penalty_2pt_plays)} number of false start plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_2pt_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_2pt_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

1 number of false start plays



(2013, [2110])
TWO-POINT CONVERSION ATTEMPT
M.Jones rushes left end
ATTEMPT SUCCEEDS
PENALTY on NE-C.Anderson, Offensive Holding, 10 yards, enforced at PHI 2 - No Play.



In [65]:
# All special plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_special_plays = list(df_unclean_penalty_plays.index)

for i in list_unclean_penalty_passing_plays:
  list_unclean_penalty_special_plays.pop(list_unclean_penalty_special_plays.index(i))

for i in list_unclean_penalty_rushing_plays:
  list_unclean_penalty_special_plays.pop(list_unclean_penalty_special_plays.index(i))

for i in list_unclean_penalty_false_start_plays:
  list_unclean_penalty_special_plays.pop(list_unclean_penalty_special_plays.index(i))

for i in list_unclean_penalty_sacked_plays:
  list_unclean_penalty_special_plays.pop(list_unclean_penalty_special_plays.index(i))

for i in list_unclean_penalty_2pt_plays:
  list_unclean_penalty_special_plays.pop(list_unclean_penalty_special_plays.index(i))

dict_unclean_to_clean_penalty_special_plays = unclean_clean_matches(week1_2023_plays.iloc[list_unclean_penalty_special_plays], df_week1_plays_cleaned)

# Number of rushing penalty plays
print(f"{len(list_unclean_penalty_special_plays)} number of passing penalty plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_special_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_special_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

10 number of passing penalty plays



(1011, [1068])
(Kick formation) PENALTY on HOU-M.Collins, Defensive Offside, 1 yard, enforced at HOU 15 - No Play.

(1322, [1393])
(3:23) (Shotgun) PENALTY on DEN-F.Clark, Neutral Zone Infraction, 5 yards, enforced at LV 12 - No Play.

(1389, [1461])
(8:30) (No Huddle) PENALTY on LV-J.Tillery, Neutral Zone Infraction, 5 yards, enforced at DEN 45 - No Play.

(1458, [1531])
(13:33) (No Huddle, Shotgun) PENALTY on CAR, Defensive Too Many Men on Field, 5 yards, enforced at ATL 15 - No Play.

(1628, [1714])
(2:00) (Shotgun) PENALTY on MIN-P.Jones, Neutral Zone Infraction, 5 yards, enforced at TB 42 - No Play.

(1817, [1909])
(9:20) PENALTY on MIA, Defensive Too Many Men on Field, 0 yards, enforced at MIA 1 - No Play.

(1970, [2066])
(1:02) (No Huddle, Shotgun) PENALTY on PHI-J.Sweat, Neutral Zone Infraction, 5 yards, enforced at PHI 39 - No Play.

(2205, [2310])
(2:22) PENALTY on SEA-Ja.Reed, Encroachment, 5 yards, enforced at SEA 43 - No Play.

(2208, 

## Fumbles

In [66]:
# All fumbled plays

df_unclean_fumble_plays = week1_2023_plays.loc[week1_2023_plays['PlayOutcome'].str.contains('fumble', case=False)]

dict_unclean_to_clean_fumble_plays = unclean_clean_matches(df_unclean_fumble_plays, df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_fumble_plays)} number of fumbled plays")
print("\n\n")
for i in dict_unclean_to_clean_fumble_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_fumble_plays.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

20 number of fumbled plays



(66, [69])
(4:55) (Shotgun) J.Allen FUMBLES (Aborted) at BUF 21, and recovers at BUF 21
J.Allen to BUF 25 for 4 yards (M.Clemons)
FUMBLES (M.Clemons), RECOVERED by NYJ-Q.Williams at BUF 27.

(282, [303])
(5:08) (Shotgun) J.Fields sacked at CHI 18 for 0 yards (sack split by K.Clark and D.Wyatt)
FUMBLES (K.Clark) [D.Wyatt], RECOVERED by GB-R.Douglas at CHI 28
R.Douglas to CHI 28 for no gain (D.Moore)
PENALTY on GB-D.Campbell, Unnecessary Roughness, 15 yards, enforced at CHI 28.

(353, [376])
(4:19) (Shotgun) A.Richardson pass short middle to D.Jackson to JAX 35 for 6 yards (A.Cisco, F.Oluokun)
FUMBLES (A.Cisco), RECOVERED by JAX-A.Blackson at JAX 32.

(373, [396])
(3:19) D.Jackson right end to JAX 44 for -2 yards (Ty.Campbell)
FUMBLES (Ty.Campbell), touched at JAX 44, RECOVERED by JAX-D.Lloyd at JAX 46.

(577, [615])
(3:44) (Shotgun) D.Watson to CIN 25 for -4 yards
FUMBLES, and recovers at CIN 25
D.Watson to CIN 24 for 1 yard
Handoff to J.Ford to CIN 16 for 

In [67]:
# All passing fumble plays

index_fumble_pass_plays = []

for i in list(df_unclean_fumble_plays.index):
  fumble_pass = re.findall(receiver_pattern, week1_2023_plays['PlayDescription'].iloc[i])
  if len(fumble_pass) > 0:
    index_fumble_pass_plays.append(i)

dict_unclean_to_clean_fumble_pass = unclean_clean_matches(week1_2023_plays.iloc[index_fumble_pass_plays], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_fumble_pass)} number of passing fumble plays")
print("\n\n")
for i in dict_unclean_to_clean_fumble_pass.keys():
  print(f"({i}, {dict_unclean_to_clean_fumble_pass.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

4 number of passing fumble plays



(353, [376])
(4:19) (Shotgun) A.Richardson pass short middle to D.Jackson to JAX 35 for 6 yards (A.Cisco, F.Oluokun)
FUMBLES (A.Cisco), RECOVERED by JAX-A.Blackson at JAX 32.

(758, [801])
(5:21) (Shotgun) J.Goff pass short right to M.Jones to KC 14 for 3 yards (T.McDuffie)
FUMBLES (T.McDuffie), RECOVERED by KC-B.Cook at KC 7.

(950, [1002])
(11:31) (Shotgun) D.Jones pass short left to I.Hodgins to NYG 49 for 24 yards (T.Diggs)
FUMBLES (T.Diggs), touched at DAL 49, RECOVERED by DAL-I.Mukuamu at DAL 40
I.Mukuamu to DAL 37 for -3 yards (S.Shepard).

(1942, [2036])
(4:56) (Shotgun) M.Jones pass short left to E.Elliott to NE 25 for no gain (J.Davis)
FUMBLES (J.Davis), RECOVERED by PHI-Z.Cunningham at NE 26.



In [68]:
# All rushing fumble plays

index_fumble_run_plays = []

for i in list(df_unclean_fumble_plays.index):
  fumble_pass = re.findall(rusher_pattern, week1_2023_plays['PlayDescription'].iloc[i])
  if len(fumble_pass) > 0:
    index_fumble_run_plays.append(i)

dict_unclean_to_clean_fumble_run = unclean_clean_matches(week1_2023_plays.iloc[index_fumble_run_plays], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_fumble_run)} number of rushing fumble plays")
print("\n\n")
for i in dict_unclean_to_clean_fumble_run.keys():
  print(f"({i}, {dict_unclean_to_clean_fumble_run.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

4 number of rushing fumble plays



(373, [396])
(3:19) D.Jackson right end to JAX 44 for -2 yards (Ty.Campbell)
FUMBLES (Ty.Campbell), touched at JAX 44, RECOVERED by JAX-D.Lloyd at JAX 46.

(1566, [1646])
(2:29) (Shotgun) M.Sanders right guard to ATL 39 for 10 yards (R.Grant; D.Alford)
Atlanta challenged the runner was down by contact ruling, and the play was REVERSED
(Shotgun) M.Sanders right guard to ATL 39 for 10 yards (D.Alford, J.Bates)
FUMBLES (J.Bates), RECOVERED by ATL-L.Carter at ATL 39.

(2110, [2213])
(3:35) (Shotgun) J.Hurts left tackle to PHI 35 for 8 yards (J.Peppers)
FUMBLES (J.Peppers), RECOVERED by NE-Ma.Jones at PHI 41.

(2550, [2673])
(8:55) (Shotgun) A.Gibson up the middle to ARI 16 for 3 yards (V.Dimukeje; K.White)
FUMBLES (V.Dimukeje), RECOVERED by ARI-Z.Collins at ARI 16.



In [69]:
# All sacked fumble plays

index_fumble_sacked_plays = []

for i in list(df_unclean_fumble_plays.index):
  if week1_2023_plays['PlayDescription'].iloc[i].find('sacked') != -1:
    index_fumble_sacked_plays.append(i)

dict_unclean_to_clean_sacked_fumble = unclean_clean_matches(week1_2023_plays.iloc[index_fumble_sacked_plays], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_sacked_fumble)} number of sacked fumble plays")
print("\n\n")
for i in dict_unclean_to_clean_sacked_fumble.keys():
  print(f"({i}, {dict_unclean_to_clean_sacked_fumble.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

6 number of sacked fumble plays



(282, [303])
(5:08) (Shotgun) J.Fields sacked at CHI 18 for 0 yards (sack split by K.Clark and D.Wyatt)
FUMBLES (K.Clark) [D.Wyatt], RECOVERED by GB-R.Douglas at CHI 28
R.Douglas to CHI 28 for no gain (D.Moore)
PENALTY on GB-D.Campbell, Unnecessary Roughness, 15 yards, enforced at CHI 28.

(1023, [1080])
(:31) (No Huddle, Shotgun) L.Jackson sacked at BAL 26 for -5 yards
FUMBLES, touched at BAL 30, RECOVERED by HOU-M.Stewart at BAL 32.

(1119, [1176])
(9:09) (Shotgun) C.Stroud sacked at HOU 38 for -9 yards (D.Ojabo)
FUMBLES (D.Ojabo) [D.Ojabo], RECOVERED by BAL-M.Pierce at HOU 42.

(1698, [1785])
(5:04) (Shotgun) K.Cousins sacked at MIN 21 for -9 yards (A.Winfield)
FUMBLES (A.Winfield) [A.Winfield], touched at MIN 20, RECOVERED by TB-A.Winfield at MIN 18.

(2341, [2451])
(5:42) (Shotgun) B.Purdy sacked at PIT 46 for -8 yards (T.Watt)
FUMBLES (T.Watt) [T.Watt], RECOVERED by PIT-T.Watt at PIT 46
PENALTY on SF-S.Burford, Face Mask, 15 yards, enforced at P

In [70]:
# All Aborted fumbled plays

# week1_2023_plays['PlayOutcome'].str.contains('fumble', case=False)

index_fumble_aborted_plays = []

for i in list(df_unclean_fumble_plays.index):
  if week1_2023_plays['PlayDescription'].iloc[i].find('Aborted') != -1:
    index_fumble_aborted_plays.append(i)

dict_unclean_to_clean_aborted_fumble = unclean_clean_matches(week1_2023_plays.iloc[index_fumble_aborted_plays], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_aborted_fumble)} number of aborted fumble plays")
print("\n\n")
for i in dict_unclean_to_clean_aborted_fumble.keys():
  print(f"({i}, {dict_unclean_to_clean_aborted_fumble.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

4 number of aborted fumble plays



(66, [69])
(4:55) (Shotgun) J.Allen FUMBLES (Aborted) at BUF 21, and recovers at BUF 21
J.Allen to BUF 25 for 4 yards (M.Clemons)
FUMBLES (M.Clemons), RECOVERED by NYJ-Q.Williams at BUF 27.

(1695, [1782])
(8:23) K.Cousins FUMBLES (Aborted) at TB 26, touched at TB 27, RECOVERED by TB-J.Tryon at TB 25.

(1864, [1956])
(12:11) T.Tagovailoa FUMBLES (Aborted) at LAC 3, RECOVERED by LAC-N.Williams at LAC 6.

(2509, [2631])
(4:45) J.Dobbs FUMBLES (Aborted) at ARI 38, and recovers at ARI 37
J.Dobbs to ARI 37 for no gain (M.Sweat)
FUMBLES (M.Sweat), touched at ARI 38, RECOVERED by WAS-A.Anderson at ARI 37
PENALTY on ARI-P.Johnson, Unsportsmanlike Conduct, 15 yards, enforced at ARI 37.



In [71]:
# All special fumbled plays

index_fumble_special_plays = list(df_unclean_fumble_plays.index)

for i in index_fumble_pass_plays:
  index_fumble_special_plays.pop(index_fumble_special_plays.index(i))

for i in index_fumble_run_plays:
  index_fumble_special_plays.pop(index_fumble_special_plays.index(i))

for i in index_fumble_sacked_plays:
  index_fumble_special_plays.pop(index_fumble_special_plays.index(i))

for i in index_fumble_aborted_plays:
  index_fumble_special_plays.pop(index_fumble_special_plays.index(i))

dict_unclean_to_clean_fumble_special = unclean_clean_matches(week1_2023_plays.iloc[index_fumble_special_plays], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_fumble_special)} number of special fumble plays")
print("\n\n")
for i in dict_unclean_to_clean_fumble_special.keys():
  print(f"({i}, {dict_unclean_to_clean_fumble_special.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

2 number of special fumble plays



(577, [615])
(3:44) (Shotgun) D.Watson to CIN 25 for -4 yards
FUMBLES, and recovers at CIN 25
D.Watson to CIN 24 for 1 yard
Handoff to J.Ford to CIN 16 for 8 yards (G.Pratt; N.Scott)
FUMBLES (G.Pratt), RECOVERED by CIN-C.Awuzie at CIN 13
C.Awuzie to CIN 13 for no gain (E.Moore).

(1135, [1192])
N.Folk kicks 63 yards from TEN 35 to NO 2
R.Shaheed to NO 21 for 19 yards (A.Hooker)
FUMBLES (A.Hooker), ball out of bounds at NO 24
PENALTY on NO-A.Prentice, Offensive Holding, 10 yards, enforced at NO 20
Tennessee challenged the ball was out of bounds ruling, and the play was REVERSED
N.Folk kicks 63 yards from TEN 35 to NO 2
R.Shaheed to NO 21 for 19 yards (A.Hooker)
FUMBLES (A.Hooker), RECOVERED by TEN-A.Hooker at NO 24
Penalty on NO-A.Prentice, Offensive Holding, declined.



## Turnover On Downs

In [76]:
# All turnover on downs

dict_unclean_to_clean_turnover_on_downs = unclean_clean_matches(df_2023_turnover_on_downs_week1, df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_turnover_on_downs)} number of field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_turnover_on_downs.keys():
  print(f"({i}, {dict_unclean_to_clean_turnover_on_downs.get(i)})")
  play = week1_2023_plays['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

27 number of field goal plays



(233, [248])
(1:10) P.Taylor left end to CHI 25 for 5 yards (J.Sanborn)
Penalty on GB-J.Deguara, Offensive Holding, declined.

(240, [256])
(11:39) J.Fields up the middle to CHI 40 for no gain (P.Smith)
Official measurement

(344, [366])
(11:58) (No Huddle) A.Richardson up the middle to JAX 16 for no gain (R.Robertson-Harris).

(349, [371])
(8:55) (Shotgun) A.Richardson pass incomplete short right to M.Pittman (A.Gotsis).

(406, [434])
(:52) (Shotgun) G.Minshew pass incomplete short right to M.Pittman (F.Oluokun).

(462, [494])
(4:29) (Shotgun) T.Lawrence pass short left to E.Engram to IND 49 for no gain (E.Speed).

(471, [503])
(13:01) B.Hance reported in as eligible
 T.Lawrence up the middle to IND 48 for no gain (Z.Franklin).

(553, [589])
(10:32) (Shotgun) J.Burrow sacked at CIN 18 for -13 yards (M.Garrett).

(727, [768])
(2:09) (Shotgun) P.Mahomes pass incomplete deep right to Ju.Watson [J.Cominsky].

(770, [814])
(:03) (Shotgun) J.Goff pass incomp

# Index searching

In [77]:
# week1_2023_plays['PlayDescription'].iloc[34]
week1_2023_plays.iloc[0]

Unnamed: 0,0
Season,2023
Week,Week 1
Day,MON
Date,09/11
AwayTeam,Bills
HomeTeam,Jets
Quarter,1ST QUARTER
DriveNumber,1
TeamWithPossession,BUF
IsScoringDrive,0


In [208]:
df_week1_plays_cleaned.iloc[2441]

Unnamed: 0,2441
Season,2023
Week,Week 1
Day,SUN
Date,09/10
AwayTeam,49ers
HomeTeam,Steelers
Quarter,3RD QUARTER
DriveNumber,1
TeamWithPossession,SF
IsScoringDrive,1


# cleaned dataset observations

## Home and Away teams (Week 1, 2023)

In [79]:
# Season 2023 Week 1 schedule

df_2023_week1_schedule = df_week1_plays_cleaned[['HomeTeam', 'AwayTeam', 'Season', 'Date', 'Day']].drop_duplicates().sort_values(by='Date').reset_index(drop=True)

df_2023_week1_schedule

Unnamed: 0,HomeTeam,AwayTeam,Season,Date,Day
0,Chiefs,Lions,2023,09/07,THU
1,Bears,Packers,2023,09/10,SUN
2,Colts,Jaguars,2023,09/10,SUN
3,Browns,Bengals,2023,09/10,SUN
4,Giants,Cowboys,2023,09/10,SUN
5,Ravens,Texans,2023,09/10,SUN
6,Saints,Titans,2023,09/10,SUN
7,Broncos,Raiders,2023,09/10,SUN
8,Falcons,Panthers,2023,09/10,SUN
9,Vikings,Buccaneers,2023,09/10,SUN


## Offense Stats

Passing Example
1. Top 10 players who threw the ball the most
2. All passing plays from a specified player
3. Total passing yards from the specified player
4. All receivers who caught a pass from specified player
5. Top target receiver from specified player
6. Top target receiver catching yards

In [80]:
# 1. Top 10 players who threw the ball the most

passers = df_week1_plays_cleaned['Passer'].loc[(df_week1_plays_cleaned['Season'] == 2023) &
                                                (df_week1_plays_cleaned['Week'] == 'Week 1')].value_counts().head(10)

passers

Unnamed: 0_level_0,count
Passer,Unnamed: 1_level_1
,1592
M.Jones,53
K.Pickett,49
J.Allen,46
C.Stroud,46
T.Tagovailoa,45
K.Cousins,45
J.Fields,41
B.Young,39
A.Richardson,39


In [81]:
# 2. All passing plays from a specified player

passer = 'C.Stroud'

df_passing_plays_by = df_week1_plays_cleaned.loc[(df_week1_plays_cleaned['Passer'] == passer)].sort_index()

df_passing_plays_by

Unnamed: 0,Season,Week,Day,Date,AwayTeam,HomeTeam,Quarter,DriveNumber,TeamWithPossession,IsScoringDrive,...,InjuredPlayers,AcceptedPenalty,DeclinedPenalty,Kicker,LongSnapper,Returner,DownedBy,Holder,BlockedBy,TackleBy1
1103,2023,Week 1,SUN,09/10,Texans,Ravens,1ST QUARTER,2,HOU,0,...,,,,,,,,,,
1104,2023,Week 1,SUN,09/10,Texans,Ravens,1ST QUARTER,2,HOU,0,...,,,,,,,,,,
1106,2023,Week 1,SUN,09/10,Texans,Ravens,1ST QUARTER,4,HOU,0,...,,,,,,,,,,
1109,2023,Week 1,SUN,09/10,Texans,Ravens,1ST QUARTER,4,HOU,0,...,,,,,,,,,,
1113,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,1,HOU,0,...,,,,,,,,,,
1114,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,1,HOU,0,...,,,,,,,,,,
1118,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1120,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1121,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1122,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,


In [82]:
df_positive_passing_yards = df_passing_plays_by.loc[df_passing_plays_by['Yardage'] > 0]

# df_positive_passing_yards['Yardage'].sum()
df_positive_passing_yards

Unnamed: 0,Season,Week,Day,Date,AwayTeam,HomeTeam,Quarter,DriveNumber,TeamWithPossession,IsScoringDrive,...,InjuredPlayers,AcceptedPenalty,DeclinedPenalty,Kicker,LongSnapper,Returner,DownedBy,Holder,BlockedBy,TackleBy1
1109,2023,Week 1,SUN,09/10,Texans,Ravens,1ST QUARTER,4,HOU,0,...,,,,,,,,,,
1118,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1121,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1122,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1124,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1125,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1126,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1134,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,5,HOU,1,...,,,,,,,,,,
1135,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,5,HOU,1,...,,,,,,,,,,
1141,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,5,HOU,1,...,,,,,,,,,,


In [83]:
# 3. Total passing yards from the specified player

total_passing_yards = df_passing_plays_by['Yardage'].sum()

total_passing_yards

212.0

In [84]:
# 4. All receivers who caught a pass from specified player

df_all_passing_targets = df_passing_plays_by['Receiver'].loc[(df_passing_plays_by['Receiver'] != 'nan')].value_counts()

df_all_passing_targets

Unnamed: 0_level_0,count
Receiver,Unnamed: 1_level_1
N.Collins,11
R.Woods,10
D.Schultz,4
N.Dell,4
M.Boone,4
D.Pierce,3
N.Brown,3
C.Stroud,1
T.Quitoriano,1
X.Hutchinson,1


In [85]:
# 5. Top target receiver from specified player

df_passers_top_target_plays = df_passing_plays_by.loc[df_passing_plays_by['Receiver'] == df_all_passing_targets.head(1).index.tolist()[0]]

df_passers_top_target_plays

Unnamed: 0,Season,Week,Day,Date,AwayTeam,HomeTeam,Quarter,DriveNumber,TeamWithPossession,IsScoringDrive,...,InjuredPlayers,AcceptedPenalty,DeclinedPenalty,Kicker,LongSnapper,Returner,DownedBy,Holder,BlockedBy,TackleBy1
1113,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,1,HOU,0,...,,,,,,,,,,
1120,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1125,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1126,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,3,HOU,1,...,,,,,,,,,,
1136,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,5,HOU,1,...,,,,,,,,,,
1140,2023,Week 1,SUN,09/10,Texans,Ravens,2ND QUARTER,5,HOU,1,...,,,,,,,,,,
1154,2023,Week 1,SUN,09/10,Texans,Ravens,3RD QUARTER,3,HOU,0,...,,,,,,,,,,
1159,2023,Week 1,SUN,09/10,Texans,Ravens,3RD QUARTER,5,HOU,0,...,,,,,,,,,,
1174,2023,Week 1,SUN,09/10,Texans,Ravens,4TH QUARTER,3,HOU,0,...,,,,,,,,,,
1175,2023,Week 1,SUN,09/10,Texans,Ravens,4TH QUARTER,3,HOU,0,...,[G.Fant],,,,,,,,,


In [86]:
# 6. Top target receiver catching yards

df_passers_top_target_plays['Yardage'].sum()

80.0

Rushing Example
1. All players who carried the ball from a specified team
2. All rushing plays from top rusher of a specified team
3. Total rushing yards from top rusher of a specified team


In [87]:
# 1. All players who carried the ball from a specified team
# - I need to map team names to their abbreviations in the future
#   - For right now 'Cowboys' == 'DAL'

team_abbreviation = 'DAL'

team_rushers = df_week1_plays_cleaned['Rusher'].loc[(df_week1_plays_cleaned['TeamWithPossession'] == team_abbreviation) &
                                                    (df_week1_plays_cleaned['Rusher'] != 'nan')].value_counts()

team_rushers

Unnamed: 0_level_0,count
Rusher,Unnamed: 1_level_1
T.Pollard,14
R.Dowdle,6
D.Vaughn,6
S.Barkley,4
D.Jones,3
K.Turpin,3
M.Breida,1
D.Bland,1
D.Prescott,1


In [88]:
# 2. All rushing plays from top rusher of a specified team

df_top_rushers_plays = df_week1_plays_cleaned.loc[df_week1_plays_cleaned['Rusher'] == team_rushers.head(1).index.tolist()[0]]

df_top_rushers_plays

Unnamed: 0,Season,Week,Day,Date,AwayTeam,HomeTeam,Quarter,DriveNumber,TeamWithPossession,IsScoringDrive,...,InjuredPlayers,AcceptedPenalty,DeclinedPenalty,Kicker,LongSnapper,Returner,DownedBy,Holder,BlockedBy,TackleBy1
875,2023,Week 1,SUN,09/10,Cowboys,Giants,1ST QUARTER,3,DAL,1,...,,,,,,,,,,
878,2023,Week 1,SUN,09/10,Cowboys,Giants,1ST QUARTER,3,DAL,1,...,,,,,,,,,,
888,2023,Week 1,SUN,09/10,Cowboys,Giants,2ND QUARTER,1,DAL,1,...,,,,,,,,,,
892,2023,Week 1,SUN,09/10,Cowboys,Giants,2ND QUARTER,1,DAL,1,...,,,,,,,,,,
904,2023,Week 1,SUN,09/10,Cowboys,Giants,2ND QUARTER,3,DAL,1,...,,,,,,,,,,
905,2023,Week 1,SUN,09/10,Cowboys,Giants,2ND QUARTER,3,DAL,1,...,,,,,,,,,,
909,2023,Week 1,SUN,09/10,Cowboys,Giants,2ND QUARTER,5,DAL,0,...,,,,,,,,,,
914,2023,Week 1,SUN,09/10,Cowboys,Giants,3RD QUARTER,1,DAL,1,...,,,,,,,,,,
917,2023,Week 1,SUN,09/10,Cowboys,Giants,3RD QUARTER,1,DAL,1,...,,,,,,,,,,
921,2023,Week 1,SUN,09/10,Cowboys,Giants,3RD QUARTER,1,DAL,1,...,,,,,,,,,,


In [89]:
# 3. Total rushing yards from top rusher of a specified team

df_top_rushers_plays['Yardage'].sum()

70.0

## Defense Stats

1. All defensive plays from a specified team
2. All solo tackles made form the specified team
3. All plays of the player with the most solo tackles

In [90]:
# 1. All defensive plays from a specified team

team_name = 'Jets'
team_abbreviation = 'NYJ'

df_all_game_plays = df_week1_plays_cleaned.loc[(df_week1_plays_cleaned['HomeTeam'] == team_name) |
                                                (df_week1_plays_cleaned['AwayTeam'] == team_name)]

df_all_defensive_plays = df_all_game_plays.loc[df_all_game_plays['TeamWithPossession'] != team_abbreviation]

df_all_defensive_plays

Unnamed: 0,Season,Week,Day,Date,AwayTeam,HomeTeam,Quarter,DriveNumber,TeamWithPossession,IsScoringDrive,...,InjuredPlayers,AcceptedPenalty,DeclinedPenalty,Kicker,LongSnapper,Returner,DownedBy,Holder,BlockedBy,TackleBy1
0,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,...,,,,G.Zuerlein,,,,,,
1,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,...,,,,,,,,,,
2,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,...,,,,,,,,,,
3,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,...,,,,,,,,,,
4,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76,2023,Week 1,MON,09/11,Bills,Jets,4TH QUARTER,6,BUF,1,...,,,,,,,,,,
77,2023,Week 1,MON,09/11,Bills,Jets,4TH QUARTER,6,BUF,1,...,,,,,,,,,,
78,2023,Week 1,MON,09/11,Bills,Jets,4TH QUARTER,6,BUF,1,...,,,,,,,,,,
79,2023,Week 1,MON,09/11,Bills,Jets,4TH QUARTER,6,BUF,1,...,,,,,,,,,,


In [91]:
# 2. All solo tackles made form the specified team

df_all_solo_tackles = df_all_defensive_plays['TackleBy1'].loc[(df_all_defensive_plays['TackleBy1'] != 'nan') &
                                                              (df_all_defensive_plays['TackleBy2'] == 'nan')].value_counts()

df_all_solo_tackles

KeyError: 'TackleBy2'

In [None]:
# 3. All plays of the player with the most solo tackles

df_player_with_most_tackles = df_week1_plays_cleaned.loc[(df_week1_plays_cleaned['TackleBy1'] == df_all_solo_tackles.head(1).index.tolist()[0]) &
                                                         (df_week1_plays_cleaned['TackleBy2'] == 'nan')]

df_player_with_most_tackles