<a href="https://colab.research.google.com/github/KeoniM/NFL_Data_Cleaning/blob/main/NFL_Plays_Week2_2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**PURPOSE:**
- Accurately clean a week's worth of play data
  - Season 2023 -> Week 2

**THOUGHTS, CONCERNS AND IDEAS FOR LATER:**

*General*

1. Players with the same name
  - I do think that the raw data has naming conventions to decipher between two players with the exact same name but not 100% sure.

2. Cleaning check (TESTING)
  - I need a method that will help decern whether these plays have been cleaned correctly. Currently I am manually checking but this is not sustainable or efficient.
    - **IDEA:** Cross reference recorded NFL stats with stats here and compare likeness. (maybe return a df that highlights differences?)
    - **PROBLEM:** There are differences one how different organizations(?) record stats for players. For example, NFL.com and espn.com are not the same on what they consider a solo tackle or an assisted tackle.
    - **PROBLEM 2:** NFL.com does not have every play recorded on their website. So far I have found that it is common for a game to miss at least 1 play.
      - What do I do about this?

3. Adjust features (PlayOutcomes/PlayTypes/IsScoringDrive/etc...) for plays that have been split up into multiple rows (Fumble Recoveries, Interceptions, etc...).
  - EXAMPLE: Any fumble recovery that is not the runningback on an intended running play should not count as rushing yards for the player who recovered the fumble.
    - IDEA: Should I broaden 'playtypes' to include:
      1. yardage after fumble (Currently have it as 'Run' playtype)
          - UPDATE: All recoveries after a fumble for yardage is under the umbrella playtype 'Fumble Return'
      2. yardage after interception (Currently have it as 'Interception')
          - UPDATE: All interceptions for yardage is under the umbrella playtype 'Run After Interception'
  - EXAMPLE: If a team throws an interception and that interception results in a touchdown for the opposing team, I do not think it should be considered as a 'scoring drive' for the team that threw the interception.
    - IDEA: For the category "isScoringDrive" the categories could be:
      1. 0 - Is not a scoring drive
      2. 1 - Is scoring drive for team on offense
      3. 2 - Is scoring drive for team on defense
      - ^^^ SHOULD IMPLEMENT ^^^

4. I need to add to 'dict_names'. The Rams have 2 different values for accronyms. they have 1. 'LAR' and 2. 'LA'.
  - This impacts a bit of code in here becaues now I will have to adjust for dictionaries to have possible list type values.

5. I am 1000000% sure that there are many ways to make this code more efficient, clean and easier to read.

*Offense*

1. Trick plays
  - Need a larger sample size that contains more trick plays

2. Handoffs
  - Need a larger sample size that contains more handoffs
    - (Only one has been found within the dataset "Season 2023 Week 1", it was handled for that specific play type but have not implement for all)
      - IDEA: Implement 'Handoff' into cleaning method for run plays. I believe the only time that you would be able to handoff the ball to someone is during a run play.

*Defense*

1. Nuance of players recorded for sacks & forced fumbles
  - Look under sack play type cleaning method
    - The formatting of multiple defending players in on a fumbled play may cause wrong recording of data (e.i. player who assisted in tackle may be credited for the forced fumble)

2. SUBJECTIVE DEFENSIVE STAT RECORDING
  - Depending on where you look for your defensive stats, their recordings may be different. For example, a solo tackle for one company recording stats may be an assisted tackle for another. (e.g. 'NFL.com' <-> 'espn.com')
    - In my opinion, I think there are times where both of them have errors in their stat tables.
      - EXAMPLE:
        - "(7:54) (Shotgun) D.Jones sacked at NYG 15 for -10 yards (M.Parsons)"
          - M.Parsons - awarded 1 sack & 1 TFL
        - "(5:27) (No Huddle, Shotgun) D.Jones sacked at NYG 34 for -8 yards (C.Golston)"
          - C.Golston - awarded 1 sack & 0 TFL
        - This is shown in both NFL.com & espn.com
  - Another example, an assisted tackle for loss may count as a TFL for that player by one stat crew but another may record that the player just had an assisted tackle.
  - I have made tables to try and mimic both 'NFL.com' and 'espn.com' but the dataset is flexable and can be adjusted towards preference.
    - ';' means solo and assisted tackle
    - ',' means 0.5 tackle
      - From what I have seen, the ordering matters when it comes to awarding defensive players a solo tackle vs. an assisted vs. awarding them anything at all.

3. Safety
  - I have not come across safeties yet.

# MOUNTING AND IMPORTS

In [None]:
# Mount your Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Used to access personal google cloud services
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


In [None]:
# Imports

# For math (currently only using to check for 'nan' or 'NaN' values)
import math

# Data manipulation
import pandas as pd

# Regular expressions
import re

# Grab data from database
from google.cloud import bigquery

In [None]:
# # debugger (maybe use in the future)
# %pdb on

# LOADING DATA (BigQuery queries)

In [None]:
# Client connect to bigquery project
client = bigquery.Client('nfl-data-430702')

## Season 2023 Week 2

In [None]:
# Grabbing all plays from 2023 Week 2 NFL Sesason
week2_2023_plays_query = """
                         SELECT *
                         FROM `nfl-data-430702.NFL_Scores.NFL-Plays-Week2_2023`
                         """

# Running psuedo query, and returns the amount of bytes it will take to run query
dry_run_config = bigquery.QueryJobConfig(dry_run=True)
dry_run_query = client.query(week2_2023_plays_query, job_config=dry_run_config)
print("This query will process {} bytes.".format(dry_run_query.total_bytes_processed))

# Running query (Being mindful of the amount of data being grabbed)
# Will grab a maximum of a Gigabyte
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**9)
safe_config_query = client.query(week2_2023_plays_query, job_config=safe_config)

This query will process 575639 bytes.


In [None]:
# Putting data attained from query into a dataframe
week2_2023_plays = safe_config_query.to_dataframe()

In [None]:
week2_2023_plays.head()

Unnamed: 0,Season,Week,Day,Date,AwayTeam,HomeTeam,Quarter,DriveNumber,TeamWithPossession,IsScoringDrive,PlayNumberInDrive,IsScoringPlay,PlayOutcome,PlayDescription,PlayStart
0,2023,Week 2,SUN,09/17,Giants,Cardinals,1ST QUARTER,1,ARI,0,7,0,Pass Incomplete,(12:21) (Shotgun) J.Dobbs pass incomplete shor...,3rd & 6 at NYG 37
1,2023,Week 2,SUN,09/17,Giants,Cardinals,1ST QUARTER,1,ARI,0,1,0,Kickoff,"G.Gano kicks 65 yards from NYG 35 to end zone,...",Kickoff from NYG 35
2,2023,Week 2,SUN,09/17,Giants,Cardinals,4TH QUARTER,1,ARI,0,2,0,17 Yard Pass,(3:27) (Shotgun) J.Dobbs pass short middle to ...,1st & 10 at ARI 25
3,2023,Week 2,SUN,09/17,Giants,Cardinals,1ST QUARTER,1,ARI,0,6,0,Pass Incomplete,(12:27) J.Dobbs pass incomplete deep left to Z...,2nd & 6 at NYG 37
4,2023,Week 2,SUN,09/17,Giants,Cardinals,4TH QUARTER,1,ARI,0,7,0,-1 Yard Run,(:38) J.Conner right tackle pushed ob at NYG 4...,1st & 10 at NYG 43


In [None]:
# Noting the original size of the raw uncleaned dataframe of data
# - (rows, columns)
week2_2023_plays.shape

(2634, 15)

# CATEGORIZE PLAYS
- The goal here is to parse out the different values for 'PlayOutcome'
  - This is where I will separate different types of plays
    - ( pass / run / kickoff / etc. )

In [None]:
# Maybe try to fuzzywuzzy this in the future?
# - I need to narrow these down into basic categories.
# - (Take away numbers & "Yard")
# - Find the most common words between all outcomes (hoping to get all categories e.i. 'Pass', 'Run', 'Touchdown', etc...)

# All play outcomes from the game
# - From here we can categorize and clean plays accordingly
week2_2023_plays['PlayOutcome'].unique()

array(['Pass Incomplete', 'Kickoff', '17 Yard Pass', '-1 Yard Run',
       'Touchdown Cardinals', '16 Yard Pass', '17 Yard Run', '1 Yard Run',
       '5 Yard Run', 'Punt', '-1 Yard Pass', '1 Yard Pass', '4 Yard Run',
       'Field Goal No Good', 'Extra Point Good', '5 Yard Pass',
       '14 Yard Pass', '12 Yard Pass', '23 Yard Pass', '7 Yard Run',
       '5 Yard Penalty', 'Touchdown Falcons', '-5 Yard Penalty',
       '4 Yard Pass', '45 Yard Pass', 'Run for No Gain', '20 Yard Pass',
       '3 Yard Run', '8 Yard Run', '-3 Yard Pass', '20 Yard Run',
       '2 Yard Run', 'Touchdown Ravens', '8 Yard Pass', '9 Yard Pass',
       '7 Yard Pass', '-10 Yard Penalty', '6 Yard Run', '25 Yard Penalty',
       '-7 Yard Sack', '3 Yard Pass', '10 Yard Pass', '14 Yard Run',
       'Field Goal', '19 Yard Pass', 'Touchdown Bills', '16 Yard Run',
       '11 Yard Run', '-2 Yard Pass', '11 Yard Pass', '-10 Yard Sack',
       '32 Yard Pass', 'Interception', 'Touchdown Bengals', '2 Yard Pass',
       'Touchd

In [None]:
# NOTES:
# - Currently, I am eyeing at all unique play outcomes to categorizing them.
#   - This type of approach is not flexable because a play outcome can
#     arise that has not been seen yet.
#     - There may be more play outcomes in the future when working on a full season,
#       let alone all seasons and future games

# Play Types with complete cleaning methods (As far as this sample size goes)

# ~ OFFENSE ~
df_2023_pass_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Pass')]
df_2023_run_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Run')]
# ~ DEFENSE ~
df_2023_interception_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Interception')]
df_2023_sack_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Sack')]
# ~ SPECIAL TEAMS ~
df_2023_punt_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Punt')]
df_2023_kickoff_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Kickoff')]
# ~ SCORING ~
df_2023_touchdown_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Touchdown')]
df_2023_extrapoint_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Extra Point')]
df_2023_fieldgoal_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Field Goal')]
df_2023_2pt_conversion_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('2PT Conversion')]
# ~ OTHER ~
df_2023_fumble_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Fumble')]
df_2023_penalty_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Penalty')]
df_2023_turnover_on_downs_week2 = week2_2023_plays[week2_2023_plays['PlayOutcome'].str.contains('Turnover on Downs')]


## SANITY CHECK (All Plays Accounted for)
  - Once all plays have been categorized, will compare the sum of all plays within each category to the size of the original dataframe of plays.
    - Goal is to make sure the number of plays is the same.

In [None]:
# Categorized plays

plays_list = [df_2023_pass_week2,         # Offense
              df_2023_run_week2,
              df_2023_interception_week2, # Defense
              df_2023_sack_week2,
              df_2023_punt_week2,         # Special Teams
              df_2023_kickoff_week2,
              df_2023_touchdown_week2,    # Scoring
              df_2023_extrapoint_week2,
              df_2023_fieldgoal_week2,
              df_2023_2pt_conversion_week2,
              df_2023_fumble_week2,       # Other
              df_2023_penalty_week2,
              df_2023_turnover_on_downs_week2]

num_plays_categorized = 0

for plays in plays_list:
  num_plays_categorized = num_plays_categorized + len(plays)

num_plays_categorized == len(week2_2023_plays)

True

# HELPER METHODS (personal use)
- For personal use, does not actually take part in cleaning dataset at all.

In [None]:
# PURPOSE:
# - Quick look at a section of plays
#   - Ideally the plays that the user wants to break down and clean.
# INPUT PARAMETERS:
# df_all_plays      - DataFrame - The original dataframe where the desired plays to view came from
# df_section_plays  - DataFrame - A section of the original dataframe the user wants to view
# RETURN:
# - Printing to the console:
#   1. index of play (Within original dataframe)
#   2. 'PlayDescription' feature of play
#   3. 'PlayOutcome' feature of play
def print_plays(df_all_plays, df_section_plays):
  for idx, value in df_section_plays['PlayOutcome'].items():
    play = df_all_plays['PlayDescription'].iloc[idx]
    print("index: " + str(idx))
    for i in play.split(". "):
      print(i)
    print(value)
    print()

In [None]:
# EXAMPLE: Displaying all touchdown plays within dataset

print_plays(week2_2023_plays, df_2023_touchdown_week2)

index: 6
(15:00) J.Dobbs scrambles up the middle for 23 yards, TOUCHDOWN.
Touchdown Cardinals

index: 34
(11:54) (Shotgun) D.Ridder left end for 6 yards, TOUCHDOWN.
Touchdown Falcons

index: 55
(7:15) D.Faalele reported in as eligible
 G.Edwards left guard for 1 yard, TOUCHDOWN.
Touchdown Ravens

index: 71
(11:42) (Shotgun) L.Jackson pass deep right to N.Agholor for 17 yards, TOUCHDOWN.
Touchdown Ravens

index: 101
(6:06) (Shotgun) J.Allen pass short right to G.Davis for 2 yards, TOUCHDOWN.
Touchdown Bills

index: 107
(12:16) D.Edwards reported in as eligible
 J.Allen pass short left to D.Knox for 2 yards, TOUCHDOWN.
Touchdown Bills

index: 142
(13:31) J.Stout punts 54 yards to CIN 19, Center-T.Ott
C.Jones for 81 yards, TOUCHDOWN.
Touchdown Bengals

index: 156
(14:09) (Shotgun) D.Watson pass short right to J.Ford for 3 yards, TOUCHDOWN.
Touchdown Browns

index: 188
(9:25) (Shotgun) D.Prescott pass short left to J.Ferguson for 4 yards, TOUCHDOWN.
Touchdown Cowboys

index: 240
(13:40) (S

# PIPELINE
  - ORDER
    1. Team Dictionary
      - Used to map team names with thier acronyms
    2. Regular expressions
      - Used to find common patterns within raw data
    3. Transforming Data
      - So far, only label encoding
    4. Cleaning methods
      - Unique cleaning methods for each play type
    5. Main pipeline method
      - Control flow of cleaning methods



## 1. TEAM DICTIONARY

In [None]:
# KEY: Team name
# VALUE: Acronym of team

dict_teams = {
    'Cardinals': 'ARI', 'Falcons': 'ATL', 'Ravens': 'BAL', 'Bills': 'BUF', 'Panthers': 'CAR', 'Bears': 'CHI',
    'Bengals': 'CIN', 'Browns': 'CLE', 'Cowboys': 'DAL', 'Broncos': 'DEN', 'Lions': 'DET', 'Packers': 'GB',
    'Texans': 'HOU', 'Colts': 'IND', 'Jaguars': 'JAX', 'Chiefs': 'KC', 'Raiders': 'LV', 'Chargers': 'LAC',
    'Rams': 'LAR', 'Dolphins': 'MIA', 'Vikings': 'MIN', 'Patriots': 'NE', 'Saints': 'NO', 'Giants': 'NYG',
    'Jets': 'NYJ', 'Eagles': 'PHI', 'Steelers': 'PIT', '49ers': 'SF', 'Seahawks': 'SEA', 'Buccaneers': 'TB',
    'Titans': 'TEN', 'Commanders': 'WAS'
}

In [None]:
# KEY: Acronym of team
# VALUE: Team name

dict_teams_2 = {
    'ARI': 'Cardinals', 'ATL': 'Falcons', 'BAL': 'Ravens', 'BUF': 'Bills', 'CAR': 'Panthers', 'CHI': 'Bears',
    'CIN': 'Bengals', 'CLE': 'Browns', 'DAL': 'Cowboys', 'DEN': 'Broncos', 'DET': 'Lions', 'GB': 'Packers',
    'HOU': 'Texans', 'IND': 'Colts', 'JAX': 'Jaguars', 'KC': 'Chiefs', 'LV': 'Raiders', 'LAC': 'Chargers',
    'LAR': 'Rams', 'MIA': 'Dolphins', 'MIN': 'Vikings', 'NE': 'Patriots', 'NO': 'Saints', 'NYG': 'Giants',
    'NYJ': 'Jets', 'PHI': 'Eagles', 'PIT': 'Steelers', 'SF': '49ers', 'SEA': 'Seahawks', 'TB': 'Buccaneers',
    'TEN': 'Titans', 'WAS': 'Commanders'
}

## 2. REGULAR EXPRESSIONS

In [None]:
# ####################################################
# # REGULAR EXPRESSIONS USED TO LOCATE SPECIFIC DATA #
# ####################################################

# # Will eventually have to combine some regular expressions into one
# # - For example, punt returns <-> kick returns <-> interceptions <-> fumble recoveries (?)
# #   - standard_play_end_pattern <-> defensive_takeaway_run_pattern
# #     - Need to combine into 1.

# ###########
# # GENERAL #
# ###########

# # Players name (Grabs every variation come across so far)
# # - I need this to be able to grab 'A.St. Brown' & 'C.Edwards-Helaire' & 'L.Van Ness'
# # - I can imagine that I will have to change this again in the future.
# #   - Specifically the 'compound surnames' part

# #                                   V  V <-> meant to grab initial of first name and compound surnames such as "St." in "A.St. Brown"
# #               V  team abr   V V 1 name abr  V    V last name V     VV <-> name separator ( - | . )                          V     V <-> last name 2
# #               V 1 name w/-  V V nam w/. nam V                          V          common words that follow name           V             V      V <-> does not end with..?
# #               V no team abr V
# name_pattern = "(?:[A-Za-z]+-)*(?:[A-Za-z]{1,4}\.)+(?:[A-Za-z]+)?(?:[- ](?!to|pushed|INTERCEPTED|scrambles|for|pass|ran|is|at)[A-Za-z]+)*[^\W\d_\.]" # <-- this seems extra.

# ################
# # PLAY DETAILS #
# ################

# # Play start time
# time_on_clock_pattern = r'\((\d*:\d+)\)'

# # Offense play formation
# formation = r'\(([A-Za-z]+ ?[A-Za-z]*,? ?[A-Za-z]*)\)'

# # Yards gained on play
# # - Will probably have to adjust this in the future to include 'no gain'
# yardage_gained = r'for (-?[0-9]+) yards?'
# # yardage_gained = r'for (no gain|-?[0-9]+)(?: yards?)?'
# # for (no gain|-?[0-9]+)(?: yards?)?

# # Officially, a pass for -3 yards.
# # Yards gained on play (When discrepancy)
# official_pass_yards_pattern = r'Officially, a pass for (-?[0-9]+) yards?'

# # 4th & 8 at 50
# # Positioning of the start of the play
# # I do not think that this actually grabs all play starting positions.
# play_start_pattern = "(?:1st|2nd|3rd|4th) & [0-9]+ at (?:([A-Z]+) )?([0-9]+)"

# # Positioning at the end of the play
# # - Probably needs to be able to grab 'no gain' at the end as well.
# # standard_play_end_pattern = "(?:to|at) (?:([A-Z]+) )?([0-9]+) for (-?[0-9]+) yards?"
# standard_play_end_pattern = "(?:to|at) (?:([A-Z]+) )?([0-9]+) for (no gain|-?[0-9]+)(?: yards?)?"

# # fumble recovery field spotting
# # - ball out of bounds            at BUF 25
# # - recovered by BAL-K.Zeitler    at HOU 23
# # - and recovers                  at TEN 9
# # - RECOVERED by JAX-D.Lloyd      at JAX 46
# fumble_recovery_spotting_pattern = f"(?:ball out of bounds|recovered by {name_pattern}|RECOVERED by {name_pattern}|and recovers) at ([A-Z]+)? ([0-9]+)"

# interception_play_end_pattern = f"INTERCEPTED by {name_pattern}(?: \[(?:{name_pattern})\]| \((?:{name_pattern})\))? at (?:([A-Z]+) )?([0-9]+)"

# # Yardage from penalty
# penalty_yardage_pattern = ", ([0-9]+) yards?, enforced at (?:([A-Z]+) )?([0-9]+)"

# between_downs_penalty_yardage_pattern = ", ([0-9]+) yards?, enforced between downs"

# ###########
# # OFFENSE #
# ###########

# # Passer (Player passing, Player spiking, Player who got sacked)
# passer_name_pattern = f"({name_pattern}) (?:pass|spiked|sacked)"

# # Rushing play (Player running ball)
# rusher_pattern = f"({name_pattern})(?: scrambles)? (?:left|right|up|kneels).?"

# # Pass play (Returns intended receiver and the direction of the pass)
# receiver_pattern = f"(short|deep) (left|right|middle) (?:to|intended for) ({name_pattern})"

# # 2 Point Conversion (Pass attempt)
# tp_conversion_pass_pattern = f"({name_pattern}) pass to ({name_pattern})"

# # 2 Point Conversion (Rush attempt)
# tp_conversion_rush_pattern = f"({name_pattern}) rushes (?:left|right|up)"

# # Handoff
# handoff_pattern = f"Handoff to ({name_pattern}) to(?: ([A-Z]+))? ([0-9]+) for (-?[0-9]+) yards?"

# # Lateral
# lateral_reception_pattern = f"Lateral to ({name_pattern}) (?:pushed ob at|ran ob at|to)(?: ([A-Z]+))? (-?[0-9]+) for (no gain|-?[0-9]+)(?: yards?)?"

# ###########
# # DEFENSE #
# ###########

# # Tackles

# # solo / sack
# solo_tackle_pattern = f"\(({name_pattern})\)"

# # assisted
# shared_tackle_pattern = f"\(({name_pattern}), ({name_pattern})\)"

# # shared
# assisted_tackle_pattern = f"\(({name_pattern}); ({name_pattern})\)"

# # Pressure (Who applied pressure to passer)
# # - I think it might be possible for multiple defenders to apply pressure to the passer.
# defense_pressure_name_pattern = f"\[({name_pattern})\]"

# # Interception (Player who intercepted pass)
# interception_name_pattern = f"INTERCEPTED by ({name_pattern})"


# # Quarterback Fumbles (Quarterback fumble solo, Quarterback fumble solo -> who recovers, Quarterback <-> Center discrepancy)

# # How far passer went before fumbling on his own
# qb_fumble_pattern = f" ({name_pattern}) to(?: [A-Z]+) [0-9]+ for -?[0-9]+ yards$" # Passer fumbles are always the initial action of the play

# # Action directly after a quarterback only fumble
# qb_fumble_description_pattern = f"^FUMBLES, "

# # Fumble missnap (Will either be the quarterback or center.)
# qb_aborted_fumble_pattern = f"({name_pattern}) FUMBLES \(Aborted\)"
# qb_center_aborted_fumble_pattern = f"({name_pattern}) Aborted"
# center_aborted_fumble_pattern = f"({name_pattern}) FUMBLES at"

# # Forced fumbles (Player who forced the fumble)
# forced_fumble_pattern = f"FUMBLES \(({name_pattern})\)"

# # Explicit forced fumble
# # Fumble Forced by IND-54-D.Odeyingbo
# explicit_forced_fumble_pattern = f"Fumble Forced by ({name_pattern})"


# # Who recovered the fumble
# fumble_recovery_pattern = f"by ({name_pattern}) at (?:([A-Z]+) )?([0-9]+)"

# # fumble touched (causing down..?)
# fumble_touch_pattern = f"touched at (?:([A-Z]+) )?([0-9]+)"

# # Sack (Who is credited with a sack, who split sack, how many yards was the sack)

# # Fumble from sack (Player who forced the fumble on a sack)
# sacked_forced_fumble_sentence = f"FUMBLES \({name_pattern}\) \[({name_pattern})\]"

# # Split sack (Players who equally received credit for sack)
# split_sack_pattern = f"sack split by ({name_pattern}) and ({name_pattern})"

# # Yardage of sack (starting from line of scrimmage)
# yardage_from_sack = r'sacked(?: ob)? at(?: [A-Z]+)? [0-9]+ for (-?[0-9]+) yards'

# # Defense takeaway (takeaway for yardage)
# # D.Hill pushed ob at 50 for 20 yards (J.Wills)
# # J.Bates to ATL 49 for no gain (T.Marshall)
# defensive_takeaway_run_pattern = f"({name_pattern}) (?:pushed ob at|ran ob at|to)(?: ([A-Z]+))? (-?[0-9]+) for (no gain|-?[0-9]+)(?: yards?)?" # yardage after fumble recovery & yardage after interception

# # Defense takeaway (takeaway for touchdown)
# touchdown_after_takeaway_pattern = f"({name_pattern}) for [0-9]+ yards, TOUCHDOWN" # touchdown after a fumble recovery or interception

# #################
# # SPECIAL TEAMS #
# #################

# # Punting play (Who was the punter, How many yards the ball went, Who was the Longsnapper)
# punting_pattern = f"({name_pattern}) punts (-?[0-9]+) yards? to(?: ([A-Z]+) (-?[0-9]+)| -?[0-9]+| end zone), Center-({name_pattern})"

# # Punt return (Who was returning the punt, How many yards did they go, The player(s) that tackled the returner)
# # punt_return_pattern = f"({name_pattern}) (?:pushed ob at|ran ob at|to)(?: ([A-Z]+))? ([0-9]+) for"
# punt_return_pattern = f"({name_pattern}) (?:pushed ob at|ran ob at|to)(?: ([A-Z]+))? ([0-9]+) for (no gain|-?[0-9]+)(?: yards?)?"

# # J.Reed (didn't try to advance) to CHI 44 for no gain.
# kick_return_pattern = f"({name_pattern})(?: \(didn't try to advance\))? (?:pushed ob at|ran ob at|to)(?: ([A-Z]+))? (-?[0-9]+) for (?:no gain|(-?[0-9]+) yards? \(({name_pattern})(?:(?:,|;) ({name_pattern}))?\))" # yardage after kickoff

# # Punt return resulting in fair catch
# punt_fair_catch_pattern = f", fair catch by ({name_pattern})"

# # Punt or kickoff downed by
# kick_downed_by_pattern = f", downed by ({name_pattern})"

# # Kickoff play (Who was the kicker, How many yards the ball was kicked )
# kickoff_pattern = f"({name_pattern}) kicks(?: onside)? (-?[0-9]+) yards from (?:([A-Z]+) )?([0-9]+) to(?: ([A-Z]+) (-?[0-9]+)| -?[0-9]+| end zone)"

# # Field goal (Good)
# field_goal_good_pattern = f"({name_pattern}) (-?[0-9]+) yard field goal is GOOD, Center-({name_pattern}), Holder-({name_pattern})."

# # Field goal (no good)
# field_goal_no_good_pattern = f"({name_pattern}) (-?[0-9]+) yard field goal is No Good, ([A-Za-z]+(?: [A-Za-z]+)*), Center-({name_pattern}), Holder-({name_pattern})."

# # Field goal (blocked)
# field_goal_blocked_pattern = f"({name_pattern}) (-?[0-9]+) yard field goal is BLOCKED \(({name_pattern})\), Center-({name_pattern}), Holder-({name_pattern}), RECOVERED by ({name_pattern})"

# # Extra point (good)
# extra_point_good_pattern = f"({name_pattern}) extra point is GOOD, Center-({name_pattern}), Holder-({name_pattern})."

# # Extra point (no good)
# extra_point_no_good_pattern = f"({name_pattern}) extra point is No Good, ([A-Za-z]+(?: [A-Za-z]+)*), Center-({name_pattern}), Holder-({name_pattern})."

# ##############
# #  INJURIES  #
# ##############

# # Injuries (Returns the player(s) who go injuried during play)
# injury_pattern = f"[A-Z]+-({name_pattern}) was injured during the play"

In [None]:
####################################################
# REGULAR EXPRESSIONS USED TO LOCATE SPECIFIC DATA #
####################################################

# Will eventually have to combine some regular expressions into one
# - For example, punt returns <-> kick returns <-> interceptions <-> fumble recoveries (?)
#   - standard_play_end_pattern <-> defensive_takeaway_run_pattern
#     - Need to combine into 1.

###########
# GENERAL #
###########

# Players name (Grabs every variation come across so far)
# - I need this to be able to grab 'A.St. Brown' & 'C.Edwards-Helaire' & 'L.Van Ness'
# - I can imagine that I will have to change this again in the future.
#   - Specifically the 'compound surnames' part

#                                   V  V <-> meant to grab initial of first name and compound surnames such as "St." in "A.St. Brown"
#               V  team abr   V V 1 name abr  V    V last name V     VV <-> name separator ( - | . )                          V     V <-> last name 2
#               V 1 name w/-  V V nam w/. nam V                          V          common words that follow name           V             V      V <-> does not end with..?
#               V no team abr V
name_pattern = "(?:[A-Za-z]+-)*(?:[A-Za-z]{1,4}\.)+(?:[A-Za-z]+)?(?:[- ](?!to|pushed|INTERCEPTED|scrambles|for|pass|ran|is|at)[A-Za-z]+)*[^\W\d_\.]" # <-- this seems extra.
# name_pattern = r"(?:[A-Za-z]+-)*(?:[A-Za-z]{1,4}\.)+(?:[A-Za-z]+)?(?:[- ](?!to|pushed|INTERCEPTED|scrambles|for|pass|ran|is|at)[A-Za-z]+)*[^\W\d_\.]" # <-- this seems extra.

################
# PLAY DETAILS #
################

# Play start time
time_on_clock_pattern = r'\((\d*:\d+)\)'

# Offense play formation
formation = r'\(([A-Za-z]+ ?[A-Za-z]*,? ?[A-Za-z]*)\)'

# Yards gained on play
# - Will probably have to adjust this in the future to include 'no gain'
yardage_gained = r'for (-?[0-9]+) yards?'
# yardage_gained = r'for (no gain|-?[0-9]+)(?: yards?)?'
# for (no gain|-?[0-9]+)(?: yards?)?

# Officially, a pass for -3 yards.
# Yards gained on play (When discrepancy)
official_pass_yards_pattern = r'Officially, a pass for (-?[0-9]+) yards?'

# 4th & 8 at 50
# Positioning of the start of the play
# I do not think that this actually grabs all play starting positions.
play_start_pattern = "(?:1st|2nd|3rd|4th) & [0-9]+ at (?:([A-Z]+) )?([0-9]+)"

# Positioning at the end of the play
# - Probably needs to be able to grab 'no gain' at the end as well.
# standard_play_end_pattern = "(?:to|at) (?:([A-Z]+) )?([0-9]+) for (-?[0-9]+) yards?"
standard_play_end_pattern = "(?:to|at) (?:([A-Z]+) )?([0-9]+) for (no gain|-?[0-9]+)(?: yards?)?"

# fumble recovery field spotting
# - ball out of bounds            at BUF 25
# - recovered by BAL-K.Zeitler    at HOU 23
# - and recovers                  at TEN 9
# - RECOVERED by JAX-D.Lloyd      at JAX 46
fumble_recovery_spotting_pattern = f"(?:ball out of bounds|recovered by {name_pattern}|RECOVERED by {name_pattern}|and recovers) at ([A-Z]+)? ([0-9]+)"

interception_play_end_pattern = f"INTERCEPTED by {name_pattern}(?: \[(?:{name_pattern})\]| \((?:{name_pattern})\))? at (?:([A-Z]+) )?([0-9]+)"

# Yardage from penalty
penalty_yardage_pattern = ", ([0-9]+) yards?, enforced at (?:([A-Z]+) )?([0-9]+)"

between_downs_penalty_yardage_pattern = ", ([0-9]+) yards?, enforced between downs"

###########
# OFFENSE #
###########

# Passer (Player passing, Player spiking, Player who got sacked)
passer_name_pattern = f"({name_pattern}) (?:pass|spiked|sacked)"

# Rushing play (Player running ball)
rusher_pattern = f"({name_pattern})(?: scrambles)? (?:left|right|up|kneels).?"

# Pass play (Returns intended receiver and the direction of the pass)
receiver_pattern = f"(short|deep) (left|right|middle) (?:to|intended for) ({name_pattern})"

# 2 Point Conversion (Pass attempt)
tp_conversion_pass_pattern = f"({name_pattern}) pass to ({name_pattern})"

# 2 Point Conversion (Rush attempt)
tp_conversion_rush_pattern = f"({name_pattern}) rushes (?:left|right|up)"

# Handoff
handoff_pattern = f"Handoff to ({name_pattern}) to(?: ([A-Z]+))? ([0-9]+) for (-?[0-9]+) yards?"

# Lateral
lateral_reception_pattern = f"Lateral to ({name_pattern}) (?:pushed ob at|ran ob at|to)(?: ([A-Z]+))? (-?[0-9]+) for (no gain|-?[0-9]+)(?: yards?)?"

###########
# DEFENSE #
###########

# Tackles

# solo / sack
solo_tackle_pattern = f"\(({name_pattern})\)"

# assisted
shared_tackle_pattern = f"\(({name_pattern}), ({name_pattern})\)"

# shared
assisted_tackle_pattern = f"\(({name_pattern}); ({name_pattern})\)"

# Pressure (Who applied pressure to passer)
# - I think it might be possible for multiple defenders to apply pressure to the passer.
defense_pressure_name_pattern = f"\[({name_pattern})\]"

# Interception (Player who intercepted pass)
interception_name_pattern = f"INTERCEPTED by ({name_pattern})"


# Quarterback Fumbles (Quarterback fumble solo, Quarterback fumble solo -> who recovers, Quarterback <-> Center discrepancy)

# How far passer went before fumbling on his own
qb_fumble_pattern = f" ({name_pattern}) to(?: [A-Z]+) [0-9]+ for -?[0-9]+ yards$" # Passer fumbles are always the initial action of the play

# Action directly after a quarterback only fumble
qb_fumble_description_pattern = f"^FUMBLES, "

# Fumble missnap (Will either be the quarterback or center.)
qb_aborted_fumble_pattern = f"({name_pattern}) FUMBLES \(Aborted\)"
qb_center_aborted_fumble_pattern = f"({name_pattern}) Aborted"
center_aborted_fumble_pattern = f"({name_pattern}) FUMBLES at"

# Forced fumbles (Player who forced the fumble)
forced_fumble_pattern = f"FUMBLES \(({name_pattern})\)"

# Explicit forced fumble
# Fumble Forced by IND-54-D.Odeyingbo
explicit_forced_fumble_pattern = f"Fumble Forced by ({name_pattern})"


# Who recovered the fumble
fumble_recovery_pattern = f"by ({name_pattern}) at (?:([A-Z]+) )?([0-9]+)"

# fumble touched (causing down..?)
fumble_touch_pattern = f"touched at (?:([A-Z]+) )?([0-9]+)"

# Sack (Who is credited with a sack, who split sack, how many yards was the sack)

# Fumble from sack (Player who forced the fumble on a sack)
sacked_forced_fumble_sentence = f"FUMBLES \({name_pattern}\) \[({name_pattern})\]"

# Split sack (Players who equally received credit for sack)
split_sack_pattern = f"sack split by ({name_pattern}) and ({name_pattern})"

# Yardage of sack (starting from line of scrimmage)
yardage_from_sack = r'sacked(?: ob)? at(?: [A-Z]+)? [0-9]+ for (-?[0-9]+) yards'

# Defense takeaway (takeaway for yardage)
# D.Hill pushed ob at 50 for 20 yards (J.Wills)
# J.Bates to ATL 49 for no gain (T.Marshall)
defensive_takeaway_run_pattern = f"({name_pattern}) (?:pushed ob at|ran ob at|to)(?: ([A-Z]+))? (-?[0-9]+) for (no gain|-?[0-9]+)(?: yards?)?" # yardage after fumble recovery & yardage after interception

# Defense takeaway (takeaway for touchdown)
touchdown_after_takeaway_pattern = f"({name_pattern}) for [0-9]+ yards, TOUCHDOWN" # touchdown after a fumble recovery or interception

#################
# SPECIAL TEAMS #
#################

# Punting play (Who was the punter, How many yards the ball went, Who was the Longsnapper)
punting_pattern = f"({name_pattern}) punts (-?[0-9]+) yards? to(?: ([A-Z]+) (-?[0-9]+)| -?[0-9]+| end zone), Center-({name_pattern})"

# Punt return (Who was returning the punt, How many yards did they go, The player(s) that tackled the returner)
# punt_return_pattern = f"({name_pattern}) (?:pushed ob at|ran ob at|to)(?: ([A-Z]+))? ([0-9]+) for"
punt_return_pattern = f"({name_pattern}) (?:pushed ob at|ran ob at|to)(?: ([A-Z]+))? ([0-9]+) for (no gain|-?[0-9]+)(?: yards?)?"

# J.Reed (didn't try to advance) to CHI 44 for no gain.
kick_return_pattern = f"({name_pattern})(?: \(didn't try to advance\))? (?:pushed ob at|ran ob at|to)(?: ([A-Z]+))? (-?[0-9]+) for (?:no gain|(-?[0-9]+) yards? \(({name_pattern})(?:(?:,|;) ({name_pattern}))?\))" # yardage after kickoff

# Punt return resulting in fair catch
punt_fair_catch_pattern = f", fair catch by ({name_pattern})"

# Punt or kickoff downed by
kick_downed_by_pattern = f", downed by ({name_pattern})"

# Kickoff play (Who was the kicker, How many yards the ball was kicked )
kickoff_pattern = f"({name_pattern}) kicks(?: onside)? (-?[0-9]+) yards from (?:([A-Z]+) )?([0-9]+) to(?: ([A-Z]+) (-?[0-9]+)| -?[0-9]+| end zone)"

# Field goal (Good)
field_goal_good_pattern = f"({name_pattern}) (-?[0-9]+) yard field goal is GOOD, Center-({name_pattern}), Holder-({name_pattern})."

# Field goal (no good)
field_goal_no_good_pattern = f"({name_pattern}) (-?[0-9]+) yard field goal is No Good, ([A-Za-z]+(?: [A-Za-z]+)*), Center-({name_pattern}), Holder-({name_pattern})."

# Field goal (blocked)
field_goal_blocked_pattern = f"({name_pattern}) (-?[0-9]+) yard field goal is BLOCKED \(({name_pattern})\), Center-({name_pattern}), Holder-({name_pattern}), RECOVERED by ({name_pattern})"

# Extra point (good)
extra_point_good_pattern = f"({name_pattern}) extra point is GOOD, Center-({name_pattern}), Holder-({name_pattern})."

# Extra point (no good)
extra_point_no_good_pattern = f"({name_pattern}) extra point is No Good, ([A-Za-z]+(?: [A-Za-z]+)*), Center-({name_pattern}), Holder-({name_pattern})."

##############
#  INJURIES  #
##############

# Injuries (Returns the player(s) who go injuried during play)
injury_pattern = f"[A-Z]+-({name_pattern}) was injured during the play"

  name_pattern = "(?:[A-Za-z]+-)*(?:[A-Za-z]{1,4}\.)+(?:[A-Za-z]+)?(?:[- ](?!to|pushed|INTERCEPTED|scrambles|for|pass|ran|is|at)[A-Za-z]+)*[^\W\d_\.]" # <-- this seems extra.
  interception_play_end_pattern = f"INTERCEPTED by {name_pattern}(?: \[(?:{name_pattern})\]| \((?:{name_pattern})\))? at (?:([A-Z]+) )?([0-9]+)"
  interception_play_end_pattern = f"INTERCEPTED by {name_pattern}(?: \[(?:{name_pattern})\]| \((?:{name_pattern})\))? at (?:([A-Z]+) )?([0-9]+)"
  interception_play_end_pattern = f"INTERCEPTED by {name_pattern}(?: \[(?:{name_pattern})\]| \((?:{name_pattern})\))? at (?:([A-Z]+) )?([0-9]+)"
  solo_tackle_pattern = f"\(({name_pattern})\)"
  solo_tackle_pattern = f"\(({name_pattern})\)"
  shared_tackle_pattern = f"\(({name_pattern}), ({name_pattern})\)"
  shared_tackle_pattern = f"\(({name_pattern}), ({name_pattern})\)"
  assisted_tackle_pattern = f"\(({name_pattern}); ({name_pattern})\)"
  assisted_tackle_pattern = f"\(({name_pattern}); ({name_pattern})\)"
  defense_pressure

## TRANSFORMING DATA

In [None]:
# Value mapping the "Quarter" feature
week2_2023_plays['Quarter'].unique()

array(['1ST QUARTER', '4TH QUARTER', '2ND QUARTER', '3RD QUARTER',
       'OVERTIME'], dtype=object)

In [None]:
week2_2023_plays_modified = week2_2023_plays.copy()

dict_replace_quarter = {'1ST QUARTER': 1, '2ND QUARTER': 2, '3RD QUARTER': 3, '4TH QUARTER': 4, 'OVERTIME': 5}

week2_2023_plays_modified['Quarter'] = week2_2023_plays_modified['Quarter'].map(dict_replace_quarter)

## 3. CLEANING METHODS

###HELPER CLEANING METHODS

#### helper method to find yardage between 2 spottings on field

In [None]:
# PURPOSE:
# - Method that will return the amount of yardage between 2 spottings on the field during a play
#   - IMPORTANT: The spotting needs to have yardage gained on play.
#   - This is needed for situations where there is something that occurs that affects the yardage
#     gained during a play, such as a penalty or fumble.

# INPUT PARAMETERS:
# start_spotting            -   list   - Where the start of the play or action took place
#                                        - EXAMPLE FORMAT: ('BUF', '20')
# end_spotting              -   list   - Where the end of the play or action took place
#                                        - EXAMPLE FORMAT: ('BUF', '30')
# yardage                   -  String  - Yardage recorded between start and end spotting
#                                        - EXAMPLE FORMAT: '10'

# RETURN:
# Yardage gained between start_spotting and end_spotting

def yardage_between_spottings(start_spotting, end_spotting, yardage):
  # Need to figure out which zone the player with the ball is in and which direction they are fighting for given:
  # 1. position yardage (+/-)
  #    - Take the starting position and ending position,
  #      is the difference between them positive or negative?
  #      - ( ending position - starting position = (+/-) )
  #        - EXAMPLE: BUF 20 -> BUF 30 is (+)
  # 2. play yardage (+/-)
  #    - Did the intended play gain yardage or lose yardage?

  # In what zone is the line of scrimmage located?
  # - There are 2 different zones that the line of scrimmage could be located.
  #   1. 0 yard - 49 yard (e.g 'BUF 25')
  #   2. 51 yard - 100 yard (e.g. 'KC 25')
  # ~ 3. 50 yard line (e.g. '50') (50 yard line does not have a team acronym attached to it)

  # Standard cases (start position and end position are in the same zone):
  # position yardage (+) & play yardage (+):
  # - the starting position team zone is the beginning (0-50)
  # position yardage (-) & play yardage (+):
  # - the starting position team zone is the ending (50-100)
  # position yardage (+) & play yardage (-):
  # - the starting position team zone is the ending (50-100)
  # position yardage (-) & play yardage (-):
  # - the starting position team zone is the beginning (0-50)

  # Unique cass (start position and ending position are in different zones):
  # zones switch (e.g. KC 47 -> BUF 47)
  # play yardage (+):
  # - the starting position team zone is the beginning (0-50)
  # play yardage (-):
  # - the starting position team zone is the ending (50-100)

  starting_territory = start_spotting[0]
  starting_yardage = int(start_spotting[1])
  ending_territory = end_spotting[0]
  ending_yardage = int(end_spotting[1])
  yardage = int(yardage)

  # Standard cases
  if (starting_territory == ending_territory):
    # position yardage (+)
    if starting_yardage < ending_yardage:
      # play yardage (+)
      # starting position 0-50 zone
      if yardage > 0:
        starting_position = starting_yardage
        ending_position = ending_yardage
      # play yardage (-)
      # starting position 50-100
      else:
        starting_position = 100 - starting_yardage
        ending_position = 100 - ending_yardage
    # position yardage (-)
    else:
      # play yardage (+)
      # starting position 50-100
      if yardage > 0:
        starting_position = 100 - starting_yardage
        ending_position = 100 - ending_yardage
      # play yardage (-)
      # starting position 0-50
      else:
        starting_position = starting_yardage
        ending_position = ending_yardage
  else:
    # play yardage (+)
    # starting position 0-50
    if yardage > 0:
      starting_position = starting_yardage
      ending_position = 100 - ending_yardage
    # play yardage (-)
    # starting position 50-100
    else:
      starting_position = 100 - starting_yardage
      ending_position = ending_yardage

  return int(ending_position - starting_position)

#### helper method for fumbles

In [None]:
# ~ HELPER METHOD ~

# PURPOSE:
# - Calculate the yardage gained on a play when a fumble has occured.
#   - On certain fumbled plays, the yardage gained will not be from start -> fumble spotting
#     it will be start -> fumble recovery.

# INPUT PARAMETERS:
# start_spotting            -   list   - Where the start of the play or action took place
#                                        - EXAMPLE FORMAT: [('BUF', '21')]
#                                        - Often the 'start' of the play is the line of scrimmage
# fumble_spotting           -   list   - Where the fumble of the play or action took place
#                                        - EXAMPLE FORMAT: [('BUF', '22', '-1')]
# recovery_spotting         -   list   - Where the fumble recovery took place
#                                        - EXAMPLE FORMAT: [('BUF', '25')]
#                                        - The spotting of when a ball goes out of bounds
#                                          after a fumble also falls under 'recovery_spotting'

# RETURN:
# Yardage gained on fumbled play





# Ultimately I need to figure out:
# 1. How to clean these plays accurately
# 2. The format of cleaned plays





# RULES (How yardage is calculated when a player has fumbled):

# - Yardage awarded to a player that has fumbled during a play could either be:
#   1. Start Spotting -> Fumble spotting
#   2. Start Spotting -> Fumble recovery
#      - 'Start Spotting' could be:
#        a. line of scrimmage
#        b. spotting of catch during return (kickoff/punt)
#        c. fumble recovery(?)

# - START SPOTTING -> FUMBLE SPOTTING:
#   1. The opposing team recovers the fumble

# - START SPOTTING -> FUMBLE RECOVERY:
#   1. The same team that fumbled makes the fumble recovery
#   2. The fumble recovery is behind the fumble spotting
#      - The player that fumbled the ball does not gain yardage if the ball was
#        fumbled forward and recovered.

# - I am going to try and work through every possible schematic of what could happen
#   during a fumble and observe what possible feature values each scheme would have.
#   - Particularly I am looking at:
#     1. Yardage
#     2. Whether or not a player is recorded having a rush or pass attempt (carry or target)
#   - Important to back this up with real plays

# - Scenarios will have different variations of these

# 1. initial fumble               # <--- Right now I am thinking
#    - Beyond line of scrimmage   # <--- that this will have to
#    - Behind line of scrimmage   # <--- be cleaned separately
#      - Here I need to think of the 1st player that fumbled
#        - Run play
#          - Quarterback
#          - Runningback
#        - Passing play
#          - Quarterback & Receiver (both are affected)
#        - kickoff/punt
#          - Returner

# 2. recovery                                         # <---       ~ LOOP ~
#    - By the same team                               # <--- - Need to be able to
#    - By the opposing team                           # <---   handle an infinite
# 3. yardage after recovery                           # <---   ammount of fumbles
#    - initial fumble (Beyond line of scrimmage)      # <---
#      - Beyond fumble spotting                       # <---
#      - Behind fumble spotting                       # <---
#        - Keep LOS in mind                           # <---
#        - Keep who recovered ball in mind            # <---
#          - opposing team                            # <---
#          - same team                                # <---
#            - different player                       # <---
#            - same player                            # <---
#    - initial fumble (Behind line of scrimmage)      # <---
#      - Beyond fumble spotting                       # <---
#      - Behind fumble spotting                       # <---
#        - Keep LOS in mind                           # <---
#        - Keep who recovered ball in mind            # <---
#          - opposing team                            # <---
#          - same team                                # <---
#            - different player                       # <---
#            - same player                            # <---
# ~ 4. fumble after fumble recovery                   # <---

# Design concerns
# - Formatting play breakdowns for a player that has fumbled multiple times in a single play.
#   - EXAMPLE:
#     - player 1 (KC1 -> KC 5) (4 yard gain)
#     - player 1 fumbles
#     - player 1 recovers at KC 10
#     - player 1 (KC10 -> KC15)
#     - player 1 awarded:
#       - 14 yards rushing
#       - 1 rushing attempt
#       - 1 fumble
#       - 1 recovery
# - Goal of creating this:
#   - Creating a method that will accurately return the yardage awarded to players that have
#     fumbled.

#############################################
# ATTEMPT TO CREATE A FUMBLE YARDAGE LAYOUT #
#############################################

# Variables:
# 1. Start (LOS / punt or kickoff return spotting)
# 2. Fumble spotting (Spotting of where the fumble took place)
# 3. Fumble recovery (Spotting of where the recovery of the fumble took place)
# 4. End (The play is over, this is where the play stops)
# 5. Player who initially fumbled
# 6. Player who recovered the fumble

# NOTE:
# - If the recovery from a ball that has been fumbled beyond the LOS is behind
#   the line of scrimmage, or if the ball was fumbled behind the LOS and recovered
#   beyond the LOS, it does not matter if it is from same or opposite team,
#   the initial player who fumbled the ball will get the least amount of yards between
#   fumble spotting or fumble recovery

# ____________________________________RUN PLAY____________________________________
# - 2+ entities here
#   1. rusher
#   2. recoverer (opposing team or same team)
#  ~3.

#   - Fumble BEYOND line of scrimmage
#     1. Yardage
#        - Opposing team recovered fumble
#          - 'player who initially fumbled' Yardage
#            - LOS -> fumble spotting
#        - Same team recovered fumble
#          a. different player from same team
#             - recovered BEFORE fumble spotting
#               - 'player who initially fumbled' Yardage
#                 - LOS -> fumble recovery
#             - recovered AFTER fumble spotting
#               - 'player who initially fumbled' Yardage
#                 - LOS -> fumble spotting
#          b. same player recovered own fumble
#             - LOS -> (Down OR fumble OR lateral) (?)
#               - I need examples.



#               - I think the yardage picks up where it left off..?
#                 There would be 2 separate actions in this case.
#                 1. LOS -> fumble
#                 2. fumble recovery -> some stop
#                 - The yardage here would be
#                   - LOS -> some stop
#                     - The issue is representing this somehow.



#   - Fumble BEHIND line of scrimmage
#     1. Yardage
#        - Opposing team recovered fumble
#          - 'player who initially fumbled' Yardage
#            - LOS -> fumble spotting
#        - Same team recovered fumble
#          a. different player on same team recovered fumble
#             1. 'different player' rushes and is stopped behind LOS
#                 - 'different player' features
#                   - Yardage = fumble recovery -> (Down OR fumble OR lateral) (player who recovered fumble)
#                   - PlayType = 'Fumble Recovery for yards' (?)
#                 - 'player who initially fumbled' features
#                   - Yardage = LOS -> fumble recovery OR LOS -> fumble spotting
#                     - This depends on which is further behind LOS
#                       - This makes sense because it is the same way if the fumble was beyond the LOS.
#                         - LOS -> fumble recovery (if the recovery was behind the fumble spotting)
#                         - LOS -> fumble spotting (if the fumble recovery was beyond the fumble spotting)
#                   - PlayType = Run
#             2. 'different player' rushes and goes beyond LOS
#                 - 'different player' features
#                   - Yardage = fumble recovery -> (Down OR fumble OR lateral) (player who recovered fumble)
#                   - PlayType = 'Fumble Recovery for yards' (?)
#                 - 'player who initially fumbled' features
#                   - Yardage = 0
#                     - All players who touched the ball before the player with the ball that goes beyond
#                       LOS will receive 0 Yardage
#                     - PlayType = Run
#          b. same player recovers their own fumble
#             - Yardage = LOS -> (Down OR fumble OR lateral) (player who recovered fumble)



# ________________________PASS PLAY________________________
#   - 3+ entities here
#     1. passer
#     2. receiver
#        - I think the yardage for the initial receiver who fumbled and
#          the yardage for the passer throwing will be the same
#     3. recoverer (opposing team or same team)

# - Pass BEYOND line of scrimmage and fumbled
#   1. Yardage
#      - Opposing team recovered fumble
#        - 'player who initially fumbled' Yardage
#          - LOS -> fumble spotting
#        - 'Quarterback' passing yards
#          - LOS -> fumble spotting
#      - Same team recovered fumble
#        a. different player on same team recovered fumble
#           - recovered BEFORE fumble spotting
#             - 'player who initially fumbled' Yardage
#               - LOS -> fumble recovery
#             - 'Quarterback' passing yards
#               - LOS -> fumble recovery
#           - recovered AFTER fumble spotting
#             - 'player who initially fumbled' Yardage
#               - LOS -> fumble spotting
#             - 'Quarterback' passing yards
#               - LOS -> fumble spotting
#        b. same player recovered own fumble
#           - 'player who initially fumbled' Yardage
#             - LOS -> down
#           - 'Quarterback' passing yards
#             - LOS -> down

# - Pass BEHIND line of scrimmage and fumbled
#   1. Yardage
#      - Opposing team recovered fumble
#        - 'player who initially fumbled' Yardage
#          - LOS -> fumble spotting
#        - 'Quarterback' passing yards
#          - LOS -> fumble spotting
#      - Same team recovered fumble
#        a. different player on same team recovered fumble
#           1. 'different player' rushes and is stopped behind LOS
#               - 'different player' features
#                 - Yardage = fumble recovery -> (Down OR fumble OR lateral) (player who recovered fumble)
#                 - PlayType = 'Fumble Recovery for yards' (?)
#               - 'player who initially fumbled' features
#                 - Yardage = LOS -> fumble recovery OR LOS -> fumble spotting
#                   - This depends on which is further behind LOS
#                 - PlayType = Pass
#           2. 'different player' rushes and goes beyond LOS
#               - 'different player' features
#                 - Yardage = fumble recovery -> (Down OR fumble OR lateral) (player who recovered fumble)
#                 - PlayType = 'Fumble Recovery for yards' (?)
#               - 'player who initially fumbled' features
#                 - Yardage = 0
#                   - All players who touched the ball before the player with the ball that goes beyond
#                     LOS will receive 0 Yardage
#                   - PlayType = Pass
#        b. same player recovers their own fumble
#           - Yardage = LOS -> (Down OR fumble OR lateral) (player who recovered fumble)
#             - Their yardage continues for the string of times they recovered their own fumble.
#               - The thing is that I need to capture all those different times that they fumbled
#                 while also grabbing the yardage from the starting point to the ending point of their
#                 last fumble.



###########################################################
# ATTEMPT TO CREATE A UNIVERSAL SCHEMA FOR FUMBLE YARDAGE #
###########################################################

# Variables needed
# - I am going to write out the schema first before trying to think of this.

# - Fumble BEYOND line of scrimmage
#   - Opposing team recovered fumble
#     - 'player who initially fumbled' Yardage
#       - LOS -> fumble spotting
#         - IF passing play:
#           - 'passer' Yardage
#             - LOS -> fumble spotting
#   - Same team recovered fumble
#     a. different player on same team recovered fumble
#       - recovered BEFORE fumble spotting
#         - 'player who initially fumbled' Yardage
#           - LOS -> fumble recovery
#             - IF passing play:
#               - 'passer' Yardage
#                 - LOS -> fumble recovery
#       - recovered AFTER fumble spotting
#         - 'player who initially fumbled' Yardage
#           - LOS -> fumble spotting
#             - IF passing play:
#               - 'passer' Yardage
#                 - LOS -> fumble spotting
#     b. same player recovered own fumble <-- This is the hard part about all of this.
#        - 'player who initially fumbled' Yardage
#           - LOS -> down
#             - This could be continuous. If a player picks up their own fumble, their
#               yardage for that play strings through all of their consecutive fumble recoveries.
#               They could fumble and recover 3+ times and their yardage would be from
#               LOS to the spotting of their 3rd recoveries down. The trick is being
#               able to record each individual fumble and recovery all while grasping
#               the yardage recorded for that single play.
#               - I will need to keep track of:
#                 1. the start spotting (LOS most likely)
#                 2. the very end spotting (spotting of when they last had the ball after CONSECUTIVE fumbles and own recoveries)
#             - IF passing play:
#               - 'passer' Yardage
#                 - LOS -> down
#                   - Same as yardage of initial player who fumbled.
#
# - Fumble BEHIND line of scrimmage
#   - Opposing team recovered fumble
#     - 'player who initially fumbled' Yardage
#       - LOS -> fumble spotting
#         - IF passing play:
#           - 'passer' Yardage:
#             - LOS -> fumble spotting
#   - Same team recovered fumble
#     a. different player on same team recovered fumble
#        1. 'different player' rushes and is stopped BEHIND LOS
#           a. 'different player' stopped behind fumble spotting
#              - 'player who initially fumbled' Yardage
#                - LOS -> fumble recovery
#                  - IF passing play:
#                    - 'passer' Yardage:
#                       - LOS -> fumble recovery
#           b. 'different player' stopped in front of fumble spotting
#              - 'player who initially fumbled' Yardage
#                - LOS -> fumble spotting
#                  - IF passing play:
#                    - LOS -> fumble spotting
#        2. 'different player' rushes and is stopped BEYOND LOS
#           - 'player who initially fumbled' Yardage
#             - 0
#               - All players who touched the ball before the player with the ball that
#                 crossed over LOS is awarded 0 yards
#               - IF passing play:
#                 - 'passer' Yardage
#                    - 0 (?)
#     b. same player recovers their own fumble
#        - 'player who initially fumbled' Yardage
#          - LOS -> down
#            - This is confusing because it could be split up into multiple runs
#              - A player could rush for 20 yards but fumbled every 5 yards in a single play
#                - IF passing play:
#                  - 'passer' Yardage
#                    - LOS -> down
#                      - This is where formatting is going to get tricky



# - qb fumbles -> recovers -> pass
# - pass behind line of scrimmage -> fumble -> pass



# I think I am only trying to figure out the yardage for the initial player who fumbled
# CLAIMS:
# 1. No matter where the fumble and fumble recovery takes place, whether it be behind the line of scrimmage
#    or beyond the line of scrimmage, if the fumble was recovered behind the fumble spotting, then the yardage
#    for the initial player who fumbled the ball will always be measured based off of the fumble recovery. If
#    the fumble was recovered beyond the fumble spotting, the yardage for the initial player who fumble the ball
#    will always be measured where the fumble was spotted.
#    - Where this gets messy is if the ball was fumbled behind the line of scrimmage and recovered beyond the line of scrimmage.
#      the yardage for the initial player who fumbled will always be 0. Not only the initial player who fumbled but if there
#      were a string of players who held the ball before the player who crossed over the LOS, they would all receive 0 yards
#      gained for their separate runs.
# 2. For passing plays, the initial player who fumbled and the quarterback will have the same yardage awarded for receiving and passing.
# 3. If opposing team recovers a fumble, the yardage will always be from the LOS -> Fumble spotting.
# 4. If the same team but a different player recovered the fumbled ball, the yardage for the initial player will always be
#    LOS -> fumble spotting UNLESS the ball was recovered before fumble spotting, then it is
#    LOS -> fumble recovery.
#    - Unique situation will break this.
#      - If the fumble spotting is (-) and the same team player recovered and gained (0+) yards,
#        the yardage for initial player who fumbled (and all others who touched the ball behind LOS)
#        is 0.



# Yardage for initial player who fumbled will be
# - Yardage = LOS -> Fumble spotting
#   - UNLESS
#     - same team different player recovered before fumble spotting (This also goes for balls that have gone out of bounds)
#       - Yardage = LOS -> fumble spotting
#   - OR UNLESS
#     - fumble spotting is (-) and same team different player gained (0+) yardage
#       - Yardage = 0
#         - I would need to somehow keep going down the line of fumbles and recoveries until I find
#           whether or not the team gained positive yards.
# - If same player who fumbled recovers own fumble
#   - Yardage = LOS -> Last spotting of when initial player had the ball.
#     - I will need to somehow keep going down the line of the same player continuously
#       fumbling and recovering to find the spot at which the player ended their hold
#       of the ball or were downed.



# vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv SHARPEN THIS vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

# Yardage for initial player who fumbled:
# - IF recovered by same team OR ball fell out of bounds
#   - IF same player recovered fumble:
#     - LOS -> end
#     - RETURN YARDAGE
#   - IF (recovery/out of bounds) spotting before fumble spotting:
#     - IF recovered (by different player):
#       - IF fumble spotting before LOS (-):
#         - IF recoverer goes passed LOS:
#           - RETURN YARDAGE = 0.0
#     - LOS -> recovery/out of bounds
# - ELSE:
#   - LOS -> fumble spotting

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SHARPEN THIS ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^




# PASSING EXAMPLES:

# (270, [277, 278])
# (3:01) (Shotgun) J.Patterson to HOU 36 for -5 yards
# FUMBLES, recovered by HOU-C.Stroud at HOU 35
# C.Stroud pass short right to R.Woods to HOU 49 for 8 yards (E.Speed).

#   - If a fumble occured behind the line of scrimmage (rushing attempt) and is
#     recovered for positive yardage, what do the stats look like for the original
#     player who fumbled?
#     - Does not receive a rushing attempt..? (but receives a fumble on stats sheet)

#   - If a fumble occured behind the line of scrimmage and is passed for positive yards,
#     does the person who originally fumbled receive 0 yards?
#     - YES

# (1368, [1411, 1412])
# (9:07) (Shotgun) L.Fortner to JAX 22 for -9 yards
# FUMBLES, recovered by JAX-T.Lawrence at JAX 19
# T.Lawrence pass incomplete short left to Z.Jones.

#   - If a fumble occured behind the line of scrimmage and is thrown for an
#     incomplete pass, does the player that original fumbled receive the amount
#     of yards behind the line of scrimmage as pass/run yards?
#     - The original player that fumbled did not receive - yards for the fumble,
#       but did receive a fumble.
#       - Is this because the pass was attempted?

# - I need to double check scenarios such as
#   1. What happens if the same player that fumbled the ball recovers the ball?
#      - Does their yardage pick up where it left off?
#   2. When is the yardage from START SPOTTING -> FUMBLE SPOTTING?












# This method is used to find the yardage for a player that has fumbled and a player
# from the same team recovered the fumble.

def fumble_recovery_yardage(start_spotting, fumble_spotting, recovery_spotting):

  start_zone = start_spotting[0][0]
  start_yardage = int(start_spotting[0][1])

  fumble_zone = fumble_spotting[0][0]
  fumble_yardage = int(fumble_spotting[0][1])
  if fumble_spotting[0][2] == 'no gain':
    fumble_play_yardage = 0
  else:
    fumble_play_yardage = int(fumble_spotting[0][2])

  recovery_zone = recovery_spotting[0][0]
  recovery_yardage = int(recovery_spotting[0][1])

  # Standard cases (start zone and fumble zone are same)
  if start_zone == fumble_zone:
    # position yardage (+) <-- I do not like this name. Need to change for clarity
    # - EXAMPLE: BUF 20 -> BUF 30
    if start_yardage < fumble_yardage:
      # play yardage (+)
      # - (starting position 0-50)
      # - (fumble position 0-50)
      if fumble_play_yardage > 0:
        starting_position = start_yardage
        fumble_position = fumble_yardage
        # fumble recovery spotting in same zone (0-50)
        if start_zone == recovery_zone:
          recovery_position = recovery_yardage
        # fumble recovery spotting in opposite zone (50-100)
        else:
          recovery_position = 100 - recovery_yardage
      # play yardage (-)
      # - (starting position 50-100)
      # - (fumble position 50-100)
      else:
        starting_position = 100 - start_yardage
        fumble_position = 100 - fumble_yardage
        # fumble recovery spotting in same zone (50-100)
        if start_zone == recovery_zone:
          recovery_position = 100 - recovery_yardage
        # fumble recovery spotting in opposite zone (0-50)
        else:
          recovery_position = recovery_yardage
    # position yardage (-)
    # EXAMPLE: BUF 30 -> BUF 20
    else:
      # play yardage (+)
      # - (starting position 50-100)
      # - (fumble position 50-100)
      if fumble_play_yardage > 0:
        starting_position = 100 - start_yardage
        fumble_position = 100 - fumble_yardage
        # fumble recovery spotting in same zone (50-100)
        if start_zone == recovery_zone:
          recovery_position = 100 - recovery_yardage
        # fumble recovery spotting in opposite zone (0-50)
        else:
          recovery_position = recovery_yardage
      # play yardage (-)
      # - (starting position 0-50)
      # - (fumble position 0-50)
      else:
        starting_position = start_yardage
        fumble_position = fumble_yardage
        # fumble recovery spotting in same zone (0-50)
        if start_zone == recovery_zone:
          recovery_position = recovery_yardage
        # fumble recovery spotting in opposite zone (50-100)
        else:
          recovery_position = 100 - recovery_yardage
  # Unique cases (start position and ending position are in different zones)
  else:
    # play yardage (+)
    # - (starting position 0-50)
    # - (fumble position 50-100)
    if fumble_play_yardage > 0:
      starting_position = start_yardage
      fumble_position = 100 - fumble_yardage
      # fumble recovery zone same as start zone (0-50)
      if start_zone == recovery_zone:
        recovery_position = recovery_yardage
      # fumble recovery zone same as start zone (50-100)
      else:
        recovery_position = 100 - recovery_yardage
    # play yardage (-)
    # - (starting position 50-100)
    # - (fumble position 0-50)
    else:
      starting_position = 100 - start_yardage
      fumble_position = fumble_yardage
      # fumble recovery zone same as start zone (50-100)
      if start_zone == recovery_zone:
        recovery_position = 100 - recovery_yardage
      # fumble recovery zone same as start zone (0-50)
      else:
        recovery_position = recovery_yardage

  # Fumble recovery only affects play yardage recorded when the same team recovers the ball
  # behind the spotting of the fumble.

  # If fumble spotting and recovery spotting are both negative
  # - Go with recovery spotting
  # If fumble spotting is negative and recovery spotting is positive
  # - Go with the negative one (smaller)
  # If fumble spotting is positive and recovery spotting is negative
  # - Go with the negative one (smaller)
  # If fumble spotting is positive and recovery spotting is positive
  # - Go with the smaller one

  fumble_yardage = fumble_position - starting_position
  recovery_yardage = recovery_position - starting_position

  if fumble_yardage < 0 and recovery_yardage < 0:
    yardage = recovery_yardage
  else:
    yardage = min(fumble_yardage, recovery_yardage)

  return yardage


# PURPOSE:
# - Universal helper method that extracts fumbled data from every playtype.

# BASIC PLAN:
# 1. Accept a play (single row from df) that has been fumbled.
# 2. Replace that single row with a dataframe containing all extracted data.
#    - These replacement dataframes are not limited to a single row, but can be many depending on the play.
#      - Fumbled plays can sometimes be seen as many plays
#        - FOR EXAMPLE:
#          play 1 - intended play
#          play 2 - run after fumble recovery

# BASIC DESIGN STEP BY STEP:
# 1. Split play description into significant actions and put into a list
#    EXAMPLES:
#    - intended play
#    - fumble recovery for yardage
# 2. Clean significant actions as their own rows
# 3. Create and return replacement dataframe containing all cleaned significant actions (or rows)

# INPUT PARAMETERS:
# df_plays                  - dataframe - dataframe of plays
# play                      -  String   - 'PlayDescription' of the current play that is being cleaned
# play_index                -  Integer  - index of play (Almost always from main dataframe of plays)
# main_action_patterns      -    list   - A list of regular expressions that are meant to pinpoint primary
#                                         actions within a play that will be used to extract these actions
#                                         to create a row within the replacement dataframe
# map_who_fumbled_patterns  -    map    - A map with the purpose of finding who was responsible for fumbling the ball
#                                         on a play.
#                                         KEY   - fumble regular expressions
#                                         VALUE - group number containing the player name within the regular expression.
# main_cleaning_method      - function  - A callback function (the function using this helper method) which
#                                         is used to clean intended play actions

# RETURN:
# df_multi_row_play - dataframe - dataframe of organized and cleaned actions stemming from a single unclean fumbled play

def extract_fumble_data(df_plays, play, play_index, main_action_patterns, map_who_fumbled_patterns, main_cleaning_method):

  original_play_copy = df_plays.loc[play_index]

  # Breaking play description into a list of sentences
  play_elements = play.split(". ")

  #################
  # KEY VARIABLES #
  #################

  # 'play_split' info:
  # - Designed to be a 2D list (list of lists)
  # - All elements within this list collectively will represent a single play.
  # - Each element within the list will become a separate row that will replace/add to the original dataframe of plays.
  #   - Each element represents a distict action within the single play and will have all data required for that new row.
  #   ROW CONTENTS:
  #   1. [ ( The intended play ) + ( Extra data ) , ( Who caused the fumble ) ]   <-  This row will have extra info such as (injuries / penalties / eligibility / etc...)
  #                                                                                   - "The intended play" includes 'Aborted' plays
  # ~ 2. [            ( The fumble recovery )     , ( Who caused the fumble ) ]   <-  This can happen repeatedly or not at all
  # ~ 3. [ (The fumble recovery for a touchdown) ]                                <-  This can only happen once for a single play or not at all
  play_split = []

  # Explicit forced fumble
  # - Some play descriptions will state who forced the fumble. This is semi-rare
  explicit_forced_fumble = None

  # 'extra_data' info:
  # - Will be a single string containing all additional data from the play such as (injuries / penalties / eligibility / etc...)
  # - Will be put into a single row dataframe and cleaned
  #   - Once extra data has been cleaned, the single row (now clean) dataframe will serve as a shell for
  #     the first new row that will replace the old play within the main dataframe.
  #     - This first new row will have the initial action of the play as well as all additional information from the play
  extra_data = ""

  # Handoff after fumble recovery
  # - Quarterback fumbles the ball -> recovers the ball -> performs a handoff to another player
  #   - Question: Does this cover if a handoff happens -> fumble?
  #   - Question: Can I use this approach for a qb fumble -> passes?
  handoff_attempt = False

  # Because punt/kickoff returns are formatted similarly to many other playtypes, these are needed
  # in order to decifer between those playtypes and punt/kickoff returns.
  is_punt = False
  is_kickoff = False

  # Interception fumbles so far have only occured after the interception.
  # This is so I can change the 'PlayType' of the action after the interception
  is_interception = False

  # - Iterate through each element within play_elements
  # - NOTE: We are iterating through actions of the play cronologically
  for string in play_elements:

    ######################################
    # ORGANIZING KEY ACTIONS WITHIN PLAY #
    ######################################

    # ACTIONS WITHIN PLAY THAT DESERVE THEIR OWN ROW:
    # These situations will have their own list element within "play_split" (meaning their own row within the new cleaned replacement dataframe)
    # 1. intended play (initial action might be a better name for plays such as ones that have been aborted)
    # 2. runs after fumble recoveries (emphasis on the plural)
    # 3. touchdown after fumble recovery (can only happen once) (looks unique for each playtype) <- this might not be true.
    # 4. handoffs
    for play_pattern in main_action_patterns:
      if re.search(play_pattern, string) != None:
        play_split.append([string])
        break
    if re.search(play_pattern, string) != None:
      continue

    # ADD ON SECTION (Actions that will add to elements that will obtain their own row)
    # - Appends data to elements within 'play_split'
    #   - Every element within play_split is a list, this section will add to those individual lists
    #     - Specifically it will append to the last element within 'play_split' and the reason for that
    #       is because as we are iterating through sentences cronologically, the appending element
    #       will always follow directly after the element that needs it
    # These situations will add to the last element within 'play_split' (For all playtypes)
    # 1. forced fumble description (happens after regular plays & sometimes after fumble recoveries)
    # 2. fumble description describing a qb only fumble (happens after a qb only fumble)
    # 3. qb_aborted_fumble_pattern -
    for play_pattern in [forced_fumble_pattern, qb_fumble_description_pattern, center_aborted_fumble_pattern]:
      if re.search(play_pattern, string):
        index_last_element = len(play_split) - 1
        play_split[index_last_element].append(string)
        break
    if re.search(play_pattern, string) != None:
      continue




    # Explicit forced fumble here.
    if re.search(explicit_forced_fumble_pattern, string):
      explicit_forced_fumble = string
      continue




    # When a sentence does not fit within the top 2 sections ( 1. adding an element to the list || 2. appending to an element in the list )
    # - Glue the sentence into 'extra_data' to be cleaned separately.
    extra_data = extra_data + string + ". "

  ##################################
  ## CLEANING ACTIONS WITHIN PLAY ##
  ##################################

  ###################################
  # CLEANING INITIAL ACTION OF PLAY #
  ###################################

  # print(play_index)
  # print(play_split)

  # GRABBING: Initial action of play (e.g. Intended play / aborted fumble / qb only fumble / etc...)
  intended_play_description = play_split.pop(0)

  # Creating a single row dataframe of the intended play
  # No matter what the initial action is, the description will always be the first element of the first element within 'play_split'
  unclean_intended_play = pd.DataFrame([original_play_copy.copy()], columns=df_plays.columns)
  unclean_intended_play['PlayDescription'] = intended_play_description[0]

  ### TEAM WITH POSSESSION AFTER RECOVERY ###
  # fumble_recovery_team info:
  # - The goal with this variable is to grab the team that recovered the fumble.
  #   - The reason why this is needed is so that the feature 'TeamWithPossession'
  #     mirrors what is happening and who has control of the ball during the play.
  #   - This is also used to correct yardage gained when a team recovers their own
  #     fumble.
  #     - This is the reason why it is up top, this is a need to know before calculating
  #       yardage gained during certain fumble recoveries.
  fumble_recovery_team = None

  # Figuring out who recovered the fumble and assigning 'fumble_recovery_team' with
  # the team that this player is on.
  for action in intended_play_description:

    if action.find('and recovers') != -1:
      unclean_intended_play['FumbleRecoveredBy'] = unclean_intended_play['WhoFumbled'].iloc[0]
      break

    # Who recovered the fumble
    if 'recovered' in action.lower():
      fumble_recovery = re.findall(fumble_recovery_pattern, action)
      if len(fumble_recovery) > 0:
        fumble_recovery_team = fumble_recovery[0][0].split("-")[0]
        unclean_intended_play['FumbleRecoveredBy'] = "-".join(fumble_recovery[0][0].split("-")[1:])
        break

  ### STANDARD INTENDED PLAYTYPE FUMBLES ###
  # - Typical play types such as runs or passes
  # - Will have 2 elements:
  #   1. Intended play
  #   2. Who or how they fumbled
  if len(intended_play_description) > 1:
    unclean_intended_play['FumbleDetails'] = intended_play_description[1]

    ### FORCED FUMBLE BY ###
    forced_fumble = re.findall(forced_fumble_pattern, intended_play_description[1])
    if len(forced_fumble) > 0:
      unclean_intended_play['ForcedFumbleBy'] = forced_fumble[0]

    ### WHO FUMBLED ###
    for fumble_outcome in map_who_fumbled_patterns.keys():
      fumbled_by = re.findall(fumble_outcome, intended_play_description[0])
      if len(fumbled_by) > 0:
        if isinstance(fumbled_by[0], tuple):
          unclean_intended_play['WhoFumbled'] = fumbled_by[0][map_who_fumbled_patterns[fumble_outcome]]
        else:
          unclean_intended_play['WhoFumbled'] = fumbled_by[map_who_fumbled_patterns[fumble_outcome]]
        break

    cleaned_intended_play = main_cleaning_method(unclean_intended_play)

    ### YARDAGE AFTER RECOVERY ###
    if fumble_recovery_team == None or fumble_recovery_team == unclean_intended_play['TeamWithPossession'].iloc[0]:
      # For some reason, 'aborted' plays do not count here.
      if intended_play_description[0].find('Aborted') == -1:

        # print(play_index)
        # print(play_split)
        # print(intended_play_description)

        # Play start ( FORMAT EX. - [('BUF', '21')] )
        line_of_scrimmage = re.findall(play_start_pattern, original_play_copy['PlayStart'])
        # fumble spotting ( FORMAT EX. - [('BUF', '22', '-1')] )
        fumble_spotting = re.findall(standard_play_end_pattern, intended_play_description[0])
        # recovery spotting ( FORMAT EX. - [('BUF', '25')] )
        fumble_touch = re.findall(fumble_touch_pattern, intended_play_description[1])
        if len(fumble_touch) > 0:
          recovery_spotting = fumble_touch # I guess if a fumble is "touched" then that is what stops the yards after fumble from adding onto yardage gained?



        # I need something for fumbles that end up out of bounds in the end zone for a touchback.
        # - For now, I KNOW THIS IS WRONG, I will mark it as recovered at the 0 yard line of the opposing teams zone.
        elif intended_play_description[0].find('ball out of bounds in End Zone, Touchback'):
          # find opposing team
          opposing_team_acronym = ""
          if unclean_intended_play['TeamWithPossession'].iloc[0] == dict_teams.get(unclean_intended_play['AwayTeam'].iloc[0]):
            opposing_team_acronym = dict_teams.get(unclean_intended_play['HomeTeam'].iloc[0])
          else:
            opposing_team_acronym = dict_teams.get(unclean_intended_play['AwayTeam'].iloc[0])
          recovery_spotting = [[opposing_team_acronym, '0']]

        else:
          recovery_spotting = re.findall(fumble_recovery_spotting_pattern, intended_play_description[1])
        # print(line_of_scrimmage)
        # print(fumble_spotting)
        # print(recovery_spotting)
        yardage = fumble_recovery_yardage(line_of_scrimmage, fumble_spotting, recovery_spotting)
        cleaned_intended_play['Yardage'] = yardage

  ### ABORTED FUMBLE ###
  # - Will grab rusher, an aborted fumble still counts as a rush attempt
  # - Will grab who was at fault for aborted play
  if intended_play_description[0].find('Aborted') != -1:
    rusher_patterns = [qb_fumble_pattern, qb_aborted_fumble_pattern, qb_center_aborted_fumble_pattern]
    for pattern in rusher_patterns:
      rusher = re.findall(pattern, play)
      if len(rusher) > 0:
        rusher_name = rusher[0]
        unclean_intended_play['Rusher'] = rusher_name
        break
    # Rusher at fault for aborted play
    if intended_play_description[0].find('(Aborted)') != -1:
      unclean_intended_play['WhoFumbled'] = rusher_name
      unclean_intended_play['FumbleDetails'] = intended_play_description[0]
    # Center at fault for aborted play
    else:
      center_at_fault = re.findall(center_aborted_fumble_pattern, intended_play_description[1])
      unclean_intended_play['WhoFumbled'] = center_at_fault
      unclean_intended_play['FumbleDetails'] = intended_play_description[1]
    unclean_intended_play['Yardage'] = 0
    cleaned_intended_play = unclean_intended_play # Technically unnecessary, but doing this to show that the play is now clean.

  ### KICKOFF FUMBLE ###
  #   - Because there is the kickoff, then the kickoff return, then the fumble on the kickoff return,
  #     the intended play will not have the fumble detail but still needs to be cleaned.
  #   - Currently does not grab penalties during fumbles on kickoff plays.
  kickoff = re.findall(kickoff_pattern, intended_play_description[0])
  if len(kickoff) > 0:
    is_kickoff = True
    cleaned_intended_play = main_cleaning_method(unclean_intended_play)

  ### PUNTING FUMBLE ###
  #   - Because the fumble occurs on the punt return,
  #     the intended play will not have the fumble detail but still needs to be cleaned.
  #   - Currently does not grab penalties during punts on kickoff plays.
  punt = re.findall(punting_pattern, intended_play_description[0])
  if len(punt) > 0:
    is_punt = True
    cleaned_intended_play = main_cleaning_method(unclean_intended_play)

  ### SACKED FUMBLE ###
  if intended_play_description[0].find('sacked') != -1:
    passer_name_sack_fumble = re.findall(passer_name_pattern, intended_play_description[0])
    unclean_intended_play['WhoFumbled'] = passer_name_sack_fumble[0]
    cleaned_intended_play = unclean_intended_play

  ### INTERCEPTION FUMBLE ###
  # - Fumble occurs after interception
  if intended_play_description[0].lower().find('intercepted') != -1:
    is_interception = True
    cleaned_intended_play = main_cleaning_method(unclean_intended_play)
    # Intercepted by
    intercepted_by = re.findall(interception_name_pattern, intended_play_description[0])
    if len(intercepted_by) > 0:
      cleaned_intended_play['InterceptedBy'] = intercepted_by[0]

      if cleaned_intended_play['SoloTackle'].iloc[0] != 'nan':
        cleaned_intended_play.at[0, 'PassDefendedBy'] = (intercepted_by[0], cleaned_intended_play['SoloTackle'].iloc[0])
        cleaned_intended_play['SoloTackle'] = 'nan'
      else:
        cleaned_intended_play['PassDefendedBy'] = intercepted_by[0]

  ### EXPLICIT FORCED FUMBLE ###
  if explicit_forced_fumble != None:
    cleaned_intended_play['ForcedFumbleBy'] = explicit_forced_fumble
    explicit_forced_fumble = None

  ### CLEANING EXTRA DATA ###
  # - Extra data includes (injuries / penalties / eligibility / official yardage ruling / etc...)
  #   - 'Official yardage ruling' is why this is done last.
  # - Extra data will be located within the first row of the replacement dataframe.
  # - Question: Not sure how an accepted penalty will react during a fumbled play.
  if extra_data:
    playdescription = cleaned_intended_play['PlayDescription'].iloc[0]
    cleaned_intended_play['PlayDescription'] = extra_data
    cleaned_intended_play = main_cleaning_method(cleaned_intended_play)
    cleaned_intended_play['PlayDescription'] = playdescription









  ##########################################
  # CLEANING SECONDARY ACTIONS WITHIN PLAY #
  ##########################################

  ####################################################################
  # FUMBLE RECOVERIES FOR YARDAGE & FUMBLE RECOVERIES FOR TOUCHDOWNS #
  ####################################################################

  # Created list for the possibility of multiple fumbles and recoveries in a single play
  list_recovery_runs = []

  for play in play_split:

    recovery_row = pd.DataFrame([original_play_copy.copy()], columns=df_plays.columns)

    recovery_row['PlayDescription'] = play[0]

    # Pass after fumble recovery
    pass_play = re.findall(passer_name_pattern, play[0])
    if len(pass_play) > 0:
      recovery_row['PlayOutcome'] = 'Pass'
      cleaned_recovery_row = clean_pass_plays(recovery_row)

    # Handoff after fumble recovery
    elif play[0].find('Handoff') != -1:
      handoff_attempt = True
      recovery_row['PlayOutcome'] = 'Run'
      cleaned_recovery_row = clean_run_plays(recovery_row)



    # Fumble on punt return
    elif is_punt == True:
      is_punt = False
      cleaned_recovery_row = clean_punt_plays(recovery_row)
      # fix yardage
      punt_breakdown = re.findall(punting_pattern, cleaned_intended_play['PlayDescription'].iloc[0])
      punt_return_breakdown = re.findall(punt_return_pattern, play[0])
      punt_fumble_recovery = re.findall(fumble_recovery_spotting_pattern, play[1])
      punt_return_start = [[punt_breakdown[0][2], punt_breakdown[0][3]]]
      punt_return_end = [[punt_return_breakdown[0][1], punt_return_breakdown[0][2], punt_return_breakdown[0][3]]]
      cleaned_recovery_row['Yardage'] = fumble_recovery_yardage(punt_return_start, punt_return_end, punt_fumble_recovery)



    # Fumble on kickoff return
    elif is_kickoff == True:
      is_kickoff = False
      cleaned_recovery_row = clean_kickoff_plays(recovery_row)

    # Fumble on run after interception
    elif is_interception == True:
      is_interception = False
      recovery_row['PlayOutcome'] = 'Run'
      cleaned_recovery_row = clean_run_plays(recovery_row)
      cleaned_recovery_row['PlayOutcome'] = cleaned_intended_play['PlayOutcome']
      cleaned_recovery_row['PlayType'] = 'Run After Interception'

    # Everything else can be labeled as a run play
    else:
      recovery_row['PlayOutcome'] = 'Run'
      cleaned_recovery_row = clean_run_plays(recovery_row)
      cleaned_recovery_row['PlayType'] = 'Fumble Return'

    # Assigning 'TeamWithPossession' as the team who recovered the fumble.
    if fumble_recovery_team != None:
      cleaned_recovery_row['TeamWithPossession'] = fumble_recovery_team



    # Was the recovered fumble fumbled?
    # OR
    # Did the fumble occur on a punt/kickoff return?
    if len(play) > 1:

      cleaned_recovery_row['FumbleDetails'] = play[1]

      # Who forced fumble
      forced_fumble = re.findall(forced_fumble_pattern, play[1])
      if len(forced_fumble) > 0:
        cleaned_recovery_row['ForcedFumbleBy'] = forced_fumble[0]

      # Who fumbled the ball
      cleaned_recovery_row['WhoFumbled'] = cleaned_recovery_row['Rusher'].iloc[0]

      # Fumble occured during punt/kickoff return
      if cleaned_recovery_row['Returner'].iloc[0] != 'nan':
        cleaned_recovery_row['WhoFumbled'] = cleaned_recovery_row['Returner'].iloc[0]

      # Who recovered the fumble
      if 'recovered' in play[1].lower():
        fumble_recovery = re.findall(fumble_recovery_pattern, play[1])
        if len(fumble_recovery) > 0:
          fumble_recovery_team = fumble_recovery[0][0].split("-")[0]
          # Always taking note of who recovers the fumble, so that the team with possession during that time is recorded.
          # - If the player who recovered the fumble gains yards, this variable will loop back around to assign the 'TeamWithPossession'
          #   feature to go along with the following row representing the recovery for yardage.
          cleaned_recovery_row['FumbleRecoveredBy'] = "-".join(fumble_recovery[0][0].split("-")[1:])

    cleaned_recovery_row['PlayOutcome'] = original_play_copy['PlayOutcome'] # <- Maybe this isn't correct? when a play is split by multiple rows, this becomes tricky.
    list_recovery_runs.append(cleaned_recovery_row)

  # - I need to review teams with possession when there is a fumble recovery for a touchdown.







  ###################
  # 3.NEW DATAFRAME #
  ###################
  # - Create the cleaned replacement row(s) for the original row.

  if len(list_recovery_runs) > 0:
    df_multi_row_play = pd.DataFrame(columns=df_plays.columns)
    df_multi_row_play = pd.concat([cleaned_intended_play, *list_recovery_runs], ignore_index=True)
  else:
    df_multi_row_play = cleaned_intended_play

  #########################
  # DATAFRAME ADJUSTMENTS #
  #########################

  # - I need to double check this and possibly incorporate this above
  # Handoff after fumble recovery
  # - Quarterback fumbles the ball -> recovers the ball -> performs a handoff to another player
  #   - All actions recorded before the handoff need a change in 'PlayType'
  #     - Currently each recorded as a rushing attempt.
  # - I think this is wrong.
  #   - I think the only reason why the QB received 0 yards after fumbling the ball
  #     behind the LOS is because the handoff went passed the LOS. If it did not,
  #     I have no idea what the yardage would look like.
  #   - This may have to be taken out and adjusted in the future when I look at a larger
  #     sample.
  if handoff_attempt:
    handoff_index = 0
    list_indexes_to_handoff = []
    for idx, action in df_multi_row_play['PlayDescription'].items():
      if action.find('Handoff') != -1:
        for i in list_indexes_to_handoff:
          df_multi_row_play.loc[i, 'PlayType'] = 'Handoff Attempt'
        break
      else:
        list_indexes_to_handoff.append(idx)

  # print()

  return df_multi_row_play

#### helper methods for penalties

In [None]:
# NOTES:
# - I think that it is crucial to note that only plays where the team with the ball
#   (offense or defense) are affected and ultimately worked on in this method.
#   Penalties that aid the team with the ball will not be looked at because those
#   do not effect the stats for either the ball carrier or qb or whoever. Only
#   penalties against the team with the ball have effects on player stats.

# Potential new features to add
# - How far do I want to take this? I do not want unneccessary features
#   that will just take up space.
# - For now I will do the bare minimum.
# 1. AcceptedPenalty
# 2. DeclinedPenalty

# FEATURE IDEAS IF I WANTED TO EXTEND
# X 1. Yardage from penalty (?)(Will not add this right now)
# X 2. Offensive penalty (?) (I do not think I want to go through with this yet)
# X 3. Defensive penalty (?) (I do not think I want to go through with this yet)
#    - I think these are features that would be beneficial but I could save space
#      by grouping accepted penalties and declined penalties. (FOR NOW)
# X 4. 'total yards gained' (?)
#    - This way, I can easily grab
#      1. Yards from play
#      2. Yards from penalty
#      3. Yards gained all together
# X 5. 'Player awarded yardage' (?)
#     - A feature strictly for yardage gained for the intented offensive player

# PURPOSE:
# - Categorize whether the penalty was offensive or defensive.
#   - Helper method for 'accepted_penalty_play_on_offense'

# INPUT PARAMETERS:
# df_plays       - dataframe - dataframe of plays
# play           -  String   - 'PlayDescription' of the current play that is being cleaned
# play_index     -  Integer  - index of play (Almost always from main dataframe of plays)
# start_string   -  String   - The string that contains the spotting of the beginning of the play for
#                              that specific play type
# dict_start     -   Dict    - A map that has a key of regular expressions and values of where to grab
#                              the start of the play for that particular play type
# end_string     -  String   - The string that contains the spotting of the end of the play for that
#                              specific play type
# dict_end       -   Dict    - A map that has a key of regular expressions and values of where to grab
#                              the end of the play for that particular play type
# yardage_string -  String   - A string that contains the yardage gained for that particular play type
# dict_yardage   -   Dict    - A map that has a key of regular expressions and values of where to grab
#                              the yardage of the play for that particular play type

# RETURN/OUTCOME:
# - Categorize offensive/defensive penalty to play's feature 'AcceptedPenalty' or 'DeclinedPenalty'

def extract_penalty_data(df_plays, play, play_index, start_string, dict_start, end_string, dict_end, yardage_string, dict_yardage):

  # Accepted Penalty
  if play.find('PENALTY') != -1:
    accepted_penalties = []
    play_elements = play.split(". ")
    for i in play_elements:
      if i.find('PENALTY') != -1:
        accepted_penalties.append(i)
        # Strictly looking for penalties against team with ball
        # 1. Could be offense on a traditional play (pass/rush/etc...)
        # 2. Could be defensive team on a defensive takeaway run (interception/fumble/etc..)
        penalty_player_team = re.findall(name_pattern, i)
        if len(penalty_player_team) > 0:
          if penalty_player_team[0].split("-")[0] == df_plays['TeamWithPossession'].loc[play_index]:
            # Important to note that the feature 'Yardage' is player awarded yardage for intended play.
            # - This does not factor in the yards penalized from the penalty.
            df_plays.loc[play_index, 'Yardage'] = accepted_penalty_play_on_offense(df_plays, play, play_index, start_string, dict_start, end_string, dict_end, yardage_string, dict_yardage)
    df_plays.at[play_index, 'AcceptedPenalty'] = accepted_penalties

  # Declined Penalty
  if play.find('Penalty') != -1:
    declined_penalties = []
    play_elements = play.split(". ")
    for i in play_elements:
      if i.find('Penalty') != -1:
        declined_penalties.append(i)
    df_plays.at[play_index, 'DeclinedPenalty'] = declined_penalties

# NOTES:

# PENALTY YARDAGE RULES (yardage awarded to offensive players when offensive penalty occurs):
# 1. If the penalty was beyond the line of scrimmage and brought back, the rusher is
#    awarded with any positive gained yards up to the spotting of the ball.
# 2. If the penalty was behind or at the line of scrimmage, the play does does not count.
#    - This should not count as a rushing attempt 'Play Type' should be 'No play'
#    - 'Yardage' will be 0.0

# WHAT TO FIND
# 1. Where the play started
# 2. Positive direction for the offense (field awareness?)
#    - I need to know which end zone the offense is trying to reach
#      - Brake the positioning of the play (start, end, penalty) down to a scale of 0-100
#        EXAMPLE:
#        - original:
#          (start: BUF 30, end: BUF 20, yardage: 10, penalty: offense 5 yards at BUF 30)
#        - scaled:
#          (start: 70,     end: 80,     yardage: 10, penalty: -5 yards at 80)

# PURPOSE:
# - Find yardage awarded to offensive players on penalized plays

# INPUT PARAMETERS:
# df_plays       - dataframe - dataframe of plays
# play           -  String   - 'PlayDescription' of the current play that is being cleaned
# play_index     -  Integer  - index of play (Almost always from main dataframe of plays)
# start_string   -  String   - The string that contains the spotting of the beginning of the play for
#                              that specific play type
# dict_start     -   Dict    - A map that has a key of regular expressions and values of where to grab
#                              the start of the play for that particular play type
# end_string     -  String   - The string that contains the spotting of the end of the play for that
#                              specific play type
# dict_end       -   Dict    - A map that has a key of regular expressions and values of where to grab
#                              the end of the play for that particular play type
# yardage_string -  String   - A string that contains the yardage gained for that particular play type
# dict_yardage   -   Dict    - A map that has a key of regular expressions and values of where to grab
#                              the yardage of the play for that particular play type

# RETURN/OUTCOME:
# - Fill feature 'Yardage' for play containing offensive penalty.

def accepted_penalty_play_on_offense(df_plays, play, play_index, start_string, dict_start, end_string, dict_end, yardage_string, dict_yardage):

  # Situations where a penalty does not affect an offensive players yardage during
  # a penalized play.
  # 1. An incomplete pass will always result in a 0 gained yards.
  # 2. If a 'PlayStart' is null, it is likely a 2PT conversion attempt(?)
  #    - Players are not awarded yardage on a 2PT conversion attempt.
  if df_plays['PlayOutcome'].loc[play_index] == 'Pass Incomplete':
    return 0.0
  if df_plays['PlayStart'].loc[play_index] is None:
    return 0.0

  # VARIABLES:
  # 1. Starting point (on 100 yard format)
  # 2. ending point (on 100 yard format)
  # 3. penalty enforcement (on 100 yard format)
  # BASIC ALGORITHM:
  # 1. penalty enforcement - starting point
  #    - If result < 0:
  #      - yardage = 0.0
  #    - else:
  #      - yardage = result

  # Starting variables (e.g. BUF 20)
  # - Normally will be the line of scrimmage.
  # - Can also be the start of when a punt was caught on a punt return.
  start_territory = None
  start_yardage = 0
  # End variables (e.g. BUF 40)
  end_territory = None
  end_yardage = 0
  # Penalty variables (e.g. BUF 30)
  penalty_territory = None
  penalty_yardage = 0

  # ALL CLEANING METHODS THAT CURRENTLY HAVE THIS HELPER METHOD:

  # PLAN:
  # I need to grab specific bits of data from a play, each bit
  # requiring a string from some feature of the play row and a regular
  # expression to grab the data from within the string.
  # - Instead of having an "if" statement for each playtype,
  #   I am going to send in the string required and a dictionary of regular
  #   expressions that could give me the information that I need.
  #   - each regular expression within the dictionary will have a:
  #     KEY - the regular expression
  #     VALUE - indexes of the data within the regular expression that I need.

  # 1. clean_pass_plays
  #    START:             string - 'PlayStart' feature # <---- What happens if there isn't a playstart?
  #           regular expression - play_start_pattern
  #                    territory - 0
  #                      yardage - 1
  #    END:               string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                    territory - 0
  #                      yardage - 1
  #    PLAY YARDAGE:      string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                      yardage - 2
  # NOTE:
  # - I don't think that I have passing penalties correct.
  #   - (4:00) (Shotgun) M.Jones pass short left to E.Elliott to PHI 29 for 6 yards (R.Blankenship; J.Bradberry)
  #     PENALTY on NE-A.Mafi, Offensive Holding, 10 yards, enforced at PHI 29.
  #     - With the current model that I have, I have the penalty spotting to be at PH 29
  #       not 10 yards from PHI 29.

  # 2. clean_intercepted_plays
  #    START:             string - original PlayDescription
  #           regular expression - interception_play_end_pattern
  #                    territory - 1
  #                      yardage - 2
  #    END:               string - original PlayDescription
  #           regular expression - defensive_takeaway_run_pattern
  #                    territory - 1
  #                      yardage - 2
  #    PLAY YARDAGE:      string - original PlayDescription
  #           regular expression - defensive_takeaway_run_pattern
  #                      yardage - 3

  # 3. clean_run_plays
  #    START:             string - 'PlayStart' feature
  #           regular expression - play_start_pattern
  #                    territory - 0
  #                      yardage - 1
  #    END:               string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                    territory - 0
  #                      yardage - 1
  #    PLAY YARDAGE:      string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                      yardage - 2

  # 4. cleaning_2pt_conversion_plays
  #    START:             string - 'PlayStart' feature
  #           regular expression - play_start_pattern
  #                    territory - 0
  #                      yardage - 1
  #    END:               string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                    territory - 0
  #                      yardage - 1
  #    PLAY YARDAGE:      string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                      yardage - 2

  # 5. clean_sacked_plays
  #    START:             string - 'PlayStart' feature
  #           regular expression - play_start_pattern
  #                    territory - 0
  #                      yardage - 1
  #    END:               string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                    territory - 0
  #                      yardage - 1
  #    PLAY YARDAGE:      string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                      yardage - 2

  # 6. clean_punt_plays
  #    START:             string - original PlayDescription
  #           regular expression - punting_pattern
  #                    territory - 2
  #                      yardage - 3
  #    END:               string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                    territory - 0
  #                      yardage - 1
  #    PLAY YARDAGE:      string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                      yardage - 2

  # 7. clean_kickoff_plays
  #    START:             string - original PlayDescription
  #           regular expression - punting_pattern
  #                    territory - 2
  #                      yardage - 3
  #    END:               string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                    territory - 0
  #                      yardage - 1
  #    PLAY YARDAGE:      string - PlayDescription
  #           regular expression - standard_play_end_pattern
  #                      yardage - 2

  # Penalty data
  list_different_penalty_enforcements = [penalty_yardage_pattern, between_downs_penalty_yardage_pattern]
  for penalty_enforcement_type in list_different_penalty_enforcements:
    penalty_elements = re.findall(penalty_enforcement_type, play)
    if len(penalty_elements) > 0:
      # If penalty was enforced between downs, yardage from intended play is not affected
      if penalty_enforcement_type == between_downs_penalty_yardage_pattern:
        return df_plays['Yardage'].loc[play_index]
      break
  penalty_territory = penalty_elements[0][1]
  penalty_yardage = int(penalty_elements[0][2])

  # Start
  for start_pattern in dict_start:
    start_elements = re.findall(start_pattern, start_string)
    if len(start_elements) > 0:
      start_territory = start_elements[0][int(dict_start.get(start_pattern)[0])]
      start_yardage = int(start_elements[0][int(dict_start.get(start_pattern)[1])])
      break

  # End
  for end_pattern in dict_end:
    end_elements = re.findall(end_pattern, end_string)
    if len(end_elements) > 0:
      end_territory = end_elements[0][int(dict_end.get(end_pattern)[0])]
      end_yardage = int(end_elements[0][int(dict_end.get(end_pattern)[1])])
      break
  if end_territory is None:
    return 0.0

  # Play Yardage
  for yard_pattern in dict_yardage:
    yard_elements = re.findall(yard_pattern, yardage_string)
    if len(yard_elements) > 0:
      if yard_elements[0][int(dict_yardage.get(yard_pattern)[0])] == 'no gain':
        play_yardage = 0
      else:
        play_yardage = int(yard_elements[0][int(dict_yardage.get(yard_pattern)[0])])
      break

  # Need to figure out which zone the offense is in and which direction they are fighting for given:
  # 1. position yardage (+/-)
  #    - Take the starting position and ending position, is the difference between
  #      them positive or negaive? (e.g. BUF 30 -> BUF 40 is +)
  # 2. play yardage (+/-)
  #    - Did the intended play gain yardage or lose yardage?

  # - I need to figure out which direction the penalty is going towards.
  #   - Initially I am thinking that it would be the opposite of positive yardage.

  # Standard cases:
  # position yardage (+) & play yardage (+):
  # - the starting position team zone is the beginning (0-50)
  # position yardage (-) & play yardage (+):
  # - the starting position team zone is the ending (50-100)
  # position yardage (+) & play yardage (-):
  # - the starting position team zone is the ending (50-100)
  # position yardage (-) & play yardage (-):
  # - the starting position team zone is the beginning (0-50)

  # Unique cases:
  # zones switch (e.g. KC 47 -> BUF 47)
  # play yardage (+):
  # - the starting position team zone is the beginning (0-50)
  # play yardage (-):
  # - the starting position team zone is the ending (50-100)
  # penalty occured on the line of scrimmage
  # - doesn't matter. yardage gained is 0

  # 0-100 yard format
  # EXAMPLE: GB 30
  # GB is the 0-50 yard territory: 30 yards
  # GB is the 50-100 yard territory: 100 - 30 = 70 yards

  starting_fifty_to_end_zone = False
  penalty_enforcement_fifty_to_end_zone = False

  # Start territory and end territory are the same
  if start_territory == end_territory:
    # position yardage (+) (e.g. BUF 10 -> BUF 20)
    if start_yardage < end_yardage:
      # play yardage (-) (e.i. gained negative yards on play)
      if play_yardage < 0:
        starting_fifty_to_end_zone = True
        penalty_enforcement_fifty_to_end_zone = True
    else:
      # play yardage (+) (e.i. gained positive yards on play)
      if play_yardage > 0:
        starting_fifty_to_end_zone = True
        penalty_enforcement_fifty_to_end_zone = True
  # Start territory and end territory are different
  else:
    # play yardage (+)
    if play_yardage > 0:
      zero_to_fifty_zone = start_territory
      # penalty territory and start territory are different
      if penalty_territory != zero_to_fifty_zone:
        penalty_enforcement_fifty_to_end_zone = True
    # play yardage (-)
    else:
      fifty_to_hundred_zone = start_territory
      starting_fifty_to_end_zone = True
      if penalty_territory == fifty_to_hundred_zone:
        penalty_enforcement_fifty_to_end_zone = True

  if starting_fifty_to_end_zone:
    start_yardage = 100 - start_yardage
  if penalty_enforcement_fifty_to_end_zone:
    penalty_yardage = 100 - penalty_yardage

  resulting_player_awarded_yardage = penalty_yardage - start_yardage

  if resulting_player_awarded_yardage < 0:
    return 0.0
  else:
    return resulting_player_awarded_yardage

#### helper method for laterals

In [None]:
# STILL A WORK IN PROGRESS. WILL 1000000% NEED TO ITERATE OVER TIME.

# PURPOSE:
# - To effectively clean plays that have laterals within them

# FUNCTION:
# - INPUT:
#   1. dataframe of plays
#   2. index of lateral play
#   3. cleaning method using this helper method
# - OUTPUT:
#   1. dataframe (probably multiple rows) representing play cleaned

# THOUGHTS:
# - I think that a lateral will have to be separated into multiple rows.
#   ROWS:
#   1. initial play..
#      - (Run / Pass / Punt Return / interception / etc..)
#      - Although I have not seen this yet, I believe that a lateral could set
#        up the main action of the play.
#        - For example, there could be a lateral behind the line of scrimmage
#          and then a pass following after.
#   2. A new row for every lateral.
# - I think this is a method that will have to evolve with time as more samples
#   of laterals come in. I can't create a good method that will stand without
#   more plays to clean.

# WHAT I NEED:
# - Ideally I would want parameters to be as minimal as possible.
#   - This is a method that takes a single play description and will provide
#     a potentially multi row replacement of that single play but cleaned.
# - Would all I need is the play description..?
#   - lets try

# DESIGN IDEA:

#########################
# DATA COLLECTION PHASE #
#########################

# 1. Parameters:
#    1. dataframe of plays
#    2. index of play with lateral
#    3. cleaning method using this helper method

# 2. Locate lateral play
#    - Save copy of entry to refer back to
#    - Save copy of 'PlayDescription' feature

# 3. Separate each sentence within 'PlayDescription' feature

##########################
# DATA ORGANIZAION PHASE #
##########################

# 4. Organize each sentence within play into list
#    - All main actions will have their own rows
#      - secondary actions (such as fumbles) will be add ons for those particular rows..
#        - I will not worry about this now because I do not have a sample to test this.
#    - I imagine the data structure for this will look something like this
#      - DATA STRUCTURE: (list of lists)
#        - [[action 1], [action 2, action 2 subaction], [action 3]]

##################
# CLEANING PHASE #
##################
# 5. Create list to put cleaned single row dataframes
#    - Each action that is cleaned will be it's own single row dataframe
#      - Once the action is cleaned, I will append it to the list and at the end
#        will concatenate the list of cleaned single row dataframes to create a
#        single dataframe and return it.
# 6. Clean main play
#    - I need to decipher what the initial play was and clean it
#      - The initial play will be the first play within the list so I can 'pop'
#        that play out and focus on that element of the play.
#      - I could leverage where this play came from and use that
#        - What I mean is that if this lateral play was initially being cleaned by
#          'clean_pass_plays' then I will clean the initial play using that method.

# INITIAL PLAY TYPES AND HOW TO CLEAN THEM
# - Because the sample size that I am working with is so small, I will have to come back to this
#   and iterate this method as I get exposed to more lateral plays.

# PASSING PLAY
# - If the initial play before the lateral is a passing play,
#   there will be 2 rows representing the initial pass. ( 1. Passer, 2. Receiver )
#   - Use return dataframe to make an extra copy:
#     1. df_passer_row (will use original return dataframe copy)
#     2. df_receiver_row (will be created and added to return dataframe)
#        - Dataframe at this point should have a length of 2
#   ROWS:
#   1. PASSER
#      - FEATURES:
#        1. 'PlayDescription' - Sentence that contains play description of pass (Same as receiver row)
#        2. 'Passer' - Record name of passer
#        3. 'Receiver' - Record name of receiver as 'nan'
#            - Receiver will have own row
#        4. 'Yardage' - (Will be recorded during the cleaning of secondary actions)
#            - Passing yards equivalent to how far the ball goes beyond the line of scrimmage
#              - This is the main reason why the 'Passer' and 'Receiver' have separate rows.
#   2. RECEIVER
#      - FEATURES:
#        1. 'PlayDescription' - Sentence that contains play description of pass (Same as passer row)
#        2. 'Passer' - Record name of passer as 'nan'
#            - Passer has own row
#        3. 'Receiver' - Record name of receiver
#        4. 'Yardage' - (Depends on outcome of lateral)
#           - IF this receiver catches and laterals behind LOS ( (-) yards ):
#             - IF lateral player goes beyond LOS:
#               - This receiver gains 0 yardage
#             - ELSE (lateral player stays behind LOS)
#               - This receiver gains (-) yards where ball was lateraled
#           - ELSE (this receiver catches and laterals beyond LOS ( (+) yards )):
#             - this receiver gains yards until ball is lateraled (LOS -> this receiver laterals)


#  - How will secondary laterals be able to access these rows within new dataframe to
#    adjust yardage for both ( 1. Passer, 2. Receiver ) rows?
#    - I am thinking that because the 'Passer' row will be the first row, then I could
#      have the values of yardage I receive from these go straight to the first row of the dataframe.



# 7. Clean remaining actions within list
#    - Cycle through remaining actions within list (Loop)
#      - decipher which action is taking place using regular expressions
#        - If the action looks like a running play, clean it as a running play
#          - This is where I think things can get sticky and I will have to iterate
#            as time goes on and I get bigger samples.
#          - For now, I will only add actions that I have come across so far.
#        - Fow now, I will have the assumption that all actions after a lateral can be
#          cleaned using the 'clean_run_plays' method. (I also might just clean it entirely within this method)
#          - The type of yardage should refect the initial playtype
#            - So this may mean that the 'PlayType' feature should be something like
#              'lateral passing play' if the initial play was a passing play

# REMAINING PLAY TYPES AND HOW TO CLEAN THEM

# PASSING (INITIAL PLAY IS A PASSING PLAY)
# - FEATURES:
#   1. 'PlayDescription' - Sentence that contains the lateral play
#   2. 'Receiver' - Player who received the lateral
#   3. 'Yardage' - 2 different yardages will be captured:
#       1. Yardage for player who received this lateral
#          - FROM ( reception of the ball | LOS ) -> TO ( The end of their carry )
#            - FROM
#              - IF the start distance is behind the line of scrimmage:
#                - the start distance will be from the LOS ( 'Yardage' (Player) = end of carry - LOS )
#                  - IF player receives (+) yards:
#                    - All players who held the ball before will have a yardage value of 0
#              - ELSE the start distance is beyond the LOS:
#                - the start distance will be from where the ball was lateralled ('Yardage' (Player) = end of carry - reception of ball)
#            - TO
#              - End of carry (Will be within PlayDescription)
#       2. Distance from the line of scrimmage -> The end of their carry
#          - FROM ( LOS ) -> TO end of carry
#            - FROM
#              - Line of scrimmage (taken from 'PlayStart' feature)
#            - TO
#              - End of carry (Will be within PlayDescription)
#          - This will happen through every action found within loop (Every lateral)
#            - Passer receives passing yards based off how far the ball traveled during play
# - Add action row to return dataframe and go to the next

####################
# RETURN DATAFRAME #
####################

# 8. Return new cleaned lateral play replacement dataframe
#    1. concat all cleaned single row dataframes within list of cleaned single row dataframes.
#    2. return dataframe containing all cleaned actions

def extract_lateral_data(df_plays, play_index, main_cleaning_method):

  # 2. Locate lateral play
  original_play_copy = df_plays.loc[[play_index]].copy().reset_index(drop=True)
  play = original_play_copy['PlayDescription'].iloc[0]

  # In 'PlayDescription' all information before the "reversed" sentence is not needed.
  # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
  if play.find('REVERSED') != -1:
    play_elements = play.split(". ")
    for i in play_elements:
      if i.find("REVERSED") != -1:
        # df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
        play = ". ".join(play_elements[play_elements.index(i) + 1:])
        break

  # 3. Separate each sentence within 'PlayDescription' feature
  play_elements = play.split(". ")
  # print(f"ALL ELEMENTS IN PLAY\n{play_elements}")
  # print()

  # 4. Organize each sentence within play into list
  play_split = []
  for action in play_elements:

    # Main Actions
    if re.search(standard_play_end_pattern, action) != None:
      play_split.append([action])
      continue

    # secondary actions (such as fumbles)
    # - Expecting this to be present for next iteration

  # print(f"ALL ELEMENTS GRABBED\n{play_split}")
  # print()

  # 5. Create list to put cleaned single row dataframes
  cleaned_actions_list = []
  # Variable to keep track of where lateral spotting is
  lateral_spotting = None

  # 6. Clean main play
  initial_play_description = play_split.pop(0)
  # print(f"INITIAL PLAY\n{initial_play_description}")
  # print()

  unclean_intended_play = original_play_copy.copy()
  unclean_intended_play['PlayDescription'] = initial_play_description

  # PASSING PLAY
  if main_cleaning_method == clean_pass_plays:
    # PASSER
    df_passer_row = unclean_intended_play.copy()
    df_passer_row.loc[0, 'Passer'] = re.findall(passer_name_pattern, df_passer_row['PlayDescription'].iloc[0])[0]
    cleaned_actions_list.append(df_passer_row)
    # RECEIVER
    df_receiver_row = unclean_intended_play.copy()
    df_receiver_row.loc[0, 'Receiver'] = re.findall(receiver_pattern, df_receiver_row['PlayDescription'].iloc[0])[0][2]
    end_spotting = re.findall(standard_play_end_pattern, df_receiver_row['PlayDescription'].iloc[0])[0]
    df_receiver_row.loc[0, 'Yardage'] = int(end_spotting[2])
    lateral_spotting = end_spotting
    cleaned_actions_list.append(df_receiver_row)

  # RUSHING PLAY
  if main_cleaning_method == clean_run_plays:
    df_rusher_row = unclean_intended_play.copy()
    df_rusher_row.loc[0, 'Rusher'] = re.findall(rusher_pattern, df_rusher_row['PlayDescription'].iloc[0])[0]
    df_rusher_row.loc[0, 'Yardage'] = int(re.findall(standard_play_end_pattern, df_rusher_row['PlayDescription'].iloc[0])[0][2])
    cleaned_actions_list.append(df_rusher_row)

  # 7. Clean remaining actions within list
  while(len(play_split) > 0):
    play = play_split.pop(0)
    lateral_row = original_play_copy.copy()
    lateral_row['PlayDescription'] = play
    lateral_data = re.findall(lateral_reception_pattern, play[0])
    lateral_receiver = lateral_data[0][0]
    lateral_end_spotting = [lateral_data[0][1], lateral_data[0][2]]
    lateral_yardage = int(lateral_data[0][3])

    # Yardage for player
    # - Is lateral reception before or after line of scrimmage
    #   - See if yardage from lateral reception and line of scrimmage is (+) or (-)
    # - 'lateral_spotting' is not sustainable like this. Need to update if there are multiple laterals.
    #   right now it grabs the lateral from the main action cleaning.
    check_lateral_before_after_los = yardage_between_spottings(lateral_spotting,
                                                               re.findall(play_start_pattern, lateral_row['PlayStart'].iloc[0])[0],
                                                               cleaned_actions_list[len(cleaned_actions_list)-1]['Yardage'].iloc[0])

    # lateral after LOS
    if check_lateral_before_after_los > 0:
      lateral_row.loc[0, 'Yardage'] = yardage_between_spottings(lateral_spotting,
                                                                lateral_end_spotting,
                                                                lateral_yardage)

    # lateral before LOS
    else:
      lateral_row.loc[0, 'Yardage'] = yardage_between_spottings(re.findall(play_start_pattern, lateral_row['PlayStart'].iloc[0])[0],
                                                                lateral_end_spotting,
                                                                lateral_yardage)

      # If lateral took place before line of scrimmage &
      # ended passed line of scrimmage, then everyone apart of the lateral
      # before the line of scrimmage gets 0 yardage instead of their (-).
      if lateral_row['Yardage'].iloc[0] > 0:
        for action_row in cleaned_actions_list:
          action_row.loc[0, 'Yardage'] = 0

    if main_cleaning_method == clean_run_plays:
      # Name of rusher
      lateral_row.loc[0, 'Rusher'] = lateral_receiver
      # changin 'playtype' for lateral row
      lateral_row.loc[0, 'PlayType'] = 'lateral after run'

    if main_cleaning_method == clean_pass_plays:
      # Name of lateral receiver
      lateral_row.loc[0, 'Receiver'] = lateral_receiver
      # changin 'playtype' for lateral row
      lateral_row.loc[0, 'PlayType'] = 'lateral after pass'
      # Yardage for passer
      line_of_scrimmage = re.findall(play_start_pattern, lateral_row['PlayStart'].iloc[0])[0]
      cleaned_actions_list[0].loc[0, 'Yardage'] = yardage_between_spottings(line_of_scrimmage, lateral_end_spotting, lateral_yardage)

    # Need to see if there was another lateral, update 'lateral_spotting'
    # - Will update when time comes.
    cleaned_actions_list.append(lateral_row)

  #############
  #  DEFENSE  #
  #############

  solo_tackle = re.findall(solo_tackle_pattern, original_play_copy['PlayDescription'].iloc[0])
  if len(solo_tackle) > 0:
    cleaned_actions_list[len(cleaned_actions_list) - 1].loc[0, 'SoloTackle'] = solo_tackle[0]

  shared_tackle = re.findall(shared_tackle_pattern, original_play_copy['PlayDescription'].iloc[0])
  if len(shared_tackle) > 0:
    cleaned_actions_list[len(cleaned_actions_list) - 1].at[0, 'SharedTackle'] = shared_tackle[0]

  assisted_tackle = re.findall(assisted_tackle_pattern, original_play_copy['PlayDescription'].iloc[0])
  if len(assisted_tackle) > 0:
    cleaned_actions_list[len(cleaned_actions_list) - 1].at[0, 'AssistedTackle'] = [assisted_tackle[0][0], assisted_tackle[0][1]]

  return pd.concat(cleaned_actions_list, ignore_index=True)

### OFFENSE CLEANING METHODS

#### PASS PLAYS

In [None]:
# PURPOSE:
# - Clean all passing type plays within a given dataframe.
# INPUT PARAMETERS:
# df_plays    - dataframe - NFL plays (can include play types other than passing)
# index_start -  integer  - index where within the dataframe the method will start
#                           cleaning in ascending order.
# RETURN:
# df_plays - dataframe - the same input df_plays but with all passing play types cleaned

# NOTE:
# - I want this to work with slices of the main dataframe as well.
#   - Within slices, I think it is crucial to keep the original indexing from the main
#     dataframe for ease to put back into the original dataframe.

def clean_pass_plays(df_plays, index_start = None):

  # Adjusting df_plays to start cleaning at a specified index (index_start)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    # Locating all passing type plays within dataframe
    df_pass_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Pass')]
  else:
    # Locating all passing type plays within dataframe
    df_pass_plays = df_plays[df_plays['PlayOutcome'].str.contains('Pass')]

  for idx, play in df_pass_plays['PlayDescription'].items():

    ################
    # Play details #
    ################

    # Play Type
    df_plays.loc[idx, 'PlayType'] = 'Pass'

    # TimeOnTheClock
    TimeOnTheClock = re.findall(time_on_clock_pattern, play)
    if len(TimeOnTheClock) > 0:
      df_plays.loc[idx, 'TimeOnTheClock'] = TimeOnTheClock[0]

    ############
    # REVERSES #
    ############

    # In 'PlayDescription' all information before the "reversed" sentence is not needed.
    # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
    if play.find('REVERSED') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find("REVERSED") != -1:
          df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############################
    # REPORTING IN AS ELIGIBLE #
    ############################

    # I do not think this contains any useful data so I am going to exclude it.
    if play.find('reported in as eligible') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find('reported in as eligible') != -1:
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############
    # LATERALS #
    ############
    # - It makes sense to check for laterals here because I need to use the start
    #   of the entry with no features filled out as a template.
    #   - I do not know how this will affect laterals that are fumbled
    #   - I do not know how this will affect laterals that have penalties
    if play.lower().find('lateral') != -1:
      df_replacement_rows = extract_lateral_data(df_plays, idx, clean_pass_plays)
      df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
      index_of_last_added_row = idx + len(df_replacement_rows) - 1
      if df_pass_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_pass_plays(df_plays, index_of_last_added_row + 1)

    ###########
    # FUMBLES #
    ###########

    # Additional rows may be added after certain types of fumbled passing plays.
    # - The idea here is that, in those situations, the helping method 'extract_fumble_data'
    #   will return a small dataframe of the rows that the single play split into.
    #   - When this small dataframe is returned, it will replace the original play
    #     within the main dataframe of plays and then continue on cleaning the rest of the passing plays.

    if play.find('FUMBLES') != -1:
      main_action_patterns = [passer_name_pattern, qb_fumble_pattern, defensive_takeaway_run_pattern]
      map_who_fumbled_patterns = {
          qb_fumble_pattern : 0,
          receiver_pattern: 2
      }
      main_cleaning_method = clean_pass_plays
      df_replacement_rows = extract_fumble_data(df_plays, play, idx,
                                                main_action_patterns,
                                                map_who_fumbled_patterns,
                                                main_cleaning_method)

      # "df_plays.index.tolist().index(idx)" needed for method usage with slices of original dataframe.
      df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
      index_of_last_added_row = idx + len(df_replacement_rows) - 1
      if df_pass_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_pass_plays(df_plays, index_of_last_added_row + 1)

    ###########
    # OFFENSE #
    ###########

    # NOTE:
    # - Incomplete passes will have 'PlayOutcome' as 'Pass Incomplete' as well
    #   as yardage value being 0.0

    # Yardage gained
    yardage = re.findall(yardage_gained, play)
    if len(yardage) > 0:
      df_plays.loc[idx, 'Yardage'] = int(yardage[0])
    else:
      if df_plays.loc[idx, 'PlayOutcome'] == 'Pass Incomplete':
        df_plays.loc[idx, 'Yardage'] = 0

    # Overruled yardage gained
    official_yardage = re.findall(official_pass_yards_pattern, play)
    if len(official_yardage) > 0:
      df_plays.loc[idx, 'Yardage'] = int(official_yardage[0])

    # Formation
    Formation = re.findall(formation, play)
    if len(Formation) > 0:
      if Formation[0] == 'Aborted':
        pass
      else:
        df_plays.loc[idx, 'Formation'] = Formation[0]

    # Passer (What about spikes?)
    passer_name = re.findall(passer_name_pattern, play)
    if len(passer_name) > 0:
      df_plays.loc[idx, 'Passer'] = passer_name[0]

    receiver_name_and_passing_details = re.findall(receiver_pattern, play)
    if len(receiver_name_and_passing_details) > 0:
      df_plays.loc[idx, 'Direction'] = f"{receiver_name_and_passing_details[0][0]} {receiver_name_and_passing_details[0][1]}"
      df_plays.loc[idx, 'Receiver'] = receiver_name_and_passing_details[0][2]

    # Unique situation (offense spikes the ball)
    if play.find('spike') != -1:
      df_plays.loc[idx, 'Direction'] = 'spiked' # Direction?

    #############
    #  DEFENSE  #
    #############

    solo_tackle = re.findall(solo_tackle_pattern, play)
    if len(solo_tackle) > 0:
      if df_plays.loc[idx, 'PlayDescription'].find('pass incomplete') != -1:
        df_plays.loc[idx, 'PassDefendedBy'] = solo_tackle[0]
      else:
        df_plays.loc[idx, 'SoloTackle'] = solo_tackle[0]

    shared_tackle = re.findall(shared_tackle_pattern, play)
    if len(shared_tackle) > 0:
      if df_plays.loc[idx, 'PlayDescription'].find('pass incomplete') != -1:
        df_plays.at[idx, 'PassDefendedBy'] = shared_tackle[0]
      else:
        df_plays.at[idx, 'SharedTackle'] = shared_tackle[0]

    assisted_tackle = re.findall(assisted_tackle_pattern, play)
    if len(assisted_tackle) > 0:
      df_plays.at[idx, 'AssistedTackle'] = [assisted_tackle[0][0], assisted_tackle[0][1]]

    pressure_by = re.findall(defense_pressure_name_pattern, play)
    if len(pressure_by) > 0:
      df_plays.loc[idx, 'PressureBy'] = pressure_by[0]

    ##############
    #  INJURIES  #
    ##############

    injuries = re.findall(injury_pattern, play)
    if len(injuries) > 0:
      df_plays.at[idx, 'InjuredPlayers'] = injuries

    #############
    #  PENALTY  #
    #############

    if play.lower().find('penalty') != -1:
      # start of play
      start_string = df_plays['PlayStart'].loc[idx]
      dict_start = {
          play_start_pattern: [0, 1]
      }
      # end of play
      end_string = play
      dict_end = {
          standard_play_end_pattern: [0, 1]
      }
      # play yardage
      play_yardage_string = play
      dict_play_yardage = {
          standard_play_end_pattern: [2]
      }
      extract_penalty_data(df_plays, play, idx, start_string, dict_start, end_string, dict_end, play_yardage_string, dict_play_yardage)

  if df_pass_plays.tail(1).index.tolist()[0] == idx:
    return df_plays

#### RUN PLAYS

In [None]:
# PURPOSE:
# - Clean run play types
# INPUT PARAMETERS:
# df_plays    - dataframe - dataframe of plays
# index_start -  integer  - the starting index of the associated input dataframe
#                           to begin cleaning.
# RETURN:
# df_plays - dataframe - dataframe of plays that now has all useful run play
#                        data accessable and clean.

# NOTE:
# - Need to comment on how this is also a method being used for
#   1. fumble recoveries for yardage
#   2. fumble recoveries for touchdown
# - I also have not come across a case where a rushing play has been fumbled and someone
#   recovered the ball and scored a touchdown yet.

def clean_run_plays(df_plays, index_start = None):

  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_run_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Run')]
  else:
    df_run_plays = df_plays[df_plays['PlayOutcome'].str.contains('Run')]

  # Iterating through every run play within 'df_run_plays'
  for idx, play in df_run_plays['PlayDescription'].items():

    ################
    # Play details #
    ################

    # Play Type
    df_plays.loc[idx, 'PlayType'] = 'Run'

    # TimeOnTheClock
    TimeOnTheClock = re.findall(time_on_clock_pattern, play)
    if len(TimeOnTheClock) > 0:
      df_plays.loc[idx, 'TimeOnTheClock'] = TimeOnTheClock[0]

    # Formation
    Formation = re.findall(formation, play)
    if len(Formation) > 0:
      if Formation[0] == 'Aborted':
        pass
      else:
        df_plays.loc[idx, 'Formation'] = Formation[0]

    ############
    # REVERSES #
    ############

    # In 'PlayDescription' all information before the "reversed" sentence is not needed.
    # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
    if play.find('REVERSED') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find("REVERSED") != -1:
          df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############################
    # REPORTING IN AS ELIGIBLE #
    ############################

    # I do not think this contains any useful data so I am going to exclude it.
    if play.find('reported in as eligible') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find('reported in as eligible') != -1:
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############
    # LATERALS #
    ############
    # - I think it makes sense to check for laterals here because I need to use the start
    #   of the entry with it having not features filled out as a template.
    #   - I do not know how this will affect laterals that are fumbled
    #   - I do not know how this will affect laterals that have penalties
    if play.lower().find('lateral') != -1:
      df_replacement_rows = extract_lateral_data(df_plays, idx, clean_run_plays)
      df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
      index_of_last_added_row = idx + len(df_replacement_rows) - 1

      # returning row after the last index
      if df_run_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_run_plays(df_plays, index_of_last_added_row + 1)

    ###########
    # FUMBLES #
    ###########

    if play.find('FUMBLES') != -1:

      # - I think it would help to comment on each action added
      # - Does this catch fumble recovery touchdowns? <--
      main_action_patterns = [rusher_pattern,
                              qb_aborted_fumble_pattern,
                              qb_center_aborted_fumble_pattern,
                              qb_fumble_pattern, defensive_takeaway_run_pattern,
                              handoff_pattern]

      map_who_fumbled_patterns = {
          rusher_pattern : 0,
          qb_aborted_fumble_pattern: 0,
          qb_fumble_pattern: 0,
          defensive_takeaway_run_pattern: 0,
          handoff_pattern: 0
      }

      main_cleaning_method = clean_run_plays
      df_replacement_rows = extract_fumble_data(df_plays, play, idx,
                                                main_action_patterns,
                                                map_who_fumbled_patterns,
                                                main_cleaning_method)

      # "df_plays.index.tolist().index(idx)" needed for method usage with slices of original dataframe.
      df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
      index_of_last_added_row = idx + len(df_replacement_rows) - 1

      # returning row after the last index
      if df_run_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_run_plays(df_plays, index_of_last_added_row + 1)

    #############
    #  OFFENSE  #
    #############

    # Rusher
    rusher_patterns = [rusher_pattern, defensive_takeaway_run_pattern, qb_fumble_pattern, touchdown_after_takeaway_pattern, handoff_pattern]
    # Loop through patterns and find the first match
    for pattern in rusher_patterns:
      rusher = re.findall(pattern, play)
      if len(rusher) > 0:
        if isinstance(rusher[0], tuple):
          rusher_name = rusher[0][0]
        else:
          rusher_name = rusher[0]
        df_plays.loc[idx, 'Rusher'] = rusher_name
        break

    # Direction
    rushing_directions = ['guard', 'middle', 'tackle', 'end', 'kneels']
    for i in rushing_directions:
      if play.find(i) != -1:
        start = play.find(rusher_name) + len(rusher_name) + 1
        end = play.find(i) + len(i)
        df_plays.loc[idx, 'Direction'] = play[start:end]
        break

    # Yardage gained
    yardage = re.findall(yardage_gained, play)
    if len(yardage) > 0:
      df_plays.loc[idx, 'Yardage'] = int(yardage[0])
    elif math.isnan(df_plays['Yardage'].loc[idx]):
      df_plays.loc[idx, 'Yardage'] = 0

    # Find yardage gained during handoff.
    # - Yardage gained (as obvious as it sounds) is
    #   yardage gained from the line of scrimmage.
    if play.find('Handoff') != -1:
      start = re.findall(play_start_pattern, df_run_plays['PlayStart'].loc[idx])[0]
      end = re.findall(handoff_pattern, play)[0]
      start_territory = start[0]
      start_yardage = int(start[1])
      end_territory = end[1]
      end_yardage = int(end[2])
      handoff_yardage_gained = int(end[3])

      if start_territory == end_territory:
        if start_yardage > end_yardage:
          if handoff_yardage_gained > 0:
            df_plays.loc[idx, 'Yardage'] = start_yardage - end_yardage
          else:
            df_plays.loc[idx, 'Yardage'] = end_yardage - start_yardage
        else:
          if handoff_yardage_gained > 0:
            df_plays.loc[idx, 'Yardage'] = end_yardage - start_yardage
          else:
            df_plays.loc[idx, 'Yardage'] = start_yardage - end_yardage
      else:
        if handoff_yardage_gained > 0:
          df_plays.loc[idx, 'Yardage'] = 100 - end_yardage - start_yardage
        else:
          df_plays.loc[idx, 'Yardage'] = end_yardage - (100 - start_yardage)

    #############
    #  DEFENSE  #
    #############

    solo_tackle = re.findall(solo_tackle_pattern, play)
    if len(solo_tackle) > 0:
      df_plays.loc[idx, 'SoloTackle'] = solo_tackle[0]

    shared_tackle = re.findall(shared_tackle_pattern, play)
    if len(shared_tackle) > 0:
      df_plays.at[idx, 'SharedTackle'] = shared_tackle[0]

    assisted_tackle = re.findall(assisted_tackle_pattern, play)
    if len(assisted_tackle) > 0:
      df_plays.at[idx, 'AssistedTackle'] = [assisted_tackle[0][0], assisted_tackle[0][1]]

    ##############
    #  INJURIES  #
    ##############

    injuries = re.findall(injury_pattern, play)
    if len(injuries) > 0:
      df_plays.at[idx, 'InjuredPlayers'] = injuries

    #############
    #  PENALTY  #
    #############

    if play.lower().find('penalty') != -1:
      # start of play
      start_string = df_plays['PlayStart'].loc[idx]
      dict_start = {
          play_start_pattern: [0, 1]
      }
      # end of play
      end_string = play
      dict_end = {
          standard_play_end_pattern: [0, 1]
      }
      # play yardage
      play_yardage_string = play
      dict_play_yardage = {
          standard_play_end_pattern: [2]
      }
      extract_penalty_data(df_plays, play, idx, start_string, dict_start, end_string, dict_end, play_yardage_string, dict_play_yardage)

    # Return if the last play has been cleaned in 'df_run_plays'
    if df_run_plays.tail(1).index.tolist()[0] == idx:
      return df_plays

####2PT CONVERSIONS

In [None]:
# I NEED A LARGER SAMPLE SIZE FOR MORE PLAYS
# - I need a sample size that has fumbled plays (if that's possible?)
# - I need a sample size that has interception (if that's possible?)
# - I need a sample size with injuries (as dark as that may sound)

def cleaning_2pt_conversion_plays(df_plays, index_start = None):

  # Cut 'df_plays' to begin from 'index_start' to the last penalty play available in dataframe
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start]
    df_2pt_conversion_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('2PT Conversion', case=False)]
  else:
    df_2pt_conversion_plays = df_plays[df_plays['PlayOutcome'].str.contains('2PT Conversion', case=False)]

  # Iterating through every penalty play within 'df_2pt_conversion_plays'
  for idx, play in df_2pt_conversion_plays['PlayDescription'].items():

    ###################
    # PASSING ATTEMPT #
    ###################

    pass_2ptc = re.findall(tp_conversion_pass_pattern, play)
    if len(pass_2ptc) > 0:
      df_plays.loc[idx, 'Passer'] = pass_2ptc[0][0]
      df_plays.loc[idx, 'Receiver'] = pass_2ptc[0][1]
      df_plays.loc[idx, 'PlayType'] = '2PT Conversion Pass'

    ###################
    # RUSHING ATTEMPT #
    ###################

    rush_2ptc = re.findall(tp_conversion_rush_pattern, play)
    if len(rush_2ptc) > 0:
      df_plays.loc[idx, 'Rusher'] = rush_2ptc[0]
      df_plays.loc[idx, 'PlayType'] = '2PT Conversion Run'
      # Direction
      rushing_directions = ['guard', 'middle', 'tackle', 'end', 'kneels']
      for i in rushing_directions:
        if play.find(i) != -1:
          start = play.find('rushes') + len('rushes') + 1
          end = play.find(i) + len(i)
          df_plays.loc[idx, 'Direction'] = play[start:end]
          break

    #############
    #  PENALTY  #
    #############

    if play.lower().find('penalty') != -1:
      # start of play
      start_string = df_plays['PlayStart'].loc[idx]
      dict_start = {
          play_start_pattern: [0, 1]
      }
      # end of play
      end_string = play
      dict_end = {
          standard_play_end_pattern: [0, 1]
      }
      # play yardage
      play_yardage_string = play
      dict_play_yardage = {
          standard_play_end_pattern: [2]
      }
      extract_penalty_data(df_plays, play, idx, start_string, dict_start, end_string, dict_end, play_yardage_string, dict_play_yardage)

  return df_plays

###DEFENSE CLEANING METHODS

#### INTERCEPTIONS

In [None]:
# PURPOSE:
# - Clean intercepted plays
# INPUT PARAMETERS:
# df_plays    - dataframe - dataframe of plays
# index_start -  integer  - the starting index of the associated input dataframe
#                           to begin cleaning.
# RETURN:
# df_plays - dataframe - dataframe of plays that now has all useful intercepted play
#                        data accessible and clean.

# ROUGH DESGIN
# 1. Narrow dataframe using 'index_start'
#    - This is a recursive method, the narrowing will get smaller and
#      smaller until all 'intercepted' type plays have been cleaned.
# 2. Grab first 'intercepted' play from narrowed dataframe
# 3. Create 2 single row dataframes.
#    a. intended play
#    b. yardage after interception
# 4. Break down play into sentences and clean
#    - Depending on the sentence within the play, will determine which
#      single row dataframe it will go to.
# 5. Combine both dataframes of cleaned data into one dataframe
# 6. Replace old play row with new cleaned multi row
# 7. return clean_interceped_plays( x , y)
#    - x = updated df_plays
#    - y = index directly after the last clean added row

# Concerns:
# ~ 1 ~
# PLAY SNIP - "(9:53) (Shotgun) D.Watson pass short left intended for E.Moore INTERCEPTED by D.Hill (Z.Carter) at CIN 30."
# - The concern here is (Z.Carter)
#   - I do not know what to categorize this player as? I believe that he had an impact on the play and could possibly be a reason
#     that D.Hill was able to intercept the ball.
#     - Should I create a feature called "ImpactPlayer" or something?
# ~ 2 ~
# PLAY SNIP - "(4:16) (Shotgun) J.Allen pass deep middle intended for S.Diggs INTERCEPTED by J.Whitehead [Q.Williams] at NYJ -1. Touchback."
# - The concern here is 'touchback'
#   - I have no idea what to do with that
# ~ 3 ~
#`- I do not have anything set in play to handle fumbles? What happens if a QB fumbles, recovers, then throws an interception? -> Then player that intercepted fumbles?
# ~ 4 ~
# - There are 2 rows within this sinlge play. (Intended throwing play, yardage after interception)
#   - For both of these rows that represent a single play, they both state that the throwing team has possession
#     - I do not know how this is going to effect the future with analysis on data
# - -----> GRAB DATA FOR TOUCHBACKS <-----
# - -----> GRAB DATA FOR PLAYTYPE INTERCEPTION FOR YARDAGE <-----

def clean_intercepted_plays(df_plays, index_start = None):

  # Will cut df_plays starting from index_start (narrowing our search space)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_intercepted_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Interception')]
  else:
    df_intercepted_plays = df_plays[df_plays['PlayOutcome'].str.contains('Interception')]

  # Exit case (If no more 'Interception' type plays are found)
  if df_intercepted_plays.empty:
    return df_plays

  # Retrieve the index and 'PlayDescription' of the first intercepted play in 'df_intercepted_plays'
  # - Process one play per iteration in the recursive method
  idx = df_intercepted_plays.index[0]
  play = df_plays['PlayDescription'].loc[idx]

  ############
  # REVERSES #
  ############

  # In 'PlayDescription' all information before the "reversed" sentence is not needed.
  # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
  if play.find('REVERSED') != -1:
    play_elements = play.split(". ")
    for i in play_elements:
      if i.find("REVERSED") != -1:
        df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
        play = ". ".join(play_elements[play_elements.index(i) + 1:])
        break

  ###########
  # FUMBLES #
  ###########
  # - I am worried about the types of interception fumbles that can happen that I have yet to see.
  #   - Such as a fumble by the QB then throws and interception

  if play.find('FUMBLES') != -1:
    outcome = df_plays['PlayOutcome'].loc[idx]
    df_plays.at[idx, 'PlayOutcome'] = 'Pass'
    main_action_patterns = [interception_name_pattern, defensive_takeaway_run_pattern]
    map_who_fumbled_patterns = None
    main_cleaning_method = clean_pass_plays
    df_replacement_rows = extract_fumble_data(df_plays, play, idx,
                                              main_action_patterns,
                                              map_who_fumbled_patterns,
                                              main_cleaning_method)
    df_replacement_rows['PlayOutcome'] = outcome
    df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
    df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
    df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
    index_of_last_added_row = idx + len(df_replacement_rows) - 1
    if df_intercepted_plays.tail(1).index.tolist()[0] == idx:
      return df_plays
    else:
      return clean_intercepted_plays(df_plays, index_of_last_added_row + 1)

  # Create 2 single row dataframes.
  # 1. intended play
  df_intended_play = df_plays.loc[idx].copy()
  df_intended_play = pd.DataFrame([df_intended_play], columns=df_plays.columns)
  df_intended_play.reset_index(drop=True, inplace=True)
  df_intended_play['PlayDescription'] = 'nan'
  # 2. yardage after interception
  df_yardage_after_interception = df_plays.loc[idx].copy()
  df_yardage_after_interception = pd.DataFrame([df_yardage_after_interception], columns=df_plays.columns)
  df_yardage_after_interception.reset_index(drop=True, inplace=True)
  df_yardage_after_interception['PlayDescription'] = 'nan'

  # break down play by sentences.
  play_elements = play.split(". ")

  penalties = [] # <- Is this needed anymore?

  # Split play elements
  # 1. intended play
  #    - Grab all elements leading up to the sentence containing interception
  #      - Clean using 'clean_pass_plays' method
  # 2. actions after interception
  #    - Grab all elements after sentence containing interception
  #      - Clean using 'clean_run_plays' method
  #      - Clean using 'clean_touchdown_plays' method..?

  # Separating play into
  # 1. intended passing play
  # 2. remaining actions following interception
  for i in play_elements:
    if i.lower().find('intercepted') != -1:
      intended_play_playdescription = ". ".join(play_elements[:play_elements.index(i)+1])
      after_interception_playdescription = ". ".join(play_elements[play_elements.index(i)+1:])
      # print(idx)
      # print(intended_play_playdescription)
      # print(after_interception_playdescription)
      break

  #################
  # INTENDED PLAY #
  #################

  df_intended_play['PlayDescription'] = intended_play_playdescription
  df_intended_play['PlayOutcome'] = 'Pass'
  df_intended_play = clean_pass_plays(df_intended_play)
  df_intended_play['PlayOutcome'] =  df_plays['PlayOutcome'].loc[idx]

  # Intercepted by
  intercepted_by = re.findall(interception_name_pattern, intended_play_playdescription)
  if len(intercepted_by) > 0:
    df_intended_play['InterceptedBy'] = intercepted_by[0]
    # - During intercepted plays, The intended play portion of the play description is cleaned
    #   by the regular pass cleaning method. A defensive player awarded with a pass defend
    #   during an intercepted play is formatted the exact same as a player awarded a solo
    #   tackle during a completed pass play. I will leverage that here and move the player
    #   to the correct feature ('SoloTackle' -> 'PassDefendedBy')
    if df_intended_play['SoloTackle'].iloc[0] != 'nan':
      df_intended_play.at[0, 'PassDefendedBy'] = (intercepted_by[0], df_intended_play['SoloTackle'].iloc[0])
      df_intended_play['SoloTackle'] = 'nan'
    else:
      df_intended_play['PassDefendedBy'] = intercepted_by[0]

  #############################################################
  # YARDAGE AFTER INTERCEPTION / TOUCHDOWN AFTER INTERCEPTION #
  #############################################################
  # - I need this to be able to clean everything.
  #   - I need it to be able to clean regular interceptions for yardage (X)
  #   - I need it to be able to clean regular interceptions for yardage and then fumbled (X)
  #   - I need it to be able to clean interceptions resulting in multiple fumbles (X)
  #   - I need it to be able to clean interceptions for touchdowns (X)

  #   - I need it to be able to clean a fumbled interception that is recoverd for a touchdown
  #   - I need this to account for penalties

  for action in [touchdown_after_takeaway_pattern, defensive_takeaway_run_pattern]:
    yardage_after_interception = re.findall(action, after_interception_playdescription)
    if len(yardage_after_interception) > 0:
      df_yardage_after_interception['PlayDescription'] = after_interception_playdescription

      # Flipping team with possession when the play transitions from one team with possession to the other.
      if dict_teams_2.get(df_yardage_after_interception['TeamWithPossession'].iloc[0]) == df_yardage_after_interception['HomeTeam'].iloc[0]:
        df_yardage_after_interception['TeamWithPossession'] = dict_teams.get(df_yardage_after_interception['AwayTeam'].iloc[0])
      else:
        df_yardage_after_interception['TeamWithPossession'] = dict_teams.get(df_yardage_after_interception['HomeTeam'].iloc[0])

      # Ideally I would like to send this off to another method.
      if action == touchdown_after_takeaway_pattern:
        df_yardage_after_interception['IsScoringPlay'] = 1
      # else:
      df_yardage_after_interception['PlayOutcome'] = 'Run'
      df_yardage_after_interception = clean_run_plays(df_yardage_after_interception)
      df_yardage_after_interception['PlayOutcome'] =  df_plays['PlayOutcome'].loc[idx]

      # The 'clean_run_plays' method will change 'PlayType' so that is why I am
      # putting it down here.
      df_yardage_after_interception['PlayType'] = 'Run After Interception'
      break



  # #############
  # # PENALTIES #
  # #############

  # if len(penalties) > 0:
  #   # start of play
  #   start_string = play
  #   dict_start = {
  #       interception_play_end_pattern: [0, 1]
  #   }
  #   # end of play
  #   end_string = play
  #   dict_end = {
  #       defensive_takeaway_run_pattern: [1, 2]
  #   }
  #   # play yardage
  #   play_yardage_string = play
  #   dict_play_yardage = {
  #       defensive_takeaway_run_pattern: [3]
  #   }
  #   extract_penalty_data(df_yardage_after_interception, play, df_yardage_after_interception.index[0], start_string, dict_start, end_string, dict_end, play_yardage_string, dict_play_yardage)

  # print()



  #############################
  # NEW REPLACEMENT DATAFRAME #
  #############################

  # combine both single row dataframes into one
  if df_yardage_after_interception['PlayDescription'].iloc[0] == 'nan':
    df_cleaned_replacement = df_intended_play
  else:
    df_cleaned_replacement = pd.concat([df_intended_play, df_yardage_after_interception], ignore_index=True)

  # Replace old row with new cleaned dataframe
  df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
  df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
  df_plays = pd.concat([df_before_row, df_cleaned_replacement, df_after_row], ignore_index=True)

  # If this is the last play in the dataset
  if df_intercepted_plays.tail(1).index.tolist()[0] == idx:
    return df_plays
  else:
    return clean_intercepted_plays(df_plays, idx+len(df_cleaned_replacement))

#### SACKS


In [None]:
def clean_sacked_plays(df_plays, index_start = None):

  # Will cut df_plays starting from index_start (narrowing our search space)
  if index_start != None:
    df_plays_adjusted = df_plays.iloc[index_start:]
    df_sacked_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Sack')]
  else:
    df_sacked_plays = df_plays[df_plays['PlayOutcome'].str.contains('Sack')]

  for idx, play in df_sacked_plays['PlayDescription'].items():

    ################
    # Play details #
    ################

    # TimeOnTheClock
    TimeOnTheClock = re.findall(time_on_clock_pattern, play)
    if len(TimeOnTheClock) > 0:
      df_plays.loc[idx, 'TimeOnTheClock'] = TimeOnTheClock[0]

    ############
    # REVERSES #
    ############

    # In 'PlayDescription' all information before the "reversed" sentence is not needed.
    # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
    if play.find('REVERSED') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find("REVERSED") != -1:
          df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ###########
    # FUMBLES #
    ###########

    if play.find('FUMBLES') != -1:

      main_action_patterns = [passer_name_pattern,
                              defensive_takeaway_run_pattern,
                              touchdown_after_takeaway_pattern]

      map_who_fumbled_patterns = {
          passer_name_pattern : 0,
      }

      main_cleaning_method = clean_sacked_plays
      df_replacement_rows = extract_fumble_data(df_plays, play, idx,
                                                main_action_patterns,
                                                map_who_fumbled_patterns,
                                                main_cleaning_method)

      # "df_plays.index.tolist().index(idx)" needed for method usage with slices of original dataframe.
      df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
      index_of_last_added_row = idx + len(df_replacement_rows) - 1
      # returning row after the last index
      if df_sacked_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_sacked_plays(df_plays, index_of_last_added_row + 1)

    #############
    #  OFFENSE  #
    #############

    # Formation
    Formation = re.findall(formation, play)
    if len(Formation) > 0:
      if Formation[0] == 'Aborted':
        pass
      else:
        df_plays.loc[idx, 'Formation'] = Formation[0]

    # Sacked Passer
    sacked_passer_name = re.findall(passer_name_pattern, play)
    if len(sacked_passer_name) > 0:
      df_plays.loc[idx, 'Passer'] = sacked_passer_name[0]

    # Yardage lost
    yardage = re.findall(yardage_from_sack, play)
    if len(yardage) > 0:
      df_plays.loc[idx, 'Yardage'] = int(yardage[0])

    #############
    #  DEFENSE  #
    #############

    # Solo sack (One person sacked the passer)
    solo_sack = re.findall(solo_tackle_pattern, play)
    if len(solo_sack) > 0:
      df_plays.loc[idx, 'SackedBy'] = solo_sack[0]
      df_plays.loc[idx, 'SoloTackle'] = solo_sack[0]

    # Split sack (A sack was given to the passer by multiple defenders)
    split_sack = re.findall(split_sack_pattern, play)
    if len(split_sack) > 0:
      df_plays.at[idx, 'SackedBy'] = split_sack[0]
      df_plays.at[idx, 'AssistedTackle'] = split_sack[0]

    ##############
    #  INJURIES  #
    ##############

    injuries = re.findall(injury_pattern, play)
    if len(injuries) > 0:
      df_plays.at[idx, 'InjuredPlayers'] = injuries

    #############
    #  PENALTY  #
    #############

    if play.lower().find('penalty') != -1:
      # start of play
      start_string = df_plays['PlayStart'].loc[idx]
      dict_start = {
          play_start_pattern: [0, 1]
      }
      # end of play
      end_string = play
      dict_end = {
          standard_play_end_pattern: [0, 1]
      }
      # play yardage
      play_yardage_string = play
      dict_play_yardage = {
          standard_play_end_pattern: [2]
      }
      extract_penalty_data(df_plays, play, idx, start_string, dict_start, end_string, dict_end, play_yardage_string, dict_play_yardage)

    if df_sacked_plays.tail(1).index.tolist()[0] == idx:
      return df_plays

### SPECIAL TEAMS CLEANING METHODS

#### PUNTS

In [None]:
# A punt playtype will be split into 2 or more rows
#   1. The Punt
#      - 'PlayType'
#         - Punt
#      - 'Punter'
#      - 'LongSnapper'
#   2. The Punt Return
#      - 'PlayType'
#         - Punt Return
#      - 'PlayOutcome'
#         - x yard punt return
#         - fair catch
#         - touchback
#         - out of bounds
#         - downed
#      - 'Returner'
#      - 'Receiver'
#      - 'Yardage'
#      - 'TackleBy1'
#      - 'TackleBy2'
#      - 'DownedBy'

# I need to figure out a fake punt
# I need to figure out a punt that has been blocked
# I need to figure out what to do when a fumble happens
# I need to figure out what to do when a touchdown happens
# Maybe in the future, to make this more space friendly, I can combine features
# - Such as 'Punter' & 'LongSnapper' OR 'TackleBy1' & 'DownedBy'
#   OR 'Returner' & 'Receiver'

def clean_punt_plays(df_plays, index_start = None):

  # Will cut df_plays starting from index_start (narrowing our search space)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_punt_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Punt')]
  else:
    df_punt_plays = df_plays[df_plays['PlayOutcome'].str.contains('Punt')]

  if df_punt_plays.empty:
    return df_plays

  # Retrieve the index and 'PlayDescription' of the first punt play in 'df_punt_plays'
  # - Process one play per iteration in the recursive method
  idx = df_punt_plays.index[0]
  play = df_plays['PlayDescription'].loc[idx]
  row_copy = df_plays.loc[idx].copy()

  ############
  # REVERSES #
  ############

  # In 'PlayDescription' all information before the "reversed" sentence is not needed.
  # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
  if play.find('REVERSED') != -1:
    play_elements = play.split(". ")
    for i in play_elements:
      if i.find("REVERSED") != -1:
        df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
        play = ". ".join(play_elements[play_elements.index(i) + 1:])
        break

  ###########
  # FUMBLES #
  ###########

  if play.find('FUMBLES') != -1:
    main_action_patterns = [punting_pattern, punt_return_pattern, defensive_takeaway_run_pattern, handoff_pattern]

    map_who_fumbled_patterns = {
        punt_return_pattern: 0,
        defensive_takeaway_run_pattern: 0,
        handoff_pattern: 0
    }

    main_cleaning_method = clean_punt_plays
    df_replacement_rows = extract_fumble_data(df_plays, play, idx,
                                              main_action_patterns,
                                              map_who_fumbled_patterns,
                                              main_cleaning_method)

    df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
    df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
    df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
    index_of_last_added_row = idx + len(df_replacement_rows) - 1

    if df_punt_plays.tail(1).index.tolist()[0] == idx:
      return df_plays
    else:
      return clean_punt_plays(df_plays, index_of_last_added_row + 1)

  # Create 2 single row dataframes.
  # 1. The Punt
  df_punt = row_copy
  df_punt = pd.DataFrame([df_punt], columns=df_plays.columns)
  df_punt.reset_index(drop=True, inplace=True)
  df_punt['PlayDescription'] = 'nan'
  # 2. The Punt Return
  df_punt_return = row_copy
  df_punt_return = pd.DataFrame([df_punt_return], columns=df_plays.columns)
  df_punt_return.reset_index(drop=True, inplace=True)
  df_punt_return['PlayDescription'] = 'nan'

  #############
  # PLAY TIME #
  #############

  time = re.findall(time_on_clock_pattern, play)
  if len(time) > 0:
    df_punt.loc[0, 'TimeOnTheClock'] = time[0]

  # break down play by sentences.
  play_elements = play.split(". ")

  accepted_penalties = []
  declined_penalties = []

  for i in play_elements:

    ########
    # PUNT #
    ########

    # All data needed for first row in replacement dataframe
    punt = re.findall(punting_pattern, i)
    if len(punt) > 0:
      df_punt['PlayType'] = 'Punt'
      df_punt['PlayDescription'] = i
      df_punt['Kicker'] = punt[0][0]
      df_punt['Yardage'] = int(punt[0][1])
      df_punt['LongSnapper'] = punt[0][4]
      # Touchback
      if i.find('Touchback') != -1:
        df_punt['PlayOutcome'] = 'Touchback'
        continue
      # Out of bounds
      if i.find('out of bounds') != -1:
        df_punt['PlayOutcome'] = 'out of bounds'
        continue
      # Downed by
      if i.find('downed by') != -1:
        df_punt['PlayOutcome'] = 'downed'
        downed_by = re.findall(kick_downed_by_pattern, i)
        df_punt['DownedBy'] = downed_by[0][downed_by[0].find("-")+1:] # Need to get abreviation of team name away from player name (e.g. IND-G.Stuard)
        continue
      # fair catch
      if i.find('fair catch') != -1:
        df_punt['PlayOutcome'] = 'fair catch'
        fair_catch_by = re.findall(punt_fair_catch_pattern, i)
        df_punt['Returner'] = fair_catch_by[0]
        continue
      continue

    ######################################
    # PUNT RETURN (Including touchdowns) #
    ######################################

    # All data needed for the second row within replacement dataframe
    # - Second row only needed when there is a punt return for yardage
    # - I think I am going to run into trouble if there is a fumble recovery for yardage
    punt_return_patterns = [punt_return_pattern, touchdown_after_takeaway_pattern]
    for return_pattern in punt_return_patterns:
      punt_return = re.findall(return_pattern, i)
      if len(punt_return) > 0:
        df_punt_return['PlayDescription'] = i
        df_punt_return['PlayOutcome'] = 'Run'
        # Change team with possession on punt returns to the team that is returning the ball
        if df_punt['TeamWithPossession'].iloc[0] == dict_teams.get(df_punt['AwayTeam'].iloc[0]):
          df_punt_return.loc[0, 'TeamWithPossession'] = dict_teams.get(df_punt['HomeTeam'].iloc[0])
        else:
          df_punt_return.loc[0, 'TeamWithPossession'] = dict_teams.get(df_punt['AwayTeam'].iloc[0])
        df_punt_return = clean_run_plays(df_punt_return)
        df_punt_return['PlayOutcome'] = row_copy['PlayOutcome']
        df_punt_return['PlayType'] = 'Punt Return'
        df_punt_return['Rusher'] = 'nan'
        if return_pattern == punt_return_pattern:
          df_punt_return['Returner'] = punt_return[0][0]
        else:
          df_punt_return['Returner'] = punt_return[0]
        break

    #############
    #  PENALTY  #
    #############

    if i.find('PENALTY') != -1:
      accepted_penalties.append(i)
    if i.find('Penalty') != -1:
      declined_penalties.append(i)

  # To keep consistency with plays being split into multiple rows,
  # I think I should have the penalties placed with the first row (punt) and
  # not the following (punt return)
  # - I don't know what to do right now.
  #   1. Punts
  #      - A punter loses yardage on their kick when:
  #        1. there is a punt return
  #        2. I think a penalty as well..? (I need more data to see)
  #   2. Punt returns
  #      - A punt return loses yardage if there is a penalty on the return team
  # - I think for now I am only going to focus on punt returns.
  #   When the time comes, I will make adjustments for yardage on punts.
  if play.lower().find('penalty') != -1:
    if df_punt_return['Returner'].iloc[0] == 'nan':
      df_punt.at[0, 'AcceptedPenalty'] = accepted_penalties
      df_punt.at[0, 'DeclinedPenalty'] = declined_penalties
    else:
      # start of play
      start_string = play
      dict_start = {
          punting_pattern: [2, 3]
      }
      # end of play
      end_string = play
      dict_end = {
          defensive_takeaway_run_pattern: [1, 2]
      }
      # play yardage
      play_yardage_string = play
      dict_play_yardage = {
          defensive_takeaway_run_pattern: [3]
      }
      extract_penalty_data(df_punt_return, play, df_punt_return.index[0], start_string, dict_start, end_string, dict_end, play_yardage_string, dict_play_yardage)

  #############################
  # NEW REPLACEMENT DATAFRAME #
  #############################

  if df_punt_return['PlayDescription'].iloc[0] == 'nan':
    df_replacement_rows = df_punt
  elif df_punt['PlayDescription'].iloc[0] == 'nan': # Will happen during fumbled punt returns.
    df_replacement_rows = df_punt_return
  else:
    df_replacement_rows = pd.concat([df_punt, df_punt_return], ignore_index=True)

  df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
  df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
  df_plays = pd.concat([df_before_row, df_replacement_rows, df_after_row], ignore_index=True)

  if df_punt_plays.tail(1).index.tolist()[0] == idx:
    return df_plays
  else:
    return clean_punt_plays(df_plays, idx+len(df_replacement_rows))

#### KICKOFFS

In [None]:
# A kickoff playtype will be split into 1 or more rows

# I need to figure out an onside kick (recovered by kicking team)
# I need to figure out fumbled kickoff returns
# I need to figure out returns for a touchdown
# injuries?

# Method can mirror punts method.

def clean_kickoff_plays(df_plays, index_start = None):

  # Will cut df_plays starting from index_start (narrowing our search space)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_kickoff_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('kickoff', case=False)]
  else:
    df_kickoff_plays = df_plays[df_plays['PlayOutcome'].str.contains('kickoff', case=False)]

  # exit case
  if df_kickoff_plays.empty:
    return df_plays

  # Retrieve the index and 'PlayDescription' of the first kickoff play in 'df_kickoff_plays'
  # - Process one play per iteration in the recursive method
  idx = df_kickoff_plays.index[0]
  play = df_plays['PlayDescription'].loc[idx]

  ############
  # REVERSES #
  ############

  # In 'PlayDescription' all information before the "reversed" sentence is not needed.
  # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
  if play.find('REVERSED') != -1:
    play_elements = play.split(". ")
    for i in play_elements:
      if i.find("REVERSED") != -1:
        df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
        play = ". ".join(play_elements[play_elements.index(i) + 1:])
        break

  ###########
  # FUMBLES #
  ###########

  if play.find('FUMBLES') != -1:
    main_action_patterns = [kickoff_pattern, kick_return_pattern, defensive_takeaway_run_pattern, handoff_pattern]

    map_who_fumbled_patterns = {
        kick_return_pattern: 0,
        defensive_takeaway_run_pattern: 0,
        handoff_pattern: 0
    }

    main_cleaning_method = clean_kickoff_plays
    df_replacement_rows = extract_fumble_data(df_plays, play, idx,
                                              main_action_patterns,
                                              map_who_fumbled_patterns,
                                              main_cleaning_method)

    df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
    df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
    df_plays = pd.concat([df_before, df_replacement_rows, df_after], ignore_index=True)
    index_of_last_added_row = idx + len(df_replacement_rows) - 1

    # returning row after the last index
    if df_kickoff_plays.tail(1).index.tolist()[0] == idx:
      return df_plays
    else:
      return clean_kickoff_plays(df_plays, index_of_last_added_row + 1)

  # Create 2 single row dataframes.
  # 1. The Kickoff
  df_kickoff = df_plays.loc[idx].copy()
  df_kickoff = pd.DataFrame([df_kickoff], columns=df_plays.columns)
  df_kickoff.reset_index(drop=True, inplace=True)
  df_kickoff['PlayDescription'] = 'nan'
  # 2. The Kickoff Return
  df_kickoff_return = df_plays.loc[idx].copy()
  df_kickoff_return = pd.DataFrame([df_kickoff_return], columns=df_plays.columns)
  df_kickoff_return.reset_index(drop=True, inplace=True)
  df_kickoff_return['PlayDescription'] = 'nan'

  # break down play by sentences.
  play_elements = play.split(". ")

  accepted_penalties = []
  declined_penalties = []

  for i in play_elements:

    ###########
    # KICKOFF #
    ###########

    kickoff = re.findall(kickoff_pattern, i)
    if len(kickoff) > 0:
      df_kickoff['PlayType'] = 'Kickoff'
      df_kickoff['PlayDescription'] = i

      # Change team with possession on kickoff to the team that is kicking
      if df_kickoff['TeamWithPossession'].iloc[0] == dict_teams.get(df_kickoff['AwayTeam'].iloc[0]):
        df_kickoff.loc[0, 'TeamWithPossession'] = dict_teams.get(df_kickoff['HomeTeam'].iloc[0])
      else:
        df_kickoff.loc[0, 'TeamWithPossession'] = dict_teams.get(df_kickoff['AwayTeam'].iloc[0])

      df_kickoff['Kicker'] = kickoff[0][0]
      df_kickoff['Yardage'] = int(kickoff[0][1])
      if i.find('Touchback') != -1:
        df_kickoff['PlayOutcome'] = 'Touchback'
        continue
      # I need to figure out what the difference will be when the kicking team recovers
      if i.find('onside') != -1:
        df_kickoff['PlayOutcome'] = 'onside'
        downed_by = re.findall(kick_downed_by_pattern, i)
        if len(downed_by) > 0:
          df_kickoff['DownedBy'] = downed_by[0][downed_by[0].find("-")+1:]
        continue
      continue

    #########################################
    # KICKOFF RETURN (Including touchdowns) #
    #########################################

    kick_return_patterns = [kick_return_pattern, touchdown_after_takeaway_pattern]
    for return_pattern in kick_return_patterns:
      kick_return = re.findall(return_pattern, i)
      if len(kick_return) > 0:
        df_kickoff_return['PlayDescription'] = i
        df_kickoff_return['PlayOutcome'] = 'Run'
        df_kickoff_return = clean_run_plays(df_kickoff_return)
        df_kickoff_return['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]
        df_kickoff_return['PlayType'] = 'Kickoff Return'
        df_kickoff_return['Rusher'] = 'nan'
        df_kickoff_return['Returner'] = kick_return[0][0] # I think this will be a problem once I get a dataset with kick return touchdowns
        break

    #############
    #  PENALTY  #
    #############

    # Accepted Penalty
    if i.find('PENALTY') != -1:
      accepted_penalties.append(i)

    # Declined Penalty
    if i.find('Penalty') != -1:
      declined_penalties.append(i)

  if play.lower().find('penalty') != -1:
    if df_kickoff_return['Returner'].iloc[0] == 'nan':
      df_kickoff.at[0, 'AcceptedPenalty'] = accepted_penalties
      df_kickoff.at[0, 'DeclinedPenalty'] = declined_penalties
    else:
      # start of play
      start_string = play
      dict_start = {
          kickoff_pattern: [4, 5]
      }
      # end of play
      end_string = play
      dict_end = {
          kick_return_pattern: [1, 2]
      }
      # play yardage
      play_yardage_string = play
      dict_play_yardage = {
          kick_return_pattern: [3]
      }
      extract_penalty_data(df_kickoff_return, play, df_kickoff_return.index[0], start_string, dict_start, end_string, dict_end, play_yardage_string, dict_play_yardage)

  #############################
  # NEW REPLACEMENT DATAFRAME #
  #############################

  if df_kickoff_return['PlayDescription'].iloc[0] == 'nan':
    df_replacement_rows = df_kickoff
  elif df_kickoff['PlayDescription'].iloc[0] == 'nan': # Will happen during fumbled kickoff returns
    df_replacement_rows = df_kickoff_return
  else:
    df_replacement_rows = pd.concat([df_kickoff, df_kickoff_return], ignore_index=True)

  df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
  df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
  df_plays = pd.concat([df_before_row, df_replacement_rows, df_after_row], ignore_index=True)

  if df_kickoff_plays.tail(1).index.tolist()[0] == idx:
    return df_plays
  else:
    return clean_kickoff_plays(df_plays, idx+len(df_replacement_rows))

###SCORING CLEANING METHODS

#### TOUCHDOWNS

In [None]:
# Still need to figure out whether or not plays that have multiple rows will all have
# 'IsScoringDrive' = 1, 'IsScoringDrive' = 1, 'PlayOutcome' = *teamname* Touchdown
# - The reasoning to not have this is because if a qb was to throw a pick 6,
#   it wouldn't count as a "Scoring Drive" for them but the opposing team.
# - For consistency, I will have the entire play have
#   'IsScoringDrive' = 1, 'IsScoringDrive' = 1, 'PlayOutcome' = *teamname* Touchdown
# - Need larger dataset to include all other touchdown plays such as kickoff returns and field goal returns

# - I need to clean this method an condense. I feel like I can probably condense this a lot.

def clean_touchdown_plays(df_plays, index_start=None):

  # Cut 'df_plays' to begin from 'index_start' to the last touchdown play available in dataframe
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_touchdown_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Touchdown')]
  else:
    df_touchdown_plays = df_plays[df_plays['PlayOutcome'].str.contains('Touchdown')]

  # Iterating through every touchdown play within 'df_touchdown_plays'
  for idx, play in df_touchdown_plays['PlayDescription'].items():

    # - Once i figure out what kind of touchdown it was, then I will be able to
    #   determine the 'PlayType'

    ######################
    # PASSING TOUCHDOWNS #
    ######################

    # If a play has a passer throwing the ball, I am assuming it is a passing play
    passing_play = re.findall(passer_name_pattern, play)
    if len(passing_play) > 0 and play.find("sacked") == -1 and play.find("INTERCEPTED") == -1:

      # creating a copy of the passing touchdown play row and cleaning the copy
      passing_touchdown_row = df_plays.loc[idx].copy()
      passing_touchdown_row['PlayType'] = 'Pass'
      passing_touchdown_row['PlayOutcome'] = 'Pass'
      passing_touchdown_row['IsScoringPlay'] = 1
      passing_touchdown_row = pd.DataFrame([passing_touchdown_row], columns=df_plays.columns)
      passing_touchdown_row = clean_pass_plays(passing_touchdown_row)
      passing_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, passing_touchdown_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_touchdown_plays(df_plays, idx+len(passing_touchdown_row))

    ######################
    # RUSHING TOUCHDOWNS #
    ######################

    rusher_patterns = [rusher_pattern]
    # Loop through patterns and find the first match
    for pattern in rusher_patterns:
      rusher = re.findall(pattern, play)
      if len(rusher) > 0:
        # creating a copy of the rushing touchdown play row and cleaning the copy
        rushing_touchdown_row = df_plays.loc[idx].copy()
        rushing_touchdown_row['PlayType'] = 'Run'
        rushing_touchdown_row['PlayOutcome'] = 'Run'
        rushing_touchdown_row['IsScoringPlay'] = 1
        rushing_touchdown_row = pd.DataFrame([rushing_touchdown_row], columns=df_plays.columns)
        rushing_touchdown_row = clean_run_plays(rushing_touchdown_row)
        rushing_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

        # Replacing old row with cleaned row
        df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
        df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
        df_plays = pd.concat([df_before_row, rushing_touchdown_row, df_after_row], ignore_index=True)

        # Recursion to update 'df_plays'
        if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
          return df_plays
        else:
          return clean_touchdown_plays(df_plays, idx+len(rushing_touchdown_row))

    ##########################
    # INTERCEPTED TOUCHDOWNS #
    ##########################

    # Still need to clean intercepted play types
    if play.find("INTERCEPTED") != -1:

      # creating a copy of the incercepted touchdown play and cleaning the copy
      intercepted_touchdown_row = df_plays.loc[idx].copy()
      intercepted_touchdown_row['PlayOutcome'] = 'Interception'
      intercepted_touchdown_row['IsScoringPlay'] = 1 # This will only be the value for the team that threw the interception
      intercepted_touchdown_row = pd.DataFrame([intercepted_touchdown_row], columns=df_plays.columns)
      intercepted_touchdown_row.reset_index(drop=True, inplace=True)

      #################################################################################################### Under Construction
      # Change feature 'TeamWithPossession' for each play in drive
      # - Raw data states that the team that intercepted the ball for a touchdown had possession for each play
      #   within drive. The correct value for this feature for each play in drive is the team that threw
      #   the interception.

      wrong_team_with_possession = df_plays['TeamWithPossession'].loc[idx]
      if wrong_team_with_possession == dict_teams.get(df_plays['HomeTeam'].loc[idx]):
        correct_team_with_possession = dict_teams.get(df_plays['AwayTeam'].loc[idx])
      else:
        correct_team_with_possession = dict_teams.get(df_plays['HomeTeam'].loc[idx])

      # HERE I NEED TO CHANGE ALL 'TEAMWITHPOSSESSION' FEATURES FOR EVERY PLAY IN DRIVE
      # I need to figure out how to efficiently grab every play in drive.
      intercepted_touchdown_row['TeamWithPossession'] = correct_team_with_possession
      conditions_for_unique_drive = ((df_plays['Season'] == df_plays['Season'].loc[idx]) &
      (df_plays['Week'] == df_plays['Week'].loc[idx]) &
      (df_plays['AwayTeam'] == df_plays['AwayTeam'].loc[idx]) &
      (df_plays['HomeTeam'] == df_plays['HomeTeam'].loc[idx]) &
      (df_plays['Quarter'] == df_plays['Quarter'].loc[idx]) &
      (df_plays['DriveNumber'] == df_plays['DriveNumber'].loc[idx]))

      df_plays.loc[conditions_for_unique_drive, 'TeamWithPossession'] = correct_team_with_possession

      ####################################################################################################

      # Because this is an interception for a touchdown, the defensive team should have their team
      # with possession to end the drive.
      # REMINDER: This single play is separated into multiple actions (play will be represented with multiple rows)
      intercepted_touchdown_row = clean_intercepted_plays(intercepted_touchdown_row)
      intercepted_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, intercepted_touchdown_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_touchdown_plays(df_plays, idx+len(intercepted_touchdown_row))

    #####################################
    # SACKED FUMBLE RECOVERY TOUCHDOWNS #
    #####################################

    if play.find("sacked") != -1:

      # creating a copy of the sack touchdown play and cleaning the copy
      sacked_touchdown_row = df_plays.loc[idx].copy()
      sacked_touchdown_row['PlayOutcome'] = 'Sack'
      sacked_touchdown_row['IsScoringPlay'] = 1
      sacked_touchdown_row = pd.DataFrame([sacked_touchdown_row], columns=df_plays.columns)
      sacked_touchdown_row.reset_index(drop=True, inplace=True)

      #################################################################################################### Under Construction
      # Change feature 'TeamWithPossession' for each play in drive
      # - Raw data states that the team that recovered the ball for a touchdown had possession for each play
      #   within drive. The correct value for this feature for each play in drive is the team that fumbled.
      #   - WRONG: I think this could cause some issues in the future. During a fumbled play, the feature
      #            'TeamWithPossession' could go back and forth between both teams during that single play.
      #            - This would take each of those rows representing that sinlge fumbled play and state that
      #              the offense had possession the entire time.

      wrong_team_with_possession = df_plays['TeamWithPossession'].loc[idx]
      if wrong_team_with_possession == dict_teams.get(df_plays['HomeTeam'].loc[idx]):
        correct_team_with_possession = dict_teams.get(df_plays['AwayTeam'].loc[idx])
      else:
        correct_team_with_possession = dict_teams.get(df_plays['HomeTeam'].loc[idx])

      # HERE I NEED TO CHANGE ALL 'TEAMWITHPOSSESSION' FEATURES FOR EVERY PLAY IN DRIVE
      # I need to figure out how to efficiently grab every play in drive.
      sacked_touchdown_row['TeamWithPossession'] = correct_team_with_possession
      conditions_for_unique_drive = ((df_plays['Season'] == df_plays['Season'].loc[idx]) &
      (df_plays['Week'] == df_plays['Week'].loc[idx]) &
      (df_plays['AwayTeam'] == df_plays['AwayTeam'].loc[idx]) &
      (df_plays['HomeTeam'] == df_plays['HomeTeam'].loc[idx]) &
      (df_plays['Quarter'] == df_plays['Quarter'].loc[idx]) &
      (df_plays['DriveNumber'] == df_plays['DriveNumber'].loc[idx]))

      df_plays.loc[conditions_for_unique_drive, 'TeamWithPossession'] = correct_team_with_possession

      ####################################################################################################

      # Because this is a fumble recovery for a touchdown, the defensive team should have their team
      # with possession to end the drive. If this was not here, it would state that
      sacked_touchdown_row = clean_sacked_plays(sacked_touchdown_row)
      sacked_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row (Original row can sometimes be replaced with multiple rows)
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, sacked_touchdown_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_touchdown_plays(df_plays, idx+len(sacked_touchdown_row))

    ##########################
    # PUNT RETURN TOUCHDOWNS #
    ##########################

    punt_play = re.findall(punting_pattern, play)
    if len(punt_play) > 0:

      # creating a copy of the punt touchdown play and cleaning the copy
      punt_touchdown_row = df_plays.loc[idx].copy()
      punt_touchdown_row['PlayOutcome'] = 'Punt'
      punt_touchdown_row['IsScoringPlay'] = 1 # This will only be the value for the team that punted the ball
      punt_touchdown_row = pd.DataFrame([punt_touchdown_row], columns=df_plays.columns)
      punt_touchdown_row.reset_index(drop=True, inplace=True)


      #################################################################################################### Under Construction
      # Change feature 'TeamWithPossession' for each play in drive
      # - Raw data states that the team that returned the punt for a touchdown had possession for each play
      #   within drive. The correct value for this feature for each play in drive is the team that punted.
      #   - NOTE: A punting play is separated into 2 pieces.
      #           1. The punt (Team with possession is the team that punted the ball)
      #           2. The punt return (Team with possession is the team that is returning the ball)

      wrong_team_with_possession = df_plays['TeamWithPossession'].loc[idx]
      if wrong_team_with_possession == dict_teams.get(df_plays['HomeTeam'].loc[idx]):
        correct_team_with_possession = dict_teams.get(df_plays['AwayTeam'].loc[idx])
      else:
        correct_team_with_possession = dict_teams.get(df_plays['HomeTeam'].loc[idx])

      # HERE I NEED TO CHANGE ALL 'TEAMWITHPOSSESSION' FEATURES FOR EVERY PLAY IN DRIVE
      # I need to figure out how to efficiently grab every play in drive.
      punt_touchdown_row['TeamWithPossession'] = correct_team_with_possession
      conditions_for_unique_drive = ((df_plays['Season'] == df_plays['Season'].loc[idx]) &
      (df_plays['Week'] == df_plays['Week'].loc[idx]) &
      (df_plays['AwayTeam'] == df_plays['AwayTeam'].loc[idx]) &
      (df_plays['HomeTeam'] == df_plays['HomeTeam'].loc[idx]) &
      (df_plays['Quarter'] == df_plays['Quarter'].loc[idx]) &
      (df_plays['DriveNumber'] == df_plays['DriveNumber'].loc[idx]))

      df_plays.loc[conditions_for_unique_drive, 'TeamWithPossession'] = correct_team_with_possession

      ####################################################################################################


      punt_touchdown_row = clean_punt_plays(punt_touchdown_row)
      punt_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, punt_touchdown_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_touchdown_plays(df_plays, idx+len(punt_touchdown_row))

    #################################
    # BLOCKED FIELD GOAL TOUCHDOWNS #
    #################################

    field_goal_blocked = re.findall(field_goal_blocked_pattern, play)
    if len(field_goal_blocked) > 0:

      # creating a copy of recovered blocked field goal touchdown play and cleaning the copy
      blocked_fg_touchdown_row = df_plays.loc[idx].copy()
      blocked_fg_touchdown_row['PlayOutcome'] = 'Field Goal'
      blocked_fg_touchdown_row['IsScoringPlay'] = 1 # This will only be the value for the team that attempted the field goal
      blocked_fg_touchdown_row = pd.DataFrame([blocked_fg_touchdown_row], columns=df_plays.columns)
      blocked_fg_touchdown_row.reset_index(drop=True, inplace=True)
      blocked_fg_touchdown_row = clean_field_goal_plays(blocked_fg_touchdown_row)
      blocked_fg_touchdown_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      #################################################################################################### Under Construction
      # Change feature 'TeamWithPossession' for each play in drive
      # - Raw data states that the team that blocked the field goal attempt and recovered for a touchdown had possession for each play
      #   within drive. The correct value for this feature for each play in drive is the team that threw
      #   the interception.

      wrong_team_with_possession = df_plays['TeamWithPossession'].loc[idx]
      if wrong_team_with_possession == dict_teams.get(df_plays['HomeTeam'].loc[idx]):
        correct_team_with_possession = dict_teams.get(df_plays['AwayTeam'].loc[idx])
      else:
        correct_team_with_possession = dict_teams.get(df_plays['HomeTeam'].loc[idx])

      # HERE I NEED TO CHANGE ALL 'TEAMWITHPOSSESSION' FEATURES FOR EVERY PLAY IN DRIVE
      # I need to figure out how to efficiently grab every play in drive.
      blocked_fg_touchdown_row['TeamWithPossession'] = correct_team_with_possession
      conditions_for_unique_drive = ((df_plays['Season'] == df_plays['Season'].loc[idx]) &
      (df_plays['Week'] == df_plays['Week'].loc[idx]) &
      (df_plays['AwayTeam'] == df_plays['AwayTeam'].loc[idx]) &
      (df_plays['HomeTeam'] == df_plays['HomeTeam'].loc[idx]) &
      (df_plays['Quarter'] == df_plays['Quarter'].loc[idx]) &
      (df_plays['DriveNumber'] == df_plays['DriveNumber'].loc[idx]))

      df_plays.loc[conditions_for_unique_drive, 'TeamWithPossession'] = correct_team_with_possession

      ####################################################################################################

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, blocked_fg_touchdown_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_touchdown_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_touchdown_plays(df_plays, idx+len(blocked_fg_touchdown_row))

#### FIELD GOALS

In [None]:
# I need an example of when a player returns the field goal for yardage
# I need a larger sample size for "Blocked" field goals
# I need to figure out what to do if someone fumbles a recovery
# I need to figure out what to do on a trick play (e.i. holder runs out with the ball)
# - INCOMPLETE. NEED LARGER SAMPLE SIZE

def clean_field_goal_plays(df_plays, index_start = None):

  # Adjusting df_plays to start cleaning at a specified index (index_start)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    # Locating all field goal plays within dataframe
    df_field_goal_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Field Goal')]
  else:
    # Locating all field goal plays within dataframe
    df_field_goal_plays = df_plays[df_plays['PlayOutcome'].str.contains('Field Goal')]

  for idx, play in df_field_goal_plays['PlayDescription'].items():

    play_elements = play.split(". ")

    ###################
    # EXTRA PLAY DATA #
    ###################

    # I may have to change this later.
    # I think I will have to move this towards the end.

    # - eventually I will have to use 'extract_penalty_data'
    #   to handle field goal penalties. For now I am just
    #   going to record them.

    if len(play_elements) > 1:

      accepted_penalties = []
      declined_penalties = []
      injured_players = []

      for i in play_elements:

        # Accepted Penalty
        if i.find('PENALTY') != -1:
          accepted_penalties.append(i)

        # Declined Penalty
        if i.find('Penalty') != -1:
          declined_penalties.append(i)

        # Injuries
        injury_on_play = re.findall(injury_pattern, i)
        if len(injury_on_play) > 0:
          injured_players.append(injury_on_play[0])

      if len(accepted_penalties) > 0:
        df_plays.at[idx, 'AcceptedPenalty'] = accepted_penalties
      if len(declined_penalties) > 0:
        df_plays.at[idx, 'DeclinedPenalty'] = declined_penalties
      if len(injured_players) > 0:
        df_plays.at[idx, 'InjuredPlayers'] = injured_players

    # Time of play
    time_on_clock = re.findall(time_on_clock_pattern, play)
    if len(time_on_clock) > 0:
      df_plays.loc[idx, 'TimeOnTheClock'] = time_on_clock[0]

    #########################
    # FIELD GOAL SITUATIONS #
    #########################

    # Field goal good
    field_goal_good = re.findall(field_goal_good_pattern, play)
    if len(field_goal_good) > 0:
      df_plays.loc[idx, 'PlayOutcome'] = 'Field Goal Good'
      df_plays.loc[idx, 'PlayType'] = 'Field Goal'
      df_plays.loc[idx, 'Kicker'] = field_goal_good[0][0]
      df_plays.loc[idx, 'Yardage'] = int(field_goal_good[0][1])
      df_plays.loc[idx, 'LongSnapper'] = field_goal_good[0][2]
      df_plays.loc[idx, 'Holder'] = field_goal_good[0][3]
      continue

    # Field goal no good
    field_goal_no_good = re.findall(field_goal_no_good_pattern, play)
    if len(field_goal_no_good) > 0:
      df_plays.loc[idx, 'PlayOutcome'] = 'Field Goal No Good'
      df_plays.loc[idx, 'PlayType'] = 'Field Goal'
      df_plays.loc[idx, 'Kicker'] = field_goal_no_good[0][0]
      df_plays.loc[idx, 'Yardage'] = int(field_goal_no_good[0][1])
      df_plays.loc[idx, 'Direction'] = field_goal_no_good[0][2]
      df_plays.loc[idx, 'LongSnapper'] = field_goal_no_good[0][3]
      df_plays.loc[idx, 'Holder'] = field_goal_no_good[0][4]
      continue

    # Field goal blocked
    # - I NEED A LARGER SAMPLE SIZE TO CORRECTLY CLEAN THESE
    # - Should I create a feature for those who recovered the ball?
    # - This part of the method will sometimes have a play that can only be broken down
    #   into multiple sentences, reach representing individual actions.
    #   - individual actions like the field goal attempt, the recovery from the blockage
    #     possible fumbles (I have not implement this).
    # - I need to note that if a penalty has been called during a blocked field goal
    #   with a recovery for yardage, extra data (such as penalties and injuries) will
    #   be recorded multiple times. (I am too lazy right now to clean a situation that
    #   I have not come across yet.)
    field_goal_blocked = re.findall(field_goal_blocked_pattern, play)
    if len(field_goal_blocked) > 0:

      ########################################
      # LOCATING BLOCKED FIELD GOAL SENTENCE #
      ########################################

      play_elements = play.split(". ")
      if len(play_elements) > 1:
        for i in play_elements:
          # Locating which sentence contains the field goal attempt
          field_goal_blocked = re.findall(field_goal_blocked_pattern, i)
          if len(field_goal_blocked) > 0:

            ########################################
            # CLEANING BLOCKED FIELD GOAL SENTENCE #
            ########################################

            # Isolating blocked field goal sentence
            field_goal_attempt = i

            # creating copy of row as a single row dataframe with feature 'PlayDescription' as field goal blocked
            df_field_goal_blocked_row = pd.DataFrame([df_plays.loc[idx].copy()], columns=df_plays.columns)
            df_field_goal_blocked_row['PlayDescription'] = field_goal_attempt
            df_field_goal_blocked_row['PlayOutcome'] = 'Field Goal'
            df_field_goal_blocked_row = clean_field_goal_plays(df_field_goal_blocked_row)
            df_field_goal_blocked_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

            #############################
            # CLEANING RECOVERY YARDAGE #
            #############################

            # Grabbing all actions that followed the field goal attempt (should be things such as recovery for yardage, fumbles, recovery for touchdown, etc.)
            field_goal_blocked_recovery_actions = play_elements[play_elements.index(i)+1::]
            field_goal_blocked_recovery_actions = ". ".join(field_goal_blocked_recovery_actions)

            # creating copy of row as a single row dataframe with feature 'PlayDescription' as recovery data
            df_recovery_yardage_rows = pd.DataFrame([df_plays.loc[idx].copy()], columns=df_plays.columns)
            df_recovery_yardage_rows['PlayDescription'] = field_goal_blocked_recovery_actions
            df_recovery_yardage_rows['PlayOutcome'] = 'Run'
            df_recovery_yardage_rows = clean_run_plays(df_recovery_yardage_rows)
            df_recovery_yardage_rows['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]
            df_recovery_yardage_rows['PlayType'] = 'Field Goal Return'

            #############################################
            # REPLACING UNCLEAN ROW WITH CLEANED ROW(S) #
            #############################################

            df_before = df_plays.iloc[:df_plays.index.tolist().index(idx)]
            df_after = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
            df_plays = pd.concat([df_before, df_field_goal_blocked_row, df_recovery_yardage_rows, df_after], ignore_index=True)
            index_of_last_added_row = idx + len(df_field_goal_blocked_row) + len(df_recovery_yardage_rows)

            if df_field_goal_plays.tail(1).index.tolist()[0] == idx:
              return df_plays
            else:
              return clean_field_goal_plays(df_plays, index_of_last_added_row + 1)
            break

      df_plays.loc[idx, 'PlayOutcome'] = 'Field Goal Blocked'
      df_plays.loc[idx, 'PlayType'] = 'Field Goal'
      df_plays.loc[idx, 'Kicker'] = field_goal_blocked[0][0]
      df_plays.loc[idx, 'Yardage'] = int(field_goal_blocked[0][1])
      df_plays.loc[idx, 'BlockedBy'] = field_goal_blocked[0][2]
      df_plays.loc[idx, 'LongSnapper'] = field_goal_blocked[0][3]
      df_plays.loc[idx, 'Holder'] = field_goal_blocked[0][4]
      continue

  return df_plays

####EXTRA POINT

In [None]:
def clean_extra_point_plays(df_plays, index_start = None):

  # Adjusting df_plays to start cleaning at a specified index (index_start)
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    # Locating all extra point plays within dataframe
    df_extra_point_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Extra Point')]
  else:
    # Locating all extra point plays within dataframe
    df_field_goal_plays = df_plays[df_plays['PlayOutcome'].str.contains('Extra Point')]

  for idx, play in df_field_goal_plays['PlayDescription'].items():

    play_elements = play.split(". ")

    ###################
    # EXTRA PLAY DATA #
    ###################

    # - eventually I will have to use 'extract_penalty_data'
    #   to handle field goal penalties. For now I am just
    #   going to record them.

    if len(play_elements) > 1:

      accepted_penalties = []
      declined_penalties = []
      injured_players = []

      for i in play_elements:

        # Accepted Penalty
        if i.find('PENALTY') != -1:
          accepted_penalties.append(i)

        # Declined Penalty
        if i.find('Penalty') != -1:
          declined_penalties.append(i)

        # Injuries
        injury_on_play = re.findall(injury_pattern, i)
        if len(injury_on_play) > 0:
          injured_players.append(injury_on_play[0])

      if len(accepted_penalties) > 0:
        df_plays.at[idx, 'AcceptedPenalty'] = accepted_penalties
      if len(declined_penalties) > 0:
        df_plays.at[idx, 'DeclinedPenalty'] = declined_penalties
      if len(injured_players) > 0:
        df_plays.at[idx, 'InjuredPlayers'] = injured_players

    ##########################
    # EXTRA POINT SITUATIONS #
    ##########################

    # Extra point good
    extra_point_good = re.findall(extra_point_good_pattern, play)
    if len(extra_point_good) > 0:
      df_plays.loc[idx, 'PlayOutcome'] = 'Extra Point Good'
      df_plays.loc[idx, 'PlayType'] = 'Extra Point'
      df_plays.loc[idx, 'Kicker'] = extra_point_good[0][0]
      df_plays.loc[idx, 'LongSnapper'] = extra_point_good[0][1]
      df_plays.loc[idx, 'Holder'] = extra_point_good[0][2]
      continue

    # Extra point no good
    extra_point_no_good = re.findall(extra_point_no_good_pattern, play)
    if len(extra_point_no_good) > 0:
      df_plays.loc[idx, 'PlayOutcome'] = 'Extra Point No Good'
      df_plays.loc[idx, 'PlayType'] = 'Extra Point'
      df_plays.loc[idx, 'Kicker'] = extra_point_no_good[0][0]
      df_plays.loc[idx, 'Direction'] = extra_point_no_good[0][1]
      df_plays.loc[idx, 'LongSnapper'] = extra_point_no_good[0][2]
      df_plays.loc[idx, 'Holder'] = extra_point_no_good[0][3]
      continue

  return df_plays

###OTHER CLEANING METHODS

#### FUMBLE PLAYS

In [None]:
# What about punt returns?
# Might need more data on 'Aborted' fumbled plays. Currently it does not show who fumbled the ball.

def clean_fumble_plays(df_plays, index_start=None):

  # Cut 'df_plays' to begin from 'index_start' to the last penalty play available in dataframe
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_fumble_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('fumble', case=False)]
  else:
    df_fumble_plays = df_plays[df_plays['PlayOutcome'].str.contains('fumble', case=False)]

  for idx, play in df_fumble_plays['PlayDescription'].items():

    ############
    # REVERSES #
    ############

    # In 'PlayDescription' all information before the "reversed" sentence is not needed.
    # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
    if play.find('REVERSED') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find("REVERSED") != -1:
          df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############################
    # REPORTING IN AS ELIGIBLE #
    ############################

    # I do not think this contains any useful data so I am going to exclude it.
    if play.find('reported in as eligible') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find('reported in as eligible') != -1:
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    initial_action = play.split(". ")[0]

    ##################
    # PASSING FUMBLE #
    ##################

    fumble_pass = re.findall(receiver_pattern, initial_action)
    if len(fumble_pass) > 0:

      # creating a copy of the passing fumbled play row and cleaning the copy
      passing_fumble_row = df_plays.loc[idx].copy()
      passing_fumble_row['PlayOutcome'] = 'Pass'
      passing_fumble_row = pd.DataFrame([passing_fumble_row], columns=df_plays.columns)
      passing_fumble_row = clean_pass_plays(passing_fumble_row)

      # Record whether the pass was complete or incomplete.
      if play.find('pass incomplete') != -1:
        passing_fumble_row['PlayOutcome'] = f"{df_plays['PlayOutcome'].loc[idx]} (I)"
      else:
        passing_fumble_row['PlayOutcome'] = f"{df_plays['PlayOutcome'].loc[idx]} (C)"

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, passing_fumble_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_fumble_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_fumble_plays(df_plays, idx+len(passing_fumble_row))

    ##################
    # RUSHING FUMBLE #
    ##################

    fumble_rush = re.findall(rusher_pattern, initial_action)
    qb_fumble = re.findall(qb_fumble_pattern, initial_action)
    fumble_aborted = initial_action.find('Aborted')
    if len(fumble_rush) > 0 or fumble_aborted != -1 or len(qb_fumble) > 0:

      # creating a copy of the rushing fumbled play row and cleaning the copy
      rushing_fumble_row = df_plays.loc[idx].copy()
      rushing_fumble_row['PlayOutcome'] = 'Run'
      rushing_fumble_row = pd.DataFrame([rushing_fumble_row], columns=df_plays.columns)
      rushing_fumble_row = clean_run_plays(rushing_fumble_row)
      rushing_fumble_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, rushing_fumble_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_fumble_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_fumble_plays(df_plays, idx+len(rushing_fumble_row))

    #################
    # SACKED FUMBLE #
    #################

    if initial_action.find('sacked') != -1:

      # creating a copy of the sacked fumble play row and cleaning the copy
      sacked_fumble_row = df_plays.loc[idx].copy()
      sacked_fumble_row['PlayOutcome'] = 'Sack'
      sacked_fumble_row = pd.DataFrame([sacked_fumble_row], columns=df_plays.columns)
      sacked_fumble_row = clean_sacked_plays(sacked_fumble_row)
      sacked_fumble_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, sacked_fumble_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_fumble_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_fumble_plays(df_plays, idx+len(sacked_fumble_row))

    ##################
    # KICKOFF FUMBLE #
    ##################

    kickoff_fumble = re.findall(kickoff_pattern, initial_action)
    if len(kickoff_fumble) > 0:

      # creating a copy of the passing fumbled play row and cleaning the copy
      kickoff_fumble_row = df_plays.loc[idx].copy()
      kickoff_fumble_row['PlayOutcome'] = 'kickoff'
      kickoff_fumble_row = pd.DataFrame([kickoff_fumble_row], columns=df_plays.columns)
      kickoff_fumble_row = clean_kickoff_plays(kickoff_fumble_row)
      kickoff_fumble_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, kickoff_fumble_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_fumble_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_fumble_plays(df_plays, idx+len(kickoff_fumble_row))

    ###############
    # PUNT FUMBLE #
    ###############

    punt_fumble = re.findall(punting_pattern, initial_action)
    if len(punt_fumble) > 0:

      # creating a copy of the fumbled play row and cleaning the copy
      punt_fumble_row = df_plays.loc[idx].copy()
      punt_fumble_row['PlayOutcome'] = 'Punt'
      punt_fumble_row = pd.DataFrame([punt_fumble_row], columns=df_plays.columns)
      punt_fumble_row = clean_punt_plays(punt_fumble_row)
      punt_fumble_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, punt_fumble_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_fumble_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_fumble_plays(df_plays, idx+len(punt_fumble_row))

  return df_plays

#### PENALTY PLAYS

In [None]:
# This probably does not cover every possible penalty play.
# For example, in this sample of plays there are no penalties during kickoffs
# when penalties during kickoffs are 100% possible.

def clean_penalty_plays(df_plays, index_start=None):

  # Cut 'df_plays' to begin from 'index_start' to the last penalty play available in dataframe
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_plays_adjusted = df_plays.iloc[df_plays.index.tolist().index(index_start):]
    df_penalty_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('penalty', case=False)]
  else:
    df_penalty_plays = df_plays[df_plays['PlayOutcome'].str.contains('penalty', case=False)]

  # Iterating through every penalty play within 'df_penalty_plays'
  for idx, play in df_penalty_plays['PlayDescription'].items():

    ############
    # REVERSES #
    ############

    # In 'PlayDescription' all information before the "reversed" sentence is not needed.
    # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
    if play.find('REVERSED') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find("REVERSED") != -1:
          df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############################
    # REPORTING IN AS ELIGIBLE #
    ############################

    # I do not think this contains any useful data so I am going to exclude it.
    if play.find('reported in as eligible') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find('reported in as eligible') != -1:
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    initial_action = play.split(". ")[0]

    ###############################
    # PENALTY DURING PASSING PLAY #
    ###############################

    penalty_pass = re.findall(receiver_pattern, initial_action)
    if len(penalty_pass) > 0 or play.find('pass incomplete') != -1:

      # creating a copy of the passing penalty play row and cleaning the copy
      passing_penalty_row = df_plays.loc[idx].copy()
      passing_penalty_row['PlayOutcome'] = 'Pass'
      passing_penalty_row = pd.DataFrame([passing_penalty_row], columns=df_plays.columns)
      passing_penalty_row = clean_pass_plays(passing_penalty_row)
      passing_penalty_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]
      passing_penalty_row['PlayType'] = 'No Play'

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, passing_penalty_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_penalty_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_penalty_plays(df_plays, idx+len(passing_penalty_row))

    ###############################
    # PENALTY DURING RUSHING PLAY #
    ###############################

    penalty_rush = re.findall(rusher_pattern, initial_action)
    if len(penalty_rush) > 0 or play.find('Aborted') != -1:

      # creating a copy of the rushing penalty play row and cleaning the copy
      rushing_penalty_row = df_plays.loc[idx].copy()
      rushing_penalty_row['PlayOutcome'] = 'Run'
      rushing_penalty_row = pd.DataFrame([rushing_penalty_row], columns=df_plays.columns)
      rushing_penalty_row = clean_run_plays(rushing_penalty_row)
      rushing_penalty_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]
      rushing_penalty_row['PlayType'] = 'No Play'

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, rushing_penalty_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_penalty_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_penalty_plays(df_plays, idx+len(rushing_penalty_row))

    ######################################
    # PENALTY DURING 2PT CONVERSION PLAY #
    ######################################

    if play.find('TWO-POINT CONVERSION ATTEMPT') != -1:

      # creating a copy of the 2pt conversion penalty play row and cleaning the copy
      two_pt_conversion_penalty_row = df_plays.loc[idx].copy()
      two_pt_conversion_penalty_row['PlayOutcome'] = '2PT Conversion'
      two_pt_conversion_penalty_row = pd.DataFrame([two_pt_conversion_penalty_row], columns=df_plays.columns)
      two_pt_conversion_penalty_row = cleaning_2pt_conversion_plays(two_pt_conversion_penalty_row)
      two_pt_conversion_penalty_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]
      two_pt_conversion_penalty_row['PlayType'] = 'No Play'

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, two_pt_conversion_penalty_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_penalty_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_penalty_plays(df_plays, idx+len(two_pt_conversion_penalty_row))

    #################
    # SACKED FUMBLE #
    #################

    if initial_action.find('sacked') != -1:

      # creating a copy of the sacked fumble play row and cleaning the copy
      sacked_penalty_row = df_plays.loc[idx].copy()
      sacked_penalty_row['PlayOutcome'] = 'Sack'
      sacked_penalty_row = pd.DataFrame([sacked_penalty_row], columns=df_plays.columns)
      sacked_penalty_row = clean_sacked_plays(sacked_penalty_row)
      sacked_penalty_row['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]
      sacked_penalty_row['PlayType'] = 'No Play'

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, sacked_penalty_row, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_penalty_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_penalty_plays(df_plays, idx+len(sacked_penalty_row))

    #########################
    # PENALTY (False Start) #
    #########################

    # Will use 'clean_run_plays' method to clean these
    # All other penalty plays (e.i. False Start, Delay of Game, Offside, Neutral Zone Infraction, Too Many Men on Field, Encroachment, Taunting)

    # if play.find('False Start') != -1 or play.find('Delay of Game') != -1:

    # TimeOnTheClock
    TimeOnTheClock = re.findall(time_on_clock_pattern, play)
    if len(TimeOnTheClock) > 0:
      df_plays.loc[idx, 'TimeOnTheClock'] = TimeOnTheClock[0]

    # Formation
    Formation = re.findall(formation, play)
    if len(Formation) > 0:
      if Formation[0] == 'Aborted':
        pass
      else:
        df_plays.loc[idx, 'Formation'] = Formation[0]

    df_plays.at[idx, 'AcceptedPenalty'] = play
    df_plays.at[idx, 'PlayType'] = 'No Play'

  return df_plays

#### TURNOVER ON DOWNS

In [None]:
# Looks like either a pass / run / sack play

def clean_turnover_on_downs_plays(df_plays, index_start=None):

  # Cut 'df_plays' to begin from 'index_start' to the last penalty play available in dataframe
  if index_start != None:
    df_plays_adjusted = df_plays.loc[index_start:]
    df_turnover_on_downs_plays = df_plays_adjusted[df_plays_adjusted['PlayOutcome'].str.contains('Turnover on Downs', case=False)]
  else:
    df_turnover_on_downs_plays = df_plays[df_plays['PlayOutcome'].str.contains('Turnover on Downs', case=False)]

  # Iterating through every penalty play within 'df_turnover_on_downs_plays'
  for idx, play in df_turnover_on_downs_plays['PlayDescription'].items():
    # print()
    # print(idx)
    # print(play)

    ############
    # REVERSES #
    ############

    # In 'PlayDescription' all information before the "reversed" sentence is not needed.
    # - All information before is stored within 'ReverseDetails' and the remaining is cleaned.
    if play.find('REVERSED') != -1:
      play_elements = play.split(". ")
      for i in play_elements:
        if i.find("REVERSED") != -1:
          df_plays.at[idx, 'ReverseDetails'] = play_elements[:play_elements.index(i) + 1]
          play = ". ".join(play_elements[play_elements.index(i) + 1:])
          break

    ############################
    # TURNOVER ON DOWNS (PASS) #
    ############################

    passing_play = re.findall(passer_name_pattern, play)
    if len(passing_play) > 0 and play.find("sacked") == -1:

      passing_turnover_on_downs = df_plays.loc[idx].copy()
      passing_turnover_on_downs['PlayOutcome'] = 'Pass'
      passing_turnover_on_downs = pd.DataFrame([passing_turnover_on_downs], columns=df_plays.columns)
      passing_turnover_on_downs = clean_pass_plays(passing_turnover_on_downs)

      # Record whether the pass was complete or incomplete.
      if play.find('pass incomplete') != -1:
        passing_turnover_on_downs['PlayOutcome'] = f"{df_plays['PlayOutcome'].loc[idx]} (I)"
      else:
        passing_turnover_on_downs['PlayOutcome'] = f"{df_plays['PlayOutcome'].loc[idx]} (C)"

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, passing_turnover_on_downs, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_turnover_on_downs_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_turnover_on_downs_plays(df_plays, idx+len(passing_turnover_on_downs))

    ############################
    # TURNOVER ON DOWNS (RUSH) #
    ############################

    rushing_play = re.findall(rusher_pattern, play)
    if len(rushing_play) > 0:

      rushing_turnover_on_downs = df_plays.loc[idx].copy()
      rushing_turnover_on_downs['PlayOutcome'] = 'Run'
      rushing_turnover_on_downs = pd.DataFrame([rushing_turnover_on_downs], columns=df_plays.columns)
      rushing_turnover_on_downs = clean_run_plays(rushing_turnover_on_downs)
      rushing_turnover_on_downs['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, rushing_turnover_on_downs, df_after_row], ignore_index=True)

      if df_turnover_on_downs_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_turnover_on_downs_plays(df_plays, idx+len(rushing_turnover_on_downs))

    ##############################
    # TURNOVER ON DOWNS (SACKED) #
    ##############################

    if play.find("sacked") != -1:

      sacked_turnover_on_downs = df_plays.loc[idx].copy()
      sacked_turnover_on_downs['PlayOutcome'] = 'Sack'
      sacked_turnover_on_downs = pd.DataFrame([sacked_turnover_on_downs], columns=df_plays.columns)
      sacked_turnover_on_downs.reset_index(drop=True, inplace=True)
      sacked_turnover_on_downs = clean_sacked_plays(sacked_turnover_on_downs)
      sacked_turnover_on_downs['PlayOutcome'] = df_plays['PlayOutcome'].loc[idx]

      # Replacing old row with cleaned row
      df_before_row = df_plays.iloc[:df_plays.index.tolist().index(idx)]
      df_after_row = df_plays.iloc[df_plays.index.tolist().index(idx)+1:]
      df_plays = pd.concat([df_before_row, sacked_turnover_on_downs, df_after_row], ignore_index=True)

      # Recursion to update 'df_plays'
      if df_turnover_on_downs_plays.tail(1).index.tolist()[0] == idx:
        return df_plays
      else:
        return clean_turnover_on_downs_plays(df_plays, idx+len(sacked_turnover_on_downs))

## 4. PIPELINE MAIN METHOD

In [None]:
# PURPOSE:
# - Accept a dataframe of plays (dataframes formatted by NFL_Scrapers) and
#   return a cleaned dataframe of those plays.
# INPUT PARAMTERS:
# df_all_plays         - dataframe - all plays in raw form from NFL_Scraper that user
#                                    would like to clean.
# OUTPUT:
# df_all_plays_cleaned - dataframe - all plays from 'df_all_plays' cleaned and data
#                                    dispersed into individual new features.

# CURRENT DESIGN PLAN:
# 1. Use uniquely designed methods for each play type to clean within dataframe
#    - (e.g. pass, run, touchdown, punt, sack, ... )
# 2. Repeat until all plays within dataframe have been cleaned.
#   NOTE:
#   - It is important to fully clean a play type before moving to the next
#      because sometimes cleaning could involve adding a new row to the dataframe,
#      causing a reset to the dataframes indexing.
#      - If we were to separate all play types from the beginning, the indexes
#        could shift around causing, for example, an index that might originally
#        point to a run play to now instead point at a pass play.

def clean_dataframe_of_plays(df_all_plays):

  ################################
  # RAW DATA COLUMN DESCRIPTIONS #
  ################################
  # Season             - Year of the season
  # Week               - Game week of the season (e.g. 'Week 1')
  # Day                - Day of the week (e.g. 'MON')
  # Date               - Month and day of the game formatted MM/DD (e.g. '09/07')
  # AwayTeam           - Visiting team of the game
  # HomeTeam           - Home team of the game
  # Quarter            - Quarter that the focused play is in
  # DriveNumber        - Drive number that the focused play is in
  # TeamWithPossession - Team with the ball during the play.
  #                      - Can have multiple teams with possession during a single play.
  #                        - Some plays are broken down into multiple rows such as fumbled plays.
  # IsScoringDrive     - Does the drive that the focused play in result in a score?
  # PlayOutcome        - Ultimate result of the play (e.g. '13 Yard Pass')
  # PlayDescription    - The raw description given of the focused play, entailing everything
  #                      that happened within it.
  # PlayStart          - The down and where the play started on the field (e.g. '2nd & 9 at DET 21')

  ###########################
  # NEW COLUMN DESCRIPTIONS #
  ###########################

  # PlayType           - The type of play (e.g. pass/run)
  # TimeOnTheClock     - The time that was on the clock when the play started
  # Formation          - Play formation
  # Passer             - Player that threw the ball (mostly the quarterback)
  # Rusher             - Player that ran the ball (mostly the runningback)
  # Receiver           - Player on the same team as the passer that caught the ball
  # Direction          - Where the ball is going during the play
  # Yardage            - Yards gained during the play
  #                      - (Should specify that yardage does not include extra yardage gained from penalties)
  #                      - (Player awarded yardage)
  #                      - (also includes how far kicks have gone during kickoffs and punts)
  # SoloTackle         - Player awarded a solo tackle from a play
  # AssistedTackle     - Player awarded an assisted tackle from a play
  # SharedTackle       - Player awarded a shared tackle from a play
  # PassDefendedBy     - Defender that defended the passing play
  # PressureBy         - Defender that applied pressure to the passer
  # InterceptedBy      - Defender that intercepted the passing play
  # SackedBy           - Player awarded a sack from a play. (Could be solo or split)
  # ForcedFumbledBy    - Player awarded a forced fumble from a play
  # WhoFumbled         - Player who last held the ball during a fumble.
  # FumbleRecoveredBy  - Player who recovered the fumbled ball
  # FumbleDetails      - A list that has what happened after the fumble
  #                      - [forced fumble by, recovered by, yards gained, tackled by]
  # ReverseDetails     - A list having plays leading up to play reversal
  # InjuredPlayers     - Players that were injured during the play
  # AcceptedPenalty    - Penalty on the field that was accepted
  # DeclinedPenalty    - Penalty on the field that was declined
  # Kicker             - Player who kicked the ball during a kickoff / punt / extra point / field goal
  # LongSnapper        - Player who snapped the ball during a punt / extra point / field goal
  # Returner           - Player who returned the ball during a kickoff / punt
  # DownedBy           - ? ? ? I forget
  # Holder             - Player who held ball for extra point / field goal
  # BlockedBy          - Player who blocked a punt / extra point / field goal

  new_columns = ["PlayType", "TimeOnTheClock", "Formation", "Passer", "Rusher", "Receiver", "Direction", "Yardage",
                "SoloTackle", "AssistedTackle", "SharedTackle", 'PassDefendedBy', "PressureBy", "InterceptedBy", "SackedBy", "ForcedFumbleBy", "WhoFumbled", "FumbleRecoveredBy",
                "FumbleDetails", "ReverseDetails",
                "InjuredPlayers", "AcceptedPenalty", "DeclinedPenalty",
                "Kicker", "LongSnapper", "Returner", "DownedBy", "Holder", "BlockedBy"]

  string_columns = ["PlayType", "TimeOnTheClock", "Formation", "Passer", "Rusher", "Receiver", "Direction",
                    "SoloTackle", "AssistedTackle", "SharedTackle", 'PassDefendedBy', "PressureBy", "InterceptedBy", "SackedBy", "ForcedFumbleBy", "WhoFumbled", "FumbleRecoveredBy",
                    "FumbleDetails", "ReverseDetails",
                    "InjuredPlayers", "AcceptedPenalty", "DeclinedPenalty",
                    "Kicker", "LongSnapper", "Returner", "DownedBy", "Holder", "BlockedBy"]

  int_columns = ["Yardage"]

  ########################################
  # RETURN DATAFRAME WITH ADDED FEATURES #
  ########################################

  df_all_plays_cleaned = df_all_plays.copy()
  df_all_plays_cleaned = df_all_plays_cleaned.reindex(columns=df_all_plays_cleaned.columns.tolist() + new_columns)
  df_all_plays_cleaned[string_columns] = df_all_plays_cleaned[string_columns].astype(str)
  df_all_plays_cleaned[int_columns] = df_all_plays_cleaned[int_columns].astype(float)

  ########################################
  # GETTING PLAY CATEGORIES AND CLEANING #
  ########################################

  # TOUCHDOWNS MUST BE CLEANED FIRST
  # - Any touchdown resulting from a change in possession (e.g. Interception for Touchdown)
  #   raw data states that the team on defense had possession the entire drive.
  #   - So all plays leading up to the touchdown state that the defense has possession.
  df_all_plays_cleaned = clean_touchdown_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_run_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_pass_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = cleaning_2pt_conversion_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_intercepted_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_sacked_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_punt_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_kickoff_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_field_goal_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_extra_point_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_fumble_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_penalty_plays(df_all_plays_cleaned)
  df_all_plays_cleaned = clean_turnover_on_downs_plays(df_all_plays_cleaned)

  return df_all_plays_cleaned

# TESTING (Helper Methods)

In [None]:
# PURPOSE:
# - A tool that can be used to compare original plays and their cleaned versions

# I would like to return a map that has:
# KEY: index of original unclean play
# VALUE: index(es) of cleaned play

def unclean_clean_matches(df_unclean_plays, df_clean_plays):

  my_map = {}

  # This group of features is unique to each play
  # - Both the unclean and cleaned versions of the plays have these
  # - These features will be used to find the matching plays between the unclean df and the cleaned df
  # matching_features = ['Season', 'Week', 'Date', 'AwayTeam', 'HomeTeam', 'Quarter', 'DriveNumber', 'TeamWithPossession', 'PlayNumberInDrive']
  matching_features = ['Season', 'Week', 'Date', 'AwayTeam', 'HomeTeam', 'Quarter', 'DriveNumber', 'PlayNumberInDrive']

  # Iterate through each row of the dataframe of unclean plays
  for u_row in df_unclean_plays.itertuples(index=True):
    u_features = [getattr(u_row, col) for col in matching_features]

    matching_indexes = []
    matches_found = False

    # Iterate through each row of the dataframe of cleaned plays
    # - The starting index will be the index of the unclean play within the main original dataframe of plays
    #   - The matching cleaned pair will either be at the exact same location or higher
    for c_row in df_clean_plays[u_row.Index::].itertuples(index=True):
      c_features = [getattr(c_row, col) for col in matching_features]

      # If a match is found, check for consective rows of matches because some uncleaned plays needed to be cleaned using multiple rows
      # - Once a row that does not match follows one that does, will break the loop because the one play match has been found.
      if u_features == c_features:
        matching_indexes.append(c_row.Index)
        matches_found = True
      elif matches_found:
        my_map[u_row.Index] = matching_indexes
        break

  return my_map

# TESTING AREA

In [None]:
df_week2_plays_cleaned = clean_dataframe_of_plays(week2_2023_plays_modified)

In [None]:
df_week2_plays_cleaned.shape

(2735, 44)

## PLAYTYPE OBSERVATIONS
- Looking at each play from each playtype

### Passing plays

In [None]:
# Number of passing type plays during 2023, Week 1

df_unclean_pass_plays = week2_2023_plays_modified.loc[week2_2023_plays_modified['PlayOutcome'].str.contains('Pass')]

map_passing_plays = unclean_clean_matches(df_unclean_pass_plays, df_week2_plays_cleaned)

len(map_passing_plays.keys())

1037

In [None]:
# Every unclean passing play and their associated cleaned play breakdown

for i in map_passing_plays.keys():
  print(f"({i}, {map_passing_plays.get(i)})")
  play = week2_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

(0, [0])
(12:21) (Shotgun) J.Dobbs pass incomplete short left to M.Brown.

(2, [2])
(3:27) (Shotgun) J.Dobbs pass short middle to Z.Ertz to ARI 42 for 17 yards (X.McKinney).

(3, [3])
(12:27) J.Dobbs pass incomplete deep left to Z.Ertz.

(5, [5])
(14:58) (Shotgun) J.Dobbs pass incomplete deep right to M.Brown.

(7, [7])
(1:04) (Shotgun) J.Dobbs pass deep middle to Mi.Wilson to NYG 43 for 16 yards (X.McKinney) [K.Thibodeaux]
Penalty on NYG-D.Banks, Illegal Contact, declined.

(13, [13])
(1:22) (Shotgun) J.Dobbs pass incomplete short left to Z.Ertz [M.McFadden].

(14, [14])
(2:13) (Shotgun) J.Dobbs pass short right to T.McBride pushed ob at NYG 43 for 16 yards (C.Basham)
PENALTY on ARI-M.Brown, Illegal Block Above the Waist, 10 yards, enforced at 50.

(15, [15])
(1:45) (No Huddle, Shotgun) J.Dobbs pass short left to M.Brown to ARI 41 for 1 yard (B.Okereke).

(20, [20])
(14:24) J.Dobbs pass short right to Z.Ertz to ARI 47 for 5 yards (X.McKinney).

(21, [21])
(15:00) (Shotgun) J.Dobbs pas

In [None]:
# passing type plays during 2023, Week 1 that have been spiked

df_unclean_pass_plays_spiked = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('Pass')) &
                                                             (week1_2023_plays_modified['PlayDescription'].str.contains('spiked', case=False))]

map_passing_spiked_plays = unclean_clean_matches(df_unclean_pass_plays_spiked, df_week1_plays_cleaned)

for i in map_passing_spiked_plays.keys():
  print(f"({i}, {map_passing_spiked_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

NameError: name 'week1_2023_plays_modified' is not defined

In [None]:
# passing type plays during 2023, Week 1 that result in touchdown

df_unclean_pass_plays_touchdown = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('touchdown', case=False)) &
                                                                (week1_2023_plays_modified['PlayDescription'].str.contains('pass', case=False))]

map_passing_touchdown_plays = unclean_clean_matches(df_unclean_pass_plays_touchdown, df_week1_plays_cleaned)

for i in map_passing_touchdown_plays.keys():
  print(f"({i}, {map_passing_touchdown_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# passing type plays during 2023, Week 1 that result in touchdown

df_unclean_pass_plays_touchdown = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('touchdown', case=False)) &
                                                                (week1_2023_plays_modified['PlayDescription'].str.contains('PENALTY', case=False))]

map_passing_touchdown_plays = unclean_clean_matches(df_unclean_pass_plays_touchdown, df_week1_plays_cleaned)

for i in map_passing_touchdown_plays.keys():
  print(f"({i}, {map_passing_touchdown_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# every passing play that resulted in a fumble (including fumble recoveries resulting in a touchdown)

df_unclean_pass_fumble_plays = week1_2023_plays_modified.loc[((week1_2023_plays_modified['PlayOutcome'].str.contains('Pass')) |
                                                             ((week1_2023_plays_modified['PlayDescription'].str.contains('Touchdown', case=False)) &
                                                              (week1_2023_plays_modified['PlayOutcome'].str.contains('Pass')))) &
                                                              (week1_2023_plays_modified['PlayDescription'].str.contains('fumbles', case=False))]

for i in unclean_clean_matches(df_unclean_pass_fumble_plays, df_week1_plays_cleaned).items():
  print(i)

In [None]:
dict_unclean_to_clean_pass_fumble_plays = unclean_clean_matches(df_unclean_pass_fumble_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_pass_fumble_plays.keys():
  # print(i)
  print(f"({i}, {dict_unclean_to_clean_pass_fumble_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All passing plays with accepted penalties

df_unclean_pass_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('Pass')) &
                                                      (week1_2023_plays_modified['PlayDescription'].str.contains('PENALTY'))]

map_passing_penalty_plays = unclean_clean_matches(df_unclean_pass_plays, df_week1_plays_cleaned)

for i in map_passing_penalty_plays.keys():
  print(f"({i}, {map_passing_penalty_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All passing plays with lateral

df_unclean_pass_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('Pass')) &
                                                      (week1_2023_plays_modified['PlayDescription'].str.contains('lateral', case=False))]

map_passing_penalty_plays = unclean_clean_matches(df_unclean_pass_plays, df_week1_plays_cleaned)

for i in map_passing_penalty_plays.keys():
  print(f"({i}, {map_passing_penalty_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

### Rushing plays

In [None]:
# Number of running type plays during 2023, Week 1

df_unclean_run_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayOutcome'].str.contains('Run')]

map_run_plays = unclean_clean_matches(df_unclean_run_plays, df_week1_plays_cleaned)

len(map_run_plays.keys())

In [None]:
# Every unclean passing play and their associated cleaned play breakdown

for i in map_run_plays.keys():
  print(f"({i}, {map_run_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# penalty rushing plays

df_unclean_rush_penalty_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('Run')) &
                                                              (week1_2023_plays_modified['PlayDescription'].str.contains('penalty', case=False))]

dict_unclean_rush_penalty_plays = unclean_clean_matches(df_unclean_rush_penalty_plays, df_week1_plays_cleaned)

for i in dict_unclean_rush_penalty_plays.keys():
  print(f"({i}, {dict_unclean_rush_penalty_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# fumbled rushing plays (not including touchdowns)

df_unclean_rush_fumble_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('Run')) &
                                                             (week1_2023_plays_modified['PlayDescription'].str.contains('fumbles', case=False))]

for i in unclean_clean_matches(df_unclean_rush_fumble_plays, df_week1_plays_cleaned).items():
  print(i)

In [None]:
dict_unclean_to_clean_rush_fumble_plays = unclean_clean_matches(df_unclean_rush_fumble_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_rush_fumble_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_rush_fumble_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All rushing touchdowns

df_unclean_pass_plays_touchdown = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('touchdown', case=False))]

list_all_touchdown_rushing_plays = []

for idx, play in df_unclean_pass_plays_touchdown['PlayDescription'].items():
  run_play = re.findall(rusher_pattern, play)
  if len(run_play) > 0:
    list_all_touchdown_rushing_plays.append(idx)

map_rushing_touchdown_plays = unclean_clean_matches(week1_2023_plays_modified.loc[list_all_touchdown_rushing_plays], df_week1_plays_cleaned)

for i in map_rushing_touchdown_plays.keys():
  print(f"({i}, {map_rushing_touchdown_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# lateral rushing plays

df_unclean_rush_penalty_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('Run')) &
                                                              (week1_2023_plays_modified['PlayDescription'].str.contains('lateral', case=False))]

dict_unclean_rush_penalty_plays = unclean_clean_matches(df_unclean_rush_penalty_plays, df_week1_plays_cleaned)

for i in dict_unclean_rush_penalty_plays.keys():
  print(f"({i}, {dict_unclean_rush_penalty_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

###2pt Conversions

In [None]:
# All extra point plays

df_unclean_2pt_conversion_week1 = week1_2023_plays_modified[week1_2023_plays_modified['PlayOutcome'].str.contains('2PT Conversion')]

dict_unclean_to_clean_2ptc = unclean_clean_matches(df_unclean_2pt_conversion_week1, df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_2ptc)} number of 2pt conversion attempts")
print("\n\n")
for i in dict_unclean_to_clean_2ptc.keys():
  print(f"({i}, {dict_unclean_to_clean_2ptc.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All passing 2PT conversion attempts

index_pass_2ptc = []

for i in list(df_unclean_2pt_conversion_week1.index):
  pass_2ptc = re.findall(tp_conversion_pass_pattern, week1_2023_plays_modified['PlayDescription'].iloc[i])
  if len(pass_2ptc) > 0:
    index_pass_2ptc.append(i)

dict_unclean_to_clean_pass_2ptc = unclean_clean_matches(week1_2023_plays_modified.iloc[index_pass_2ptc], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_pass_2ptc)} number of 2pt conversion pass attempts")
print("\n\n")
for i in dict_unclean_to_clean_pass_2ptc.keys():
  print(f"({i}, {dict_unclean_to_clean_pass_2ptc.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All rushing 2PT conversion attempts

index_rush_2ptc = []

for i in list(df_unclean_2pt_conversion_week1.index):
  rush_2ptc = re.findall(tp_conversion_rush_pattern, week1_2023_plays_modified['PlayDescription'].iloc[i])
  if len(rush_2ptc) > 0:
    index_rush_2ptc.append(i)

dict_unclean_to_clean_rush_2ptc = unclean_clean_matches(week1_2023_plays_modified.iloc[index_rush_2ptc], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_rush_2ptc)} number of 2pt conversion attempts")
print("\n\n")
for i in dict_unclean_to_clean_rush_2ptc.keys():
  print(f"({i}, {dict_unclean_to_clean_rush_2ptc.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

### Intercepted plays

In [None]:
df_unclean_intercepted_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayDescription'].str.contains('INTERCEPTED', case=False)) |
                                                             (week1_2023_plays_modified['PlayOutcome'].str.contains('Interception', case=False))]

dict_unclean_to_clean_intercepted_plays = unclean_clean_matches(df_unclean_intercepted_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_intercepted_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_intercepted_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All interceptions with a penalty

df_unclean_intercepted_penalty_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayDescription'].str.contains('INTERCEPTED', case=False)) &
                                                                    (week1_2023_plays_modified['PlayDescription'].str.contains('PENALTY', case=False))]

dict_unclean_to_clean_intercepted_penalty_plays = unclean_clean_matches(df_unclean_intercepted_penalty_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_intercepted_penalty_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_intercepted_penalty_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All interceptions resulting in a touchdown

df_unclean_intercepted_touchdown_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayDescription'].str.contains('INTERCEPTED', case=False)) &
                                                                       (week1_2023_plays_modified['PlayOutcome'].str.contains('touchdown', case=False))]

dict_unclean_to_clean_intercepted_touchdown_plays = unclean_clean_matches(df_unclean_intercepted_touchdown_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_intercepted_touchdown_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_intercepted_touchdown_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

### Sacked Plays

In [None]:
df_unclean_sacked_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('Sack', case=False))]

dict_unclean_to_clean_sacked_plays = unclean_clean_matches(df_unclean_sacked_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_sacked_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_sacked_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All sacked plays resulting in a touchdown

df_unclean_sacked_touchdown_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('touchdown', case=False)) &
                                                                  (week1_2023_plays_modified['PlayDescription'].str.contains('sack', case=False))]

dict_unclean_to_clean_sacked_touchdown_plays = unclean_clean_matches(df_unclean_sacked_touchdown_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_sacked_touchdown_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_sacked_touchdown_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All sacked plays resulting in a fumble

df_unclean_sacked_fumble_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayDescription'].str.contains('sack', case=False)) &
                                                                  (week1_2023_plays_modified['PlayDescription'].str.contains('fumbles', case=False))]

dict_unclean_to_clean_sacked_fumble_plays = unclean_clean_matches(df_unclean_sacked_fumble_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_sacked_fumble_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_sacked_fumble_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All sacked plays with a penalty

df_unclean_sacked_penalty_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayDescription'].str.contains('sack', case=False)) &
                                                                  (week1_2023_plays_modified['PlayDescription'].str.contains('penalty', case=False))]

dict_unclean_to_clean_sacked_penalty_plays = unclean_clean_matches(df_unclean_sacked_penalty_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_sacked_penalty_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_sacked_penalty_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

### Punt Plays

In [None]:
df_unclean_punt_plays = week2_2023_plays_modified.loc[week2_2023_plays_modified['PlayDescription'].str.contains('punts', case=False)]

dict_unclean_to_clean_punt_plays = unclean_clean_matches(df_unclean_punt_plays, df_week2_plays_cleaned)

for i in dict_unclean_to_clean_punt_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_punt_plays.get(i)})")
  play = week2_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# Punts that have penalties

df_punt_penalty_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayDescription'].str.contains('punts', case=False)) &
                                                      (week1_2023_plays_modified['PlayDescription'].str.contains('penalty', case=False))]

dict_unclean_to_clean_punt_penalty_plays = unclean_clean_matches(df_punt_penalty_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_punt_penalty_plays.keys():
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_check = re.findall(punt_return_pattern, play)
  if len(play_check) > 0:
    print(f"({i}, {dict_unclean_to_clean_punt_penalty_plays.get(i)})")
    play_split = play.split(". ")
    for j in play_split:
      print(j)
    print()

In [None]:
# All punt return touchdown plays

df_unclean_punt_touchdown_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayDescription'].str.contains('punts', case=False) &
                                                                week1_2023_plays_modified['PlayDescription'].str.contains('touchdown', case=False)]

dict_unclean_to_clean_punt_touchdown_plays = unclean_clean_matches(df_unclean_punt_touchdown_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_punt_touchdown_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_punt_touchdown_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

### Kickoffs

In [None]:
# All kickoff plays

df_unclean_kickoff_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayOutcome'].str.contains('kickoff', case=False)]

dict_unclean_to_clean_kickoff_plays = unclean_clean_matches(df_unclean_kickoff_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_kickoff_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_kickoff_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All onside kicks

df_unclean_kickoff_onside_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayOutcome'].str.contains('kickoff', case=False) &
                                                                week1_2023_plays_modified['PlayDescription'].str.contains('onside', case=False)]

dict_unclean_to_clean_kickoff_onside_plays = unclean_clean_matches(df_unclean_kickoff_onside_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_kickoff_onside_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_kickoff_onside_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All kickoff fumble plays

df_unclean_kickoff_fumble_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayDescription'].str.contains('kicks', case=False)) &
                                                                (week1_2023_plays_modified['PlayDescription'].str.contains('fumble', case=False))]

dict_unclean_to_clean_kickoff_plays = unclean_clean_matches(df_unclean_kickoff_fumble_plays, df_week1_plays_cleaned)

for i in dict_unclean_to_clean_kickoff_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_kickoff_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

### Touchdown plays

In [None]:
# All touchdown plays

df_unclean_touchdown_plays = week2_2023_plays_modified.loc[week2_2023_plays_modified['PlayOutcome'].str.contains('touchdown', case=False)]

dict_unclean_to_clean_touchdown_plays = unclean_clean_matches(df_unclean_touchdown_plays, df_week2_plays_cleaned)

for i in dict_unclean_to_clean_touchdown_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_touchdown_plays.get(i)})")
  play = week2_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

### Field goals

In [None]:
# All field goal plays

df_unclean_fieldgoal_week1 = week1_2023_plays_modified[week1_2023_plays_modified['PlayOutcome'].str.contains('Field Goal')]

dict_unclean_to_clean_field_goal_plays = unclean_clean_matches(df_unclean_fieldgoal_week1, df_week1_plays_cleaned)

# Number of field goal plays
print(f"{len(dict_unclean_to_clean_field_goal_plays)} number of field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_field_goal_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_field_goal_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All field goal plays (good)

made_field_goal_play_indexes = []

for i in list(df_2023_fieldgoal_week1.index):
  made_field_goal = re.findall(field_goal_good_pattern, week1_2023_plays_modified['PlayDescription'].iloc[i])
  if len(made_field_goal) > 0:
    made_field_goal_play_indexes.append(i)

dict_unclean_to_clean_good_field_goals = unclean_clean_matches(week1_2023_plays_modified.iloc[made_field_goal_play_indexes], df_week1_plays_cleaned)

# Number of field goal plays
print(f"{len(dict_unclean_to_clean_good_field_goals)} number of good field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_good_field_goals.keys():
  print(f"({i}, {dict_unclean_to_clean_good_field_goals.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All field goal plays (no good)

no_good_field_goal_play_indexes = []

for i in list(df_2023_fieldgoal_week1.index):
  made_field_goal = re.findall(field_goal_no_good_pattern, week1_2023_plays_modified['PlayDescription'].iloc[i])
  if len(made_field_goal) > 0:
    no_good_field_goal_play_indexes.append(i)

dict_unclean_to_clean_no_good_field_goals = unclean_clean_matches(week1_2023_plays_modified.iloc[no_good_field_goal_play_indexes], df_week1_plays_cleaned)

# Number of field goal plays
print(f"{len(dict_unclean_to_clean_no_good_field_goals)} number of no good field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_no_good_field_goals.keys():
  print(f"({i}, {dict_unclean_to_clean_no_good_field_goals.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All field goal plays (special)

special_field_goal_play_indexes = []

special_field_goal_play_indexes = list(df_2023_fieldgoal_week1.index)

for i in made_field_goal_play_indexes:
  special_field_goal_play_indexes.pop(special_field_goal_play_indexes.index(i))

for i in no_good_field_goal_play_indexes:
  special_field_goal_play_indexes.pop(special_field_goal_play_indexes.index(i))


dict_unclean_to_clean_special_field_goals = unclean_clean_matches(week1_2023_plays_modified.iloc[special_field_goal_play_indexes], df_week1_plays_cleaned)

# Number of field goal plays
print(f"{len(dict_unclean_to_clean_special_field_goals)} number of special field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_special_field_goals.keys():
  print(f"({i}, {dict_unclean_to_clean_special_field_goals.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

### Extra Points

In [None]:
# All extra point plays

df_unclean_extrapoint_week1 = week1_2023_plays_modified[week1_2023_plays_modified['PlayOutcome'].str.contains('Extra Point')]

dict_unclean_to_clean_extrapoint = unclean_clean_matches(df_unclean_extrapoint_week1, df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_extrapoint)} number of field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_extrapoint.keys():
  print(f"({i}, {dict_unclean_to_clean_extrapoint.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All extra point plays (good)

extra_point_good_index_list = []

for i in list(df_2023_extrapoint_week1.index):
  made_extra_point = re.findall(extra_point_good_pattern, week1_2023_plays_modified['PlayDescription'].iloc[i])
  if len(made_extra_point) > 0:
    extra_point_good_index_list.append(i)

dict_unclean_to_clean_extrapoint_good = unclean_clean_matches(week1_2023_plays_modified.iloc[extra_point_good_index_list], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_extrapoint_good)} number of field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_extrapoint_good.keys():
  print(f"({i}, {dict_unclean_to_clean_extrapoint_good.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All extra point plays (no good)

extra_point_no_good_index_list = []

for i in list(df_2023_extrapoint_week1.index):
  no_good_extra_point = re.findall(extra_point_no_good_pattern, week1_2023_plays_modified['PlayDescription'].iloc[i])
  if len(no_good_extra_point) > 0:
    extra_point_no_good_index_list.append(i)

dict_unclean_to_clean_extrapoint_no_good = unclean_clean_matches(week1_2023_plays_modified.iloc[extra_point_no_good_index_list], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_extrapoint_no_good)} number of field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_extrapoint_no_good.keys():
  print(f"({i}, {dict_unclean_to_clean_extrapoint_no_good.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# Blocked extra point?

### Fumbles

In [None]:
# All fumbled plays

# recovered = offense recovered the ball
# RECOVERED = defense recovered the ball
# and recovers = QB recovered the ball (Or whoever initially fumbled the ball)

df_unclean_fumble_plays = week2_2023_plays_modified.loc[(week2_2023_plays_modified['PlayOutcome'].str.contains('fumble', case=False)) |
                                                        (week2_2023_plays_modified['PlayDescription'].str.contains('fumble', case=False))]

dict_unclean_to_clean_fumble_plays = unclean_clean_matches(df_unclean_fumble_plays, df_week2_plays_cleaned)

print(f"{len(dict_unclean_to_clean_fumble_plays)} number of fumbled plays")
print("\n\n")
for i in dict_unclean_to_clean_fumble_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_fumble_plays.get(i)})")
  play = week2_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All passing fumble plays

index_fumble_pass_plays = []

for i in list(df_unclean_fumble_plays.index):
  fumble_pass = re.findall(receiver_pattern, week1_2023_plays_modified['PlayDescription'].iloc[i])
  if len(fumble_pass) > 0:
    index_fumble_pass_plays.append(i)

dict_unclean_to_clean_fumble_pass = unclean_clean_matches(week1_2023_plays_modified.iloc[index_fumble_pass_plays], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_fumble_pass)} number of passing fumble plays")
print("\n\n")
for i in dict_unclean_to_clean_fumble_pass.keys():
  print(f"({i}, {dict_unclean_to_clean_fumble_pass.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All rushing fumble plays

index_fumble_run_plays = []

for i in list(df_unclean_fumble_plays.index):
  fumble_pass = re.findall(rusher_pattern, week1_2023_plays_modified['PlayDescription'].iloc[i])
  if len(fumble_pass) > 0:
    index_fumble_run_plays.append(i)

dict_unclean_to_clean_fumble_run = unclean_clean_matches(week1_2023_plays_modified.iloc[index_fumble_run_plays], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_fumble_run)} number of rushing fumble plays")
print("\n\n")
for i in dict_unclean_to_clean_fumble_run.keys():
  print(f"({i}, {dict_unclean_to_clean_fumble_run.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All sacked fumble plays

index_fumble_sacked_plays = []

for i in list(df_unclean_fumble_plays.index):
  if week1_2023_plays_modified['PlayDescription'].iloc[i].find('sacked') != -1:
    index_fumble_sacked_plays.append(i)

dict_unclean_to_clean_sacked_fumble = unclean_clean_matches(week1_2023_plays_modified.iloc[index_fumble_sacked_plays], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_sacked_fumble)} number of sacked fumble plays")
print("\n\n")
for i in dict_unclean_to_clean_sacked_fumble.keys():
  print(f"({i}, {dict_unclean_to_clean_sacked_fumble.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All Aborted fumbled plays

# week1_2023_plays['PlayOutcome'].str.contains('fumble', case=False)

index_fumble_aborted_plays = []

for i in list(df_unclean_fumble_plays.index):
  if week1_2023_plays_modified['PlayDescription'].iloc[i].find('Aborted') != -1:
    index_fumble_aborted_plays.append(i)

dict_unclean_to_clean_aborted_fumble = unclean_clean_matches(week1_2023_plays_modified.iloc[index_fumble_aborted_plays], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_aborted_fumble)} number of aborted fumble plays")
print("\n\n")
for i in dict_unclean_to_clean_aborted_fumble.keys():
  print(f"({i}, {dict_unclean_to_clean_aborted_fumble.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All special fumbled plays

index_fumble_special_plays = list(df_unclean_fumble_plays.index)

for i in index_fumble_pass_plays:
  index_fumble_special_plays.pop(index_fumble_special_plays.index(i))

for i in index_fumble_run_plays:
  index_fumble_special_plays.pop(index_fumble_special_plays.index(i))

for i in index_fumble_sacked_plays:
  index_fumble_special_plays.pop(index_fumble_special_plays.index(i))

for i in index_fumble_aborted_plays:
  index_fumble_special_plays.pop(index_fumble_special_plays.index(i))

dict_unclean_to_clean_fumble_special = unclean_clean_matches(week1_2023_plays_modified.iloc[index_fumble_special_plays], df_week1_plays_cleaned)

print(f"{len(dict_unclean_to_clean_fumble_special)} number of special fumble plays")
print("\n\n")
for i in dict_unclean_to_clean_fumble_special.keys():
  print(f"({i}, {dict_unclean_to_clean_fumble_special.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

### Penalties

In [None]:
# What is the difference between these penalties and penalties in other play outcomes?

# All plays with "penalty" outcomes

df_unclean_penalty_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayOutcome'].str.contains('penalty', case=False)]

dict_unclean_to_clean_penalty_plays = unclean_clean_matches(df_unclean_penalty_plays, df_week1_plays_cleaned)

# Number of penalty plays
print(f"{len(df_unclean_penalty_plays)} number of penalty plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All passing plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_passing_plays = []

for idx, play in df_unclean_penalty_plays['PlayDescription'].items():
  passing_play = re.findall(receiver_pattern, play)
  if len(passing_play) > 0 or play.find('pass incomplete') != -1:
    list_unclean_penalty_passing_plays.append(idx)

# Dataframe of all passing plays with "penalty" outcomes
df_unclean_penalty_passing_plays = week1_2023_plays_modified.iloc[list_unclean_penalty_passing_plays]

dict_unclean_to_clean_penalty_passing_plays = unclean_clean_matches(df_unclean_penalty_passing_plays, df_week1_plays_cleaned)

# Number of passing penalty plays
print(f"{len(list_unclean_penalty_passing_plays)} number of passing penalty plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_passing_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_passing_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All rushing plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_rushing_plays = []

for idx, play in df_unclean_penalty_plays['PlayDescription'].items():
  rushing_play = re.findall(rusher_pattern, play)
  if len(rushing_play) > 0 or play.find('Aborted') != -1:
    list_unclean_penalty_rushing_plays.append(idx)

# Dataframe of all passing plays with "penalty" outcomes
df_unclean_penalty_rushing_plays = week1_2023_plays_modified.iloc[list_unclean_penalty_rushing_plays]

dict_unclean_to_clean_penalty_rushing_plays = unclean_clean_matches(df_unclean_penalty_rushing_plays, df_week1_plays_cleaned)

# Number of rushing penalty plays
print(f"{len(list_unclean_penalty_rushing_plays)} number of passing penalty plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_rushing_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_rushing_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All "False Start" or "Delay of Game" plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_false_start_plays = []

for idx, play in df_unclean_penalty_plays['PlayDescription'].items():
  if play.find('False Start') != -1 or play.find('Delay of Game') != -1:
    list_unclean_penalty_false_start_plays.append(idx)

dict_unclean_to_clean_penalty_false_start_plays = unclean_clean_matches(week1_2023_plays_modified.iloc[list_unclean_penalty_false_start_plays], df_week1_plays_cleaned)

# Number of rushing penalty plays
print(f"{len(list_unclean_penalty_false_start_plays)} number of false start plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_false_start_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_false_start_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All sacked plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_sacked_plays = []

for idx, play in df_unclean_penalty_plays['PlayDescription'].items():
  if play.find('sacked') != -1:
    list_unclean_penalty_sacked_plays.append(idx)

dict_unclean_to_clean_penalty_sacked_plays = unclean_clean_matches(week1_2023_plays_modified.iloc[list_unclean_penalty_sacked_plays], df_week1_plays_cleaned)

# Number of rushing penalty plays
print(f"{len(list_unclean_penalty_sacked_plays)} number of false start plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_sacked_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_sacked_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All TWO-POINT CONVERSION ATTEMPT plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_2pt_plays = []

for idx, play in df_unclean_penalty_plays['PlayDescription'].items():
  rushing_play = re.findall(rusher_pattern, play)
  if play.find('TWO-POINT CONVERSION ATTEMPT') != -1 and len(rushing_play) == 0:
    list_unclean_penalty_2pt_plays.append(idx)

dict_unclean_to_clean_penalty_2pt_plays = unclean_clean_matches(week1_2023_plays_modified.iloc[list_unclean_penalty_2pt_plays], df_week1_plays_cleaned)

# Number of rushing penalty plays
print(f"{len(list_unclean_penalty_2pt_plays)} number of false start plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_2pt_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_2pt_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All special plays with "penalty" outcomes

# Grabbing all penalty plays within original dataframe
df_unclean_penalty_plays = week1_2023_plays_modified.loc[week1_2023_plays_modified['PlayOutcome'].str.contains('penalty', case=False)]

# List for all indexes that are passing play type penalty plays
list_unclean_penalty_special_plays = list(df_unclean_penalty_plays.index)

for i in list_unclean_penalty_passing_plays:
  list_unclean_penalty_special_plays.pop(list_unclean_penalty_special_plays.index(i))

for i in list_unclean_penalty_rushing_plays:
  list_unclean_penalty_special_plays.pop(list_unclean_penalty_special_plays.index(i))

for i in list_unclean_penalty_false_start_plays:
  list_unclean_penalty_special_plays.pop(list_unclean_penalty_special_plays.index(i))

for i in list_unclean_penalty_sacked_plays:
  list_unclean_penalty_special_plays.pop(list_unclean_penalty_special_plays.index(i))

for i in list_unclean_penalty_2pt_plays:
  list_unclean_penalty_special_plays.pop(list_unclean_penalty_special_plays.index(i))

dict_unclean_to_clean_penalty_special_plays = unclean_clean_matches(week1_2023_plays_modified.iloc[list_unclean_penalty_special_plays], df_week1_plays_cleaned)

# Number of rushing penalty plays
print(f"{len(list_unclean_penalty_special_plays)} number of special penalty plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_special_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_special_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

In [None]:
# All interceptions with some type of penalty

df_unclean_penalty_plays = week1_2023_plays_modified.loc[(week1_2023_plays_modified['PlayOutcome'].str.contains('penalty', case=False)) &
                                                         (week1_2023_plays_modified['PlayDescription'].str.contains('interception', case=False))]

dict_unclean_to_clean_penalty_plays = unclean_clean_matches(df_unclean_penalty_plays, df_week1_plays_cleaned)

# Number of penalty plays
print(f"{len(df_unclean_penalty_plays)} number of penalty plays")
print("\n\n")
for i in dict_unclean_to_clean_penalty_plays.keys():
  print(f"({i}, {dict_unclean_to_clean_penalty_plays.get(i)})")
  play = week1_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

### Turnover On Downs

In [None]:
# All turnover on downs

df_unclean_turnover_on_downs_week1 = week2_2023_plays_modified[week2_2023_plays_modified['PlayOutcome'].str.contains('Turnover on Downs')]

dict_unclean_to_clean_turnover_on_downs = unclean_clean_matches(df_unclean_turnover_on_downs_week1, df_week2_plays_cleaned)

print(f"{len(dict_unclean_to_clean_turnover_on_downs)} number of field goal plays")
print("\n\n")
for i in dict_unclean_to_clean_turnover_on_downs.keys():
  print(f"({i}, {dict_unclean_to_clean_turnover_on_downs.get(i)})")
  play = week2_2023_plays_modified['PlayDescription'].iloc[i]
  play_split = play.split(". ")
  for j in play_split:
    print(j)
  print()

## Index searching

In [None]:
week2_2023_plays_modified['PlayDescription'].iloc[1196]

In [None]:
# df_week2_plays_cleaned.iloc[1425]
# week2_2023_plays_modified.iloc[1943]

# (3:45) (Shotgun) P.Mahomes pass short right to K.Toney to KC 22 for -1 yards (T.Herndon)
# FUMBLES (T.Herndon), and recovers at KC 16
# K.Toney to KC 12 for -4 yards (Dari.Williams, T.Herndon).

# - How can I format this play to obtain the correct recorded stats while
#   capturing all actions within the play?

# - Recorded stats
#   Passer = P.Mahomes
#   Receiver = K.Toney
#   Yardage = -11 yards
#             - (start = KC 23)
#             - (end = KC 12)

# CLEANED DATASET OBSERVATIONS
- Attempting to grab basic stats on players for a single game

## Helper Methods

In [None]:
# Get rid of duplicate rows (for play's that have multiple rows)

def no_duplicates(df_with_duplicates, index_start=None):

  # exit case
  # - The last element has been grabbed
  if df_with_duplicates.tail(1).index[0] == index_start:
    return df_with_duplicates

  if index_start == None:
    index_start = df_with_duplicates.index[0]

  first_element = df_with_duplicates.loc[index_start]

  second_element = df_with_duplicates.iloc[df_with_duplicates.index.tolist().index(index_start)+1]

  # Features that will decipher whether the two rows are apart of the same play
  # matching_features = ['Season', 'Week', 'Date', 'AwayTeam', 'HomeTeam', 'Quarter', 'DriveNumber', 'TeamWithPossession', 'PlayNumberInDrive']
  matching_features = ['Season', 'Week', 'Date', 'AwayTeam', 'HomeTeam', 'Quarter', 'DriveNumber', 'PlayNumberInDrive']

  # - Check to see if 1st and 2nd elements are match
  if first_element[matching_features].equals(second_element[matching_features]):
    # 1. remove 2nd element
    df_with_duplicates = df_with_duplicates.drop(df_with_duplicates.index[df_with_duplicates.index.tolist().index(index_start)+1], inplace=False)
    # 2. run method starting search from 1st element
    #    - This is in case more matches to 1st element
    return no_duplicates(df_with_duplicates, index_start)
  else:
    # 1. run method starting search from 2nd element
    #    - 2nd element will become '1st element'
    #    - after 2nd element will become '2nd element'
    return no_duplicates(df_with_duplicates, df_with_duplicates.index[df_with_duplicates.index.tolist().index(index_start)+1])

**Table Creation**
- Goal is to mirror tables on NFL.com for testing purposes

In [None]:
# Display the scoring table for a specified game.

def score_table(away_team, home_team, df_cleaned_plays, dict_of_teams):
  df_all_plays_in_game = df_cleaned_plays.loc[(df_cleaned_plays['HomeTeam'] == home_team) &
                                              (df_cleaned_plays['AwayTeam'] == away_team)]

  df_all_plays_in_game = no_duplicates(df_all_plays_in_game)

  teams = [[away_team],[home_team]]

  for i in range(len(teams)):
    total_score = 0
    for quarter in df_all_plays_in_game['Quarter'].unique():
      quarter_score = 0

      # touchdowns
      df_touchdowns_in_quarter = df_all_plays_in_game.loc[(df_all_plays_in_game['PlayOutcome'].str.contains(f'touchdown {teams[i][0]}', case=False)) &
                                                          (df_all_plays_in_game['Quarter'] == quarter)]
      quarter_score += df_touchdowns_in_quarter.shape[0] * 6

      # PAT
      df_extra_points_in_quarter = df_all_plays_in_game.loc[(df_all_plays_in_game['PlayOutcome'].str.contains('Extra Point Good', case=False)) &
                                                            (df_all_plays_in_game['Quarter'] == quarter) &
                                                            (df_all_plays_in_game['TeamWithPossession'] == dict_of_teams.get(teams[i][0]))]
      quarter_score += df_extra_points_in_quarter.shape[0] * 1

      # field goals
      df_field_goals_in_quarter = df_all_plays_in_game.loc[(df_all_plays_in_game['PlayOutcome'].str.contains('Field Goal Good', case = False)) &
                                                          (df_all_plays_in_game['Quarter'] == quarter) &
                                                          (df_all_plays_in_game['TeamWithPossession'] == dict_of_teams.get(teams[i][0]))]
      quarter_score += df_field_goals_in_quarter.shape[0] * 3

      # 2 Pt Conversions
      df_two_pt = df_all_plays_in_game.loc[(df_all_plays_in_game['PlayOutcome'].str.contains('2PT Conversion Success', case=False)) &
                                           (df_all_plays_in_game['Quarter'] == quarter) &
                                           (df_all_plays_in_game['TeamWithPossession'] == dict_of_teams.get(teams[i][0]))]
      quarter_score += df_two_pt.shape[0] * 2

      teams[i].append(quarter_score)
      total_score += quarter_score

    teams[i].append(total_score)
    teams[i].pop(0)

  scoring_columns = df_all_plays_in_game['Quarter'].unique().tolist()

  scoring_columns.append("Total")

  return pd.DataFrame(teams, columns = scoring_columns, index=[dict_of_teams.get(away_team), dict_of_teams.get(home_team)])

INDEX:
- Each quarterback that played in game
COLUMNS:
- CP/ATT - completions / pass attempts
- YDS - total passing yards
- TD - total touchdowns thrown
- INT - total interceptions thrown

In [None]:
# Display quarterback stats for a specified game

# def passing_table(away_team, home_team, df_cleaned_plays, dict_acronym_to_team):
def passing_table(away_team, home_team, df_cleaned_plays, dict_acronym_to_team, home_or_away):


  # User decides which receiving table they want (home or away)
  if home_or_away == 0:
    team = home_team
  elif home_or_away == 1:
    team = away_team
  else:
    return "home_or_away parameter should be a 0 or 1. (home = 0, away = 1)"

  # All plays within game
  df_all_plays_in_game = df_cleaned_plays.loc[(df_cleaned_plays['HomeTeam'] == home_team) &
                                              (df_cleaned_plays['AwayTeam'] == away_team)]

  # list of quarterbacks in game
  list_qbs = df_all_plays_in_game['Passer'].unique().tolist()
  if 'nan' in list_qbs:
    list_qbs.pop(list_qbs.index('nan'))

  #   key: quarterback
  # value: team
  dict_qbs_to_team = {}
  for qb in list_qbs:
    dict_qbs_to_team[qb] = dict_acronym_to_team.get(df_all_plays_in_game['TeamWithPossession'].loc[df_all_plays_in_game['Passer'] == qb].value_counts().index[0])

  # Filtering 'dict_qbs_to_team'
  #   key: quarterback(s) from desired team
  # value: team
  dict_qbs_to_team = {k:v for k,v in dict_qbs_to_team.items() if v == team}

  df_quarterback_data = pd.DataFrame(columns=["CP/ATT", "YDS", "TD", "INT"], index = list(dict_qbs_to_team.keys()))

  # Grabbing data for each quarterback in game
  for qb in df_quarterback_data.index:

    passing_attempts = df_all_plays_in_game.loc[(df_all_plays_in_game['Passer'] == qb) &
                                                (df_all_plays_in_game['PlayType'] == "Pass")]

    passing_completions = passing_attempts.loc[(passing_attempts['PlayOutcome'].str.contains('yard pass', case=False)) |
                                               (passing_attempts['PlayOutcome'].str.contains(f'touchdown {dict_qbs_to_team.get(qb)}', case=False)) |
                                               (passing_attempts['PlayOutcome'].str.contains("Turnover On Downs \(C\)", case=False)) |
                                               (passing_attempts['PlayOutcome'].str.contains("Pass for No Gain", case=False)) |
                                               (passing_attempts['PlayOutcome'].str.contains("Fumble \(C\)", case=False))]

    df_quarterback_data.loc[qb, 'CP/ATT'] = f"{passing_completions.shape[0]}/{passing_attempts.shape[0]}"

    df_quarterback_data.loc[qb, 'YDS'] = int(passing_completions['Yardage'].sum())

    total_touchdowns = passing_completions.loc[passing_completions['PlayOutcome'].str.contains(f'touchdown {dict_qbs_to_team.get(qb)}', case=False)]

    df_quarterback_data.loc[qb, 'TD'] = total_touchdowns.shape[0]

    total_interceptions = passing_attempts.loc[passing_attempts['PlayDescription'].str.contains('intercepted', case=False)]

    df_quarterback_data.loc[qb, 'INT'] = total_interceptions.shape[0]

  return df_quarterback_data

In [None]:
# Display rushing stats for a specified game

def rushing_table(away_team, home_team, df_cleaned_plays, dict_acronym_to_team, home_or_away):

  # All plays within game
  df_all_plays_in_game = df_cleaned_plays.loc[(df_cleaned_plays['HomeTeam'] == home_team) &
                                              (df_cleaned_plays['AwayTeam'] == away_team)]

  # User decides which rushing table they want (home or away)
  if home_or_away == 0:
    team = home_team
  elif home_or_away == 1:
    team = away_team
  else:
    return "home_or_away parameter should be a 0 or 1. (home = 0, away = 1)"

  print(f"Rushing Table: {team}")
  print()

  df_team_rushing_plays = df_all_plays_in_game.loc[(df_all_plays_in_game['PlayType'] == 'Run') &
                                                   (df_all_plays_in_game['TeamWithPossession'] == dict_acronym_to_team.get(team)) &
                                                   (~df_all_plays_in_game['PlayOutcome'].str.contains('2PT Conversion', case=False))]

  list_rushers = df_team_rushing_plays['Rusher'].unique().tolist()
  if 'nan' in list_rushers:
    list_rushers.pop(list_rushers.index('nan'))

  # Dataframe that will be returned
  df_rusher_data = pd.DataFrame(columns=["CAR", "YDS", "TD", "AVG"], index = list_rushers)

  for rb in df_rusher_data.index:
    rusher_plays = df_team_rushing_plays.loc[df_team_rushing_plays['Rusher'] == rb]
    df_rusher_data.loc[rb, 'CAR'] = rusher_plays.shape[0] - rusher_plays.loc[rusher_plays['PlayType'] == 'lateral after run'].shape[0]
    df_rusher_data.loc[rb, 'YDS'] = int(rusher_plays['Yardage'].sum())
    df_rusher_data.loc[rb, 'TD'] = rusher_plays.loc[rusher_plays['PlayOutcome'].str.contains('touchdown', case=False)].shape[0]
    df_rusher_data.loc[rb, 'AVG'] = round(rusher_plays['Yardage'].mean(), 2)

  return df_rusher_data.sort_values(by="YDS", ascending=False)

In [None]:
# Display receiving stats for a specified game

def receiving_table(away_team, home_team, df_cleaned_plays, dict_acronym_to_team, home_or_away):

  # All plays within game
  df_all_plays_in_game = df_cleaned_plays.loc[(df_cleaned_plays['HomeTeam'] == home_team) &
                                              (df_cleaned_plays['AwayTeam'] == away_team)]

  # User decides which receiving table they want (home or away)
  if home_or_away == 0:
    team = home_team
  elif home_or_away == 1:
    team = away_team
  else:
    return "home_or_away parameter should be a 0 or 1. (home = 0, away = 1)"

  print(f"Receiving Table: {team}")
  print()

  df_receiving_plays = df_all_plays_in_game.loc[(df_all_plays_in_game['PlayType'].str.contains('pass', case=False)) &
                                                (df_all_plays_in_game['TeamWithPossession'] == dict_teams.get(team)) &
                                                (~df_all_plays_in_game['PlayOutcome'].str.contains('2PT Conversion', case=False))]

  receivers = df_receiving_plays['Receiver'].unique().tolist()
  if 'nan' in receivers:
    receivers.pop(receivers.index('nan'))

  df_receiver_data = pd.DataFrame(columns=["REC", "YDS", "TD", "TGTS"], index = receivers)

  # print(df_receiver_data)

  for receiver in df_receiver_data.index:
    receiver_plays = df_receiving_plays.loc[df_receiving_plays['Receiver'] == receiver]
    # df_receiver_data.loc[receiver, 'REC'] = receiver_plays.loc[(receiver_plays['PlayOutcome'].str.contains('yard pass', case=False)) |
    #                                                           (receiver_plays['PlayOutcome'].str.contains(f'Touchdown {team}', case=False)) |
    #                                                           (receiver_plays['PlayOutcome'].str.contains("Turnover On Downs \(C\)", case=False)) |
    #                                                           (receiver_plays['PlayOutcome'].str.contains("Pass for No Gain", case=False)) |
    #                                                           (receiver_plays['PlayOutcome'].str.contains("Fumble \(C\)", case=False))].shape[0]

    df_receiver_data.loc[receiver, 'REC'] = receiver_plays.loc[(
        (receiver_plays['PlayOutcome'].str.contains('yard pass', case=False)) |
         (receiver_plays['PlayOutcome'].str.contains(f'Touchdown {team}', case=False)) |
          (receiver_plays['PlayOutcome'].str.contains("Turnover On Downs \(C\)", case=False)) |
           (receiver_plays['PlayOutcome'].str.contains("Pass for No Gain", case=False)) |
            (receiver_plays['PlayOutcome'].str.contains("Fumble \(C\)", case=False))) &
                                                               ~ (receiver_plays['PlayType'].str.contains("lateral after pass", case=False))
                                                               ].shape[0]

    df_receiver_data.loc[receiver, 'YDS'] = int(receiver_plays['Yardage'].sum())
    df_receiver_data.loc[receiver, 'TD'] = receiver_plays.loc[receiver_plays['PlayOutcome'].str.contains(f'touchdown {team}', case=False)].shape[0]
    df_receiver_data.loc[receiver, 'TGTS'] = receiver_plays.shape[0] - receiver_plays.loc[receiver_plays['PlayType'] == 'lateral after pass'].shape[0]

  return df_receiver_data.sort_values(by="YDS", ascending=False)

In [None]:
# Display all fumble stats for a specified game

# NOTE:
# - nfl.com does not have the correct table stats.
#   - nfl.com has a lot of flaws in their play by play data

# - I need to grab all players that recovered a fumbled ball.

def fumble_table(away_team, home_team, df_cleaned_plays, dict_acronym_to_team, home_or_away):

  # User decides which fumble table they want (home or away)
  if home_or_away == 0:
    team = home_team
  elif home_or_away == 1:
    team = away_team
  else:
    return "home_or_away parameter should be a 0 or 1. (home = 0, away = 1)"

  print(f"Fumbles Table: {team}")
  print()

  # Dataframe of all plays within game
  df_all_plays_in_game = df_cleaned_plays.loc[(df_cleaned_plays['HomeTeam'] == home_team) &
                                              (df_cleaned_plays['AwayTeam'] == away_team) &
                                              (df_cleaned_plays['PlayType'] != 'No Play')]

  # Dataframe of all fumbles within game
  df_all_fumbles_in_game = df_all_plays_in_game.loc[df_all_plays_in_game['FumbleDetails'] != 'nan']

  # Dataframe of all desired team fumbles
  df_team_fumbles = df_all_fumbles_in_game.loc[df_all_fumbles_in_game['TeamWithPossession'] == dict_acronym_to_team.get(team)]

  # Dataframe of all fumbles that were recovered by a player (both desired team and opposing)
  df_team_fumbles_recovered = df_all_fumbles_in_game.loc[df_all_fumbles_in_game['FumbleRecoveredBy'] != 'nan']

  # Dataframe for return 'FUMBLES' table
  df_team_fumble_data = pd.DataFrame(columns=["FUM", "LOST", "FF", "REC"])


  # NOTE: I might be able to add both players who fumbled and players who recovered here in a list and make it work.

  # List of all desired team players who fumbled
  list_team_fumblers = df_team_fumbles['WhoFumbled'].unique().tolist()
  if 'nan' in list_team_fumblers:
    list_team_fumblers.pop(list_team_fumblers.index('nan'))


  # Filling 'FUMBLES' table (FUM, LOST, REC)

  # - Cycle through all players who have fumbled on the team
  for player in list_team_fumblers:
    fum = 0
    lost = 0
    rec = 0
    df_all_player_fumbles = df_team_fumbles.loc[df_team_fumbles['WhoFumbled'] == player]
    for idx, row in df_all_player_fumbles.iterrows():
      fum = fum + 1
      # Nobody recovered the ball
      if row['FumbleRecoveredBy'] == 'nan':
        continue

      # Looking at who recovered the ball (The person who fumbled also recovered)
      if row['FumbleDetails'].find('and recovers') != -1:
        rec = rec + 1
        continue

      # When a player fumbled the ball, did it result in a turnover?
      if 'recovered' in row['FumbleDetails'].lower():
        fumble_recovery = re.findall(fumble_recovery_pattern, row['FumbleDetails'])
        if len(fumble_recovery) > 0:
          fumble_recovery_team = fumble_recovery[0][0].split("-")[0]
          if fumble_recovery_team != row['TeamWithPossession']:
            lost = lost + 1

    df_team_fumble_data.loc[player, 'FUM'] = fum
    df_team_fumble_data.loc[player, 'LOST'] = lost
    df_team_fumble_data.loc[player, 'REC'] = rec


  # - Cycling through all players who recovered a fumble on both teams.
  for idx, row in df_team_fumbles_recovered.iterrows():
    fumble_recovery = re.findall(fumble_recovery_pattern, row['FumbleDetails'])
    if len(fumble_recovery) > 0:
      fumble_recovery_team = dict_teams_2.get(fumble_recovery[0][0].split("-")[0])
      player = fumble_recovery[0][0].split("-")[1:][0]
      if fumble_recovery_team == team:
        if player not in df_team_fumble_data.index.tolist():
          df_team_fumble_data.loc[player, 'REC'] = 1
        else:
          df_team_fumble_data.loc[player, 'REC'] = df_team_fumble_data.loc[player, 'REC'] + 1


  # Filling 'FUMBLES' table (FF)
  if home_or_away == 0:
    df_team_forced_fumble_data = df_all_fumbles_in_game.loc[df_all_fumbles_in_game['TeamWithPossession'] == dict_acronym_to_team.get(away_team)]
  else:
    df_team_forced_fumble_data = df_all_fumbles_in_game.loc[df_all_fumbles_in_game['TeamWithPossession'] == dict_acronym_to_team.get(home_team)]

  # All team players with a forced fumble
  list_team_forced_fumblers = df_team_forced_fumble_data['ForcedFumbleBy'].unique().tolist()
  if 'nan' in list_team_forced_fumblers:
    list_team_forced_fumblers.pop(list_team_forced_fumblers.index('nan'))

  for player in list_team_forced_fumblers:
    df_team_fumble_data.loc[player, 'FF'] = df_team_forced_fumble_data.loc[df_team_forced_fumble_data['ForcedFumbleBy'] == player].shape[0]

  for idx, row in df_team_fumble_data.iterrows():
    for col in df_team_fumble_data.columns:
      if pd.isna(df_team_fumble_data.loc[idx, col]):
        df_team_fumble_data.loc[idx, col] = 0

  return df_team_fumble_data

In [None]:
# Display interception table

def interception_table(away_team, home_team, df_cleaned_plays, dict_acronym_to_team, home_or_away):

  # All plays within game
  df_all_plays_in_game = df_cleaned_plays.loc[(df_cleaned_plays['HomeTeam'] == home_team) &
                                              (df_cleaned_plays['AwayTeam'] == away_team)]

  # User decides which fumble table they want (home or away)
  if home_or_away == 0:
    team = away_team
    unwanted_team = home_team
  elif home_or_away == 1:
    team = home_team
    unwanted_team = away_team
  else:
    return "home_or_away parameter should be a 0 or 1. (home = 0, away = 1)"

  print(f"Interception Table: {unwanted_team}")
  print()

  # All team defensive plays
  # df_team_defensive_plays = df_all_plays_in_game.loc[df_all_plays_in_game['TeamWithPossession'] == dict_acronym_to_team.get(team)]
  df_team_defensive_plays = df_all_plays_in_game.loc[(df_all_plays_in_game['TeamWithPossession'] == dict_acronym_to_team.get(team)) &
                                                     (df_all_plays_in_game['PlayType'] != 'No Play')]

  # Interception table
  df_interception_table = pd.DataFrame(columns=["INT", "YDS", "PD"])

  # All team interceptions
  df_team_interceptions = df_team_defensive_plays.loc[df_team_defensive_plays['InterceptedBy'] != 'nan']

  # list of players with interceptions
  list_team_interception_players = df_team_interceptions['InterceptedBy'].unique().tolist()

  for player in list_team_interception_players:
    df_interception_table.loc[player, 'INT'] = df_team_interceptions.loc[df_team_interceptions['InterceptedBy'] == player].shape[0]
    df_interception_table.loc[player, 'YDS'] = int(df_all_plays_in_game['Yardage'].loc[(df_all_plays_in_game['PlayType'] == 'Run After Interception') &
                                                                                          (df_all_plays_in_game['Rusher'] == player)].sum())

  # All team pass defends
  df_team_pass_defends = df_team_defensive_plays.loc[df_team_defensive_plays['PassDefendedBy'] != 'nan']

  # list of players who have defended passes (sometimes there are multiple defenders for a single play)
  list_team_pass_defenders = df_team_pass_defends['PassDefendedBy'].loc[df_team_pass_defends['PassDefendedBy'] != 'nan'].tolist()

  # list of all players involved in defended passes
  list_team_pass_defenders_split = []
  for player in list_team_pass_defenders:
    if isinstance(player, tuple):
      for p in player:
        list_team_pass_defenders_split.append(p)
    else:
        list_team_pass_defenders_split.append(player)

  for pass_defender in list_team_pass_defenders_split:
    df_interception_table.loc[pass_defender, 'PD'] = list_team_pass_defenders_split.count(pass_defender)

  # Replacing all 'nan' values with 0
  for idx, row in df_interception_table.iterrows():
    for col in df_interception_table.columns:
      if pd.isna(df_interception_table.loc[idx, col]):
        df_interception_table.loc[idx, col] = 0

  return df_interception_table

In [None]:
# Display Defense table

# QUESTIONS:
# 1. Is a TFL when its '<= 0 yards' or '< 0 yards'

def defense_table(away_team, home_team, df_cleaned_plays, dict_team_name_to_acronym, home_or_away):

  if home_or_away == 0:
    wanted_team = home_team
    unwanted_team = away_team
  elif home_or_away == 1:
    wanted_team = away_team
    unwanted_team = home_team
  else:
    return "home_or_away parameter should be a 0 or 1. (home = 0, away = 1)"

  print(f"Defense Table: {wanted_team}")
  print()

  # Dataframe of all defensive and offensive players that contributed defensively
  df_wanted_team_defense_plays = df_cleaned_plays.loc[
      (df_cleaned_plays['TeamWithPossession'] == dict_team_name_to_acronym.get(unwanted_team))
      &
      (~df_cleaned_plays['PlayType'].str.contains('No Play', case=False))
  ]

  # List of all players that recorded a solo tackle
  list_solo_tackles = []
  for idx, solo_tackler in df_wanted_team_defense_plays['SoloTackle'].items():
    if solo_tackler == 'nan':
      continue
    else:
      list_solo_tackles.append(solo_tackler)

  # List of all recorded assisted tackles by players
  list_assisted_tackles = []
  for idx, assisted_tackler in df_wanted_team_defense_plays['AssistedTackle'].items():
    if assisted_tackler == 'nan':
      continue
    else:
      for player in assisted_tackler:
        list_assisted_tackles.append(player)

  # List of all recorded shared tackles by players
  list_shared_tackles = []
  for idx, shared_tackler in df_wanted_team_defense_plays['SharedTackle'].items():
    if shared_tackler == 'nan':
      continue
    else:
      for player in shared_tackler:
        list_shared_tackles.append(player)

  # List of every tackle recorded in the game for the wanted team by player name
  list_tackles = list_solo_tackles + list_assisted_tackles + list_shared_tackles

  # List of all players that made a defensive impact for wanted team
  # No duplicates
  list_defensive_players = []
  for name in list_tackles:
    if name not in list_defensive_players:
      list_defensive_players.append(name)

  # Return dataframe
  df_defense_data = pd.DataFrame(columns=["T-A", "SACK", "TFL", "TD", "Tackles", "LastName"])

  # Filling dataframe
  for player in list_defensive_players:
    # Grabbing all last names for dataframe sorting purposes
    df_defense_data.loc[player, 'LastName'] = player.split(".")[1]
    # Grabbing all solo tackles for each player in dataframe for sorting purposes
    df_defense_data.loc[player, 'Tackles'] = list_solo_tackles.count(player)
    # T = Solo tackles
    # A = Assisted tackles + Shared tackles
    df_defense_data.loc[player, 'T-A'] = f"{list_solo_tackles.count(player)}-{list_assisted_tackles.count(player) + list_shared_tackles.count(player)}"
    # All solo sacks
    df_defense_data.loc[player, 'SACK'] = df_wanted_team_defense_plays.loc[df_wanted_team_defense_plays['SackedBy'] == player].shape[0]

    # Tackle for loss (version 2)
    df_defense_data.loc[player, 'TFL'] = df_wanted_team_defense_plays.loc[
        (df_wanted_team_defense_plays['SoloTackle'] == player)
        &
        (df_wanted_team_defense_plays['TeamWithPossession'] != dict_team_name_to_acronym.get(wanted_team))
        &
        (df_wanted_team_defense_plays['Yardage'] < 0)].shape[0]

    # TD count for defenders that scoop or intercept and score.
    df_defense_data.loc[player, 'TD'] = df_cleaned_plays.loc[
        (df_cleaned_plays['Rusher'] == player)
        &
        (df_cleaned_plays['PlayType'].isin(['Run After Interception', 'Fumble Return']))
        &
        (df_cleaned_plays['PlayOutcome'].str.contains('touchdown', case=False))
        &
        (df_cleaned_plays['WhoFumbled'] != player)].shape[0]

  # Grabbing assisted tackles that result in a tackle for loss, although they are assisted tackles they are recorded as a TFL
  # - This is not the case for some organizations that record nfl stats. Some may say that an assisted tackle for loss is just an assisted tackle.
  for idx, assisted_tackler in df_wanted_team_defense_plays['AssistedTackle'].items():
    if assisted_tackler == 'nan':
      continue
    else:
      # (1:17) J.Ford right guard to CLE 29 for -4 yards (D.Hill, G.Pratt)
      # - in this scenario, I believe that the first player (D.Hill) is awarded the TFL.
      for player in assisted_tackler:
        if df_wanted_team_defense_plays['Yardage'].loc[idx] < 0:
          df_defense_data.loc[player, 'TFL'] = df_defense_data.loc[player, 'TFL'] + 1

  # ^^^ shared tackles follow the same rules as above
  for idx, shared_tackler in df_wanted_team_defense_plays['SharedTackle'].items():
    if shared_tackler == 'nan':
      continue
    else:
      # for player in shared_tackler:
      #   if df_wanted_team_defense_plays['Yardage'].loc[idx] < 0:
      #     df_defense_data.loc[player, 'TFL'] = df_defense_data.loc[player, 'TFL'] + 1
      if isinstance(shared_tackler, tuple):
        if df_wanted_team_defense_plays['Yardage'].loc[idx] < 0:
          df_defense_data.loc[shared_tackler[0], 'TFL'] = df_defense_data.loc[shared_tackler[0], 'TFL'] + 1

  # I need to grab split sacks. On split sacks, a player involved is awarded 0.5 of a sack
  for idx, sack in df_wanted_team_defense_plays['SackedBy'].items():
    if sack == 'nan':
      continue
    else:
      if isinstance(sack, tuple):
        for player in sack:
          df_defense_data.loc[player, 'SACK'] = df_defense_data.loc[player, 'SACK'] + 0.5

  # Organizing dataframe
  # df_defense_data = df_defense_data.sort_values(by=["T-A", "LastName"], ascending=[False, True]) # <-- This sorting method feels better.
  df_defense_data = df_defense_data.sort_values(by=["Tackles", "LastName"], ascending=[False, True]) # <-- This sorting method is how NFL.com has it. (easier cross referencing)
  df_defense_data = df_defense_data.drop(["Tackles", "LastName"], axis=1)

  # Replacing all 'nan' values with 0
  for idx, row in df_defense_data.iterrows():
    for col in df_defense_data.columns:
      if pd.isna(df_defense_data.loc[idx, col]):
        df_defense_data.loc[idx, col] = 0

  return df_defense_data

In [None]:
# Display KICKING table

def kicking_table(away_team, home_team, df_cleaned_plays, dict_team_name_to_acronym, home_or_away):

  if home_or_away == 0:
    team = home_team
  elif home_or_away == 1:
    team = away_team

  print(f"Kicking Table: {team}")
  print()

  df_all_plays_in_game = df_cleaned_plays.loc[(df_cleaned_plays['HomeTeam'] == home_team) &
                                              (df_cleaned_plays['AwayTeam'] == away_team)]

  df_all_kicking_plays = df_all_plays_in_game.loc[
                                                  (
                                                      (df_all_plays_in_game['PlayOutcome'].str.contains('Field Goal', case=False))
                                                      |
                                                      (
                                                           (df_all_plays_in_game['PlayOutcome'].str.contains('Extra Point', case=False))
                                                           &
                                                           (df_all_plays_in_game['Kicker'] != 'nan')
                                                      )
                                                  )
                                                  &
                                                  (df_all_plays_in_game['TeamWithPossession'] == dict_team_name_to_acronym.get(team))]

  list_kickers = df_all_kicking_plays['Kicker'].unique().tolist()
  if 'nan' in list_kickers:
    list_kickers.pop(list_kickers.index('nan'))

  df_kicker_data = pd.DataFrame(columns=["FG", "LONG", "XP", "PTS"])

  for kicker in list_kickers:
    # FG
    df_field_goals_good = df_all_kicking_plays.loc[(df_all_kicking_plays['PlayOutcome'] == 'Field Goal Good') &
                                                  (df_all_kicking_plays['Kicker'] == kicker)]
    df_field_goals_no_good = df_all_kicking_plays.loc[(df_all_kicking_plays['PlayOutcome'] == 'Field Goal No Good') &
                                                      (df_all_kicking_plays['Kicker'] == kicker)]
    df_kicker_data.loc[kicker, "FG"] = f"{len(df_field_goals_good)}/{len(df_field_goals_good) + len(df_field_goals_no_good)}"
    # LONG
    df_kicker_data.loc[kicker, "LONG"] = df_field_goals_good['Yardage'].max()
    if pd.isna(df_kicker_data.loc[kicker, "LONG"]):
      df_kicker_data.loc[kicker, "LONG"] = 0
    # XP
    df_extra_points_good = df_all_kicking_plays.loc[(df_all_kicking_plays['PlayOutcome'] == 'Extra Point Good') &
                                                    (df_all_kicking_plays['Kicker'] == kicker)]
    df_extra_points_no_good = df_all_kicking_plays.loc[(df_all_kicking_plays['PlayOutcome'] == 'Extra Point No Good') &
                                                      (df_all_kicking_plays['Kicker'] == kicker)]
    df_kicker_data.loc[kicker, "XP"] = f"{len(df_extra_points_good)}/{len(df_extra_points_good) + len(df_extra_points_no_good)}"
    # PTS
    df_kicker_data.loc[kicker, "PTS"] = len(df_field_goals_good) * 3 + len(df_extra_points_good)

  return df_kicker_data

In [None]:
# Display PUNTING table

def punting_table(away_team, home_team, df_cleaned_plays, dict_team_name_to_acronym, home_or_away):

  if home_or_away == 0:
    team = home_team
  elif home_or_away == 1:
    team = away_team

  print(f"Punting Table: {team}")
  print()

  df_all_plays_in_game = df_cleaned_plays.loc[(df_cleaned_plays['HomeTeam'] == home_team) &
                                              (df_cleaned_plays['AwayTeam'] == away_team)]

  df_all_punting_plays = df_all_plays_in_game.loc[(df_all_plays_in_game['PlayType'].str.contains('punt', case=False)) &
                                                  (df_all_plays_in_game['TeamWithPossession'] == dict_team_name_to_acronym.get(team)) &
                                                  (df_all_plays_in_game['Kicker'] != 'nan')]

  df_all_punters = df_all_punting_plays['Kicker'].unique().tolist()

  df_punting_table = pd.DataFrame(columns=["PUNTS", "AVG", "I20", "LONG"])

  for punter in df_all_punters:

    df_all_kicker_punts = df_all_punting_plays.loc[df_all_punting_plays['Kicker'] == punter]

    # PUNTS
    df_punting_table.loc[punter, "PUNTS"] = df_all_kicker_punts.shape[0]

    # AVG
    df_punting_table.loc[punter, "AVG"] = round(float(df_all_punting_plays['Yardage'].loc[df_all_punting_plays['Kicker'] == punter].mean()), 2)

    # - I need to take into account the return. If the kick is inside the 20 and
    #   the returner goes passed the 20, then it does not count for an I20.
    # - Should I find the 'punt return' that corresponds to the punt?
    #   - if punt return found:
    #     - use as 'end'
    #   - else:
    #     - use punting pattern.

    # I20
    i20_count = 0
    for idx, row in df_all_kicker_punts.iterrows():
      # start
      start_punt = re.findall(play_start_pattern, row['PlayStart'])[0]
      start_punt_territory = start_punt[0]
      start_punt_yardage = start_punt[1]

      # end
      end_punt = re.findall(punting_pattern, row['PlayDescription'])
      punt_yardage = end_punt[0][1]
      end_punt_territory = end_punt[0][2]
      end_punt_yardage = end_punt[0][3]

      # touchback / out of bounds / etc..
      if end_punt_yardage == '':
        continue

      # penalty
      # - If an accepted penalty has taken place, the description of the penalty
      #   will state the ball placement after the penalty has taken affect.
      #   - Will leaverage this to identify if the spotting after the punting play
      #     is inside the opposing 20 yard line.
      # PLAN:
      # 1. Identify territory that punter is aiming towards
      # 2. Use spotting of accepted penalty to see if inside 20

      # pseudo code: (example game : BUF vs KC)
      # same territory: (BUF -> BUF)
      #   start_punt_yards > end_punt_yards (BUF 40 -> BUF 20)
      #     yardage +:
      #       aim is BUF territory
      #     yardage -:
      #       aim is KC territory
      #   start_punt_yards < end_punt_yards:
      #     yardage +:
      #       aim is KC territory
      #     yardage -:
      #       aim is BUF territory
      # different territory:
      #   yardage +:
      #     aim is ending territory
      #   yardage -:
      #     aim is starting territory
      if row['AcceptedPenalty'] != 'nan':
        # variables
        penalty = re.findall(penalty_yardage_pattern, row['AcceptedPenalty'][0])
        penalty_territory = penalty[0][1]
        penalty_yardage = penalty[0][2]

        # algorithm
        if start_punt_territory == end_punt_territory:
          if int(start_punt_yardage) > int(end_punt_yardage):
            if int(punt_yardage) > 0:
              if start_punt_territory == penalty_territory and int(penalty_yardage) <= 20:
                i20_count += 1
                continue
            else:
              if start_punt_territory != penalty_territory and int(penalty_yardage) <= 20:
                i20_count += 1
                continue
          else:
            if int(punt_yardage) > 0:
              if start_punt_territory != penalty_territory and int(penalty_yardage) <= 20:
                i20_count += 1
                continue
            else:
              if start_punt_territory == penalty_territory and int(penalty_yardage) <= 20:
                i20_count += 1
                continue
        else:
          if int(punt_yardage) > 0:
            if end_punt_territory == penalty_territory and int(penalty_yardage) <= 20:
              i20_count += 1
              continue
          else:
            if start_punt_territory == penalty_territory and int(penalty_yardage) <= 20:
              i20_count += 1
              continue

      # Trying to see if there is a punt return associated with the punt observed.
      # - Search through every play in game and see if there is a match for punting play
      #   - A match will come up if there has been an associated punt return with the punt
      # - What about punt returns for a touchdown?
      # - Does this table account for punts that result in a touchdown? <<<<<<<<<<<<<<<<<<
      #   - I need to check if this table does as well as kickoffs. <<<<<<<<<<<<<<<<<<<<<<<<<<<
      searching_punt_return = df_all_plays_in_game.loc[(df_all_plays_in_game['Season'] == row["Season"]) &
                                                       (df_all_plays_in_game['Week'] == row["Week"]) &
                                                       (df_all_plays_in_game['AwayTeam'] == row["AwayTeam"]) &
                                                       (df_all_plays_in_game['Quarter'] == row["Quarter"]) &
                                                       (df_all_plays_in_game['DriveNumber'] == row["DriveNumber"]) &
                                                       (df_all_plays_in_game['PlayNumberInDrive'] == row["PlayNumberInDrive"]) &
                                                       (df_all_plays_in_game['PlayType'] == "Punt Return")]

      # All I need from the punt return is the yardage gained
      return_yardage = 0
      if searching_punt_return.shape[0] > 0:
        return_yardage = searching_punt_return['Yardage'].iloc[0]

      # Checking to see if the punt landed inside 20
      if int(end_punt_yardage) <= 20 and int(punt_yardage) > 0:
        punt_inside_20 = False
        if start_punt_territory == end_punt_territory:
          if int(start_punt_yardage) > int(end_punt_yardage):
            punt_inside_20 = True
        else:
          punt_inside_20 = True

        # Take into account the punt return
        if punt_inside_20:
          if return_yardage > 0:
            if int(return_yardage) + int(end_punt_yardage) <= 20:
              i20_count += 1
              continue
          else:
            i20_count += 1
            continue

    df_punting_table.loc[punter, "I20"] = i20_count

    # LONG
    df_punting_table.loc[punter, "LONG"] = int(df_all_punting_plays['Yardage'].loc[df_all_punting_plays['Kicker'] == punter].max())

  return df_punting_table

In [None]:
# Display KICKOFF RETURNS table

def kickoff_returns_table(away_team, home_team, df_cleaned_plays, dict_team_name_to_acronym, home_or_away):

  if home_or_away == 0:
    team = home_team
  elif home_or_away == 1:
    team = away_team

  print(f"Kickoff Returns Table: {team}")
  print()

  df_all_plays_in_game = df_cleaned_plays.loc[(df_cleaned_plays['HomeTeam'] == home_team) &
                                              (df_cleaned_plays['AwayTeam'] == away_team)]

  df_all_kickoff_returns = df_all_plays_in_game.loc[(df_all_plays_in_game['PlayType'].str.contains('kickoff return', case=False)) &
                                                    (df_all_plays_in_game['TeamWithPossession'] == dict_team_name_to_acronym.get(team))]

  df_all_kickoff_returners = df_all_kickoff_returns['Returner'].unique().tolist()

  df_kickoff_returns_table = pd.DataFrame(columns=["RET", "AVG", "TD", "LNG"])

  for returner in df_all_kickoff_returners:
    # RET
    df_kickoff_returns_table.loc[returner, "RET"] = df_all_kickoff_returns.loc[df_all_kickoff_returns['Returner'] == returner].shape[0]
    # AVG
    df_kickoff_returns_table.loc[returner, "AVG"] = round(float(df_all_kickoff_returns['Yardage'].loc[df_all_kickoff_returns['Returner'] == returner].mean()), 2)
    # TD
    df_kickoff_returns_table.loc[returner, "TD"] = df_all_kickoff_returns.loc[(df_all_kickoff_returns['Returner'] == returner) &
                                                                              (df_all_kickoff_returns['PlayOutcome'].str.contains('touchdown', case=False))].shape[0]
    # LNG
    df_kickoff_returns_table.loc[returner, "LNG"] = int(df_all_kickoff_returns.loc[df_all_kickoff_returns['Returner'] == returner]['Yardage'].max())

  return df_kickoff_returns_table

In [None]:
# Display PUNT RETURNS table

def punt_returns_table(away_team, home_team, df_cleaned_plays, dict_team_name_to_acronym, home_or_away):

  if home_or_away == 0:
    team = home_team
  elif home_or_away == 1:
    team = away_team

  print(f"Punt Returns Table: {team}")
  print()

  df_all_plays_in_game = df_cleaned_plays.loc[(df_cleaned_plays['HomeTeam'] == home_team) &
                                              (df_cleaned_plays['AwayTeam'] == away_team)]

  df_all_punt_returns = df_all_plays_in_game.loc[(df_all_plays_in_game['PlayType'].str.contains('punt return', case=False)) &
                                                 (df_all_plays_in_game['TeamWithPossession'] == dict_team_name_to_acronym.get(team))]

  df_all_punt_returners = df_all_punt_returns['Returner'].unique().tolist()

  df_punt_returns_table = pd.DataFrame(columns=["RET", "AVG", "TD", "LNG"])

  # (2445, [2575, 2576])
  # (7:45) N.Cooney punts 51 yards to WAS 19, Center-A.Brewer
  # J.Crowder pushed ob at WAS 29 for 10 yards (V.Dimukeje)
  # PENALTY on WAS-J.Martin, Illegal Block Above the Waist, 10 yards, enforced at WAS 19.

  for returner in df_all_punt_returners:
    # RET
    df_punt_returns_table.loc[returner, "RET"] = df_all_punt_returns.loc[df_all_punt_returns['Returner'] == returner].shape[0]
    # AVG
    df_punt_returns_table.loc[returner, "AVG"] = round(float(df_all_punt_returns['Yardage'].loc[df_all_punt_returns['Returner'] == returner].mean()), 2)
    # TD
    df_punt_returns_table.loc[returner, "TD"] = df_all_punt_returns.loc[(df_all_punt_returns['Returner'] == returner) &
                                                                        (df_all_punt_returns['PlayOutcome'].str.contains('touchdown', case=False))].shape[0]
    # LNG
    df_punt_returns_table.loc[returner, "LNG"] = int(df_all_punt_returns.loc[df_all_punt_returns['Returner'] == returner]['Yardage'].max())

  return df_punt_returns_table

## Home and Away teams (Week 1, 2023)

In [None]:
# Season 2023 Week 1 schedule

df_2023_week2_schedule = df_week2_plays_cleaned[['HomeTeam', 'AwayTeam', 'Season', 'Date', 'Day']].drop_duplicates().sort_values(by='Date').reset_index(drop=True)

df_2023_week2_schedule

In [None]:
dict_teams = {
    'Cardinals': 'ARI', 'Falcons': 'ATL', 'Ravens': 'BAL', 'Bills': 'BUF', 'Panthers': 'CAR', 'Bears': 'CHI',
    'Bengals': 'CIN', 'Browns': 'CLE', 'Cowboys': 'DAL', 'Broncos': 'DEN', 'Lions': 'DET', 'Packers': 'GB',
    'Texans': 'HOU', 'Colts': 'IND', 'Jaguars': 'JAX', 'Chiefs': 'KC', 'Raiders': 'LV', 'Chargers': 'LAC',
    'Rams': 'LA', 'Dolphins': 'MIA', 'Vikings': 'MIN', 'Patriots': 'NE', 'Saints': 'NO', 'Giants': 'NYG',
    'Jets': 'NYJ', 'Eagles': 'PHI', 'Steelers': 'PIT', '49ers': 'SF', 'Seahawks': 'SEA', 'Buccaneers': 'TB',
    'Titans': 'TEN', 'Commanders': 'WAS'
  }

In [None]:
dict_teams_2 = {
    'ARI': 'Cardinals', 'ATL': 'Falcons', 'BAL': 'Ravens', 'BUF': 'Bills', 'CAR': 'Panthers', 'CHI': 'Bears',
    'CIN': 'Bengals', 'CLE': 'Browns', 'DAL': 'Cowboys', 'DEN': 'Broncos', 'DET': 'Lions', 'GB': 'Packers',
    'HOU': 'Texans', 'IND': 'Colts', 'JAX': 'Jaguars', 'KC': 'Chiefs', 'LV': 'Raiders', 'LAC': 'Chargers',
    'LA': 'Rams', 'MIA': 'Dolphins', 'MIN': 'Vikings', 'NE': 'Patriots', 'NO': 'Saints', 'NYG': 'Giants',
    'NYJ': 'Jets', 'PHI': 'Eagles', 'PIT': 'Steelers', 'SF': '49ers', 'SEA': 'Seahawks', 'TB': 'Buccaneers',
    'TEN': 'Titans', 'WAS': 'Commanders'
}

## Scoring Table
COLUMNS:
- Each quarter of the game
ROW:
- Each team playing in game

In [None]:
# Some games may not have every play recorded.
# (Week 1 2023, Game 1, 3rd quarter)
# - A field goal was supposed to be recorded after the interception touchdown but
#   was not.

game_num = 10

away_team = df_2023_week2_schedule['AwayTeam'].iloc[game_num]
home_team = df_2023_week2_schedule['HomeTeam'].iloc[game_num]

score_table(away_team, home_team, df_week2_plays_cleaned, dict_teams)

## AWAY STATS

In [None]:
passing_table(away_team, home_team, df_week2_plays_cleaned, dict_teams_2, 1)

In [None]:
rushing_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 1)

In [None]:
receiving_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 1)

In [None]:
fumble_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 1)

In [None]:
interception_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 1)

In [None]:
defense_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 1)

In [None]:
kicking_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 1)

In [None]:
punting_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 1)

In [None]:
kickoff_returns_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 1)

In [None]:
punt_returns_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 1)

## HOME STATS

In [None]:
passing_table(away_team, home_team, df_week2_plays_cleaned, dict_teams_2, 0)

In [None]:
rushing_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 0)

In [None]:
receiving_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 0)

In [None]:
fumble_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 0)

In [None]:
interception_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 0)

In [None]:
defense_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 0)

In [None]:
kicking_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 0)

In [None]:
punting_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 0)

In [None]:
kickoff_returns_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 0)

In [None]:
punt_returns_table(away_team, home_team, df_week2_plays_cleaned, dict_teams, 0)

# Microscope (player observations)

In [None]:
# every play with a players name in it

df_all_plays_in_game = df_week2_plays_cleaned.loc[(df_week2_plays_cleaned['HomeTeam'] == home_team) &
                                                  (df_week2_plays_cleaned['AwayTeam'] == away_team)]

df_plays = df_all_plays_in_game.loc[(df_all_plays_in_game['PlayDescription'].str.contains("J.Patterson"))]

for idx, play in df_plays['PlayDescription'].items():
  print(idx)
  play_split = play.split(". ")
  for i in play_split:
    print(i)
  print()

# (940, [968, 969])
# (8:20) R.Wright punts 51 yards to PHI 8, Center-A.DePaola
# B.Covey to PHI 16 for 8 yards (T.Dye)
# FUMBLES (T.Dye), recovered by PHI-K.Ringo at PHI 10.