<a href="https://colab.research.google.com/github/Keoni808/NFL_Data_Cleaning/blob/main/NFL_Plays_Week1_2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

PURPOSE:
- To view a larger sample size of plays.
  - Currently working on breaking down a single game but do not have enough data in that game to correctly break down all play descriptions for different play types.

# MOUNTING AND IMPORTS

In [None]:
# Mount your Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Used to access personal google cloud services
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


In [None]:
# Installs
!pip install ipdb

Collecting ipdb
  Downloading ipdb-0.13.13-py3-none-any.whl.metadata (14 kB)
Collecting jedi>=0.16 (from ipython>=7.31.1->ipdb)
  Using cached jedi-0.19.1-py2.py3-none-any.whl.metadata (22 kB)
Downloading ipdb-0.13.13-py3-none-any.whl (12 kB)
Using cached jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)
Installing collected packages: jedi, ipdb
Successfully installed ipdb-0.13.13 jedi-0.19.1


In [None]:
# Imports

# Data manipulation
import pandas as pd

# Regular expressions
import re

# Grab data from database
from google.cloud import bigquery

# Debugging
import ipdb

In [None]:
# Turning on automatic debugger
%pdb on

Automatic pdb calling has been turned ON


# LOADING DATA (BigQuery queries)

In [None]:
# Client connect to bigquery project
client = bigquery.Client('nfl-data-430702')

## Season 2023 Week 1

In [None]:
# Grabbing all plays from Super Bowl 2023
week1_2023_plays_query = """
                         SELECT *
                         FROM `nfl-data-430702.NFL_Scores.NFL-Plays-Week1_2023`
                         """

# Running psuedo query, and returns the amount of bytes it will take to run query
dry_run_config = bigquery.QueryJobConfig(dry_run=True)
dry_run_query = client.query(week1_2023_plays_query, job_config=dry_run_config)
print("This query will process {} bytes.".format(dry_run_query.total_bytes_processed))

# Running query (Being mindful of the amount of data being grabbed)
# Will grab a maximum of a Gigabyte
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**9)
safe_config_query = client.query(week1_2023_plays_query, job_config=safe_config)

This query will process 570194 bytes.


In [None]:
# Putting data attained from query into a dataframe
week1_2023_plays = safe_config_query.to_dataframe()

In [None]:
week1_2023_plays.head()

Unnamed: 0,Season,Week,Day,Date,AwayTeam,HomeTeam,Quarter,DriveNumber,TeamWithPossession,IsScoringDrive,PlayNumberInDrive,IsScoringPlay,PlayOutcome,PlayDescription,PlayStart
0,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,1,0,Kickoff,G.Zuerlein kicks 65 yards from NYJ 35 to end z...,Kickoff from NYJ 35
1,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,2,0,7 Yard Pass,(15:00) (Shotgun) J.Allen pass short right to ...,1st & 10 at BUF 25
2,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,3,0,5 Yard Pass,"(14:34) (No Huddle, Shotgun) J.Allen pass shor...",2nd & 3 at BUF 32
3,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,4,0,3 Yard Run,(14:01) J.Cook up the middle to BUF 40 for 3 y...,1st & 10 at BUF 37
4,2023,Week 1,MON,09/11,Bills,Jets,1ST QUARTER,1,BUF,0,5,0,2 Yard Run,(13:24) (Shotgun) J.Cook up the middle to BUF ...,2nd & 7 at BUF 40


In [None]:
# Observation of the amount of data being worked on
week1_2023_plays.shape

(2600, 15)

# CATEGORIZE PLAYS
- The goal here is to parse out the different values for 'PlayOutcome'
  - separate pass / run / kickoff / etc.

## PARSING


In [None]:
# Maybe try to fuzzywuzzy this in the future?

# All play outcomes from the game
# - From here we can categorize and clean plays accordingly
week1_2023_plays['PlayOutcome'].unique()

array(['Kickoff', '7 Yard Pass', '5 Yard Pass', '3 Yard Run',
       '2 Yard Run', 'Pass Incomplete', 'Punt', '-5 Yard Penalty',
       '5 Yard Run', '1 Yard Pass', '14 Yard Run', '3 Yard Pass',
       '8 Yard Run', '6 Yard Pass', '15 Yard Pass', '-9 Yard Sack',
       '4 Yard Pass', '13 Yard Pass', 'Field Goal', '-2 Yard Sack',
       'Interception', '-5 Yard Run', '18 Yard Pass', '8 Yard Pass',
       '6 Yard Run', '12 Yard Run', '-1 Yard Run', '26 Yard Pass',
       'Touchdown Bills', 'Extra Point Good', '13 Yard Run',
       '-3 Yard Sack', '7 Yard Run', '9 Yard Pass', '4 Yard Run',
       'Fumble', '-10 Yard Penalty', '10 Yard Pass', '26 Yard Run',
       '5 Yard Penalty', '-10 Yard Sack', '22 Yard Pass', '-4 Yard Run',
       '-12 Yard Sack', '83 Yard Run', '1 Yard Run', '2 Yard Pass',
       '10 Yard Run', 'Run for No Gain', '12 Yard Pass', '20 Yard Pass',
       '9 Yard Run', '-2 Yard Pass', 'Sack', '24 Yard Pass',
       '14 Yard Pass', 'Touchdown Jets', '-3 Yard Run', '-2 Yar

In [None]:
# There are more types of plays that I have not made yet for Week 1.

# Looking at all unique play outcomes and categorizing them.
# - This type of approach does not feel very flexable because a play outcome can
#   arise that has not been seen yet.
# - There may be more in the future when working on a full season, let alone all seasons and future games
df_2023_pass_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Pass')]
df_2023_run_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Run')]

# df_2023_punt_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Punt')]
# df_2023_sack_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Sack')]
# df_2023_kickoff_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Kickoff')]
# df_2023_fumble_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Fumble')]
# df_2023_interception_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Interception')]
# df_2023_penalty_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Penalty')]
# df_2023_fieldgoal_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Field Goal')]
# df_2023_touchdown_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Touchdown')]
# df_2023_extrapoint_week1 = week1_2023_plays[week1_2023_plays['PlayOutcome'].str.contains('Extra Point')]

# plays_list = [df_2023_pass_sb,
#               df_2023_run_sb,
#               df_2023_punt_sb,
#               df_2023_sack_sb,
#               df_2023_kickoff_sb,
#               df_2023_fumble_sb,
#               df_2023_interception_sb,
#               df_2023_penalty_sb,
#               df_2023_fieldgoal_sb,
#               df_2023_touchdown_sb,
#               df_2023_extrapoint_sb]

## SANITY CHECK (All Plays Accounted for)
- NOT COMPLETE
  - Still need to grab other play types

## HELPER METHODS

In [None]:
# PURPOSE:
# - Quick look at a section of plays
#   - Ideally the plays that the user wants to break down and clean.
# INPUT PARAMETERS:
# df_all_plays      - DataFrame - The original dataframe where the desired plays to view came from
# df_section_plays  - DataFrame - A section of the original dataframe the user wants to view
# RETURN:
# - Printing to the console:
#   1. index of play
#   2. 'PlayDescription' feature of play
#   3. 'PlayOutcome' feature of play
def print_plays(df_all_plays, df_section_plays):
  for idx, value in df_section_plays['PlayOutcome'].items():
    print("index:" + str(idx))
    play = df_all_plays['PlayDescription'].iloc[idx]
    print(play)
    print(value)
    print()

# Fumbled plays (Pass & Run)
- Only looking for fumbled plays

## Pass Fumble Plays

In [None]:
# THIS IS ONLY FOR PASSING RIGHT NOW.

# Regular expression used to grab QB only fumbles.
# Example - "(14:21) J.Love to CHI 44 for -3 yards"
# NOTE:
# - There are other plays that follow this format.
#   So far I have seen:
#   1. P.Campbell to NYG 33 for -2 yards
#      - What looks like to be an ordinary run play
qb_fumble = "[A-Za-z]+\.[A-Za-z]+-?[A-Za-z]* to [A-Z]+ [0-9]+ for -?[0-9]+ yards$"

# Regular expression used for players who recovered the fumbled ball.
# Example: NYG-P.Campbell
fumble_recoverer = "[A-Z]+-[A-Z]+\.[A-Za-z]+-?[A-Za-z]*"

# PURPOSE:
# - Extract fumble data from fumble plays.
#   - Goal is to strictly grab data that can only appear during fumbled plays,
#     the rest of the data will go down through the pipeline.

def extract_fumble_data(play):

  # Every action of the play is recorded into sentences that can be broken down.
  # - Goal is to strictly grab data that only appears during fumbled plays,
  #   the rest will go through the set play type pipeline.
  play_elements = play.split(". ")
  # Collecting fumble data in the exact order in which it happened.
  extracted_fumble_details = [None] * len(play_elements)
  push_back_to_pipeline = []
  # When traversing through each element, some elements will singal that
  # the next element is a detail exclusively found in fumble plays.
  automatic_fumble_detail_add = False

  for i in play_elements:
    if automatic_fumble_detail_add:
      extracted_fumble_details.pop(play_elements.index(i))
      extracted_fumble_details.insert(play_elements.index(i), i)
      automatic_fumble_detail_add = False
      continue
    else:
      # All plays added to this list, then shaved off if neccessary.
      push_back_to_pipeline.append(i)

    # QB only fumbles
    # (e.g. '(14:21) J.Love to CHI 44 for -3 yards.')
    passer = re.findall(qb_fumble, i)
    if len(passer) == 1:
      # Wanted element (QB only fumble) does not:
      # 1. Follow a sentence stating that the ball has been fumbled.
      #    - In order to check the previous sentence, we must make sure there
      #      is a sentence there to check in the first place.
      if play_elements.index(i) > 0 and play_elements[play_elements.index(i)-1].find('FUMBLES') != -1:
        continue
      else:
        push_back_to_pipeline.pop(push_back_to_pipeline.index(i))
        extracted_fumble_details.pop(play_elements.index(i))
        extracted_fumble_details.insert(play_elements.index(i), i)

    # Fumble and recovery
    # If the person who recovered the ball then goes on to run the ball after,
    # their yardage gained from that run will be automatically added to extracted_fumble_details
    if i.find('FUMBLES') != -1:
      recoverer = re.findall(fumble_recoverer, i)
      if len(recoverer) > 0:
        player_who_recovered_ball = recoverer[0][recoverer[0].find("-") + 1:]
        try:
          if play_elements[play_elements.index(i)+1].find(player_who_recovered_ball) != -1:
            automatic_fumble_detail_add = True
        except IndexError:
          pass
      push_back_to_pipeline.pop(push_back_to_pipeline.index(i))
      extracted_fumble_details.pop(play_elements.index(i))
      extracted_fumble_details.insert(play_elements.index(i), i)

    # Reversed
    # If play has been reversed, the only offensive stats recorded are the
    # sentences that follow the play reversal.
    if i.find('REVERSED') != -1:
      for j in push_back_to_pipeline:
        extracted_fumble_details.pop(play_elements.index(j))
        extracted_fumble_details.insert(play_elements.index(j), j)
      push_back_to_pipeline.clear()

  return extracted_fumble_details, push_back_to_pipeline

In [None]:
for idx, value in df_2023_pass_week1['PlayOutcome'].items():
  play = week1_2023_plays['PlayDescription'].iloc[idx]
  if play.find('FUMBLES') != -1:
    fumble_details, main_play = extract_fumble_data(play)
    print(fumble_details)
    print(main_play)
    print(". ".join(main_play))
    print(value)
    print()

['(14:21) J.Love to CHI 44 for -3 yards', 'FUMBLES, and recovers at CHI 46', None]
['J.Love pass deep left to L.Musgrave to CHI 4 for 37 yards (T.Stevenson) [D.Walker].']
J.Love pass deep left to L.Musgrave to CHI 4 for 37 yards (T.Stevenson) [D.Walker].
37 Yard Pass

['(14:15) T.Lawrence pass short right to C.Ridley to JAX 47 for 14 yards (R.Thomas, E.Speed)', 'FUMBLES (E.Speed), RECOVERED by IND-E.Speed at IND 49', 'E.Speed ran ob at IND 49 for no gain', 'The Replay Official reviewed the ball was inbounds ruling, and the play was REVERSED', None, 'FUMBLES (E.Speed), ball out of bounds at IND 49', None, None]
['T.Lawrence pass short right to C.Ridley to JAX 47 for 14 yards (R.Thomas, E.Speed)', 'IND-K.Moore was injured during the play', 'IND-D.Flowers was injured during the play.']
T.Lawrence pass short right to C.Ridley to JAX 47 for 14 yards (R.Thomas, E.Speed). IND-K.Moore was injured during the play. IND-D.Flowers was injured during the play.
14 Yard Pass

[None, 'FUMBLES (B.Okere

In [None]:
for idx, value in df_2023_pass_week1['PlayOutcome'].items():
  play = week1_2023_plays['PlayDescription'].iloc[idx]
  if play.find('FUMBLES') != -1:
    print("index:" + str(idx))
    fumble_play_elements = play.split(". ")
    for i in fumble_play_elements:
      print(i)
    # print(play)
    print(value)
    print()

index:213
(14:21) J.Love to CHI 44 for -3 yards
FUMBLES, and recovers at CHI 46
J.Love pass deep left to L.Musgrave to CHI 4 for 37 yards (T.Stevenson) [D.Walker].
37 Yard Pass

index:423
(14:15) T.Lawrence pass short right to C.Ridley to JAX 47 for 14 yards (R.Thomas, E.Speed)
FUMBLES (E.Speed), RECOVERED by IND-E.Speed at IND 49
E.Speed ran ob at IND 49 for no gain
The Replay Official reviewed the ball was inbounds ruling, and the play was REVERSED
T.Lawrence pass short right to C.Ridley to JAX 47 for 14 yards (R.Thomas, E.Speed)
FUMBLES (E.Speed), ball out of bounds at IND 49
IND-K.Moore was injured during the play
IND-D.Flowers was injured during the play.
14 Yard Pass

index:872
(11:26) (Shotgun) D.Prescott pass short right to T.Pollard to NYG 12 for 7 yards (B.Okereke)
FUMBLES (B.Okereke), recovered by DAL-T.Biadasz at NYG 4.
7 Yard Pass

index:961
(4:45) (Shotgun) D.Jones pass short left to M.Breida to NYG 43 for 5 yards (M.Bell)
FUMBLES (M.Bell), recovered by NYG-P.Campbell at 

##Run Fumble Plays

In [None]:
##############################
# MAIN VARIABLES TO FOCUS ON #
##############################

# extracted_fumble_details
# - Extra data that is not practical enough to have it's own feature. Although
#   this data is useful, it is not efficient to have columns for the data
#   that is only available on fumbled plays.
#   - So all this data will be within a list contained within a single feature
#     'extracted_fumble_details'

# push_back_to_pipeline
# - Because this method will serve as a helper method to another, specifically
#   handling fumbled run plays within the set of all run plays, I would like to
#   reduce redundancy by pushing all data that can be cleaned by the main method
#   to that method.
#   - This helper method is simply extracting data that cannot be cleaned within the
#     main method out, packaging and storing it, then sending the rest to the
#     main method.

###################
# DESIGN THOUGHTS #
###################

# - I need the ability to add multiple rows for a single play.
#   When a fumble occurs, there are multiple plays within a single play.
#   1. There is the original play
#      - Currently focusing on run plays
#   2. There is the play after the fumble recovery
#      - Could just be recovered (If so, nothing will follow the fumble action description)
#      - Could have been picked up and rushed for X amount of yards.
# - I have decided that I will collect all data from everyone who
#   touches the ball and carries it.
#   - This means that even if a defender were to recovered a fumbled ball
#     the defender will have rushing yards.

##########
# DESIGN #
##########

# Idea #1
# 1. Grab all data that is needed for the additional row.
# 2. Insert row following the initial play
#    - This is tricky because each play has it's own index given from
#      the original set of plays. (Set of plays could be from a single game to a season of games)
#    2.1 Split the dataframe in half, the first half ending with the
#        fumbled play and the second half containing the rest of the plays after.
#    2.2 Insert the post fumble play row of data
#        - I would have to identify that this playtype is 'post fumble recovery'.
#          Anything that will signify that this play is occuring after fumble.
#    2.3 Reindex the entire dataframe.

#    - This is a semi broad solution?
#      - What I am worried about is how exactly to split the entire dataframe at a specific point
#        when I am working on a individual slices of the dataframe?
#        1. There is the original dataframe containing all plays.
#        2. Plays are then split up based on what type of 'PlayOutcome' it has
#           (e.g. pass, run, touchdown, punt, sack penalty etc.)
#        3. Each of these splits are dataframes with their original indexes attached to each row.
#        4. If I have to add an additional row, I would need to
#           4.1 go back to the original dataframe, or a copy of the dataframe to preserve the original
#           4.2 split that dataframe where needed
#           4.3 to add a new row (that I think at this point might be blank)
#           4.4 parse out the plays again (this is needed to maintain original order)
#               - without this part, indexes would mix? (Cant just add a 1 to the original play index to create a new row
#                                                        because that original play index + 1 is already taken by another play)
#           4.5 Somehow pick up where it was left off?
#               - Will the original dataframe contain filled in data..?
#               - I think that a good idea here would be to create a copy of the original dataframe that has the
#                 new added features to hold the broken down data.
#                 - Before the cleaning starts, the new features will all have blank entries.
#                   - Each row's new features will be filled once it is it's turn to be cleaned.
#                     - Why I am saying this is because I want to use this copy to split and add a row
#                       and I think it may be possible this way.
#                     - What I am thinking about now is the problem with indexes. If I were to split
#                       the entire dataframe into parsed out smaller dataframes, each row of each dataframe
#                       containing their unique index from the original dataframe, adding a row anywhere
#                       within these smaller dataframes would ruin the indexing of all plays that are indexed
#                       after.
#                       - This is a problem because you would no longer be able to access plays accurately
#                         within the original dataframe. The indexing would not match up.
#                     - To solve this issue, maybe I can:
#                       1. Solve one playtype at a time.
#                          1.1 Look for a single type of play (e.g. run)
#                          1.2 Clean all of chosen single plays
#                              - If rows are added, reindex after.
#                       2. Search for the next type of play
#                          - Continue this process until every type of play has been cleaned
#                            - This will solve the indexing issue between different types of plays
#                              because those indexes will not be located until after the additional rows have been added.
#                              - There is still the issue with indexing within a specific type of play.
#                                - I need to remember that these methods are filling in features within the new dataframe.
#                                  - When cleaning any type of play, the indexing will always be from lowest to highest.
#                                    so if we were to come across a play that needs an extra row, we can be assured that
#                                    we will be able to continue from there on after.

In [None]:
# All fumbled run plays and their play outcomes

for idx, value in df_2023_run_week1['PlayOutcome'].items():
  play = week1_2023_plays['PlayDescription'].iloc[idx]
  if play.find('FUMBLES') != -1:
    print("index:" + str(idx))
    fumble_play_elements = play.split(". ")
    for i in fumble_play_elements:
      print(i)
    print(value)
    print()

index:115
(9:54) Bre.Hall left end to BUF 22 for -1 yards (G.Rousseau)
FUMBLES (G.Rousseau), ball out of bounds at BUF 25.
-4 Yard Run

index:230
(2:08) S.Clifford FUMBLES (Aborted) at CHI 35, and recovers at CHI 35.
Run for No Gain

index:756
(6:44) (Shotgun) J.Goff Aborted
F.Ragnow FUMBLES at KC 24, recovered by DET-J.Goff at KC 27
J.Goff to KC 27 for no gain (G.Karlaftis).
Run for No Gain

index:826
(8:53) (Shotgun) D.Jones Aborted
J.Schmitz FUMBLES at DAL 18, recovered by NYG-D.Jones at DAL 27.
Run for No Gain

index:933
(9:27) (Shotgun) D.Jones FUMBLES (Aborted) at NYG 30, and recovers at NYG 30
D.Jones to NYG 32 for 2 yards (M.Smith).
Run for No Gain

index:1015
(6:33) (No Huddle, Shotgun) L.Jackson scrambles right end to HOU 20 for 6 yards (T.Thomas)
FUMBLES (T.Thomas), recovered by BAL-K.Zeitler at HOU 23
HOU-H.Ridgeway was injured during the play.
3 Yard Run

index:1214
(1:39) J.Williams right tackle to TEN 9 for 11 yards (K.Byard, S.Murphy-Bunting)
FUMBLES (S.Murphy-Bunting),

In [None]:
# NOTES:

# - Need to add a continue after every check to be sure that a single sentence
#   does not have multiple categories it can fit into.

# I am considering just adding all plays into the regular stream of cleaning.
# By all plays I mean when someone either run or passes the ball, nothing in between.

# How would I handle this situation?
# (10:16) (Shotgun) B.Young to ATL 38 for -5 yards
# FUMBLES, and recovers at ATL 36
# B.Young to ATL 36 for no gain (T.Andersen).
# -3 Yard Run

# - Would I end up having to create 2 separate rows?

# Maybe, when this method has been reached, the main dataframe can take out a row and hand it over here.
# in return, we can return 1+ rows to piece back into the original dataframe.

# I need to take a step back and try to visualize what exactly I am trying to do here.
# I would like to:
# 1. create a pipeline that you will be able to input unclean raw data from NFL_Scraper to
#    and have it output cleaned data.
#    - I dont know how I am going to do this.
#      - Maybe I will separate all different types of plays into their own categories
#      - Once the plays are in their own categories, I can create methods to clean
#        each category.
#        - What I need to be able to do is to add new rows of data,
#          these new rows need to run flush with the rest of the dataframe.

qb_fumble = "[A-Za-z]+\.[A-Za-z]+-?[A-Za-z]* to [A-Z]+ [0-9]+ for -?[0-9]+ yards$"

fumble_recoverer = "[A-Z]+-[A-Z]+\.[A-Za-z]+-?[A-Za-z]*"

def extract_fumble_data_run(play):
  play_elements = play.split(". ")
  extracted_fumble_details = [None] * len(play_elements)
  push_back_to_pipeline = []
  automatic_fumble_detail_add = False

  for i in play_elements:
    if automatic_fumble_detail_add:
      extracted_fumble_details.pop(play_elements.index(i))
      extracted_fumble_details.insert(play_elements.index(i), i)
      automatic_fumble_detail_add = False
      continue
    else:
      push_back_to_pipeline.append(i)

    passer = re.findall(qb_fumble, i)
    if len(passer) == 1:
      if play_elements.index(i) > 0 and play_elements[play_elements.index(i)-1].find('FUMBLES') != -1:
        continue
      else:
        push_back_to_pipeline.pop(push_back_to_pipeline.index(i))
        extracted_fumble_details.pop(play_elements.index(i))
        extracted_fumble_details.insert(play_elements.index(i), i)

    if i.find('Aborted') != -1:
      push_back_to_pipeline.pop(push_back_to_pipeline.index(i))
      extracted_fumble_details.pop(play_elements.index(i))
      extracted_fumble_details.insert(play_elements.index(i), i)
      continue

    # I may have to change this to add some plays that do count towards yards gained
    # after fumble recovery.
    if i.find('FUMBLES') != -1:
      recoverer = re.findall(fumble_recoverer, i)
      if len(recoverer) > 0:
        player_who_recovered_ball = recoverer[0][recoverer[0].find("-") + 1:]
        try:
          if play_elements[play_elements.index(i)+1].find(player_who_recovered_ball) != -1:
            automatic_fumble_detail_add = True
        except IndexError:
          pass
      push_back_to_pipeline.pop(push_back_to_pipeline.index(i))
      extracted_fumble_details.pop(play_elements.index(i))
      extracted_fumble_details.insert(play_elements.index(i), i)

    if i.find('REVERSED') != -1:
      for j in push_back_to_pipeline:
        extracted_fumble_details.pop(play_elements.index(j))
        extracted_fumble_details.insert(play_elements.index(j), j)
      push_back_to_pipeline.clear()

  return extracted_fumble_details, push_back_to_pipeline

In [None]:
for idx, value in df_2023_run_week1['PlayOutcome'].items():
  play = week1_2023_plays['PlayDescription'].iloc[idx]
  if play.find('FUMBLES') != -1:
    fumble_details, main_play = extract_fumble_data_run(play)
    fumble_play_elements = play.split(". ")
    for i in fumble_play_elements:
      print(i)
    print("BREAKDOWN")
    print(fumble_details)
    print(main_play)
    print(value)
    print()

(9:54) Bre.Hall left end to BUF 22 for -1 yards (G.Rousseau)
FUMBLES (G.Rousseau), ball out of bounds at BUF 25.
BREAKDOWN
[None, 'FUMBLES (G.Rousseau), ball out of bounds at BUF 25.']
['(9:54) Bre.Hall left end to BUF 22 for -1 yards (G.Rousseau)']
-4 Yard Run

(2:08) S.Clifford FUMBLES (Aborted) at CHI 35, and recovers at CHI 35.
BREAKDOWN
['(2:08) S.Clifford FUMBLES (Aborted) at CHI 35, and recovers at CHI 35.']
[]
Run for No Gain

(6:44) (Shotgun) J.Goff Aborted
F.Ragnow FUMBLES at KC 24, recovered by DET-J.Goff at KC 27
J.Goff to KC 27 for no gain (G.Karlaftis).
BREAKDOWN
['(6:44) (Shotgun) J.Goff Aborted', 'F.Ragnow FUMBLES at KC 24, recovered by DET-J.Goff at KC 27', 'J.Goff to KC 27 for no gain (G.Karlaftis).']
[]
Run for No Gain

(8:53) (Shotgun) D.Jones Aborted
J.Schmitz FUMBLES at DAL 18, recovered by NYG-D.Jones at DAL 27.
BREAKDOWN
['(8:53) (Shotgun) D.Jones Aborted', 'J.Schmitz FUMBLES at DAL 18, recovered by NYG-D.Jones at DAL 27.']
[]
Run for No Gain

(9:27) (Shotgun) D

In [None]:
for idx, value in df_2023_run_week1['PlayOutcome'].items():
  play = week1_2023_plays['PlayDescription'].iloc[idx]
  if play.find('Aborted') != -1:
    print("index:" + str(idx))
    fumble_play_elements = play.split(". ")
    for i in fumble_play_elements:
      print(i)
    # print(play)
    print(value)
    print()

index:230
(2:08) S.Clifford FUMBLES (Aborted) at CHI 35, and recovers at CHI 35.
Run for No Gain

index:756
(6:44) (Shotgun) J.Goff Aborted
F.Ragnow FUMBLES at KC 24, recovered by DET-J.Goff at KC 27
J.Goff to KC 27 for no gain (G.Karlaftis).
Run for No Gain

index:826
(8:53) (Shotgun) D.Jones Aborted
J.Schmitz FUMBLES at DAL 18, recovered by NYG-D.Jones at DAL 27.
Run for No Gain

index:933
(9:27) (Shotgun) D.Jones FUMBLES (Aborted) at NYG 30, and recovers at NYG 30
D.Jones to NYG 32 for 2 yards (M.Smith).
Run for No Gain

index:1343
(3:02) T.Munford reported in as eligible
 J.Garoppolo FUMBLES (Aborted) at DEN 1, and recovers at DEN 1.
Run for No Gain

index:1921
(13:56) (Shotgun) T.Tagovailoa FUMBLES (Aborted) at MIA 20, touched at MIA 20, and recovers at MIA 20.
Run for No Gain



In [None]:
#

week1_2023_plays.iloc[933]

Unnamed: 0,933
Season,2023
Week,Week 1
Day,SUN
Date,09/10
AwayTeam,Cowboys
HomeTeam,Giants
Quarter,3RD QUARTER
DriveNumber,2
TeamWithPossession,NYG
IsScoringDrive,0
