# NFL BIG DATA BOWL 2021


**Background**
![image.png](attachment:image.png)


The NFL has again released their Kaggle competition, this year known as the NFL Big Data Bowl 2021, using NextGen Stats data.

Every NFL player on the field (both on offense and defense) during active plays is tracked in terms of the capturing of their real-time location data, speed and acceleration

Sensors throughout the stadium track radio frequency identification (RFID) tags placed within the players' shoulder-pads, charting movements in a highly accurate manner

The actual football is also tracked as an independent entity, opening up intriguing possibilities for analysis...

All real-time data is processed entirely on an advanced machine learning based Amazon Web Services (AWS) infrastructure. Data is streamed towards the NFL stats page. FYI, Data Scientists call this a pipeline.

RFID tags: Designed by Zebra Technologies (official on-field player-tracking technology partner of the NFL), they consist of an integrated circuit (IC) attached to an antenna typically a small coil of wires plus some protective packaging as determined by the application requirements. They are active in this case and not passive.

These real-time stats on players create a deeper fan experience and the NFL should be complimented on their choice to push the boundaries of analytics


# NFL TEAMS
![image.png](attachment:image.png)



As a point of reference, in general there are ~ 150 total plays in a game historically. In our case, we are ONLY examining passing plays by the teams, and thus on a per game basis, the play count will be less.

In 2018, the Raiders were based out of Oakland and thus the designation OAK as a reference in our datasets, but later they will move to Las Vegas and be known as LV
Careful: There are two Los Angeles teams (Los Angeles Chargers known as LAC, and Los Angeles Rams known as LA)

READ: In a regular season, each team should play a total of 16 games. This means there should be a total of 256 games played in the NFL season. (i.e. each of the league's 32 teams play a 16-game schedule, with one bye week for each team).

It appears there are only 253 games in our dataset, which means 3 games are missing

Why is this important ? Because this means we are missing the data for 3 games, which represents 6 individual teams will have slightly lower offensive stats, defensive stats, etc. This is approximately a 6.25 % difference, and something to very much keep in mind.

The actual teams missing that data: SEA, DEN, KC, LAC, MIN, SF !

The 2018 NFL season was the 99th season of the National Football League (NFL). The 2017 defending Super Bowl champions were the Philadelphia Eagles. In the 2018 season, the AFC Champion New England Patriots and the NFC Champion Los Angeles Rams battled in the Superbowl, with the Patriots defeating the Rams 13–3 for their sixth Super Bowl championship and their third title in five years.


# Dataset
1. Game data: The games.csv contains the teams playing in each game. The key variable is gameId.
1. Player data: The players.csv file contains player-level information from players that participated in any of the tracking data files. The key variable is nflId.
1. Play data: The plays.csv file contains play-level information from each game. The key variables are gameId and playId.
1. Tracking data: Files week[week].csv contain player tracking data from all games in week [week]. The key variables are gameId, playId, and nflId. There are 17 weeks to a typical NFL Regular Season, and thus 17 data frames with player tracking data are provided.

Game

gameId: Game identifier, unique (numeric)

gameDate: Game Date (time, mm/dd/yyyy)

gameTimeEastern: Start time of game (time, HH:MM:SS, EST)

homeTeamAbbr: Home team three-letter code (text)

visitorTeamAbbr: Visiting team three-letter code (text)

week: Week of game (numeric)

players

nflId: Player identification number, unique across players (numeric)

height: Player height (text)

weight: Player weight (numeric)

birthDate: Date of birth (YYYY-MM-DD)

collegeName: Player college (text)

position: Player position (text)

displayName: Player name (text)

Number of college-level football teams in the United States: 774

Number of Division I college-level football teams in the United States: 130

Number of college-level football players: 73,557

Number of college-level football players that are NFL draft eligible: 16,346

Number of high school level football players: 1,036,842

That is not a typo. There are over 1 million high schoolers playing football right now.

Football is the most popular sport in America (a country with a population of over 327 million)

IF you are a statistical anomaly physically, you may be able to get into the NFL via the path of attending a Division II college (maybe), but most likely it will be via Division I.

Probability of getting into a Division I college football program from high school: 2.8%

Probability of getting into the NFL from college football program: 1.6%

Total number of college players drafted into the NFL last year: 256

**Plays**

* gameId: Game identifier, unique (numeric)
* playId: Play identifier, not unique across games (numeric)
* playDescription: Description of play (text)
* quarter: Game quarter (numeric)
* down: Down (numeric)
* yardsToGo: Distance needed for a first down (numeric)
* possessionTeam: Team on offense (text)
* playType: Outcome of dropback: sack or pass (text)
* yardlineSide: 3-letter team code corresponding to line-of-scrimmage (text)
* yardlineNumber: Yard line at line-of-scrimmage (numeric)
* offenseFormation: Formation used by possession team (text)
* personnelO: Personnel used by offensive team (text)
* defendersInTheBox: Number of defenders in close proximity to line-of-scrimmage (numeric)
* numberOfPassRushers: Number of pass rushers (numeric)
* personnelD: Personnel used by defensive team (text)
* typeDropback: Dropback categorization of quarterback (text)
* preSnapHomeScore: Home score prior to the play (numeric)
* preSnapVisitorScore: Visiting team score prior to the play (numeric)
* gameClock: Time on clock of play (MM:SS)
* absoluteYardlineNumber: Distance from end zone for possession team (numeric)
* penaltyCodes: NFL categorization of the penalties that ocurred on the play. For purposes of this contest, the most important penalties are Defensive Pass Interference (DPI), Offensive Pass Interference (OPI), Illegal Contact (ICT), and Defensive Holding (DH). Multiple penalties on a play are separated by a ; (text)
* penaltyJerseyNumber: Jersey number and team code of the player commiting each penalty. Multiple penalties on a play are separated by a ; (text)
* passResult: Outcome of the passing play (C: Complete pass, I: Incomplete pass, S: Quarterback sack, IN: Intercepted pass, text)
* offensePlayResult: Yards gained by the offense, excluding penalty yardage (numeric)
* playResult: Net yards gained by the offense, including penalty yardage (numeric)
* epa: Expected points added on the play, relative to the offensive team. Expected points is a metric that estimates the average of every next scoring outcome given the play's down, distance, yardline, and time remaining (numeric)
* isDefensivePI: An indicator variable for whether or not a DPI penalty ocurred on a given play (TRUE/FALSE)

**Tracking data(week)**

* Each of the 17 week[week].csv files contain player tracking data from all passing plays during Week [week] of the 2018 regular season. Nearly all plays from each [gameId] are included; certain plays or games with insufficient data are dropped. Each team and player plays no more than 1 game in a given week.

* time: Time stamp of play (time, yyyy-mm-dd, hh:mm:ss)

* x: Player position along the long axis of the field, 0 - 120 yards. See Figure 1 below. (numeric)

* y: Player position along the short axis of the field, 0 - 53.3 yards. See Figure 1 below. (numeric)

* s: Speed in yards/second (numeric)

* a: Acceleration in yards/second^2 (numeric)

* dis: Distance traveled from prior time point, in yards (numeric)

* o: Player orientation (deg), 0 - 360 degrees (numeric)

* dir: Angle of player motion (deg), 0 - 360 degrees (numeric)

* event: Tagged play details, including moment of ball snap, pass release, pass catch, tackle, etc (text)

* nflId: Player identification number, unique across players (numeric)

* displayName: Player name (text)

* jerseyNumber: Jersey number of player (numeric)

* position: Player position group (text)

* team: Team (away or home) of corresponding player (text)

* frameId: Frame identifier for each play, starting at 1 (numeric)

* gameId: Game identifier, unique (numeric)

* playId: Play identifier, not unique across games (numeric)

* playDirection: Direction that the offense is moving (text, left or right)

* route: Route ran by offensive player (text)

# Explorative Data Analysis 

In [None]:
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt
import statistics 
import random
import numpy as np
import operator
import matplotlib
matplotlib.use("Agg")
from matplotlib.animation import FFMpegWriter
from IPython.display import Video, Image

* Reading week tracking data 

In [None]:
week1=pd.read_csv("../input/nfl-big-data-bowl-2021/week1.csv")
week2=pd.read_csv("../input/nfl-big-data-bowl-2021/week2.csv")
week3=pd.read_csv("../input/nfl-big-data-bowl-2021/week3.csv")
week4=pd.read_csv("../input/nfl-big-data-bowl-2021/week4.csv")
week5=pd.read_csv("../input/nfl-big-data-bowl-2021/week5.csv")
week6=pd.read_csv("../input/nfl-big-data-bowl-2021/week6.csv")
week7=pd.read_csv("../input/nfl-big-data-bowl-2021/week7.csv")
week8=pd.read_csv("../input/nfl-big-data-bowl-2021/week8.csv")
week9=pd.read_csv("../input/nfl-big-data-bowl-2021/week9.csv")
week10=pd.read_csv("../input/nfl-big-data-bowl-2021/week10.csv")
week11=pd.read_csv("../input/nfl-big-data-bowl-2021/week11.csv")
week12=pd.read_csv("../input/nfl-big-data-bowl-2021/week12.csv")
week13=pd.read_csv("../input/nfl-big-data-bowl-2021/week13.csv")
week14=pd.read_csv("../input/nfl-big-data-bowl-2021/week14.csv")
week15=pd.read_csv("../input/nfl-big-data-bowl-2021/week15.csv")
week16=pd.read_csv("../input/nfl-big-data-bowl-2021/week16.csv")
week17=pd.read_csv("../input/nfl-big-data-bowl-2021/week17.csv")

* Reading games data 

In [None]:
games=pd.read_csv("../input/nfl-big-data-bowl-2021/games.csv")
games.head(100)

* Reading players data

In [None]:
players=pd.read_csv("../input/nfl-big-data-bowl-2021/players.csv")
players.head(100)

In [None]:
players.head()

* Reading plays data 

In [None]:
plays=pd.read_csv('../input/nfl-big-data-bowl-2021/plays.csv') 
plays.head()

* Importing necessary libraries 

In [None]:
import io
import re
import os
#from os import startfile
import os.path as path
import requests
from glob import glob # for combining several CSV files
import altair as alt
from math import radians

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import re
import plotly.express as px
import seaborn as sns

In [None]:
# Importing the required libraries.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from numpy import loadtxt
import xgboost as xgb
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error
import plotly.offline as py
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objects as go
import plotly.express as px

In [None]:
import pandas as pd
import numpy as np
import pandas_profiling as pdp
from pandas_profiling import ProfileReport

# Visualisation profiling

from PIL import Image
import scipy.misc
import plotly.express as px
import plotly.graph_objects as go
import plotly
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.patches as patches
from matplotlib import animation,rc
from matplotlib import animation
from matplotlib.animation import FFMpegWriter
import matplotlib.animation as animation
import seaborn as sns
import dateutil
import plotly.tools as tls
import plotly.graph_objs as go

# Data preprocessing

In [None]:
playernull=players.isnull().sum()
playernull

### Player data does not have any missing fields 

In [None]:
gamesnull=games.isnull().sum()
gamesnull

Games dataset and player dataset does not have any missing values hence there was no such data cleaning to be done in these two files

### Games data does not have any missing fields 

In [None]:
playnull=plays.isnull().sum()
playnull

### Combining all the weeks 

In [None]:
allweek = [week1, week2,week3,week4,week5,week6, week7,week8,week9,week10,week11, week12,week13,week14,week15,week16,week17]
week=pd.concat(allweek)

In [None]:
weeknull=week1.isnull().sum()
weeknull

### Week data has considerable missing values hence filling all the values with either 0 null or maximum frequent value since the value missing was where the displayName was given as football 

In [None]:
week["position"].fillna("null", inplace = True) 
week["nflId"].fillna("null", inplace = True) 
week["o"].fillna("0", inplace = True) 
week["a"].fillna("0", inplace = True) 
week["dir"].fillna("0", inplace = True) 
week["jerseyNumber"].fillna("0", inplace = True)
week["s"].fillna("0", inplace = True) 
week["dis"].fillna("0", inplace = True) 

In [None]:
weekroute = week.groupby('route').size().reset_index(name='count_of_each_route')
weekroute

In [None]:
week['route'] = week['route'].replace(np.nan,"GO")
week

In [None]:
week.groupby(week['displayName']).mean()

In [None]:
weeknull=week.isnull().sum()
weeknull

## Complete data cleaning done for all the datasets  

Dropping unnecessary attributes

In [None]:
week.drop(['frameId','jerseyNumber'],axis=1,inplace=True)
week.head()

### Data processing of missing values in plays dataframe 

In [None]:
plays.drop(['playDescription','gameClock'] ,axis = 1, inplace = True) 

* Backward filling for plays dataset

In [None]:
fill_backward =plays.personnelD.fillna(method='ffill') 
backward_index = plays.index[(plays.personnelD!= fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.personnelD.iloc[backward_rows] = plays.personnelD.iloc[backward_rows].fillna(method='ffill')

In [None]:
fill_backward =plays.defendersInTheBox  .fillna(method='ffill') 
backward_index = plays.index[(plays.defendersInTheBox  != fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.defendersInTheBox  .iloc[backward_rows] = plays.defendersInTheBox  .iloc[backward_rows].fillna(method='ffill')

In [None]:
fill_backward =plays.numberOfPassRushers.fillna(method='ffill') 
backward_index = plays.index[(plays.numberOfPassRushers!= fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.numberOfPassRushers.iloc[backward_rows] = plays.numberOfPassRushers.iloc[backward_rows].fillna(method='ffill')

In [None]:
fill_backward =plays.preSnapVisitorScore.fillna(method='ffill') 
backward_index = plays.index[(plays.preSnapVisitorScore!= fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.preSnapVisitorScore.iloc[backward_rows] = plays.preSnapVisitorScore.iloc[backward_rows].fillna(method='ffill')

In [None]:
fill_backward =plays.passResult.fillna(method='ffill') 
backward_index = plays.index[(plays.passResult!= fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.passResult.iloc[backward_rows] = plays.passResult.iloc[backward_rows].fillna(method='ffill')

In [None]:
fill_backward =plays.absoluteYardlineNumber.fillna(method='ffill') 
backward_index = plays.index[(plays.absoluteYardlineNumber!= fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.absoluteYardlineNumber.iloc[backward_rows] = plays.absoluteYardlineNumber.iloc[backward_rows].fillna(method='ffill')

In [None]:
fill_backward =plays.preSnapHomeScore.fillna(method='ffill') 
backward_index = plays.index[(plays.preSnapHomeScore!= fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.preSnapHomeScore.iloc[backward_rows] = plays.preSnapHomeScore.iloc[backward_rows].fillna(method='ffill')

In [None]:
fill_backward =plays.personnelO.fillna(method='ffill') 
backward_index = plays.index[(plays.personnelO!= fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.personnelO.iloc[backward_rows] = plays.personnelO.iloc[backward_rows].fillna(method='ffill')

In [None]:
fill_backward =plays.offenseFormation.fillna(method='ffill') 
backward_index = plays.index[(plays.offenseFormation != fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.offenseFormation.iloc[backward_rows] = plays.offenseFormation.iloc[backward_rows].fillna(method='ffill')

In [None]:
fill_backward =plays.yardlineSide.fillna(method='ffill') 
backward_index = plays.index[(plays.yardlineSide != fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.yardlineSide.iloc[backward_rows] = plays.yardlineSide.iloc[backward_rows].fillna(method='ffill')

In [None]:
fill_backward =plays.typeDropback.fillna(method='ffill') 
backward_index = plays.index[(plays.typeDropback  != fill_backward)]
backward_rows = sorted(list({ind for b in backward_index for ind in [b-1,b]}))
plays.typeDropback.iloc[backward_rows] = plays.typeDropback.iloc[backward_rows].fillna(method='ffill')

In [None]:
playnull=plays.isnull().sum()
playnull

In [None]:
# Function to convert height of players from feet to inches.
# Input parameter for the function is the value of height attribute from player dataframe.
def feettoinch(players):
    if re.search('[0-9]-[0-9]',players):
        num=players.split('-')
        ans=(int(num[0])*12)+int(num[1])
        return ans
    else:
        return players

# Calling the function feettoinch.
players["height"]=players["height"].apply(feettoinch)

# Display the dataframe after changes.
players.head(10)

In [None]:
# Function to extract age from birthDate.
# Input parameter for the function is the value of birthDate attribute from player dataframe.
def from_dob_to_age(born):
    today = datetime.date.today()
    return float(today.year - born.year - ((today.month, today.day) < (born.month, born.day)))

# Calling the function from_dob_to_age.
players['birthDate'] = pd.to_datetime(players.birthDate) # Converting string type of birthDate into type date.
players['age']=players['birthDate'].apply(from_dob_to_age)

# Age column is added to the player dataframe
players.head()

In [None]:
# Function to categorize players into offensive, defensive and special teams based on their position.
# 1 : Offensive
# 2 : Defensive
# 3 : Special
def assignTeam(pos):
    offensive=['WR','HB','FB','QB','TE','RB']
    defensive=['CB','SS','MLB','OLB','FS','DE','LB','ILB','DB','S','NT','DT']
    special=['P','LS','K']
    if pos in offensive:
        return 1
    if pos in defensive:
        return 2
    if pos in special:
        return 3  

# Calling the function with input paramter to be the value of position attribute from player dataframe.
players["team"]=players["position"].apply(assignTeam)

# A new column team is added to the dataframe.
players.head(10)

* Feature Scaling

In [None]:
week["date"] = week.time.apply(lambda x: x.split("T")[0])
week.head()

In [None]:
week["time"] = week.time.apply(lambda x: x.split("T")[0])
week.head()

* Label encoding

In [None]:
#Numbers of games in every date
graph1 = games['gameDate'].value_counts().reset_index()
graph1.columns = ['date', 'games']
graph1 = graph1.sort_values('games')
fig = px.bar(
    graph1, 
    y='date', 
    x="games", 
    orientation='h', 
    title='Date and number of games on that date', 
    height=900, 
    width=900
)
#autolabel(rects, horizontal=True, percentageLabel=True)
#plt.tight_layout()
fig.show()
plt.savefig("graph.png")
#plt.close(fig)
#Image("graph.png") 

Here we can infer which team has played how many games on the day and which will also help to decide the best team based on the team that has played the more games.(Each team played sixteen regular season games over seventeen weeks. During the season, on a rotating basis, each team would have the weekend off. As a result, opening weekend was moved up to Labor Day weekend. ) 

In [None]:
#Numbers of games in every time
graph2 = games['gameTimeEastern'].value_counts().reset_index()
graph2.columns = ['time', 'games']
graph2 = graph2.sort_values('games')
fig = px.bar(
    graph2, 
    y='time', 
    x="games", 
    orientation='v', 
    title='For each and every time the number of games played', 
    height=400, 
    width=800
)
fig.show()
plt.savefig("graph2.png")


If you turn on an American football game on a Sunday afternoon, you will see a real variety of physical appearances among the players. It is, in a way, a bit like Rugby; there are positions dependent on strength and power, and others dependent on speed. American football is a more exaggerated version of this.Take, for example, Trindon Holliday and Terrence Cody. Holliday is a wide receiver and kick returner for the Denver Broncos. He is only 5 feet 5 inches tall, and when he stands on the field he truly looks like a boy among men.

In [None]:
#Number of games every week
graph3 = games['week'].value_counts().reset_index()
graph3.columns = ['week', 'games']
graph3 = graph3.sort_values('games')
graph3['week'] = graph3['week'].astype(str) + '-'
fig = px.bar(
    graph3, 
    y='week', 
    x="games", 
    orientation='h', 
    title='Number of games played every week', 
    height=600, 
    width=700
)
fig.show()
plt.savefig("graph3.png")
#plt.close(fig)
#Image("graph3.png") 

Here we can infer which team has played how many games on the day and which will also help to decide the best team based on the team that has played the more games.(Each team played sixteen regular season games over seventeen weeks. During the season, on a rotating basis, each team would have the weekend off. As a result, opening weekend was moved up to Labor Day weekend. ) 

In [None]:
players['position'].value_counts()

In [None]:
#Top positions by no of players
#Graph4
plt.rcParams["figure.figsize"] = (20,10)

sns.barplot(x = players.position.value_counts().index, y=players.position.value_counts().values)
plt.xticks(rotation=45)
plt.xlabel('Position count of players')
sns.despine()
plt.savefig("graph4.png")
#plt.close(fig)
#Image("graph4.png") 

→ By this we can see that most of the time the players are the Wide receiver  who is a ball-receiver in gridiron football and we also get to know that more wide receivers than more points can be earned.

In [None]:
#Number of plays every quarter
graph5 = plays['quarter'].value_counts().reset_index()
graph5.columns = ['quarter', 'plays']
graph5 = graph5.sort_values('plays')
fig = px.pie(
    graph5, 
    names='quarter', 
    values="plays",  
    title='Number of plays of every quarter',
    height=500,
    width=800
)
fig.show()
plt.savefig("graph5.png")

There will be 4 quarters in the game each of 15 minutes. Here we infer  how many plays are played in each quarter 

In [None]:
#Number of plays for every yards to go category 
graph6 = plays['yardsToGo'].value_counts().reset_index()
graph6.columns = ['yardsToGo', 'plays']
graph6['yardsToGo'] = graph6['yardsToGo'].astype(str) + '-'
graph6 = graph6.sort_values('plays')
fig = px.bar(
    graph6, 
    y='yardsToGo', 
    x="plays", 
    orientation='h', 
    title='Number of plays for every yards to go category',
    height=800,
    width=800
)
fig.show()
plt.savefig("graph6.jpg")
#Image("graph6.jpg")

The sum of all yards gained by a player who is in possession of the ball during a play. All-purpose yardage includes rushing and receiving yards gained on offense.The number of yards gained by the offensive team advancing the ball from the line of scrimmage.

In [None]:
#No of plays for every team
graph7 = plays['possessionTeam'].value_counts().reset_index()
graph7.columns = ['team', 'plays']
graph7 = graph7.sort_values('plays')
fig = px.bar(
    graph7, 
    y='team', 
    x="plays", 
    orientation='h', 
    title='Number of plays every team has played',
    height=800,
    width=800
)
fig.show()
plt.savefig("graph7.png")

The team with the highest plays played will be the leading team in  the game and here we can see that PIT is the highest plays played team with 738 plays

In [None]:
#Number of plays for every defenders in the box
graph8 = plays['defendersInTheBox'].value_counts().reset_index()
graph8.columns = ['defendersInTheBox', 'plays']
graph8 = graph8.sort_values('plays')
fig = px.bar(
    graph8, 
    x='defendersInTheBox', 
    y="plays",  
    title='Number of plays for every number of defenders in the box',
    height=500,
    width=800
)
fig.show()
plt.savefig("graph8.png")

The Defense in the box was constructed to stop dominant players and force secondary players to step up and beat you. It works best against a team where the dominant player is a perimeter player who is trying to score from open shots and dribble penetration. If you’re going against an opponent with one main scorer and no outside shooting threats then this defense has a great chance to be successful. By the graph we can infer that the more defense in the box and we get the play result

In [None]:
#Number of plays for every passrushers 
graph9 = plays['numberOfPassRushers'].value_counts().reset_index()
graph9.columns = ['numberOfPassRushers', 'plays']
graph9 = graph9.sort_values('plays')
fig = px.bar(
    graph9, 
    x='numberOfPassRushers', 
    y="plays",  
    title='Number of plays for every number of pass rushers present in the play',
    height=500,
    width=800
)
fig.show()
plt.savefig("graph9.png")

Rushing is charging across the line of scrimmage towards the quarterback or kicker in the effort to stop or "sack" them.A pass rush can be effective even if it does not sack the quarterback if it forces the passer to get rid of the ball before he wanted to, resulting in an incomplete pass or interception.so in graph we can infer when there are many play rushers then there are many incomplete pass or interception.

In [None]:
#Number of plays for every dropbacks 
graph10 = plays['typeDropback'].value_counts().reset_index()
graph10.columns = ['typeDropback','plays']
graph10 = graph10.sort_values('plays')
fig = px.pie(
    graph10, 
    names='typeDropback', 
    values="plays",  
    title='Number of plays for every Dropback type',
    height=500,
    width=800
)
fig.show()
plt.savefig("graph10.png")

Dropping back to pass is a passing style employed in American football in which the quarterback initially takes a three-step drop, backpedaling into the pocket to make a pass. It is the most common way of passing the ball in gridiron football.here we can infer that traditional dropback is the most used for passing a ball and traditional is the good method to successfully pass the ball.

In [None]:
fig = px.histogram(
    plays, 
    x="playResult",
    width=800,
    height=500,
    nbins=50,
    title='Play result distribution'
)
fig.show()
plt.savefig("graph11.png")

Playresult is the score of the team at the end of the play.In the graph we can see the increase in playresult with each increase in the play. 

In [None]:
play_Type = plays['playType'].value_counts()  [:50]
plt.figure(figsize=(6,4))
sns.barplot(play_Type.index, play_Type.values, alpha=0.8)
plt.ylabel('Number of playType', fontsize=12)
plt.xlabel('playType', fontsize=9)
plt.xticks(rotation=90)
plt.show();
plt.savefig("graph12.png")

In [None]:
week1_players = week1[['nflId', 'displayName', 'gameId']]

In [None]:
display_players = players[['nflId', 'displayName']]

In [None]:
# merge 2 Dataframe "df_players" and "df_team"
namecount = pd.concat([display_players, week1_players],ignore_index=True)
namecount

In [None]:
# combine 2 colums "displayName" & "gameId" then sort players desending regarding to total number of games played. 
sortgame = week1_players.groupby('displayName')['gameId'].count().reset_index().sort_values( by = 'gameId', ascending = False )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,12))

# figure "a" represent number of top players palaying a max. number of games. 
a = sns.barplot (x = sortgame.gameId, y = sortgame.displayName[:30], ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of max. games player by each player.
for i,j in enumerate(sortgame.gameId[:30]): 
        ax1.text(.05,i+0.15,j,weight = "bold", size = 15)

a.set_title("Total Number of Games played by the top 30 players Distribution" , weight = 'bold', size = 16)
a.set_xlabel('Total Games Count', size = 15)
a.set_ylabel('Top 20 Player Vs. Games Distribution', size = 14)
plt.xticks([0, 5, 10, 15, 20, 25, 30, 35],
           ['0','5','10','15','20','25','30', '35'])
plt.show();
plt.savefig("graph13.png")

In [None]:
weekgame = week.groupby('gameId').size().reset_index(name='count_of_each_gameId')
weekgame

In [None]:
offform = pd.DataFrame(plays.personnelO.value_counts())
offform.index.name = 'OffensiveFormation'
offform.columns=['Play Count']
offform[:15].style.set_caption('Different Offensive Formation')

In [None]:
defform = pd.DataFrame(plays.personnelD.value_counts())
defform.index.name = 'DefensiveFormation'
defform.columns=['Play Count']
defform[:15].style.set_caption('Diffferent Defensive formation')

In [None]:
from plotly.subplots import make_subplots

height = players.height.value_counts()
trace1 = go.Bar(
    x=height.index,
    y=height.values,
    marker=dict(
        color=height.values,
        colorscale = 'BrBg',
        reversescale = False
    ),
)

weight = players.weight.value_counts().head(10)

trace2 = go.Bar(
    x=weight.index,
    y=weight.values,
    orientation='v',
    marker=dict(
        color=weight.values,
        colorscale = 'BrBg',
        reversescale = False
    ),
)
fig = make_subplots(rows=1, cols=2, subplot_titles=('height', 'weight'))
fig.append_trace(trace1, 1,1)
fig.append_trace(trace2, 1,2)


fig['layout'].update(showlegend=False, template ='plotly_white') 
py.iplot(fig, filename="weight")

plt.savefig("graph14.png")

In [None]:
isDefensivePI = plays.isDefensivePI.value_counts()

trace = go.Bar(
    x=isDefensivePI.index,
    y=isDefensivePI.values,
    marker=dict(
        color=isDefensivePI.values,
        colorscale = 'BrBg',
        reversescale = False
    ),
)

layout = go.Layout(
    title='isDefensivePI',
    template="plotly_white",
)

data = [trace]
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename="isDefensivePI")
plt.savefig("graph15.png")

In [None]:
players.groupby('position').weight.max().plot(kind='bar', figsize=(18,6), color={'saddlebrown'})
plt.title('The max weight by position', fontsize=16, fontweight='bold')
plt.xlabel('position',fontsize=14)
plt.ylabel('weight',fontsize=14)
plt.xticks(rotation=45)
plt.show()
plt.savefig("graph16.png")

In [None]:
players.groupby('position').age.max().plot(kind='bar', figsize=(18,6), color={'darkblue'})
plt.title('The max height by position', fontsize=16, fontweight='bold')
plt.xlabel('position',fontsize=14)
plt.ylabel('age',fontsize=14)
plt.xticks(rotation=45)
plt.show()
plt.savefig("graph17.png")

In [None]:
plays.groupby('offenseFormation').offensePlayResult.mean().plot(kind='bar',figsize=(9,6))
plt.xticks(rotation=45)
plt.xlabel('offenseFormation', fontsize=14)
plt.ylabel('offensePlayResult',fontsize=14)
plt.title('The mean Yards gained by offense formation', fontsize=16, fontweight='bold')
plt.show()
plt.savefig("graph18.png")

In [None]:
# combine 3 colums "possessionTeam", "passResult" & "playId" then sort team positions desending regarding number of players achieve pass attempts result
df_postion_pass_player_sort = plays.groupby(['possessionTeam', 'passResult'])['playId'].count().reset_index().sort_values( by = 'playId', ascending = False )
df_complete_postion_pass_player_sort = df_postion_pass_player_sort.query('passResult == "C"')

In [None]:
# combine 2 colums "possessionTeam" & "playId" then sort Teams desending regarding number of players 
df_postion_player_sort = plays.groupby('possessionTeam')['playId'].count().reset_index().sort_values( by = 'playId', ascending = False )


df_complete_postion_pass_player_percentage = df_complete_postion_pass_player_sort.copy()
df_complete_postion_pass_player_percentage = df_complete_postion_pass_player_percentage.merge(df_postion_player_sort, how = 'left', on=['possessionTeam'])
df_complete_postion_pass_player_percentage = df_complete_postion_pass_player_percentage.rename(columns = {'playId_y' :'total_playId', 'playId_x' : 'complete_pass_playId'})
df_complete_postion_pass_player_percentage['complete_pass_%'] = (df_complete_postion_pass_player_percentage.complete_pass_playId 
                                                                 / df_complete_postion_pass_player_percentage.total_playId) * 100

In [None]:
fig, ax1 = plt.subplots (1,figsize = (15,10))

# figure "a" represent number of players acheive pass attempts result in all teams.
a = sns.barplot (x = df_complete_postion_pass_player_percentage['complete_pass_%'], y = df_complete_postion_pass_player_percentage.possessionTeam, 
                 ax = ax1 , linewidth = 1 ,alpha = 0.7, palette = 'tab10')

# create loop to count number of players players acheive pass attempts % in each teams.
for i,j in enumerate(df_complete_postion_pass_player_percentage['complete_pass_%']):
        ax1.text(.01,i+0.1,'{:0.2f} %'.format(j),weight = "bold", size = 10)
             
# create loop to count number of players in each college
a.set_title("Percentage of Players Completing the pass attempt Vs. Positions Team Distribution" , weight = 'bold', size = 10)
a.set_xlabel('Percentage of Player Completing the pass attempts', size = 10)
a.set_ylabel('Positions Team', size = 15)
plt.xticks([0, 10, 20, 30, 40, 50, 60, 70],['0%','10%','20%','30%','40%','50%','60%', '70%'])
plt.show();
plt.savefig("graph19.png")

In [None]:
displayName = week.displayName.value_counts()

trace = go.Bar(
    y=displayName.index[::-1],
    x=displayName.values[::-1],
    orientation='h',
    marker=dict(
        color=displayName.values[::-1],
        colorscale = 'RdBu',
        reversescale = True
    ),
)

layout = go.Layout(
    title='Most frequently occured players in tracking data  ',
    template="plotly_white",
)

data = [trace]
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename="displayName")
plt.savefig("graph20.png")

In [None]:
def find_dist(df, col_name):
    
    # Checking the frequency of games in relation to the column values
    dist = df[col_name].value_counts().reset_index()
    
    # Renaming the columns
    dist.columns = [col_name, 'frequency']
        
    # Sorting the DataFrame based on the column values
    sorted_dist = dist.sort_values(col_name, ascending=True).set_index(col_name)

    # Plotting a bar plot
    sorted_dist.plot(kind='bar', figsize=(20,4))

    # Return a boolean indicating the function was successfully executed
    return True

In [None]:
find_dist(players,'age')
plt.savefig("graph21.png")

In [None]:
# Visualizing Pass Result using Seaborn
plt.figure(figsize=(10,5))
graph = sns.countplot(x='passResult',data=plays)

for p in graph.patches:
    height = p.get_height()
    graph.text(p.get_x()+p.get_width()/2., height + 0.3,height ,ha="center")
    
plt.show()
plt.savefig("graph22.png")

## Tracking players movement 

In [None]:
football=week1[week1['displayName'] == 'Football']

In [None]:
football.drop(['s','a','dis','o','dir','nflId','position'],axis=1,inplace=True)

In [None]:
football.set_index(['gameId','playId'],inplace=True)

In [None]:
football.head()

In [None]:
players_week1=week1[week1['displayName'] != 'Football']

In [None]:
players_week1.set_index(['gameId','playId'],inplace=True)

In [None]:
players_week1.drop(['s','a','dis','o','dir'],axis=1,inplace=True)

In [None]:
trialfootball=football.loc[2018090600,75]

In [None]:
trialplayers=players_week1.loc[2018090600,75]

In [None]:
trialfootball.info()

In [None]:
trialplayers.head()

In [None]:
trial=trialplayers[["displayName","x","y","position"]]
trial.reset_index(inplace=True)

In [None]:
trial.drop(['gameId','playId'],axis=1,inplace=True)

In [None]:
triallast = trial.drop_duplicates('displayName', keep='last')

In [None]:
trialfirst = trial.drop_duplicates('displayName', keep='first')

In [None]:
plt.rcParams["figure.figsize"] = (15, 15)

In [None]:
ax=plt.gca
fx=list(trialfirst["x"])
fy=list(trialfirst["y"])
lx=list(triallast["x"])
ly=list(triallast["y"])
name=list(triallast["displayName"])
pos=list(trialfirst["position"])
for i, txt in enumerate(name):
    plt.annotate(txt, (fx[i], fy[i]),color='maroon')
for i, txt in enumerate(pos):
    plt.annotate(txt, (lx[i], ly[i]),color='blue')
plt.scatter(trialplayers["x"],trialplayers["y"],color="green",label='Players')
plt.scatter(trialfootball["x"],trialfootball["y"],color="black",label='Football')
plt.savefig("graph23.png")

In [None]:
def animated(df):
    fig = px.scatter(
        df, 
        x='x', y='y',  text='position', color='team', animation_frame='playId', animation_group='position',
        range_x=[-6, 128], range_y=[-13, 62],
        hover_data=['displayName', 'jerseyNumber', 's', 'a', 'dis', 'o'])
    fig.update_traces(textposition='top center', marker_size=10)
    fig.update_layout(paper_bgcolor='darkgreen', plot_bgcolor='darkgreen', font_color='white')
    
    return fig

In [None]:
game=week4.gameId.unique()

single = week4.query("gameId=='2018093003'")

In [None]:
#writer = FFMpegWriter(fps=10)

time_slice =  game[0:1000:5]  # Change here
df_slice = week4[week4.gameId.isin(time_slice)]
fig = animated(df_slice)
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 500
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 300
fig.show()

#plt.savevideo("graph15.mp4")