Published on September 28, 2025. By Mar√≠lia Prata, mpwolke.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

import plotly.graph_objs as go
import plotly.offline as py
import plotly.express as px

#Two lines Required to Plot Plotly
import plotly.io as pio
pio.renderers.default = 'iframe'

#Ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Predict NFL player movement after ball is thrown

You are tasked to predict NFL player movement during the video frames after the ball is thrown.

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQyJvCCYvw-XMeGPWfdddvTbMUHHj-svmvKLA&s)Instagram

## Competition Citation

@misc{nfl-big-data-bowl-2026-prediction,
    author = {Michael Lopez and Tom Bliss and Ally Blake and Yao Yan and Martyna Plomecka and Addison Howard},
    
    title = {NFL Big Data Bowl 2026 - Prediction},
    year = {2025},
    
    howpublished = {\url{https://kaggle.com/competitions/nfl-big-data-bowl-2026-prediction}},
    note = {Kaggle}
}

## One input_2023_w[01-18].csv

I picked 12.

In [None]:
player = pd.read_csv("../input/nfl-big-data-bowl-2026-prediction/train/input_2023_w12.csv")
pd.set_option('display.max_columns', None)
player.head(3)

## Output file 4

In [None]:
output4 = pd.read_csv("../input/nfl-big-data-bowl-2026-prediction/train/output_2023_w04.csv")
pd.set_option('display.max_columns', None)
output4.head(3)

### Player birth date

If any data doesn't match format "%Y-%m-%d". Use format='mixed'

On the snippet below, 2nd line https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html

In [None]:
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html

print("Data type of player_birth_date column before parsing : ", player["player_birth_date"].dtypes)
player["player_birth_date"] = pd.to_datetime(player["player_birth_date"], format='mixed')
print("Data type of player_birth_Date column after parsing : ", player["player_birth_date"].dtypes)
print(player["player_birth_date"].head())

### Player Birth year

Splitting YYYY (Year) from YYYY-MM-DD date format

In [None]:
player['player_birth_year'] = pd.DatetimeIndex(player['player_birth_date']).year
print(player["player_birth_year"])

## Unique values of birthYear column (of one single input file 12)

In [None]:
print("Unique birth year values and their counts :")
print(player["player_birth_year"].value_counts())

## Youngest and Oldest Player birth year

In [None]:
#Sanjay V https://www.kaggle.com/code/sanjayv007/nfl-big-data-bowl-beginner-s-complete-eda

# Newest and oldest player
print("Youngest player birth year : ",max(player["player_birth_year"]))
print("Oldest player birth year : ",min(player["player_birth_year"]))

## Players Birth years histogram

The higher bar is the represented by 1999 year where we have more observations on their respective bins (12). More players were born on that year.

It's a left skewed distribution.

https://www.labxchange.org/library/items/lb:LabXchange:10d3270e:html:1

In [None]:
hist = player["player_birth_year"].plot.hist(bins=20, color="orange", edgecolor="black")

## Unique values of Position column

In [None]:
print("Unique position values and their counts :")
pos_val = player.pivot_table(index = ['player_position'], aggfunc = 'size') 
pos_val = pos_val.reset_index()
pos_val.columns= ["Positions", "Counts"]
pos_val = pos_val.sort_values("Counts", ascending = False)
print(pos_val)

## Players: highest height in feet

In [None]:
height = player[player['player_height'] == max(player["player_height"])]
height.head(2)

## Players: lowest height in feet (input file 12)

In [None]:
low_height = player[player['player_height'] == min(player["player_height"])]
low_height.head(2)

### Oldest and youngest players (input file 12)

In [None]:
oldest = player[player['player_birth_year'] == min(player["player_birth_year"])]
oldest.head(2)

In [None]:
youngest = player[player['player_birth_year'] == max(player["player_birth_year"])]
youngest.head(2)

In [None]:
mean=np.ceil(player['player_weight'].mean())
median=np.ceil(player['player_weight'].median())

## Players Weight Distribution (input file 12)

It's a normal distribution with mean value 211.0 and median value 206.0 (pounds of weight).

https://www.labxchange.org/library/items/lb:LabXchange:10d3270e:html:1

In [None]:
#Code by Chinta https://www.kaggle.com/chinta/mlb-is-player-age-important

plt.figure(figsize=(10, 5))
sns.set_style('white')
hist_plot = sns.histplot(player['player_weight'], )
hist_plot.axvline(mean, color='r', linestyle='--', linewidth = 4, label = f'mean-{mean}')
hist_plot.axvline(median, color='g', linestyle='-', linewidth = 4, label = f'median-{median}')
plt.suptitle("Players Weight Distribution")
plt.legend();

## Creating new dataset using play_id, player_role and game_id

In [None]:
## Create a new dataset (features: x, y, player_role)

data = player.query('play_id == 55 and game_id == 2023112300')
print(data[["x", "y", "player_role"]])

## Scatter Player Roles - Alien alike :D

It's cool, seems 3D without being 3D.

However, it's hard for a non-coder like me to interpret the meaning of this scatter plot. In fact, it seems an Alien : D

I think **Rob Mulla was the 1st to plot** something like that I tried below. Many kagglers copied him. Always **Robikscube** to bring those awesome stuff. 

The second time, I tried the same with Plotly.

In [None]:
# define a custom palette
my_palette = ["#95a5a6", "#e74c3c", "#34495e"]
sns.pairplot(
    data,
    x_vars=["x"],
    y_vars=["y"],
    height=3.5,
    hue="player_role", #hue define the color-code variable
    palette=my_palette,   # <-- see here, custom palette
)
plt.title('Players Roles')
plt.show()

## Same scatter with Plotly

In [None]:
## AI Overview

import plotly.express as px

#Two lines Required to Plot Plotly
import plotly.io as pio
pio.renderers.default = 'iframe'

# Create a scatter plot
fig = px.scatter(data, x="x", y="y", color="player_role",
                 title="Interactive Touchdown scatter with Plotly Express")
fig.show()

In [None]:
#By Lennart Haupts https://www.kaggle.com/code/lennarthaupts/getting-a-feel-for-the-tracking-data

# only looking at data from plays when the home team is on the offense
right = player[player['play_direction'] == 'right']
# only looking at a specific match
match = right[right['game_id'] == 2023112300]

## Passers Heatmap (original was tackles)

I'm still trying to get those Passers (tackles) and mostly this Heatmap.

The original was event (I changed to player_role) and tackle (Passer). I had to improvise to get my heatmap. Very likely, it isn't making any sense for both programmers and Football lovers. 

In [None]:
#By Lennart Haupts https://www.kaggle.com/code/lennarthaupts/getting-a-feel-for-the-tracking-data

fig, ax = plt.subplots(figsize=(15,10))
plt.hist2d(right['x'][right['player_role'] == 'Passer'], right['y'][right['player_role'] == 'Passer'],bins=70, cmap='summer')
plt.xlim(3 , 105) #Original 0, 120  Adjust these limits to fit the lines of the plot.
plt.ylim(10,  43) #Original 0, 53.3
plt.title('Heatmap of all player roles as Passers when the offense is moving to the right')
plt.show()

## Draft Session: 1h:56m  of my Touchdown in Rio de Janeiro 2026. 

![](https://www.manhadadiario.com.br/noticias/imagens/11757/1758933520.jpg)

#Acknowledgements:

Sanjay V https://www.kaggle.com/code/sanjayv007/nfl-big-data-bowl-beginner-s-complete-eda

Chinta https://www.kaggle.com/chinta/mlb-is-player-age-important

Lennart Haupts https://www.kaggle.com/code/lennarthaupts/getting-a-feel-for-the-tracking-data

Rob Mulla https://www.kaggle.com/code/robikscube/sign-language-recognition-eda-twitch-stream