## Data Type Encoding & Mapping Version 1:

- Encodes play_type as 1 or 0
- One hot encodes object datatypes:
    - Creates column for each object type possibility
    - Values for new columns are binary values
- Final Shape: (318668, 121)

In [1]:
# Imports
import pandas as pd

In [2]:
# Load filtered dataset
print('Loading CSV (this may take a while for large files)...')
nfl_filtered = pd.read_csv("../dataset/nfl_filtered.csv", low_memory=False, index_col=0)
print("Shape of original dataset: ", nfl_filtered.shape)

Loading CSV (this may take a while for large files)...
Shape of original dataset:  (318668, 20)


## Display feature datatypes

In [3]:
nfl_filtered.dtypes

posteam                        object
defteam                        object
posteam_type                   object
yardline_100                  float64
qtr                             int64
down                          float64
ydstogo                         int64
goal_to_go                    float64
score_differential            float64
game_half                      object
drive                           int64
posteam_timeouts_remaining    float64
defteam_timeouts_remaining    float64
shotgun                         int64
no_huddle                       int64
quarter_seconds_remaining     float64
half_seconds_remaining        float64
game_seconds_remaining        float64
side_of_field                  object
play_type                      object
dtype: object

## Encoding play_type target value

In [4]:
nfl_encoded = nfl_filtered.copy()
nfl_encoded["play_type"] = nfl_filtered["play_type"].map({"pass": 1, "run": 0})
nfl_encoded["play_type"].head(5)

play_id
68     1
92     0
113    1
162    0
183    1
Name: play_type, dtype: int64

## One-Hot Encoding for obj dtypes

Cols to be encoded:
- posteam
- defteam
- posteam_type
- game_half
- side_of_field

In [5]:
cols_to_encode = ["posteam", "defteam", "posteam_type", "game_half", "side_of_field"]

nfl_encoded = pd.get_dummies(nfl_encoded, columns=cols_to_encode, drop_first=True)

In [6]:
nfl_encoded.info()
print("Shape of encoded dataset: ", nfl_encoded.shape)

<class 'pandas.core.frame.DataFrame'>
Index: 318668 entries, 68 to 4106
Columns: 121 entries, yardline_100 to side_of_field_WAS
dtypes: bool(106), float64(9), int64(6)
memory usage: 71.1 MB
Shape of encoded dataset:  (318668, 121)


## Save encoded dataset

In [7]:
print('Saving CSV (this may take a while for large files)...')
nfl_encoded.to_csv("../dataset/nfl_encoded_v1.csv", index=True)

Saving CSV (this may take a while for large files)...
