## Data Type Encoding & Mapping Version 2:

- Encodes play_type as 1 or 0
- Converts object types to categorical types for XGBoost and CatBoost models
- Converts qtr, down, posteam_timeouts_remaining, defteam_timeouts_remaining from float to int
- Final Shape: (318668, 20)

In [4]:
# Imports
import pandas as pd

In [5]:
# Load filtered dataset
print('Loading CSV (this may take a while for large files)...')
nfl_filtered = pd.read_csv("../dataset/nfl_filtered.csv", low_memory=False, index_col=0)
print("Shape of original dataset: ", nfl_filtered.shape)

Loading CSV (this may take a while for large files)...
Shape of original dataset:  (318668, 20)


## Display feature datatypes

In [6]:
nfl_filtered.dtypes

posteam                        object
defteam                        object
posteam_type                   object
yardline_100                  float64
qtr                             int64
down                          float64
ydstogo                         int64
goal_to_go                    float64
score_differential            float64
game_half                      object
drive                           int64
posteam_timeouts_remaining    float64
defteam_timeouts_remaining    float64
shotgun                         int64
no_huddle                       int64
quarter_seconds_remaining     float64
half_seconds_remaining        float64
game_seconds_remaining        float64
side_of_field                  object
play_type                      object
dtype: object

## Encoding play_type target value

In [7]:
nfl_encoded = nfl_filtered.copy()

# Encode target
nfl_encoded["play_type"] = nfl_encoded["play_type"].map({"pass": 1, "run": 0})

## Encode objects types as categories for XGBoost and CatBoost models

In [8]:
# Convert categoricals to category dtype for XGBoost and CatBoost models

categorical_cols = ["posteam", "defteam", "posteam_type", "game_half", "side_of_field"]
for col in categorical_cols:
    nfl_encoded[col] = nfl_encoded[col].astype('category')

## Convert numeric data types

In [14]:
integer_cols = ['qtr', 'down', 'posteam_timeouts_remaining', 
'defteam_timeouts_remaining', 'goal_to_go']

for col in integer_cols:
    nfl_encoded[col] = nfl_encoded[col].astype('int64')


## Display feature datatypes

In [16]:
print("Shape of encoded dataset: ", nfl_encoded.shape)
nfl_encoded.dtypes


Shape of encoded dataset:  (318668, 20)


posteam                       category
defteam                       category
posteam_type                  category
yardline_100                   float64
qtr                              int64
down                             int64
ydstogo                          int64
goal_to_go                       int64
score_differential             float64
game_half                     category
drive                            int64
posteam_timeouts_remaining       int64
defteam_timeouts_remaining       int64
shotgun                          int64
no_huddle                        int64
quarter_seconds_remaining      float64
half_seconds_remaining         float64
game_seconds_remaining         float64
side_of_field                 category
play_type                        int64
dtype: object

## Save Encoded Dataset

In [17]:
print('Saving CSV (this may take a while for large files)...')
nfl_encoded.to_csv("../dataset/nfl_encoded_v2.csv", index=False)

Saving CSV (this may take a while for large files)...
