# Flight Delay Classifier--PyTorch Edition
As PyTorch has stopped supporting conda packages for their module, I am making this separate notebook to use a PyTorch neural net in the classification of flights as either delayed or on-time. The working process will be the same as flight_delay_classifier_v1, just with the PyTorch framework so we will gloss over some of the details.

In [27]:
import pandas as pd
import numpy as np
import os
import fnmatch
import matplotlib.pyplot as plt
from sklearn.preprocessing import FunctionTransformer, RobustScaler, OneHotEncoder
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.compose import ColumnTransformer
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.model_selection import train_test_split
from sklearn.utils.validation import check_array, check_is_fitted
import torch

## Data Prepartion
Here we prepare our functions to load and transform our data. We will be making use of sklearn pipelines before converting them to PyTorch tensors.

As one other note, we will be dropping the carrier codes from our feature set given that it was much less important than departure date and time for our random forest model.

In [4]:
# Load all the PDX files
def load_flight_info(data_dir, pat_str):
    """
    Loads a single data frame containing all the data in data_dir with files matching the pat_str
    """
    df = pd.DataFrame()
    for entry in os.listdir(data_dir):
        # Construct full path
        full_path = os.path.join(data_dir, entry)

        # Check if it is actually a file and if it matches the pattern
        if os.path.isfile(full_path) and fnmatch.fnmatch(entry, pat_str):
            df = pd.concat([df, pd.read_csv(full_path, skiprows=7)])
            print(f'Loaded {full_path} added to dataframe.')

    return df

# sine and cosine transformer classes to be used for date and time variables
class SinTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, period=1):
        self.period = period

    def fit(self, X, y=None):
        X = check_array(X)
        self.n_features_in_ = X.shape[1]
        return self

    def transform(self, X):
        check_is_fitted(self)
        X = check_array(X)
        assert self.n_features_in_ == X.shape[1]
        return np.sin(2*np.pi * X / self.period)

    def get_feature_names_out(self, input_features=None):
        if input_features is None:
            # Use feature_names_in_ if available (set during fit)
            input_features = getattr(self, "feature_names_in_", [f'x{i}' for i in range(self.n_features_in_)])

        # Define how feature names are transformed
        return [f'{col}_sin' for col in input_features]

class CosTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, period=1):
        self.period = period

    def fit(self, X, y=None):
        X = check_array(X)
        self.n_features_in_ = X.shape[1]
        return self

    def transform(self, X):
        check_is_fitted(self)
        X = check_array(X)
        assert self.n_features_in_ == X.shape[1]
        return np.cos(2*np.pi * X / self.period)

    def get_feature_names_out(self, input_features=None):
        if input_features is None:
            # Use feature_names_in_ if available (set during fit)
            input_features = getattr(self, "feature_names_in_", [f'x{i}' for i in range(self.n_features_in_)])

        # Define how feature names are transformed
        return [f'{col}_cos' for col in input_features]

In [69]:
df = load_flight_info('C:/Users/dloso/Documents/Data Science/flight-delay-forecasting/data/delays', 'PDX*')

Loaded C:/Users/dloso/Documents/Data Science/flight-delay-forecasting/data/delays\PDX_AA_Delays.csv added to dataframe.
Loaded C:/Users/dloso/Documents/Data Science/flight-delay-forecasting/data/delays\PDX_AS_Delays.csv added to dataframe.
Loaded C:/Users/dloso/Documents/Data Science/flight-delay-forecasting/data/delays\PDX_B6_Delays.csv added to dataframe.
Loaded C:/Users/dloso/Documents/Data Science/flight-delay-forecasting/data/delays\PDX_DL_Delays.csv added to dataframe.
Loaded C:/Users/dloso/Documents/Data Science/flight-delay-forecasting/data/delays\PDX_F9_Delays.csv added to dataframe.
Loaded C:/Users/dloso/Documents/Data Science/flight-delay-forecasting/data/delays\PDX_G4_Delays.csv added to dataframe.
Loaded C:/Users/dloso/Documents/Data Science/flight-delay-forecasting/data/delays\PDX_HA_Delays.csv added to dataframe.
Loaded C:/Users/dloso/Documents/Data Science/flight-delay-forecasting/data/delays\PDX_MQ_2_Delays.csv added to dataframe.
Loaded C:/Users/dloso/Documents/Data S

In [70]:
# one hot encoder for the categorical values
one_hot_encoder = OneHotEncoder(handle_unknown='ignore', sparse_output=False)

cat_pipeline = Pipeline([
    ('encoder', one_hot_encoder)
])

# Now set up which attributes get which transforms using a ColumnTransformer
time_attribs = ['DepartureTimeHour']
date_attribs = ['DayOfYear']
cat_attribs = ['Carrier Code']#, 'Destination Airport']

cyclic_cossin_transformer = ColumnTransformer(
    transformers=[
        ('categorical', one_hot_encoder, cat_attribs),
        ('day_sin', SinTransformer(period=365), date_attribs),
        ('day_cos', CosTransformer(period=365), date_attribs),
        ('hour_sin', SinTransformer(period=24), time_attribs),
        ('hour_cos', CosTransformer(period=24), time_attribs),
])

In [71]:
df['Delayed'] = np.where(df['Departure delay (Minutes)'] > 0, 1, 0)

df.dropna(subset='Departure delay (Minutes)', inplace=True)

df['Date'] = pd.to_datetime(df['Date (MM/DD/YYYY)'])
df['DayOfYear'] = df['Date'].dt.dayofyear
df['DepartureTimeHour'] = pd.to_datetime(df['Scheduled departure time'], format='%H:%M').dt.hour + pd.to_datetime(df['Scheduled departure time'], format='%H:%M').dt.minute / 60

df = df[['DayOfYear', 'DepartureTimeHour', 'Carrier Code', 'Destination Airport', 'Delayed']]
df = df[df['Carrier Code'] != ' SOURCE: Bureau of Transportation Statistics']

In [72]:
# Drop the destination airport
df = df[['DayOfYear', 'DepartureTimeHour', 'Carrier Code', 'Delayed']]

In [73]:
df_labels = pd.DataFrame(df['Delayed'])
df_features = df.drop('Delayed', axis=1)
df_features = cyclic_cossin_transformer.fit_transform(df_features)
X_train, X_test, y_train, y_test = train_test_split(df_features, df_labels, test_size=0.2)

## PyTorch Conversion
Luckily, PyTorch makes it simple to transition between dataframes and their tensors so we can quickly change over our features and labels for training. Note that we did have the OneHotEncoder put out a dense matrix as that is required for the conversion to a PyTorch tensor--sparse matrices are not allowed.

We will also build out Dataset and DataLoader classes to make sure that we are handling data efficiently in batches.

In [74]:
X_train_tensor = torch.from_numpy(X_train).float()
y_train_tensor = torch.from_numpy(y_train.values).long() # Use .long() for classification labels

X_test_tensor = torch.from_numpy(X_test).float()
y_test_tensor = torch.from_numpy(y_test.values).long()

In [75]:
from torch.utils.data import Dataset, DataLoader

class CustomTabularDataset(Dataset):
    def __init__(self, features, labels):
        self.features = features
        self.labels = labels
        self.length = len(labels)

    def __getitem__(self, index):
        return self.features[index], self.labels[index]

    def __len__(self):
        return self.length

# Create dataset instances
train_dataset = CustomTabularDataset(X_train_tensor, y_train_tensor)
test_dataset = CustomTabularDataset(X_test_tensor, y_test_tensor)

# Define DataLoaders
batch_size = 64
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

## Building the Neural Net
Finally, let's build out the neural net and see how it performs. We'll build it out with 2 hidden layers of 10 nodes each. More detailed investigations of the most accurate and efficient hidden layers layout will wait for a future iteration.

We'll use binary cross-entropy for the loss function and Adam as the optimizer as standard choices and then run the training for 30 epochs.

In [76]:
import torch.nn as nn
import torch.nn.functional as F

class BinaryClassifier(nn.Module):
    def __init__(self, input_dim):
        super(BinaryClassifier, self).__init__()
        self.layer_1 = nn.Linear(input_dim, 10)
        self.layer_2 = nn.Linear(10, 10)
        self.layer_out = nn.Linear(10, 1) # 1 output unit for binary classification

    def forward(self, x):
        x = F.relu(self.layer_1(x))
        x = F.relu(self.layer_2(x))
        x = torch.sigmoid(self.layer_out(x)) # Sigmoid activation for binary output
        return x

model = BinaryClassifier(input_dim=X_train.shape[1])

In [77]:
import torch.optim as optim

criterion = nn.BCELoss() # Binary Cross-Entropy Loss
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [78]:
epochs = 30

for epoch in range(epochs):
    for inputs, labels in train_loader:
        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)
        
        # Reshape labels to match output shape (e.g., (batch_size, 1))
        # labels = labels.unsqueeze(1).float()
        labels = labels.float()
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

    if epoch % 5 == 0:
        print(f'Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}')

Epoch 1/30, Loss: 0.4585
Epoch 6/30, Loss: 0.6080
Epoch 11/30, Loss: 0.6455
Epoch 16/30, Loss: 0.6248
Epoch 21/30, Loss: 0.5758
Epoch 26/30, Loss: 0.6656


## Testing and Evaluating the Model
We notice that the loss function does not monotonically decrease with increasing epochs, which suggests a few possibilities:
1. The learning rate is too large.
2. The loss function is unstable for the problem.
3. The data cannot be sufficiently learned from.
Having altered the learning rate and the loss function to no change, my conclusion is that there is simply not that much to be learned from this dataset. Interestingly, we arrive at a 68.5% accuracy on the test set, which is nearly identical to every other model we have attempted. This again suggests some limitation in the data (or my transformations of it).

In [83]:
correct = 0
total = 0

# Set model to evaluation mode
model.eval() 

with torch.no_grad():
    for inputs, labels in test_loader:
        # Forward pass: get raw predictions (logits)
        outputs = model(inputs)
        
        # Determine the predicted class index (highest score)
        predicted = torch.round(outputs.data)
        
        # Accumulate metrics
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f'Accuracy on test set: {accuracy:.2f}%')

Accuracy on test set: 68.47%
