<h1> Task 1:  CIFAR-10 </h1>

In this assignment, you will be building your own architecture for a (convolutional) neural network, and you will try to get a reasonable model.  You may use the code below, or you may build your own architecture.  Create an architecture that receives over 70% on accuracy on the testing set for CIFAR-10.
   


In [2]:
import torch
import torchvision
import torchvision.transforms as transforms
#you may need to pip install the appropriate modules

In [3]:
##preprocessing, it is easy to get the data via torchvision.

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Let's show some images.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))


The below code is the one we used in class, which yielded an accuracy of around 56% on the testing set.  You will be creating your own CNN to get better accuracy.  You may either start from scratch of use the CNN below as a starting point.

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class Net_Improved(nn.Module):
    def __init__(self):
        super().__init__()
        # 1st Convolutional Block: 3 -> 64 filters
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(64)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.pool = nn.MaxPool2d(2, 2)
        
        # 2nd Convolutional Block: 64 -> 128 filters
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.conv4 = nn.Conv2d(128, 128, kernel_size=3, padding=1)
        self.bn4 = nn.BatchNorm2d(128)

        # Output size after two 2x2 max-pools (32x32 -> 16x16 -> 8x8) 
        # is 128 channels * 8 * 8.
        self.fc1 = nn.Linear(128 * 8 * 8, 512)
        self.fc2 = nn.Linear(512, 128)
        self.fc3 = nn.Linear(128, 10)
        
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        # 1st Block
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.pool(F.relu(self.bn2(self.conv2(x)))) # 64 channels, 16x16
        
        # 2nd Block
        x = F.relu(self.bn3(self.conv3(x)))
        x = self.pool(F.relu(self.bn4(self.conv4(x)))) # 128 channels, 8x8
        
        # Flatten and Fully Connected Layers
        x = torch.flatten(x, 1) 
        x = self.dropout(F.relu(self.fc1(x)))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net_Improved()

In [None]:
# Loss function
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

The code below trains the network, but you may of course use your own code/your own version.

In [None]:
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')

<h1>Include a chart of the training loss as a function of the training.  Your chart should also include the accuracy.</h1>

In [None]:
for epoch in range(5):  # loop over the dataset multiple times
     running_loss = 0.0
     for i, data in enumerate(trainloader, 0): 
         running_loss += loss.item()
         if i % 2000 == 1999:    # print every 2000 mini-batches
             print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
             running_loss = 0.0
 
 print('Finished Training')

In [None]:
# Testing set (images)
dataiter = iter(testloader)
images, labels = next(dataiter)

correct = 0
total = 0

with torch.no_grad():
    for data in testloader:
        images, labels = data
        # calculate outputs by running images through the network
        outputs = net(images)
        # the class with the highest energy is what we choose as prediction
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

In [None]:
# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predictions = torch.max(outputs, 1)
        # collect the correct predictions for each class
        for label, prediction in zip(labels, predictions):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1


# print accuracy for each class
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')

<h1> Task 2:  Language Model </h1>

Consider the language model from class.  This was the one on Friday, April 4th.  Modify the "intents.json" and train a chatbot that can respond to more queries.  You may use the architecture in class, or implement the transformer architecture (optional).  Please include an output of your conversation with the bot with the following inputs:

1.  "Hi."
2.  "How are you doing?"
3.  "What is the weather like today."
4.  "What are your plans for the weekend?"
5.  "What are your plans for the summer?"


In [None]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import numpy as np
import json
import nltk
from nltk.stem.snowball import SnowballStemmer

# Initialize stemmer
stemmer = SnowballStemmer("english")

# Helper function for tokenization
def tokenize(sentence):
    return nltk.word_tokenize(sentence)

# Helper function for stemming
def stem(word):
    return stemmer.stem(word.lower())

# Helper function to create Bag-of-Words
def bag_of_words(tokenized_sentence, all_words):
    tokenized_sentence = [stem(w) for w in tokenized_sentence]
    bag = np.zeros(len(all_words), dtype=np.float32)
    for idx, w in enumerate(all_words):
        if w in tokenized_sentence:
            bag[idx] = 1.0
    return bag

# Load intents.json
with open('intents.json', 'r') as f:
    intents = json.load(f)

all_words = []
tags = []
xy = [] # list of (pattern, tag)

# Loop through each intent and pattern
for intent in intents['intents']:
    tag = intent['tag']
    tags.append(tag)
    for pattern in intent['patterns']:
        w = tokenize(pattern)
        all_words.extend(w)
        xy.append((w, tag))

# Filter and stem all words
ignore_words = ['?', '!', '.', ',']
all_words = [stem(w) for w in all_words if w not in ignore_words]
# Sort and remove duplicates
all_words = sorted(list(set(all_words)))
tags = sorted(list(set(tags)))

# Create training data
X_train = []
y_train = []
for (pattern_sentence, tag) in xy:
    bag = bag_of_words(pattern_sentence, all_words)
    X_train.append(bag)

    label = tags.index(tag)
    y_train.append(label)

X_train = np.array(X_train)
y_train = np.array(y_train)

# Create a custom dataset class
class ChatDataset(Dataset):
    def __init__(self):
        self.n_samples = len(X_train)
        self.x_data = torch.from_numpy(X_train)
        self.y_data = torch.from_numpy(y_train).type(torch.LongTensor)

    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    def __len__(self):
        return self.n_samples

# Hyperparameters and DataLoader
batch_size = 8
hidden_size = 8
output_size = len(tags)
input_size = len(X_train[0]) 
learning_rate = 0.001
num_epochs = 1000 # Increased for better performance

dataset = ChatDataset()
train_loader = DataLoader(dataset=dataset,
                          batch_size=batch_size,
                          shuffle=True,
                          num_workers=0)

In [None]:
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        # 3 fully connected layers
        self.l1 = nn.Linear(input_size, hidden_size)
        self.l2 = nn.Linear(hidden_size, hidden_size)
        self.l3 = nn.Linear(hidden_size, num_classes)
        self.relu = nn.ReLU() # Activation function

    def forward(self, x):
        out = self.l1(x)
        out = self.relu(out)
        out = self.l2(out)
        out = self.relu(out)
        out = self.l3(out)
        # no activation and no softmax at the end (CrossEntropyLoss handles this)
        return out

In [None]:
# Instantiate model, loss, and optimizer
model = NeuralNet(input_size, hidden_size, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Training loop
for epoch in range(num_epochs):
    for (words, labels) in train_loader:
        outputs = model(words)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    if (epoch+1) % 100 == 0:
        print (f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

print(f'Final loss: {loss.item():.4f}')

In [None]:
import random

# Set model to evaluation mode
model.eval()

# Chat loop
print("Let's chat! Type 'quit' to exit.")
while True:
    sentence = input("You: ")
    if sentence == "quit":
        break

    # 1. Preprocess input
    sentence = tokenize(sentence)
    X = bag_of_words(sentence, all_words)
    X = X.reshape(1, X.shape[0])
    X = torch.from_numpy(X)

    # 2. Get model prediction
    output = model(X)
    _, predicted = torch.max(output, dim=1)
    
    # 3. Get predicted tag
    tag = tags[predicted.item()]

    # 4. Check confidence (optional, but good practice)
    # The output logit needs softmax to become probability
    probs = torch.softmax(output, dim=1)
    prob = probs[0][predicted.item()]

    if prob.item() > 0.75: # Confidence threshold
        for intent in intents['intents']:
            if tag == intent["tag"]:
                # 5. Output response
                response = random.choice(intent['responses'])
                print(f"Bot: {response}")
                break
    else:
        print(f"Bot: I do not understand...")

<h1> Task 3: Gradient boosting </h1>

In this task, you will be using a gradient boosting method to predict flight departure delays.

In [None]:
import warnings

warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier #you may need to pip install this

In [None]:
train = pd.read_csv("flight_delays_train.csv")
test = pd.read_csv("flight_delays_test.csv")

<h2> Given flight departure time, carrierâ€™s code, departure airport, destination location, and flight distance, you have to predict departure delay for more than 15 minutes. </h2>
The code below uses only two features (Distance and departure time).  

In [None]:
X_train = train[["Distance", "DepTime"]].values
y_train = train["dep_delayed_15min"].map({"Y": 1, "N": 0}).values
X_test = test[["Distance", "DepTime"]].values

X_train_part, X_valid, y_train_part, y_valid = train_test_split(
    X_train, y_train, test_size=0.3, random_state=17
)

In [None]:
xgb_model = XGBClassifier(seed=17)

xgb_model.fit(X_train_part, y_train_part)
xgb_valid_pred = xgb_model.predict_proba(X_valid)[:, 1]

<h2> Your task!</h2>
Try to create a better model!  The threshold for what a 'better' model is is up to you - you could spend a lot of time making a good model. The evaluation metric is the ROC AUC (you can read more about it via a quick Google search).  Here are some starting points: try to modify which features to use, and also you may attempt to create new features.  Evaluate your model on the testing set (see the above "flight_delays_test.csv").  This (was) actually a Kaggle competition: https://www.kaggle.com/c/flight-delays-spring-2018.  If you want, you can submit your results there to see how well you did (notably, the testing labels are on purpose not part of this data set).

In [None]:
# Convert target variable from 'Y'/'N' to 1/0
train["dep_delayed_15min"] = train["dep_delayed_15min"].map({"Y": 1, "N": 0})
y_train = train["dep_delayed_15min"].values

# Combine train and test for consistent feature engineering (but keep labels separate!)
full_data = pd.concat([train.drop("dep_delayed_15min", axis=1), test], ignore_index=True)

# 1. Feature Engineering: Departure Hour
# DepTime is in HHMM format (e.g., 1934 for 7:34 PM). Extract the hour.
full_data['DepHour'] = full_data['DepTime'] // 100
# Handle case where DepTime is 2400 (midnight) or near (e.g., 1, 100, 2400) - set them to 0 or 23
full_data.loc[full_data['DepHour'] == 24, 'DepHour'] = 0

# 2. Categorical Features for Target Encoding
cat_features = ['UniqueCarrier', 'Origin', 'Dest']

# Target Encoding (Mean of the target for each category)
# We calculate the mean for the training data only to avoid data leakage.
for feature in cat_features:
    # Calculate the mean target for each category in the training set
    target_mean = train.groupby(feature)['dep_delayed_15min'].mean()
    
    # Create the new encoded feature name
    encoded_feature_name = f'{feature}_Target_Encoded'
    
    # Map the mean values to the full dataset
    full_data[encoded_feature_name] = full_data[feature].map(target_mean)

    # Fill NaNs in test data (categories not seen in train) with the global mean
    global_mean = train['dep_delayed_15min'].mean()
    full_data[encoded_feature_name].fillna(global_mean, inplace=True)
    
# 3. Final Feature Set
# Note: XGBoost can handle the numerical representation of Month, DayofMonth, etc.
# But for better performance, they should be treated as categorical or binned.
# We'll use the simplest effective set for improvement:
features = ['Distance', 'DepTime', 'DepHour', 
            'UniqueCarrier_Target_Encoded', 'Origin_Target_Encoded', 'Dest_Target_Encoded',
            'Month', 'DayOfWeek']

# Split back into train and test
X_train = full_data.iloc[:len(train)][features].values
X_test = full_data.iloc[len(train):][features].values

# Re-define train/validation split
X_train_part, X_valid, y_train_part, y_valid = train_test_split(
    X_train, y_train, test_size=0.3, random_state=17
)

In [None]:
xgb_model = XGBClassifier(seed=17)

xgb_model.fit(X_train_part, y_train_part)
xgb_valid_pred = xgb_model.predict_proba(X_valid)[:, 1]

In [None]:
roc_auc_score(y_valid, xgb_valid_pred)

<h2> Task 4: Some questions (no code) </h2>
 
1. What was the architecture you used in the CNN for CIFAR-10?  How many features did you have?

2. How would you attempt to improve your model in (1)?

3. When do you think gradient-boosting methods would perform well, and when would they perform poorly?

4. (Optional) Compare XGBoost to logistic regression in (3), and implement transformers in (2).

In [7]:
#     1. Convolutions --> Block Normalization -> ReLU --> Convolutions --> Batch Normalization 
#     --> ReLU --> max the pool for two blocks (3-->64 and 64-->128). There were 8192 features.
#     2. May want to use Deep Residual networks
#     3. GB models perform well in structured data, data with mixed features, and alarge data set.
#     GB models do not perform well under unstructured data, data involving real time prediction, 
#     and extremely large data