# [Model Compression](https://www.cs.cornell.edu/~caruana/compression.kdd06.pdf)

## Intro

* **Gap**: In some cases it is not enough for a model to be highly accurate, but also has to meet strict time and space requirements. However, the best performing models are usually large and slow, while the fast and compact models are less accurate.
* **Improvement**: A neural network can mimic the function learned by an ensemble. This is accomplished by first assigning labels to the unlabeled data with the ensemble's predictions. The neural network is then trained on this extended dataset. The neural network is able to achieve similar performance while being 1000x smaller and faster.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split

## Data

In [2]:
df = pd.read_csv('../data/AirlineSatisfaction/prepared.csv')
target_column = "isSatisfied"
X = df.drop(columns=[target_column])
y = df[target_column]

In [3]:
X, X_unlabeled, y, y_unlabeled = train_test_split(X, y, test_size=0.1, random_state=42, stratify=y)

## Ensemble

In [4]:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

from tqdm import tqdm

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

rf = RandomForestClassifier(n_estimators=100, random_state=42)
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
lr = LogisticRegression(max_iter=500)
# svm = SVC(probability=True, max_iter=1000, random_state=42)
knn = KNeighborsClassifier()
nn = MLPClassifier(hidden_layer_sizes=(64,), max_iter=250, random_state=42, verbose=True)

ensemble = VotingClassifier(estimators=[('rf', rf), ('gb', gb), ('lr', lr), ('knn', knn), ('nn', nn)], voting='soft')

ensemble.fit(X_train, y_train)

y_pred = ensemble.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Ensemble Model Accuracy: {accuracy:.4f}")

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Iteration 1, loss = 11.37410151
Iteration 2, loss = 7.32667968
Iteration 3, loss = 6.62887457
Iteration 4, loss = 4.69135660
Iteration 5, loss = 6.45999439
Iteration 6, loss = 3.68290125
Iteration 7, loss = 5.28672309
Iteration 8, loss = 5.35402399
Iteration 9, loss = 4.95053401
Iteration 10, loss = 3.55727248
Iteration 11, loss = 5.74648083
Iteration 12, loss = 4.45347841
Iteration 13, loss = 4.01830633
Iteration 14, loss = 4.82615149
Iteration 15, loss = 4.25791396
Iteration 16, loss = 4.69685177
Iteration 17, loss = 4.81809310
Iteration 18, loss = 4.25753665
Iteration 19, loss = 5.35811330
Iteration 20, loss = 4.21231975
Iteration 21, loss = 5.15376243
Training loss did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.
Ensemble Model Accuracy: 0.9127


## Neural Network

In [18]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

In [19]:
# Define the neural network model
class BinaryClassifier(nn.Module):
    def __init__(self):
        super(BinaryClassifier, self).__init__()
        self.fc1 = nn.Linear(23, 5)  # 23 features to 64 units
        self.fc2 = nn.Linear(5, 1)  # 64 units to 1 unit

    def forward(self, x):
        x = torch.relu(self.fc1(x))  # ReLU activation after first layer
        x = self.fc2(x)              # No activation here, since we'll apply sigmoid in loss
        return x

In [20]:
model = BinaryClassifier()
criterion = nn.BCEWithLogitsLoss()  # Suitable for multi-class/binary classification
optimizer = optim.Adam(model.parameters(), lr=0.005)

In [21]:
# Convert data to tensor
X_train_tensor = torch.tensor(X_train.to_numpy(), dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.to_numpy(), dtype=torch.float32)

# Create DataLoader for batching
dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(dataset, batch_size=128, shuffle=True)

In [22]:
# Training loop
num_epochs = 20
for epoch in range(num_epochs):
    model.train()  # Set the model to training mode
    running_loss = 0.0
    for inputs, labels in train_loader:
        optimizer.zero_grad()  # Clear the gradients

        # Forward pass
        outputs = model(inputs)
        
        # Compute loss (CrossEntropyLoss already applies softmax internally)
        loss = criterion(outputs, labels.unsqueeze(1))
        
        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()

    # Print loss for every epoch
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader)}")

Epoch [1/20], Loss: 0.34164809515843025
Epoch [2/20], Loss: 0.2697305103143056
Epoch [3/20], Loss: 0.25815077085270843
Epoch [4/20], Loss: 0.2491979707764764
Epoch [5/20], Loss: 0.24076046637999707
Epoch [6/20], Loss: 0.229980640571851
Epoch [7/20], Loss: 0.2216198118069233
Epoch [8/20], Loss: 0.21552911436455882
Epoch [9/20], Loss: 0.20998504518443703
Epoch [10/20], Loss: 0.20746769293760642
Epoch [11/20], Loss: 0.20467149634391835
Epoch [12/20], Loss: 0.20337636689854482
Epoch [13/20], Loss: 0.2018525931570265
Epoch [14/20], Loss: 0.19998608691315364
Epoch [15/20], Loss: 0.1985171880859595
Epoch [16/20], Loss: 0.19794716728039277
Epoch [17/20], Loss: 0.19689710854719847
Epoch [18/20], Loss: 0.19580699935173376
Epoch [19/20], Loss: 0.19479812843422603
Epoch [20/20], Loss: 0.19416763638584023


In [23]:
# Convert data to tensor
X_test_tensor = torch.tensor(X_test.to_numpy(), dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.to_numpy(), dtype=torch.float32)

# Create DataLoader for batching
dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(dataset, batch_size=64, shuffle=True)

In [24]:
correct = 0
total = 0

model.eval()  # Set the model to evaluation mode (turns off dropout, batchnorm, etc.)

with torch.no_grad():  # No need to calculate gradients during evaluation
    for inputs, labels in test_loader:
        outputs = model(inputs)
        
        # Apply softmax to get class probabilities (CrossEntropyLoss internally uses softmax)
        preds = nn.functional.sigmoid(outputs) > 0.5
        
        # Count correct predictions
        correct += (preds.squeeze() == labels.squeeze()).sum().item()
        total += labels.size(0)

accuracy = correct / total * 100
print(f'Accuracy: {accuracy:.2f}%')

Accuracy: 92.11%


## Distillation

In [25]:
y_unlabeled_pred = ensemble.predict(X_unlabeled)

In [26]:
accuracy = accuracy_score(y_unlabeled, y_unlabeled_pred)
print(f"Ensemble Model Accuracy: {accuracy:.4f}")

Ensemble Model Accuracy: 0.9571


In [27]:
X_train_new = np.concatenate([X_train.to_numpy(), X_unlabeled.to_numpy()])
y_train_new = np.concatenate([y_train.to_numpy(), y_unlabeled_pred])

In [28]:
model = BinaryClassifier()
criterion = nn.BCEWithLogitsLoss()  # Suitable for multi-class/binary classification
optimizer = optim.Adam(model.parameters(), lr=0.005)

In [30]:
# Convert data to tensor
X_train_tensor = torch.tensor(X_train_new, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train_new, dtype=torch.float32)

# Create DataLoader for batching
dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(dataset, batch_size=128, shuffle=True)

In [31]:
# Training loop
num_epochs = 20
for epoch in range(num_epochs):
    model.train()  # Set the model to training mode
    running_loss = 0.0
    for inputs, labels in train_loader:
        optimizer.zero_grad()  # Clear the gradients

        # Forward pass
        outputs = model(inputs)
        
        # Compute loss (CrossEntropyLoss already applies softmax internally)
        loss = criterion(outputs, labels.unsqueeze(1))
        
        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()

    # Print loss for every epoch
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader)}")

Epoch [1/20], Loss: 0.2940400201860849
Epoch [2/20], Loss: 0.1887842782803842
Epoch [3/20], Loss: 0.17141822811674784
Epoch [4/20], Loss: 0.16301834965879852
Epoch [5/20], Loss: 0.15718934070822355
Epoch [6/20], Loss: 0.15329945792217511
Epoch [7/20], Loss: 0.15042429366470636
Epoch [8/20], Loss: 0.14892071888253494
Epoch [9/20], Loss: 0.14738787536707906
Epoch [10/20], Loss: 0.14636520295775868
Epoch [11/20], Loss: 0.1455314329156915
Epoch [12/20], Loss: 0.1451026990383237
Epoch [13/20], Loss: 0.1444618572500554
Epoch [14/20], Loss: 0.1438194095036826
Epoch [15/20], Loss: 0.1429416001424775
Epoch [16/20], Loss: 0.14319644922631103
Epoch [17/20], Loss: 0.1424717718215139
Epoch [18/20], Loss: 0.14307365318139395
Epoch [19/20], Loss: 0.14244857138475855
Epoch [20/20], Loss: 0.14176064166422184


In [32]:
# Convert data to tensor
X_test_tensor = torch.tensor(X_test.to_numpy(), dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.to_numpy(), dtype=torch.float32)

# Create DataLoader for batching
dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(dataset, batch_size=64, shuffle=True)

In [33]:
correct = 0
total = 0

model.eval()  # Set the model to evaluation mode (turns off dropout, batchnorm, etc.)

with torch.no_grad():  # No need to calculate gradients during evaluation
    for inputs, labels in test_loader:
        outputs = model(inputs)
        
        # Apply softmax to get class probabilities (CrossEntropyLoss internally uses softmax)
        preds = nn.functional.sigmoid(outputs) > 0.5
        
        # Count correct predictions
        correct += (preds.squeeze() == labels.squeeze()).sum().item()
        total += labels.size(0)

accuracy = correct / total * 100
print(f'Accuracy: {accuracy:.2f}%')

Accuracy: 93.88%


In [None]:
# Ensemble: 0.9555
# Small Neural Net: 0.9211
# Small Neural Net w/ Model Compression: 0.9388

# around 2% increase in performance