**Vector Quantization Overview:** (Do Not Run The Code In This Cell)

In [None]:
Original word vector for 'cat': [ 0.45 -0.32  0.12  0.91 -0.56]

Quantized word vector (using centroids):
[ 0.445 -0.56   0.445  0.445 -0.56 ]

Cluster centroids (lookup table):
[ 0.445 -0.56   0.12 ]

Indices of the quantized word vector (pointing to centroids):
[0 1 0 0 1]

Dequantized word vector for 'cat': [ 0.445 -0.56   0.445  0.445 -0.56 ]

**VPTQ:**

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_moons
from torch.utils.data import DataLoader, TensorDataset

# 1. Simple Neural Network in PyTorch
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(2, 16)
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.sigmoid(self.fc3(x))
        return x

# 2. Generate a simple dataset (two moons)
def generate_data():
    X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)
    X, y = torch.tensor(X, dtype=torch.float32), torch.tensor(y, dtype=torch.float32)
    return X, y

# 3. Train function for the simple neural network
def train(model, loader, criterion, optimizer, epochs=10):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        for inputs, labels in loader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs.squeeze(), labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f"Epoch {epoch+1}, Loss: {running_loss / len(loader)}")

# 4. Function to perform Vector Quantization (VQ)
def vector_quantization(weights, num_clusters=8):
    shape = weights.shape
    # Flatten the weights to 2D for clustering
    weights_flat = weights.reshape(-1, 1)

    # Use KMeans for vector quantization
    kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(weights_flat)

    # Replace weights with the nearest centroid
    quantized_weights = kmeans.cluster_centers_[kmeans.labels_]

    # Reshape back to the original weight shape
    return quantized_weights.reshape(shape)

# 5. Function to apply quantization to the model
def apply_quantization(model, num_clusters=8):
    with torch.no_grad():
        for param in model.parameters():
            if len(param.shape) > 1:  # Only quantize weights (not biases)
                quantized_weights = vector_quantization(param.numpy(), num_clusters)
                param.copy_(torch.tensor(quantized_weights))

# 6. Main script
def main():
    # Generate dataset and create DataLoader
    X, y = generate_data()
    dataset = TensorDataset(X, y)
    loader = DataLoader(dataset, batch_size=32, shuffle=True)

    # Initialize model, loss function, and optimizer
    model = SimpleNN()
    criterion = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    print("Training the model before quantization:")
    # Train the model for a few epochs
    train(model, loader, criterion, optimizer, epochs=10)

    # Apply vector quantization to the weights
    print("\nApplying vector quantization to model weights...")
    apply_quantization(model, num_clusters=8)

    print("Training the model after quantization:")
    # Re-train the quantized model to see the effects
    train(model, loader, criterion, optimizer, epochs=10)

if __name__ == "__main__":
    main()

Training the model before quantization:
Epoch 1, Loss: 0.6754213478416204
Epoch 2, Loss: 0.6464092042297125
Epoch 3, Loss: 0.6132115349173546
Epoch 4, Loss: 0.5718646757304668
Epoch 5, Loss: 0.5307172844186425
Epoch 6, Loss: 0.49559501465409994
Epoch 7, Loss: 0.46544008888304234
Epoch 8, Loss: 0.44852482434362173
Epoch 9, Loss: 0.43288429733365774
Epoch 10, Loss: 0.41632851865142584

Applying vector quantization to model weights...
Training the model after quantization:
Epoch 1, Loss: 0.41042427998036146
Epoch 2, Loss: 0.3944588592275977
Epoch 3, Loss: 0.38029393134638667
Epoch 4, Loss: 0.3777327071875334
Epoch 5, Loss: 0.364715694449842
Epoch 6, Loss: 0.3652488263323903
Epoch 7, Loss: 0.3485293942503631
Epoch 8, Loss: 0.34981080051511526
Epoch 9, Loss: 0.336352757178247
Epoch 10, Loss: 0.3363899355754256


From the training results, we can observe the following:

**Before Quantization:**

The loss decreases steadily over 10 epochs, showing that the model is learning effectively and improving its predictions.
The final loss after 10 epochs is 0.416, indicating that the model has managed to capture the patterns in the dataset reasonably well.

**After Quantization:**

After applying vector quantization, the loss starts at 0.410, which is slightly lower than the final loss before quantization, suggesting that the quantization has not significantly impacted the model's immediate performance.
Over the next 10 epochs, the loss continues to decrease, though at a slightly slower rate compared to before quantization.

The final loss after quantization and additional training is 0.336, which is lower than the final loss before quantization.

**Interpretation:**

The quantization did not significantly degrade the model's performance. In fact, after retraining, the model achieved even better results than before quantization.

This demonstrates the effectiveness of vector quantization as a method to compress the model while retaining (or in this case, slightly improving) its performance.

**Independent Layer Quantization:**

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from sklearn.cluster import KMeans
from sklearn.datasets import make_moons
from torch.utils.data import DataLoader, TensorDataset

# 1. Simple Neural Network in PyTorch
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(2, 16)
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.sigmoid(self.fc3(x))
        return x

# 2. Generate a simple dataset (two moons)
def generate_data():
    X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)
    X, y = torch.tensor(X, dtype=torch.float32), torch.tensor(y, dtype=torch.float32)
    return X, y

# 3. Train function for the simple neural network
def train(model, loader, criterion, optimizer, epochs=10):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        for inputs, labels in loader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs.squeeze(), labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f"Epoch {epoch+1}, Loss: {running_loss / len(loader)}")

# 4. Function to perform Vector Quantization (VQ)
def vector_quantization(weights, num_clusters=8):
    shape = weights.shape
    weights_flat = weights.reshape(-1, 1)

    kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(weights_flat)
    quantized_weights = kmeans.cluster_centers_[kmeans.labels_]

    return quantized_weights.reshape(shape)

# 5. Function to quantize specific layers independently
def apply_layerwise_quantization(model, layer_quantization_params):
    with torch.no_grad():
        for i, (name, param) in enumerate(model.named_parameters()):
            if len(param.shape) > 1:  # Only quantize weights (not biases)
                if name in layer_quantization_params:  # Quantize only specified layers
                    num_clusters = layer_quantization_params[name]
                    quantized_weights = vector_quantization(param.numpy(), num_clusters)
                    param.copy_(torch.tensor(quantized_weights))

# 6. Main script
def main():
    # Generate dataset and create DataLoader
    X, y = generate_data()
    dataset = TensorDataset(X, y)
    loader = DataLoader(dataset, batch_size=32, shuffle=True)

    # Initialize model, loss function, and optimizer
    model = SimpleNN()
    criterion = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    print("Training the model before quantization:")
    train(model, loader, criterion, optimizer, epochs=10)

    # Define quantization parameters for each layer
    # fc1: 8 clusters, fc2: 16 clusters, fc3: 4 clusters
    layer_quantization_params = {
        'fc1.weight': 8,
        'fc2.weight': 16,
        'fc3.weight': 4
    }

    # Apply layer-wise quantization
    print("\nApplying layer-wise quantization...")
    apply_layerwise_quantization(model, layer_quantization_params)

    print("Training the model after layer-wise quantization:")
    train(model, loader, criterion, optimizer, epochs=10)

if __name__ == "__main__":
    main()

Training the model before quantization:
Epoch 1, Loss: 0.661501893773675
Epoch 2, Loss: 0.6172461025416851
Epoch 3, Loss: 0.5786951873451471
Epoch 4, Loss: 0.5380938546732068
Epoch 5, Loss: 0.506526131182909
Epoch 6, Loss: 0.4669571053236723
Epoch 7, Loss: 0.42523473780602217
Epoch 8, Loss: 0.3854773547500372
Epoch 9, Loss: 0.3539395099505782
Epoch 10, Loss: 0.3242013403214514

Applying layer-wise quantization...
Training the model after layer-wise quantization:
Epoch 1, Loss: 0.29677776945754886
Epoch 2, Loss: 0.2770363623276353
Epoch 3, Loss: 0.25871247006580234
Epoch 4, Loss: 0.24440406332723796
Epoch 5, Loss: 0.2334425817243755
Epoch 6, Loss: 0.22327196062542498
Epoch 7, Loss: 0.21568034356459975
Epoch 8, Loss: 0.20416234119329602
Epoch 9, Loss: 0.1966654050629586
Epoch 10, Loss: 0.1880769394338131


**Illustration of the Process:**

Let’s assume we have 3 weights: 0.52, 1.3, and -0.9.

**Step 1: Cluster the Weights**

Assume we perform K-Means clustering and obtain the following 3 centroids:

Centroid 1: 0.5

Centroid 2: 1.25

Centroid 3: -1.0

The lookup table (codebook) would look like:

Lookup Table (Codebook):
Index | Centroid
----------------

  1   |   0.5

  2   |   1.25
  
  3   |  -1.0

**Step 2: Quantize the Weights**

Each weight is replaced with the index of its nearest centroid:

0.52 is closest to centroid 0.5 → Replace 0.52 with index 1
1.3 is closest to centroid 1.25 → Replace 1.3 with index 2
-0.9 is closest to centroid -1.0 → Replace -0.9 with index 3
So, the weights [0.52, 1.3, -0.9] are replaced by the indices [1, 2, 3].

**Step 3: Use Lookup Table During Inference**

When the model uses these weights during inference:

It sees index 1 and looks up 0.5 in the codebook.
It sees index 2 and looks up 1.25 in the codebook.
It sees index 3 and looks up -1.0 in the codebook.
So, even though the original weights are not stored in the model, the model can reconstruct approximate versions of them using the lookup table.

**Lookup Table Illustration**

In [None]:
import numpy as np

# Example centroids (lookup table)
centroids = np.array([0.5, 1.25, -1.0])

# Quantized weights (indices referring to centroids)
quantized_weights = np.array([1, 2, 3])

# Lookup table to retrieve the actual weights
def lookup_weights(quantized_weights, centroids):
    # Subtract 1 because index is 1-based (in our example), but arrays are 0-based
    actual_weights = centroids[quantized_weights - 1]
    return actual_weights

# Get the dequantized (approximate) weights
dequantized_weights = lookup_weights(quantized_weights, centroids)
print(dequantized_weights)  # Output: [0.5 1.25 -1.0]

[ 0.5   1.25 -1.  ]
