# テーマB：重みのベクトル量子化とスカラー量子化の効率度比較とその融合手法の検討
[Open In Colab](https://colab.research.google.com/github/ArtIC-TITECH/b3-proj-2025/blob/main/theme_B/theme_B.ipynb)

## モジュールの読み込み

In [6]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.nn.init as init
import numpy as np
import matplotlib.pyplot as plt
import torchvision
import torchvision.transforms as transforms
from torchvision import datasets
from torch.utils.data import DataLoader
from torch.autograd import Function
from sklearn.cluster import KMeans

## MNISTのデータセット/精度評価関数の作成

In [2]:
# 実行デバイスの設定
device = 'cuda:2'

# 普通のtransform
transform_normal = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# テストデータには普通のtransformを使ってください
transform_for_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform_normal) # モデルの学習に使うデータセット
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform_for_test) # モデルの評価に使うデータセット
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

def compute_accuracy(model, test_loader, device='cuda:0'):
    model.eval()  # 評価モード
    model.to(device)
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            outputs = model(images.to(device))
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels.to(device)).sum().item()
    accuracy = 100 * correct / total
    print(f'Accuracy: {accuracy:.2f}%')
    return accuracy

def train(model, lr=0.05, epochs=5, device='cuda:0'):
    # 損失関数と最適化手法の定義
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=lr)
    model.to(device)
    for epoch in range(epochs):
        loss_sum = 0
        for images, labels in train_loader:
            # モデルの予測
            outputs = model(images.to(device))

            # 損失の計算
            loss = criterion(outputs, labels.to(device))
            loss_sum += loss.item()

            # 勾配の初期化
            optimizer.zero_grad()

            # バックプロパゲーション
            loss.backward()

            # オプティマイザの更新
            optimizer.step()

        # 損失を表示
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss_sum/len(train_loader):.4f}')
    return model



## 通常モデルの学習

In [3]:
class SimpleModel(nn.Module):
    def __init__(self): # モデルのセットアップ
        super(SimpleModel, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x): # モデルが行う処理
        x = x.view(-1, 28 * 28)  # 28x28の画像を１次元に変換
        x = self.fc1(x) 
        x = nn.ReLU()(x) # 活性化関数
        x = self.fc2(x) 
        x = nn.ReLU()(x) # 活性化関数
        x = self.fc3(x) 
        return x

# モデルのインスタンスを作成
model = SimpleModel().to(device)

model = train(model, lr=0.1, epochs=5, device=device)

accuracy = compute_accuracy(model, test_loader, device=device)

Epoch [1/5], Loss: 0.4591
Epoch [2/5], Loss: 0.1775
Epoch [3/5], Loss: 0.1298
Epoch [4/5], Loss: 0.1030
Epoch [5/5], Loss: 0.0844
Accuracy: 96.68%


## スカラー量子化（一様対称量子化）の実行

###  プロセス：量子化層に変換-->量子化認識学習

ここでは簡便に量子化パラメータをmin-maxスケーリングで決定する
対称量子化なので、行列Xの最大値と最小値の差の２分の1を$p$-bitの数値範囲の最大値$q_{max}$でわる

$q_{max} = 2^{(p-1)} - 1$

$s = \frac{max(X) - min(X)}{2q_{max}}$

$X_{q} = s * \text{clip}(\text{round}(\frac{X}{s}), -q_{max}, q_{max})$

In [None]:

class SymQuantSTE(Function):
    @staticmethod
    def forward(ctx, input: torch.Tensor, scale: torch.Tensor, num_bits: int):
        if num_bits == 1:
            s = scale.abs()
            output = s * torch.sgn(input)
        else:
            s = scale.abs().clamp_min(1e-8)
            qmax = 2 ** (num_bits - 1) - 1
            q = torch.clamp(torch.round(input / s), -qmax, qmax)
            output = q * s

        # backward用に保存
        ctx.save_for_backward(input, s)
        ctx.num_bits = num_bits
        return output

    @staticmethod
    def backward(ctx, grad_output):
        input, s = ctx.saved_tensors   # forwardでsaveしたものを正しく取り出す
        num_bits = ctx.num_bits
        if num_bits == 1:
            grad_input = torch.clamp(grad_output, -1, 1)
        else:
            qmax = 2 ** (num_bits - 1) - 1
            mask = (input.abs() <= qmax * s).to(grad_output.dtype)
            grad_input = grad_output * mask

        return grad_input, None, None




class SymQuantLinear(nn.Linear):
    def __init__(self, in_features, out_features, bias=True, weight_bits=8, act_bits=None):
        super().__init__(in_features, out_features, bias)
        self.weight_bits = weight_bits
        self.act_bits = act_bits

    def forward(self, input):
        # weight のスケール
        if self.weight_bits == 1:
            weight_scale = self.weight.abs().sum() / self.weight.numel()
        else:
            qmax_w = 2 ** (self.weight_bits - 1) - 1
            weight_scale = (self.weight.max() - self.weight.min()) / (2 * qmax_w)

        # activation のスケール
        if self.act_bits is not None:
            if self.act_bits == 1:
                act_scale = input.abs().sum() / input.numel()
            else:
                qmax_a = 2 ** (self.act_bits - 1) - 1
                act_scale = (input.max() - input.min()) / (2 * qmax_a)
            input = SymQuantSTE.apply(input, act_scale, self.act_bits)

        # quantized weight
        w_q = SymQuantSTE.apply(self.weight, weight_scale, self.weight_bits)

        return F.linear(input, w_q, self.bias)



def replace_linear_with_quantizedlinear(module, weight_bits=8, act_bits=None):
    for name, child in module.named_children():
        # すでに QuantizedLinear ならスキップ
        if isinstance(child, SymQuantLinear):
            continue
        if isinstance(child, nn.Linear):
            qlinear = SymQuantLinear(
                child.in_features,
                child.out_features,
                bias=(child.bias is not None),
                weight_bits=weight_bits,
                act_bits=act_bits
            )
            # 重みとバイアスをコピー
            qlinear.weight.data.copy_(child.weight.data)
            if child.bias is not None:
                qlinear.bias.data.copy_(child.bias.data)
            setattr(module, name, qlinear)
        else:
            replace_linear_with_quantizedlinear(child, weight_bits, act_bits)
    return module

# モデルの定義
model = SimpleModel().to(device)
# 通常学習
print('warming up by no-quantized training...')
model = train(model, lr=0.1, epochs=5, device=device)
# Linear層をQuantizedLinearに置換
model_q = replace_linear_with_quantizedlinear(model, weight_bits=4, act_bits=4)
print('quantization aware training...')
model_q = train(model_q, lr=1e-3, epochs=5, device=device)
accuracy = compute_accuracy(model_q, test_loader)

warming up by no-quantized training...
Epoch [1/5], Loss: 0.0736
Epoch [2/5], Loss: 0.0658
Epoch [3/5], Loss: 0.0576
Epoch [4/5], Loss: 0.0513
Epoch [5/5], Loss: 0.0448
quantization aware training...
Epoch [1/5], Loss: 0.0591
Epoch [2/5], Loss: 0.0373
Epoch [3/5], Loss: 0.0331
Epoch [4/5], Loss: 0.0310
Epoch [5/5], Loss: 0.0296
Accuracy: 97.77%


## ベクトル量子化の実行

###  プロセス：k-meansで重み行列における列ベクトルをクラスタリング-->同じクラスの列ベクトルの重み共有-->重み共有しながら再度追加学習



In [None]:
class VQLinear(nn.Module):
    """
    k-means による行ベクトル量子化付き Linear 層
    """
    def __init__(self, linear_layer: nn.Linear, n_clusters_percentage: int):
        super().__init__()
        self.in_features = linear_layer.in_features
        self.out_features = linear_layer.out_features
        # 行ベクトル数 = out_features
        self.n_clusters = max(int(self.out_features * n_clusters_percentage * 0.01), 1)

        # 元の重みとバイアスをコピー
        W = linear_layer.weight.data.clone()  # (out_features, in_features) → 行ベクトル単位で扱う
        self.bias = nn.Parameter(linear_layer.bias.data.clone() if linear_layer.bias is not None else None)

        # k-means クラスタリング（各行を1サンプルとする）
        kmeans = KMeans(n_clusters=self.n_clusters, n_init=10, random_state=0)
        labels = kmeans.fit_predict(W.cpu().numpy())  # 各行ベクトルのクラスタ番号
        centroids = torch.tensor(kmeans.cluster_centers_, dtype=W.dtype, device=W.device)  # (n_clusters, in_features)

        # 各行の重みが属するクラスタインデックス
        self.register_buffer("labels", torch.tensor(labels, dtype=torch.long))
        # 量子化されたクラスタ中心（学習可能にするなら Parameter にしても良い）
        self.centroids = nn.Parameter(centroids)

    def forward(self, x: torch.Tensor):
        # 各クラスタ中心から行ベクトルを再構築
        W_q = self.centroids[self.labels]  # (out_features, in_features)
        return torch.nn.functional.linear(x, W_q, self.bias)


def replace_linear_with_vqlinear(model: nn.Module, n_clusters_percentage: int):
    """
    モデル内の Linear 層を VQLinear に置換
    """
    for name, module in model.named_children():
        if isinstance(module, nn.Linear):
            setattr(model, name, VQLinear(module, n_clusters_percentage))
        else:
            replace_linear_with_vqlinear(module, n_clusters_percentage)
    return model



In [None]:
# モデルの定義
model = SimpleModel().to(device)
# 通常学習
print('warming up by no-quantized training...')
model = train(model, lr=0.1, epochs=5, device=device)
# Linear層をVQLinearに置換 (元々の重みをn_clusters_percentage%の行数+インデックス分のメモリサイズに圧縮)
model_q = replace_linear_with_vqlinear(model, n_clusters_percentage=10)
print('quantization aware training...')
model_q = train(model_q, lr=1e-3, epochs=5, device=device)
accuracy = compute_accuracy(model_q, test_loader)

warming up by no-quantized training...
Epoch [1/5], Loss: 0.0088
Epoch [2/5], Loss: 0.0050
Epoch [3/5], Loss: 0.0032
Epoch [4/5], Loss: 0.0030
Epoch [5/5], Loss: 0.0022
quantization aware training...
Epoch [1/5], Loss: 0.0020
Epoch [2/5], Loss: 0.0016
Epoch [3/5], Loss: 0.0014
Epoch [4/5], Loss: 0.0013
Epoch [5/5], Loss: 0.0012
Accuracy: 98.21%


## 課題

### ・ベクトル量子化とスカラー量子化におけるそれぞれのモデルサイズと精度のトレードオフを調べる
### ・ベクトル量子化したセントロイドにスカラー量子化を適用することで更なる圧縮を試みる