## KMNIST Attack Model
This notebook contains code for a CNN classifier on KMNIST dataset.

In [1]:
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

kmnist = datasets.KMNIST("data", train=True, download=True, transform=transforms.Compose([transforms.ToTensor()]))

Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/train-images-idx3-ubyte.gz
Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/train-images-idx3-ubyte.gz to data/KMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/18165135 [00:00<?, ?it/s]

Extracting data/KMNIST/raw/train-images-idx3-ubyte.gz to data/KMNIST/raw

Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/train-labels-idx1-ubyte.gz
Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/train-labels-idx1-ubyte.gz to data/KMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29497 [00:00<?, ?it/s]

Extracting data/KMNIST/raw/train-labels-idx1-ubyte.gz to data/KMNIST/raw

Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/t10k-images-idx3-ubyte.gz
Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/t10k-images-idx3-ubyte.gz to data/KMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/3041136 [00:00<?, ?it/s]

Extracting data/KMNIST/raw/t10k-images-idx3-ubyte.gz to data/KMNIST/raw

Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/t10k-labels-idx1-ubyte.gz
Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/t10k-labels-idx1-ubyte.gz to data/KMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5120 [00:00<?, ?it/s]

Extracting data/KMNIST/raw/t10k-labels-idx1-ubyte.gz to data/KMNIST/raw



In [2]:
kmnist_loader = DataLoader(kmnist, batch_size=len(kmnist))

In [3]:
kmnist_data = next(iter(kmnist_loader))[0]

In [4]:
kmnist_data.shape

torch.Size([60000, 1, 28, 28])

In [5]:
KMNIST_MEAN = kmnist_data.mean()
KMNIST_STD = kmnist_data.std()
KMNIST_MEAN, KMNIST_STD

(tensor(0.1918), tensor(0.3483))

In [8]:
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

train_loader = DataLoader(datasets.KMNIST("data", train=True, download=True, transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((KMNIST_MEAN,), (KMNIST_STD,))
    ])), batch_size=128, shuffle=True, num_workers=32)
val_loader = DataLoader(datasets.KMNIST("data", train=False, download=True, transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((KMNIST_MEAN,), (KMNIST_STD,))
    ])), batch_size=128, shuffle=False, num_workers=32)

In [7]:
import pytorch_lightning as pl
import torchmetrics
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from optimizee.mnist import KMnistConvModel

class KMNISTClassifier(pl.LightningModule):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.save_hyperparameters()
        self.model = KMnistConvModel()
        self.criterion = nn.NLLLoss()
        self.metrics = {
            "accuracy": {
                "train": torchmetrics.Accuracy(),
                "val": torchmetrics.Accuracy()
            }
        }

    def step(self, batch, step_name="train"):
        X, y = batch
        outputs = self.model(X)
        loss = self.criterion(outputs, y)
        preds = self.forward(X)
        metric = self.metrics["accuracy"][step_name]
        metric.update(preds.cpu(), y.cpu())
        metric_val = metric.compute()
        self.log(f"{step_name}_loss", loss, on_epoch=True)
        self.log(f"{step_name}_accuracy", metric_val, on_epoch=True)
        return loss

    def forward(self, X, *args):
        return self.model(X)

    def training_step(self, batch, batch_idx):
        return self.step(batch, "train")
    
    def validation_step(self, batch, batch_idx):
        return self.step(batch, "val")

    def predict_step(self, batch, batch_idx):
        X, y = batch
        return self.forward(X)

    def configure_optimizers(self):
        optimizer = optim.Adam(self.model.parameters(), lr=self.hparams.lr)
        return optimizer

In [9]:
import wandb
from pytorch_lightning.loggers import WandbLogger

NUM_EPOCHS = 10

wandb_logger = WandbLogger(project="optml-project", name=f"kmnist")

model = KMNISTClassifier(lr=1e-4)
trainer = pl.Trainer(default_root_dir="models/kmnist", max_epochs=NUM_EPOCHS, logger=wandb_logger, accelerator="gpu")
trainer.fit(model, train_dataloaders=train_loader, val_dataloaders=val_loader)
wandb.finish()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mmismayil[0m. Use [1m`wandb login --relogin`[0m to force relogin


GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type            | Params
----------------------------------------------
0 | model     | KMnistConvModel | 431 K 
1 | criterion | NLLLoss         | 0     
----------------------------------------------
431 K     Trainable params
0         Non-trainable params
431 K     Total params
1.724     Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▁▁▁▂▂▂▂▃▃▃▃▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▆▆▆▆▇▇▇▇████
train_accuracy_epoch,▁▅▆▇▇▇████
train_accuracy_step,▁▃▄▅▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇████████████████████
train_loss_epoch,█▃▂▂▂▁▁▁▁▁
train_loss_step,█▅▄▃▃▂▂▂▂▂▂▂▂▁▁▁▁▂▂▂▁▁▁▂▂▁▁▁▁▁▁▁▁▂▁▁▁▂▁▁
trainer/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
val_accuracy,▁▄▅▆▆▇▇▇██
val_loss,█▅▄▃▃▂▂▂▁▁

0,1
epoch,9.0
train_accuracy_epoch,0.93996
train_accuracy_step,0.94183
train_loss_epoch,0.05874
train_loss_step,0.06235
trainer/global_step,4689.0
val_accuracy,0.88006
val_loss,0.2326


Save the model

In [10]:
from collections import OrderedDict
kmnist_model_dict = OrderedDict({name.replace("model.", ""): parameter for name, parameter in model.state_dict().items()})
torch.save(kmnist_model_dict, "ckpt/attack_model/kmnist_cnn.pt")

Predict and save indices for correctly classified examples

In [11]:
preds = trainer.predict(model, val_loader)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Exception in thread SockSrvRdThr:
Traceback (most recent call last):
  File "/root/.conda/envs/optml/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/root/.conda/envs/optml/lib/python3.8/site-packages/wandb/sdk/service/server_sock.py", line 113, in run
    shandler(sreq)
  File "/root/.conda/envs/optml/lib/python3.8/site-packages/wandb/sdk/service/server_sock.py", line 172, in server_record_publish
    iface = self._mux.get_stream(stream_id).interface
  File "/root/.conda/envs/optml/lib/python3.8/site-packages/wandb/sdk/service/streams.py", line 186, in get_stream
    stream = self._streams[stream_id]
KeyError: '2w4teydj'


Predicting: 469it [00:00, ?it/s]

In [12]:
preds = torch.cat(preds)

In [13]:
preds.shape

torch.Size([10000, 10])

In [14]:
preds = preds.argmax(dim=1)

In [15]:
val_targets = []

for _, y in val_loader:
    val_targets.append(y)

val_targets = torch.cat(val_targets)

In [16]:
val_targets.shape

torch.Size([10000])

In [17]:
(preds == val_targets).sum() / len(val_targets)

tensor(0.9334)

In [18]:
correct_indices = torch.where(preds == val_targets)[0]

In [19]:
import numpy as np

with open("data/kmnist_correct/label_correct_index.npy", "wb") as f:
    np.save(f, correct_indices.numpy())