# Machine Unlearning

Can you unlearn something?

Your task here is the following: given a network pre-trained on some data, you want to finetune it to selectively forget a class, and learn a new class.

As an initial approach, you may do the following.

Start with a MNIST classifier pre-trained on a subset of the
digits.

Now replace one of the learned digits, say the class “6”, with a new digit, say “3”.

A possible way to proceed is to identify which weights are more involved in the prediction of class “6”, freeze all the rest, and train with a loss that favors the “3” while penalizing the “6”.

Test this baseline and see whether it brings you anywhere. Are there any pitfalls in this idea? Does it work? Use it as a first line of attack to understand the problem.

Starting from these baseline tests, devise a new unlearning procedure.

You can improve upon this baseline, make up your own idea from scratch, or check the literature to get ideas.

If you use an existing approach, you must add something new, for example by testing it on some new data modality (e.g., audio), by studying more extreme cases, failures, weaknesses, or by making it more efficient, and so on.

In [8]:
# test model
import source.mnist_net as mnist_net
import torch

simple_model = mnist_net.mnist_classifier()
simple_model.load_state_dict(torch.load("data/models/MNIST/simple/weights.pth", weights_only=True))

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
simple_model.to(device)

mnist_classifier(
  (features): Sequential(
    (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), bias=False)
    (1): LeakyReLU(negative_slope=0.01)
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), bias=False)
    (6): LeakyReLU(negative_slope=0.01)
    (7): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
    (8): ReLU()
    (9): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(64, 128, kernel_size=(5, 5), stride=(1, 1), padding=(3, 3), bias=False)
    (11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (12): LeakyReLU(negative_slope=0.01)
    (13): AvgPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0)
    (14): Conv2d(128, 256, kernel_size=(5, 5),

In [11]:
# test simple_model on MNIST
from source.mnist_net import train, test
import torchvision
from torch.utils.data import DataLoader
from torchvision import transforms

transform = transforms.Compose(
    [
        transforms.Grayscale(),
        transforms.ToTensor(),
    ]
)

mnist_test = torchvision.datasets.MNIST(
    root="data/db",
    train=False,
    transform=transform,
    download=True,
)
test_loader = DataLoader(
    mnist_test,
    batch_size=1024,
    shuffle=True,
    num_workers=4,
)

loss_fn = torch.nn.CrossEntropyLoss()

mnist_net.test(simple_model, test_loader, loss_fn, device, True)


ImportError: cannot import name 'test' from 'source.mnist_net' (/home/stefano/Documents/GitHub/Machine_Unlearning/source/mnist_net.py)