This notebook contains a simple example of how to compute influence functions
for deep neural networks, following the method described in [[KH17]](https://proceedings.mlr.press/v70/koh17a.html). We have

based our implementation on [this
repositoty](https://github.com/alstonlo/torch-influence/), and have introduced
only minor changes aimed at highlighting the main idea behind the method.

In [1]:
%cd /mnt/xfs/home/krisgrg/projects/tutorial-md/code/

/mnt/xfs/home/krisgrg/projects/tutorial-md/code


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [2]:
import torch
from tqdm.auto import tqdm
import scipy.sparse.linalg as L


# let's abstract away the "boring" parts in utils.py
from utils import get_model, get_loader

model = get_model()
# we'll use a smaller training set containing only samples from the cat & dog classes
train_loader = get_loader(split="train", batch_size=500)
val_loader = get_loader(split="val")

Files already downloaded and verified
Files already downloaded and verified


Suppose we have a trained model already. Let's load it.

For convenience, we're also adding the code to train this model from scratch.

In [3]:
sd = torch.load("./artifacts/model_0.pt")
model.load_state_dict(sd)

# training from scratch (in case you want to regenerate the above checkpoints yourself)
want_to_retrain = False
if want_to_retrain:
    from utils import train
    model = train(model, train_loader)

Now we are going to compute the gradient wrt the loss of our target example.

In [4]:
# let's use the first sample in the validation set as the target sample
x_target, y_target = next(iter(val_loader))
x_target = x_target[0:1].to("cuda")
y_target = y_target[0:1].to("cuda")


loss = torch.nn.CrossEntropyLoss(reduction="none")
l = loss(model(x_target), y_target)
phi = torch.autograd.grad(l, model.parameters(), retain_graph=True)
phi = torch.cat([p.flatten() for p in phi])  # flatten the gradients into a single vector

Next comes the critical part: we will compute the matrix-vector product `stest` (using the naming convention introduced in [KH17]) via the conjugate gradient method.

In [5]:
# params = self._model_make_functional()
damp = 0.001  # damping factor, i.e., regularization for the Hessian
loss = torch.nn.CrossEntropyLoss()
flat_params = torch.cat([p.flatten() for p in model.parameters()])

def hvp_fn(v):
    v = torch.tensor(v, requires_grad=False, device="cuda")

    hvp = 0.0
    for batch in train_loader:
        def f(params_):
            return loss(model(batch[0].to("cuda")), batch[1].to("cuda")) + 1e-4 * torch.square(params_.norm())
        hvp_batch = torch.autograd.functional.hvp(f, flat_params, v)[1]
        batch_size = batch[0].shape[0]
        hvp = hvp + hvp_batch.detach() * batch_size

    hvp = hvp / len(train_loader.dataset)
    hvp = hvp + damp * v

    return hvp.cpu().numpy()

d = phi.shape[0]
linop = L.LinearOperator((d, d), matvec=hvp_fn)
stest = L.cg(A=linop, b=phi.cpu().numpy(), atol=1e-8, maxiter=1000)[0]
stest = torch.from_numpy(stest).to(torch.float32).to("cuda")


Finally, we need to calculate the gradients of the training samples and compute the influence function using them, together with `stest`.

In [6]:
scores = []

train_loader_single_samples = get_loader(split="train", batch_size=1, indices=list(range(500)))
for batch in tqdm(train_loader_single_samples):
    outputs = model(batch[0].to("cuda"))
    l = loss(outputs, batch[1].to("cuda")) + 1e-4 * torch.square(flat_params.norm())
    grad_z = torch.autograd.grad(l, flat_params)[0]
    scores.append(grad_z @ stest)

scores = torch.stack(scores)
scores.shape

Files already downloaded and verified


  0%|          | 0/500 [00:00<?, ?it/s]

torch.Size([500])

That's it!