Start out with vanilla training of VGG11 on CIFAR10, with some learning rate and number of epochs.
We do this for 2 models - one trained with true labels and the other with random labels.

Notice it usually takes models trained with random labels more times to overfit the training set, so it is recommended to use at least 75 epochs for training.

In [None]:
from copy import deepcopy
from vgg_linear_fit_utils import *
from matplotlib import pyplot as plt
import numpy as np

# Get Model and data
model_name = 'vgg11_bn'  # 'resnet20', 'vgg13_bn',...
epochs = 100
learning_rate = 0.01

true_model = get_model(model_name)

random_model = get_model(model_name)

train_loader = get_dataloader(False, train=True)
random_train_loader = get_dataloader(True, train=True)
test_loader = get_dataloader(False, train=True)



true_model_initialization = deepcopy(get_first_layer_weights(true_model))
random_model_initialization = deepcopy(get_first_layer_weights(random_model))

train_model(true_model, train_loader, test_loader, num_epochs=epochs, lr=learning_rate)
train_model(random_model, random_train_loader, test_loader, num_epochs=epochs, lr=learning_rate)

Calculate the Patch PCA and get the eigenvlues $\lambda_i^2$ and the components $\{u_i\}_{i=1}^p$ for the patch dimension $p$.

In [None]:
images = torch.cat([batch[0] for batch in train_loader])
# compute pca from 10K images, no need for all 50K.
images = images[torch.randperm(images.shape[0])][:10000]
num_images = images.shape[0]

patches = images.unfold(1, 3, 1).unfold(2, 3, 1).unfold(3, 3, 1).to(torch.float64)

pca = torch.pca_lowrank(patches.flatten(-3).flatten(0, -2), q=27)

components = pca[-1]
lambdas = pca[1]
lambda_squared = lambdas**2

Plot the energy profiles of both models against each other. Models trained with random labels usually move less relative to their initialization so it is recommended to subtract the initialization. 


In [None]:
trained_first_layer = get_first_layer_weights(true_model)
random_trained_first_layer = get_first_layer_weights(random_model)

true_profile = calc_energy_profile(trained_first_layer, true_model_initialization,
                                           components, subtract_init=True, normalize=True)
random_profile = calc_energy_profile(random_trained_first_layer, random_model_initialization,
                                           components, subtract_init=True, normalize=True)


corr = np.corrcoef(true_profile, random_profile)[0,1]

# plot the normalized energy profiles
plt.plot(true_profile / true_profile.max(), label="True")
plt.plot(random_profile / random_profile.max(), label="Random")
plt.title(f"True vs Random Profiles\ncorrelation={corr:.2f}")
plt.legend()
plt.show()