# Fine Tuning The Game of Life

The experiement to be ran here is the following. We will take `ethan_m8_n2_checkpoint1000.pt` as our base model since the `ethan` model converged to 0/1000 examples wrong within 200 training epochs after the 1000th epoch checkpoint. However, the model still gets 1000/1000 examples wrong at this checkpoint, indicating that it is close to a viable solution but not quite at one. We will copy this model three times and perturb the first layer weights slightly. Then, we will fine tune for another 1000 epochs and see the resulting models. The current hypothesis is that the behavior of the intermediary step is entirely dependent on the example data seen. So two of the perturbed models will fine tune on the same data. The last model will fine tune on different data. That way, we can hopefully answer the following questions:

1) Is the intermediary behavior dependent on the data scene?
2) Will enough variability in the intermediary layer be introduced by fine tuning a model very close to convergence?
3) Is a model that is close to convergence still close to converging after slight perturbations? 

In [1]:
import numpy as np
import torch
import torch.nn as nn
import pandas as pd

from GOLDataset import generateDataset
from GOLCNN import OPNet, train_epoch, test_model
from MinimalSolution import MinNet

device = "cuda"

In [2]:
# Seed everything for reproducibility
# seed = 14 for irene, jeff, karol m=8, n=2
seed = 14
np.random.seed(seed)
torch.manual_seed(seed)

<torch._C.Generator at 0x7fee9fda0db0>

In [None]:
name1 = 'irene'
name2 = 'jeff'
name3 = 'karol'

In [3]:
# Data parameters
dataset_size = 1000
datapoint_size = 32

# Training Parameters
learning_rate = 1e-3
perturbation_coefficient = 1e-2
batch_size_param = 1
checkpoint_epochs = 1000
epochs = 500
checkpoint_rate = 100

m = 8 # Overparameterization Factor
n = 2  # Steps of GOL simulation

In [4]:
model1 = torch.load(f'./models/ethan_m8_n2_checkpoint1000.pt')
model2 = torch.load(f'./models/ethan_m8_n2_checkpoint1000.pt')
model3 = torch.load(f'./models/ethan_m8_n2_checkpoint1000.pt')

criterion1 = nn.MSELoss()
criterion2 = nn.MSELoss()
criterion3 = nn.MSELoss()

optimizer1 = torch.optim.SGD(model1.parameters(), lr=learning_rate)
optimizer2 = torch.optim.SGD(model2.parameters(), lr=learning_rate)
optimizer3 = torch.optim.SGD(model3.parameters(), lr=learning_rate)

print('model checkpoints loaded successfully')

model checkpoints loaded successfully


In [5]:
# Check the integrity of loaded model
# Expected loss around 0.05
# Expected number Incorrect around 1000/1000
dataloader = generateDataset(dataSetSize=dataset_size, size=32, n_steps=2)
acc, epoch_test_loss, num_correct, num_wrong = test_model(model1, dataloader, m, criterion1)
print(f'{name1}: Accuracy: {acc}, Test Loss: {epoch_test_loss}, Correct: {num_correct}/{dataset_size}, Incorrect: {num_wrong}/{dataset_size}')

Irene: Accuracy: 0.0, Test Loss: 0.05050164833664894, Correct: 0/1000, Incorrect: 1000/1000


In [6]:
# Perturb first layer model weights
fl_weight_shape = model1.type1_layers[0].weight.shape
random_tensor = torch.rand(fl_weight_shape).to(device)
perturbation_tensor = (random_tensor - 0.5)*(perturbation_coefficient)
model1.type1_layers[0].weight.data += perturbation_tensor

random_tensor2 = torch.rand(fl_weight_shape).to(device)
perturbation_tensor2 = (random_tensor2 - 0.5)*(perturbation_coefficient)
model2.type1_layers[0].weight.data += perturbation_tensor2

random_tensor3 = torch.rand(fl_weight_shape).to(device)
perturbation_tensor3 = (random_tensor3 - 0.5)*(perturbation_coefficient)
model3.type1_layers[0].weight.data += perturbation_tensor3

In [7]:
# Check the integrity of perturbed models
# Expected loss around 0.0523
# Expected number Incorrect around 1000/1000
dataloader = generateDataset(dataSetSize=dataset_size, size=32, n_steps=2)
acc, epoch_test_loss, num_correct, num_wrong = test_model(model1, dataloader, m, criterion1)
print(f'{name1}: Accuracy: {acc}, Test Loss: {epoch_test_loss}, Correct: {num_correct}/{dataset_size}, Incorrect: {num_wrong}/{dataset_size}')

acc, epoch_test_loss, num_correct, num_wrong = test_model(model2, dataloader, m, criterion2)
print(f'{name2}: Accuracy: {acc}, Test Loss: {epoch_test_loss}, Correct: {num_correct}/{dataset_size}, Incorrect: {num_wrong}/{dataset_size}')

acc, epoch_test_loss, num_correct, num_wrong = test_model(model3, dataloader, m, criterion3)
print(f'{name3}: Accuracy: {acc}, Test Loss: {epoch_test_loss}, Correct: {num_correct}/{dataset_size}, Incorrect: {num_wrong}/{dataset_size}')

Irene: Accuracy: 0.0, Test Loss: 0.05237091705203056, Correct: 0/1000, Incorrect: 1000/1000
Jeff: Accuracy: 0.0, Test Loss: 0.05266406759619713, Correct: 0/1000, Incorrect: 1000/1000
Karol: Accuracy: 0.0, Test Loss: 0.05132685601711273, Correct: 0/1000, Incorrect: 1000/1000


In [8]:
# Training Sequence
full_data1 = []
full_data2 = []
full_data3 = []

checkpoint_data1 = []
checkpoint_data2 = []
checkpoint_data3 = []

for t in range(checkpoint_epochs, checkpoint_epochs + epochs + 1):
    dataloader = generateDataset(dataSetSize=dataset_size, 
                                 size=datapoint_size, 
                                 n_steps=n)
    
    epoch_train_loss1 = train_epoch(model1, optimizer1, criterion1, dataloader, m)
    full_data1.append([t, epoch_train_loss1])
    
    epoch_train_loss2 = train_epoch(model2, optimizer2, criterion2, dataloader, m)
    full_data2.append([t, epoch_train_loss2])
    
    dataloader2 = generateDataset(dataSetSize=dataset_size, 
                                 size=datapoint_size, 
                                 n_steps=n)
    
    epoch_train_loss3 = train_epoch(model3, optimizer3, criterion3, dataloader2, m)
    full_data3.append([t, epoch_train_loss3])
    
    if t % checkpoint_rate == 0:
        print(f'Epoch: {t}')
              
        acc1, epoch_test_loss1, num_correct1, num_wrong1 = test_model(model1, dataloader, m, criterion1)
        checkpoint_name1 = f'{name1}_m{m}_n{n}_checkpoint{t}.pt'
        checkpoint_data1.append([t, checkpoint_name1, acc1, epoch_test_loss1, num_correct1, num_wrong1])
        print(f'{name1}: Epoch: {t}/{checkpoint_epochs + epochs}, Test Loss: {epoch_test_loss1}, Incorrect: {num_wrong1}/1000 examples')
        torch.save(model1, f'./models/{checkpoint_name1}')
        
        acc2, epoch_test_loss2, num_correct2, num_wrong2 = test_model(model2, dataloader, m, criterion2)
        checkpoint_name2 = f'{name2}_m{m}_n{n}_checkpoint{t}.pt'
        checkpoint_data2.append([t, checkpoint_name2, acc2, epoch_test_loss2, num_correct2, num_wrong2])
        print(f'{name2}: Epoch: {t}/{checkpoint_epochs + epochs}, Test Loss: {epoch_test_loss2}, Incorrect: {num_wrong2}/1000 examples')
        torch.save(model2, f'./models/{checkpoint_name2}')
        
        acc3, epoch_test_loss3, num_correct3, num_wrong3 = test_model(model3, dataloader, m, criterion3)
        checkpoint_name3 = f'{name3}_m{m}_n{n}_checkpoint{t}.pt'
        checkpoint_data3.append([t, checkpoint_name3, acc3, epoch_test_loss3, num_correct3, num_wrong3])
        print(f'{name3}: Epoch: {t}/{checkpoint_epochs + epochs}, Test Loss: {epoch_test_loss3}, Incorrect: {num_wrong3}/1000 examples')
        torch.save(model3, f'./models/{checkpoint_name3}')

Epoch: 1000
Irene: Epoch: 1000/1500, Test Loss: 0.05025511234998703, Incorrect: 1000/1000 examples
Jeff: Epoch: 1000/1500, Test Loss: 0.05021347105503082, Incorrect: 1000/1000 examples
Karol: Epoch: 1000/1500, Test Loss: 0.05021686479449272, Incorrect: 1000/1000 examples
Epoch: 1100
Irene: Epoch: 1100/1500, Test Loss: 0.0018787500448524952, Incorrect: 51/1000 examples
Jeff: Epoch: 1100/1500, Test Loss: 0.0018816253868862987, Incorrect: 51/1000 examples
Karol: Epoch: 1100/1500, Test Loss: 0.0018658285262063146, Incorrect: 57/1000 examples
Epoch: 1200
Irene: Epoch: 1200/1500, Test Loss: 0.0005328803090378642, Incorrect: 0/1000 examples
Jeff: Epoch: 1200/1500, Test Loss: 0.0005357351037673652, Incorrect: 0/1000 examples
Karol: Epoch: 1200/1500, Test Loss: 0.0005295646260492504, Incorrect: 0/1000 examples
Epoch: 1300
Irene: Epoch: 1300/1500, Test Loss: 0.00026804316439665854, Incorrect: 0/1000 examples
Jeff: Epoch: 1300/1500, Test Loss: 0.0002687508531380445, Incorrect: 0/1000 examples
Kar

In [9]:
df_full_data1 = pd.DataFrame(full_data1, columns =['epoch', 'training_loss'])
df_full_data2 = pd.DataFrame(full_data2, columns =['epoch', 'training_loss'])
df_full_data3 = pd.DataFrame(full_data3, columns =['epoch', 'training_loss'])

df_checkpoint_data1 = pd.DataFrame(checkpoint_data1, columns =['epoch', 'checkpoint_name', 'accuracy', 'test_loss', 'num_correct', 'num_wrong'])
df_checkpoint_data2 = pd.DataFrame(checkpoint_data2, columns =['epoch', 'checkpoint_name', 'accuracy', 'test_loss', 'num_correct', 'num_wrong'])
df_checkpoint_data3 = pd.DataFrame(checkpoint_data3, columns =['epoch', 'checkpoint_name', 'accuracy', 'test_loss', 'num_correct', 'num_wrong'])

In [10]:
df_full_data1.to_csv(f'./logs/{name1}_full_data.csv')
df_full_data2.to_csv(f'./logs/{name2}_full_data.csv')
df_full_data3.to_csv(f'./logs/{name3}_full_data.csv')

df_checkpoint_data1.to_csv(f'./logs/{name1}_checkpoint_data.csv')
df_checkpoint_data2.to_csv(f'./logs/{name2}_checkpoint_data.csv')
df_checkpoint_data3.to_csv(f'./logs/{name3}_checkpoint_data.csv')