# Learning The Game of Life

This notebook trains a CNN to learn Conway's game of life. The CNN is defined in `GOLCNN.py` and the dataset it is trained on is generated by `GOLDataset.py`. It trains pairs of networks on the same sequence of data. Here are what the several parameters throughout the notebook mean:
- `seed` allows you to specify a seed for any random number generators used. This ensures reproducibility of the training sequence.
- `name1` is the name of the first CNN. Providing a name is important because the model checkpoints and logs are saved using this name.
- `name2` is the name of the second CNN.
- `dataset_size` specifies the number of examples provided in each training epoch. The total number of examples the models see during training is `epochs` times `dataset_size`.
- `datapoint_size` is the size of each individual training data point. Since we are learning the game of life, each training point is a 2d array. By default, this value is 32 (indicating the data points are 32 by 32 arrays). Unfortunately, the training procedure does not seem to respond well when this value is less than 32 in practice. Hopefully this bug can be resolved.
- `learning_rate` is the learning rate passed on to the optimizer in training the CNN.
- `epochs` is the number of times to run the training sequence during the first training era. One epoch corresponds to `dataset_size` number of examples seen.
- `era2epochs` is the number of times to run the training sequence during the second training era. The second training era is for fine tuning and the `learning_rate` is set to 0.1 times the original `learning_rate`.
- `checkpoint_rate` is the interval at which we evaluate the model using the test routine and record the accuracy data. Additionally, a checkpoint of the model is saved for each interval.
- `m` This is the overparameterization factor for the CNN defined originally by the Kenyon and Springer paper. Effectively, it is the number of times more parameters more than the minimal viable amount of parameters to learn the game of life.
- `n` This is also a parameter of the CNN defined by the Kenyon and Springer paper. It represents the number of steps of the Game of Life the CNN attempts to simulate in one pass. 

Model checkpoints are saved at `./models/` and logs of epochs, loss, accuracy, etc are saved at `./logs/`.

In [1]:
import numpy as np
import torch
import torch.nn as nn
import pandas as pd

from GOLDataset import generateDataset
from GOLCNN import OPNet, train_epoch, test_model
from MinimalSolution import MinNet

device = "cuda"

In [2]:
# Seed everything for reproducibility
# seed = 11 for carl m=16, n=2
# seed = 12 for ethan m=8, n=2
# seed = 13 for greg m=8, n=2
# seed = 20 for lionel and melissa m=8, n=2
seed = 20
np.random.seed(seed)
torch.manual_seed(seed)

<torch._C.Generator at 0x7f9be5725e10>

In [1]:
name1 = "amber"
name2 = "brian"

In [3]:
# Ensure test_model() works on the minimal solution CNN
dataset_size = 1000
dataloader = generateDataset(dataSetSize=dataset_size, size=32, n_steps=3)
min_model = MinNet(3)
min_model.to(device)
criterion = nn.MSELoss()
acc, epoch_test_loss, num_correct, num_wrong = test_model(min_model, dataloader, 1, criterion)
print(f'Accuracy: {acc}, Test Loss: {epoch_test_loss}, Correct: {num_correct}/{dataset_size}, Incorrect: {num_wrong}/{dataset_size}')

Accuracy: 1.0, Test Loss: 3.1845403063913623e-18, Correct: 1000/1000, Incorrect: 0/1000


In [4]:
# Data parameters
dataset_size = 1000
datapoint_size = 32

# Training Parameters
learning_rate = 1e-3
epochs = 1500
era2epochs = 0
checkpoint_rate = 100

m = 8 # Overparameterization Factor
n = 2  # Steps of GOL simulation

model1 = OPNet(m, n)
model2 = OPNet(m, n)

criterion1 = nn.MSELoss()
criterion2 = nn.MSELoss()
optimizer1 = torch.optim.SGD(model1.parameters(), lr=learning_rate)
optimizer2 = torch.optim.SGD(model2.parameters(), lr=learning_rate)

In [5]:
model1.to(device)
model2.to(device)
print('models loaded to device')

models loaded to device


In [6]:
full_data1 = []
full_data2 = []
checkpoint_data1 = []
checkpoint_data2 = []

for t in range(1, epochs + 1):
    dataloader = generateDataset(dataSetSize=dataset_size, 
                                 size=datapoint_size, 
                                 n_steps=n)
    
    epoch_train_loss1 = train_epoch(model1, optimizer1, criterion1, dataloader, m)
    full_data1.append([t, epoch_train_loss1])
    
    epoch_train_loss2 = train_epoch(model2, optimizer2, criterion2, dataloader, m)
    full_data2.append([t, epoch_train_loss2])
    
    if t % checkpoint_rate == 0:
        print(f'Epoch: {t}')
        
        acc1, epoch_test_loss1, num_correct1, num_wrong1 = test_model(model1, dataloader, m, criterion1)
        checkpoint_name1 = f'{name1}_m{m}_n{n}_checkpoint{t}.pt'
        checkpoint_data1.append([t, checkpoint_name1, acc1, epoch_test_loss1, num_correct1, num_wrong1])
        print(f'{name1}: Epoch: {t}/{epochs}, Test Loss: {epoch_test_loss1}, Incorrect: {num_wrong1}/1000 examples')
        torch.save(model1, f'./models/{checkpoint_name1}')
        
        acc2, epoch_test_loss2, num_correct2, num_wrong2 = test_model(model2, dataloader, m, criterion2)
        checkpoint_name2 = f'{name2}_m{m}_n{n}_checkpoint{t}.pt'
        checkpoint_data2.append([t, checkpoint_name2, acc2, epoch_test_loss2, num_correct2, num_wrong2])
        print(f'{name2}: Epoch: {t}/{epochs}, Test Loss: {epoch_test_loss2}, Incorrect: {num_wrong2}/1000 examples')
        torch.save(model2, f'./models/{checkpoint_name2}')
        
print("END OF ERA 1")

optimizer1 = torch.optim.SGD(model1.parameters(), lr=learning_rate*0.1)
optimizer2 = torch.optim.SGD(model2.parameters(), lr=learning_rate*0.1)

for t in range(epochs + 1, epochs+era2epochs+1):
    dataloader = generateDataset(dataSetSize=dataset_size, 
                                 size=datapoint_size, 
                                 n_steps=n)
    
    epoch_train_loss1 = train_epoch(model1, optimizer1, criterion1, dataloader, m)
    full_data1.append([t, epoch_train_loss1])
    
    epoch_train_loss2 = train_epoch(model2, optimizer2, criterion2, dataloader, m)
    full_data2.append([t, epoch_train_loss2])
    
    if t % checkpoint_rate == 0:
        print(f'Epoch: {t}')

        acc1, epoch_test_loss1, num_correct1, num_wrong1 = test_model(model1, dataloader, m, criterion1)
        checkpoint_name1 = f'{name1}_m{m}_n{n}_checkpoint{t}.pt'
        checkpoint_data1.append([t, checkpoint_name1, acc1, epoch_test_loss1, num_correct1, num_wrong1])
        print(f'{name1}: Epoch: {t}/{epochs+era2epochs}, Test Loss: {epoch_test_loss1}, Incorrect: {num_wrong1}/1000 examples')
        torch.save(model1, f'./models/{checkpoint_name1}')
        
        acc2, epoch_test_loss2, num_correct2, num_wrong2 = test_model(model2, dataloader, m, criterion2)
        checkpoint_name2 = f'{name2}_m{m}_n{n}_checkpoint{t}.pt'
        checkpoint_data2.append([t, checkpoint_name2, acc2, epoch_test_loss2, num_correct2, num_wrong2])
        print(f'{name2}: Epoch: {t}/{epochs+era2epochs}, Test Loss: {epoch_test_loss2}, Incorrect: {num_wrong2}/1000 examples')
        torch.save(model2, f'./models/{checkpoint_name2}')
        
print("END OF ERA 2")
print("DONE!")

lionel: Epoch: 100/1500, Test Loss: 0.18643927574157715, Incorrect: 1000/1000 examples
melissa: Epoch: 100/1500, Test Loss: 0.1789475381374359, Incorrect: 1000/1000 examples
lionel: Epoch: 200/1500, Test Loss: 0.17555750906467438, Incorrect: 1000/1000 examples
melissa: Epoch: 200/1500, Test Loss: 0.1678985357284546, Incorrect: 1000/1000 examples
lionel: Epoch: 300/1500, Test Loss: 0.16865594685077667, Incorrect: 1000/1000 examples
melissa: Epoch: 300/1500, Test Loss: 0.16071560978889465, Incorrect: 1000/1000 examples
lionel: Epoch: 400/1500, Test Loss: 0.16200457513332367, Incorrect: 1000/1000 examples
melissa: Epoch: 400/1500, Test Loss: 0.15792113542556763, Incorrect: 1000/1000 examples
lionel: Epoch: 500/1500, Test Loss: 0.14897002279758453, Incorrect: 1000/1000 examples
melissa: Epoch: 500/1500, Test Loss: 0.14863507449626923, Incorrect: 1000/1000 examples
lionel: Epoch: 600/1500, Test Loss: 0.1362263560295105, Incorrect: 1000/1000 examples
melissa: Epoch: 600/1500, Test Loss: 0.13

In [7]:
df_full_data1 = pd.DataFrame(full_data1, columns =['epoch', 'training_loss'])
df_full_data2 = pd.DataFrame(full_data2, columns =['epoch', 'training_loss'])

df_checkpoint_data1 = pd.DataFrame(checkpoint_data1, columns =['epoch', 'checkpoint_name', 'accuracy', 'test_loss', 'num_correct', 'num_wrong'])
df_checkpoint_data2 = pd.DataFrame(checkpoint_data2, columns =['epoch', 'checkpoint_name', 'accuracy', 'test_loss', 'num_correct', 'num_wrong'])

In [8]:
df_full_data1.to_csv(f'./logs/{name1}_full_data.csv')
df_full_data2.to_csv(f'./logs/{name2}_full_data.csv')

df_checkpoint_data1.to_csv(f'./logs/{name1}_checkpoint_data.csv')
df_checkpoint_data2.to_csv(f'./logs/{name2}_checkpoint_data.csv')