In [None]:
import numpy as np
import torch
import time
import tqdm
import pickle

# Load the move dictionary created during the data-processing
MOVE_DICTIONARY = pickle.load(open("./model/move_dictionary.p", "rb"))

# Initialize Dataset and Dataload
## Dataset
Our dataset consists of around 10M moves, played by humans with Lichess ELO of at least 2100, to ensure quality moves, pre-processed in file `data_processing.ipynb`
Our dataset is written in csv files, with just two columns `bitmaps` that represent the game state and `move played` that represents the game played by a human in that game state. 

As our dataset contains a very large number of examples we load it in batches of `NUM_EXAMPLES_TO_LOAD_PER_FETCH`, normally 640k examples at a time.
This is imperative as fetching 1.2M examples will use around 5gb ram, and, since the training was made in a GPU with 6GB of VRAM, we couldn't load the entire dataset at once.

This loaded batch is shuffled to prevent any inherent order or pattern in the data from affecting the training, especially sinced our data consists of moves of games that are read in order.

## Dataloader
Dataloader serves the purpose of fetching the data from our dataset and dividing the examples in batches of `TRAINING_BATCH_SIZE` elements 

In [2]:
from torch.utils.data import DataLoader
from ChessDataset import ChessEvalDataset

DATASET_PATH = '../dataset/processed/test_elite/results_black.csv'
## !IMPORTANT: This dictates how much ram will be used, and how much data will be loaded
# 640_000 loads around 5gb, dont push this too high as it will crash if ram deplects
# NUM_EXAMPLES_TO_LOAD_PER_FETCH = 1_280_000 
NUM_EXAMPLES_TO_LOAD_PER_FETCH = 320_000
TRAINING_BATCH_SIZE = 64

HEADERS = ("bitmaps", "movePlayed")
dataset = ChessEvalDataset(
    file = DATASET_PATH, 
    validation_size = 25_000,
    load_batch_size = NUM_EXAMPLES_TO_LOAD_PER_FETCH,
    headers = HEADERS
)
dataloader = DataLoader(dataset, batch_size=TRAINING_BATCH_SIZE, shuffle=False) # Shuffle is made in the dataset manually

# Model
We use a model, saved in the file `model.py`present in the folder `./model/models/architecture_2Conv/model.py` (this allowed us to have a versioning system of our models) 

The final model consists of two Convolutional layers and two fully connected layers, that receive a 8x8x12 tensor, which represent the game state (8x8 squares, 6 white pieces and 6 black pieces)

And has 1800 outputs, each represent a played move (moves played in our dataset)
We didn't used all possible moves (64x63 moves) because since we didn't have all the possible moves represented in our dataset our model wasn't converging to acceptable values (40% validation accuracy) 

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
import gc
# from models.architecture_batchnorm_2Conv.model import CompleteChessBotNetwork
from model.architecture_2Conv_classes.model import ChessModel as ChessModel
model = ChessModel(len(MOVE_DICTIONARY), kernel_size=7).to(device)

# Training Loop
Our training loop, written in pseudocode:
```
For each epoch:
    For batch in dataloader.get_next_batch():
        bitmaps, expected_moves = batch
        predictions = model.predict(bitmaps)

        loss = CrossEntropyLoss(predictions, expected_moves)
        loss.backpropagation()

    validation_dataset = get_validation_dataset()
    loss, accuracy = evaluate_accuracy(validation_dataset)

    if epoch % 5 == 0:
        save_model(model)
```
We also save the weights of our model every 5 epochs.

## Optimizer and Scheduler
We use, as an optimizer, Adam (Adaptive Moment Estimation) optimizer, which adjust learning rates during training, as it works well with large datasets and complex models because it uses memory efficiently and adapts the learning rate for each parameter automatically.

## Loss Function
As a classification problem, we use Cross Entropy Loss to calculate the loss of each batch

## Model Accuracy Evaluation
To evaluate our model, we extract, in the beginning, 50k examples from the dataset that are never used in the training phase, which allows us to see how well our model generalizes.

In [None]:
NUM_EPOCHS = 60
OUTPUT_PATH = './model/architecture_2Conv_classes/blackOnly_7kernel'

# Continue with pretrained weights
# model.load_state_dict(torch.load("./model/architecture_2Conv_classes/blackOnly_7kernel/epoch-20.pth"))

optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.5)
loss_fn = torch.nn.CrossEntropyLoss()


print(torch.cuda.is_available())
print("Using device: ", device)
with open(f"{OUTPUT_PATH}/training.log", "a+") as f:
    for epoch in range(1, NUM_EPOCHS+1):
        model.train()
        t0 = time.time()
        avg_loss = 0.0
        correct = 0
        for board_tensor, target_eval in (pbar := tqdm.tqdm(dataloader)):        
            board_tensor, target_eval = board_tensor.to(device), target_eval.to(device)  # Move data to GPU
            optimizer.zero_grad()
            pred = model(board_tensor)

            # Compute loss with valid move vlaidaiton
            loss = loss_fn(pred, target_eval.squeeze(1))
            loss.backward()
            
            # Gradient clipping
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            
            optimizer.step()
            avg_loss += loss.item()

            batch_correct = (pred.argmax(dim=1) == target_eval[:, 0]).sum().item()
            correct += batch_correct
            pbar.set_description(f"Batch Accuracy: {batch_correct*100 / (TRAINING_BATCH_SIZE):.2f}%")
        scheduler.step()
        
        # Validation set
        model.eval()
        validation_features, validation_targets = dataset.get_validation_set()
        validation_features = validation_features.to(device)
        validation_targets = validation_targets.to(device)
        with torch.no_grad():
            pred = model(validation_features)
            validation_set_loss = loss_fn(pred, validation_targets.squeeze(1))
            validation_set_correct = (pred.argmax(dim=1) == validation_targets[:, 0]).sum().item()
            validation_set_accuracy = 100 * validation_set_correct / len(validation_targets)
            pred = pred.cpu()
            validation_features = validation_features.cpu()
            validation_targets = validation_targets.cpu()

        accuracy = 100 * correct / (len(dataloader) * TRAINING_BATCH_SIZE)
        tf = time.time()
        f.write(f"Epoch {epoch} - {avg_loss / len(dataloader):.4f} | Training Accuracy: {accuracy:.2f}%| Time: {tf-t0}\n")
        f.write(f"Validation set - accuracy: {validation_set_accuracy:.2f}% | loss: {validation_set_loss:.4f}\n\n")
        f.flush()
        print(f"Epoch {epoch} - {avg_loss / len(dataloader):.4f} | Training Accuracy: {accuracy:.2f}%| Time: {tf-t0}")
        print(f"Validation set - accuracy: {validation_set_accuracy:.2f}% | loss: {validation_set_loss:.4f}\n")

        # Free GPU memory
        del validation_features, validation_targets, validation_set_loss, validation_set_accuracy
        gc.collect()
        torch.cuda.empty_cache()
        
        if epoch % 5 == 0:
            torch.save(model.state_dict(), f"{OUTPUT_PATH}/epoch-{epoch}.pth")

True
Using device:  cuda


Batch Accuracy: 42.19%: 100%|██████████| 157405/157405 [18:48<00:00, 139.49it/s]


Epoch 21 - 1.6497 | Training Accuracy: 49.05%| Time: 1129.2576994895935
Validation set - accuracy: 41.93% | loss: 1.9877



Batch Accuracy: 46.88%: 100%|██████████| 157405/157405 [18:27<00:00, 142.15it/s]


Epoch 22 - 1.6450 | Training Accuracy: 49.18%| Time: 1108.1505796909332
Validation set - accuracy: 42.02% | loss: 1.9964



Batch Accuracy: 46.88%: 100%|██████████| 157405/157405 [18:35<00:00, 141.17it/s]


Epoch 23 - 1.6408 | Training Accuracy: 49.30%| Time: 1115.7941944599152
Validation set - accuracy: 42.02% | loss: 1.9888



Batch Accuracy: 48.44%: 100%|██████████| 157405/157405 [18:07<00:00, 144.75it/s]


Epoch 24 - 1.6369 | Training Accuracy: 49.41%| Time: 1088.2141144275665
Validation set - accuracy: 42.11% | loss: 1.9917



Batch Accuracy: 39.06%: 100%|██████████| 157405/157405 [18:02<00:00, 145.45it/s]


Epoch 25 - 1.6333 | Training Accuracy: 49.49%| Time: 1082.9902493953705
Validation set - accuracy: 42.28% | loss: 1.9992



Batch Accuracy: 51.56%: 100%|██████████| 157405/157405 [17:57<00:00, 146.12it/s]


Epoch 26 - 1.6297 | Training Accuracy: 49.60%| Time: 1077.987300157547
Validation set - accuracy: 41.84% | loss: 1.9976



Batch Accuracy: 50.00%: 100%|██████████| 157405/157405 [19:18<00:00, 135.88it/s] 


Epoch 27 - 1.6267 | Training Accuracy: 49.67%| Time: 1159.1764605045319
Validation set - accuracy: 42.04% | loss: 2.0029



Batch Accuracy: 45.31%: 100%|██████████| 157405/157405 [17:46<00:00, 147.58it/s]


Epoch 28 - 1.6237 | Training Accuracy: 49.75%| Time: 1067.38348031044
Validation set - accuracy: 41.85% | loss: 2.0169



Batch Accuracy: 42.19%: 100%|██████████| 157405/157405 [17:43<00:00, 148.00it/s]


Epoch 29 - 1.6210 | Training Accuracy: 49.83%| Time: 1064.3305485248566
Validation set - accuracy: 42.02% | loss: 2.0055



Batch Accuracy: 62.50%: 100%|██████████| 157405/157405 [18:45<00:00, 139.83it/s] 


Epoch 30 - 1.6184 | Training Accuracy: 49.89%| Time: 1126.4589579105377
Validation set - accuracy: 41.87% | loss: 2.0168



Batch Accuracy: 42.19%: 100%|██████████| 157405/157405 [20:14<00:00, 129.59it/s] 


Epoch 31 - 1.6161 | Training Accuracy: 49.95%| Time: 1215.4824821949005
Validation set - accuracy: 42.14% | loss: 2.0055



Batch Accuracy: 37.50%:  70%|██████▉   | 109763/157405 [13:56<06:12, 127.95it/s] 