**It is recommended to run this notebook on GPU.**

# Initializations

Instructions:

1. Put the files in your google drive.
2. Write down the drive project folder path in the **`PATH`** variable.
3. Run the cells to train the model

In [None]:
PATH = "PATH_TO_Blockudoku-ai_FOLDER" # the path "Blockudoku-ai" folder in google drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import sys
sys.path.append(PATH)
import importlib
import PolicyGradientAgent
import Engine
importlib.reload(PolicyGradientAgent)
importlib.reload(Engine)
csv_path = None

pygame 2.6.0 (SDL 2.28.4, Python 3.10.12)
Hello from the pygame community. https://www.pygame.org/contribute.html


# Pretraining Uniform Policy Network
In this section you can pretrain the policy network to learn the valid actions for each state.
The training objective is reducing the KL-divergence between the output of the network and the uniform distribution accross all valid actions in each state.

In [None]:
import importlib
import PolicyGradientUniform
import Engine
importlib.reload(PolicyGradientUniform)
importlib.reload(Engine)

In [None]:
PRETRAIN_PATH = f"{PATH}/checkpoints/pg_unif/pg_unif_.pth"
game = Engine.Blockudoku()
agent = PolicyGradientUniform.PGUniformAgent(game)
agent.train(100000000, 100, save=True, save_path=PRETRAIN_PATH, lr=0.00001)

# Training Policy Gradient Agent

Optional - set csv to hold the training records

In [None]:
import csv
csv_path = f"{PATH}/checkpoints/records/new_train.csv"

data = ["batch", "steps", "invalids", "reward", "score"]
# Open the file in append mode ('a') and write the data
with open(csv_path, 'w', newline='') as file:
  writer = csv.writer(file)
  writer.writerow(data)  # Write a single row to the CSV file

Load the agent

In [None]:
game = Engine.Blockudoku()
agent = PolicyGradientAgent.PolicyGradientAgent(game)
# load the pretrained model
agent.load_model(PRETRAIN_PATH) # comment this line if you wish to train the model directly without the pretraining phase

Option 1: Train the agent using gradualy discount factor increment strategy

In [None]:
gamma_stepsize = 0.1
agent.set_epsilon(0.2)

for lesson in range(5, 20):
  discount_factor = gamma_stepsize * lesson
  if discount_factor >= 0.99: break
  print(f"Training with discount factor: {discount_factor}")
  agent.set_gamma(discount_factor)
  # agent.train(20001, 100, render=False, save=True,
  agent.train(50001, 100, render=False, save=True,
              save_path=f"{PATH}/checkpoints/new_train/pg_model_lesson_{lesson}.pth",
              lr=0.00001,
              csv_path=csv_path)

Option 2: Train the agent using constant discount factor.

In [None]:
agent.set_epsilon(0.2)
agent.set_gamma(0.85)

agent.train(10000000, 100,
            render=False, save=True,
            save_path=f"{PATH}/checkpoints/new_train/pg_model_.pth",
            lr=0.00001,
            csv_path=csv_path)