# **EECS 492 HW4**

#**Setup**

When running this cell, give this script permission so that you can mount this Colab notebook to your Google Drive.:

In [2]:
from google.colab import drive
from google.colab import runtime
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [3]:
%cd gdrive/MyDrive/EECS492HW4Coding/alpha-zero-general/

/content/gdrive/MyDrive/EECS492HW4Coding/alpha-zero-general


Clone the EECS 492 Git Repository so that we can grab the necessary files to work with! To see where this is in our Google Colab directory, click the folder icon on the left side of the notebook.

In [None]:
!git clone https://github.com/saumit01/eecs492-hw3.git


Run the following code to import the modules you'll need. 

In [5]:
!pip3 install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting absl-py==1.1.0
  Downloading absl_py-1.1.0-py3-none-any.whl (123 kB)
[K     |████████████████████████████████| 123 kB 8.9 MB/s 
Collecting cachetools==5.2.0
  Downloading cachetools-5.2.0-py3-none-any.whl (9.3 kB)
Collecting certifi==2022.5.18.1
  Downloading certifi-2022.5.18.1-py3-none-any.whl (155 kB)
[K     |████████████████████████████████| 155 kB 47.4 MB/s 
[?25hCollecting charset-normalizer==2.0.12
  Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Collecting coloredlogs==15.0.1
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
[K     |████████████████████████████████| 46 kB 4.3 MB/s 
Collecting google-auth==2.8.0
  Downloading google_auth-2.8.0-py2.py3-none-any.whl (164 kB)
[K     |████████████████████████████████| 164 kB 58.3 MB/s 
Collecting grpcio==1.46.3
  Downloading grpcio-1.46.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.

This cell will load all the necessary libraries and set a seed so we can run the algorithm deterministically for grading

In [4]:
import logging

import coloredlogs

from Coach import Coach
from othello.OthelloGame import OthelloGame as Game
from othello.pytorch.NNet import NNetWrapper as nn
from utils import *

import torch
import random
import numpy as np

seed = 492
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

This cell specifies the hyperparameters used for the algorithm. Don't modify the hyperparameters until you have recieved all points on GradesScope. After that, feel free to optimize the hyperparameters (this is completely optional)!

In [5]:
log = logging.getLogger(__name__)

coloredlogs.install(level='INFO')  # Change this to DEBUG to see more info.

# the only hyperparameters I modified from the original are numIters, numEps, and arenaCompare
# feel free to tweak them once your code is working
# original values for reference: numIters = 1000, numEps = 100, arenaCompare = 25
args = dotdict({
    'numIters': 2,
    'numEps': 2,              # Number of complete self-play games to simulate during a new iteration.
    'tempThreshold': 15,
    'updateThreshold': 0.6,     # During arena playoff, new neural net will be accepted if threshold or more of games are won.
    'maxlenOfQueue': 200000,    # Number of game examples to train the neural networks.
    'numMCTSSims': 10,          # Number of games moves for MCTS to simulate.
    'arenaCompare': 15,         # Number of games to play during arena play to determine if new net will be accepted.
    'cpuct': 1,

    'checkpoint': './temp/',
    'load_model': False,
    'load_folder_file': ('/dev/models/8x100x50','best.pth.tar'),
    'numItersForTrainExamplesHistory': 20,

})

This cell initializes all the necessary objects to start training

In [6]:
log.info('Loading %s...', Game.__name__)
g = Game(6)

log.info('Loading %s...', nn.__name__)
nnet = nn(g)

if args.load_model:
    log.info('Loading checkpoint "%s/%s"...', args.load_folder_file[0], args.load_folder_file[1])
    nnet.load_checkpoint(args.load_folder_file[0], args.load_folder_file[1])
else:
    log.warning('Not loading a checkpoint!')

log.info('Loading the Coach...')
c = Coach(g, nnet, args)

if args.load_model:
    log.info("Loading 'trainExamples' from file...")
    c.loadTrainExamples()

2022-10-26 02:40:48 b7321da9e726 __main__[1155] INFO Loading OthelloGame...
2022-10-26 02:40:48 b7321da9e726 __main__[1155] INFO Loading NNetWrapper...
2022-10-26 02:40:48 b7321da9e726 __main__[1155] INFO Loading the Coach...


Starts the training process

In [7]:
log.info('Starting the learning process 🎉')
c.learn()

2022-10-26 02:40:51 b7321da9e726 __main__[1155] INFO Starting the learning process 🎉
2022-10-26 02:40:51 b7321da9e726 Coach[1155] INFO Starting Iter #1 ...
Self Play: 100%|██████████| 2/2 [00:07<00:00,  3.79s/it]


Checkpoint Directory exists! 
EPOCH ::: 1


Training Net: 100%|██████████| 8/8 [00:10<00:00,  1.25s/it, Loss_pi=3.42e+00, Loss_v=9.08e-01]


EPOCH ::: 2


Training Net: 100%|██████████| 8/8 [00:09<00:00,  1.21s/it, Loss_pi=3.02e+00, Loss_v=6.25e-01]


EPOCH ::: 3


Training Net: 100%|██████████| 8/8 [00:11<00:00,  1.45s/it, Loss_pi=2.76e+00, Loss_v=4.54e-01]


EPOCH ::: 4


Training Net: 100%|██████████| 8/8 [00:10<00:00,  1.28s/it, Loss_pi=2.60e+00, Loss_v=3.43e-01]


EPOCH ::: 5


Training Net: 100%|██████████| 8/8 [00:09<00:00,  1.22s/it, Loss_pi=2.39e+00, Loss_v=3.13e-01]


EPOCH ::: 6


Training Net: 100%|██████████| 8/8 [00:09<00:00,  1.24s/it, Loss_pi=2.28e+00, Loss_v=2.17e-01]


EPOCH ::: 7


Training Net: 100%|██████████| 8/8 [00:13<00:00,  1.63s/it, Loss_pi=2.11e+00, Loss_v=1.43e-01]


EPOCH ::: 8


Training Net: 100%|██████████| 8/8 [00:11<00:00,  1.44s/it, Loss_pi=1.95e+00, Loss_v=2.06e-01]


EPOCH ::: 9


Training Net: 100%|██████████| 8/8 [00:09<00:00,  1.22s/it, Loss_pi=1.79e+00, Loss_v=1.67e-01]


EPOCH ::: 10


Training Net: 100%|██████████| 8/8 [00:09<00:00,  1.22s/it, Loss_pi=1.75e+00, Loss_v=1.57e-01]
2022-10-26 02:42:45 b7321da9e726 Coach[1155] INFO PITTING AGAINST PREVIOUS VERSION
Arena.playGames (1): 100%|██████████| 7/7 [00:25<00:00,  3.58s/it]
Arena.playGames (2): 100%|██████████| 7/7 [00:24<00:00,  3.55s/it]
2022-10-26 02:43:35 b7321da9e726 Coach[1155] INFO NEW/PREV WINS : 7 / 7 ; DRAWS : 0
2022-10-26 02:43:35 b7321da9e726 Coach[1155] INFO REJECTING NEW MODEL
2022-10-26 02:43:35 b7321da9e726 Coach[1155] INFO Starting Iter #2 ...
Self Play: 100%|██████████| 2/2 [00:07<00:00,  3.85s/it]


Checkpoint Directory exists! 
EPOCH ::: 1


Training Net: 100%|██████████| 16/16 [00:19<00:00,  1.23s/it, Loss_pi=3.31e+00, Loss_v=1.23e+00]


EPOCH ::: 2


Training Net: 100%|██████████| 16/16 [00:20<00:00,  1.27s/it, Loss_pi=2.92e+00, Loss_v=1.06e+00]


EPOCH ::: 3


Training Net: 100%|██████████| 16/16 [00:19<00:00,  1.22s/it, Loss_pi=2.73e+00, Loss_v=8.11e-01]


EPOCH ::: 4


Training Net: 100%|██████████| 16/16 [00:20<00:00,  1.25s/it, Loss_pi=2.58e+00, Loss_v=6.52e-01]


EPOCH ::: 5


Training Net: 100%|██████████| 16/16 [00:23<00:00,  1.48s/it, Loss_pi=2.41e+00, Loss_v=5.34e-01]


EPOCH ::: 6


Training Net: 100%|██████████| 16/16 [00:19<00:00,  1.21s/it, Loss_pi=2.25e+00, Loss_v=4.71e-01]


EPOCH ::: 7


Training Net: 100%|██████████| 16/16 [00:19<00:00,  1.21s/it, Loss_pi=2.19e+00, Loss_v=3.82e-01]


EPOCH ::: 8


Training Net: 100%|██████████| 16/16 [00:20<00:00,  1.30s/it, Loss_pi=2.00e+00, Loss_v=3.26e-01]


EPOCH ::: 9


Training Net: 100%|██████████| 16/16 [00:19<00:00,  1.21s/it, Loss_pi=1.91e+00, Loss_v=3.24e-01]


EPOCH ::: 10


Training Net: 100%|██████████| 16/16 [00:21<00:00,  1.31s/it, Loss_pi=1.68e+00, Loss_v=2.68e-01]
2022-10-26 02:47:07 b7321da9e726 Coach[1155] INFO PITTING AGAINST PREVIOUS VERSION
Arena.playGames (1): 100%|██████████| 7/7 [00:25<00:00,  3.60s/it]
Arena.playGames (2): 100%|██████████| 7/7 [00:24<00:00,  3.50s/it]
2022-10-26 02:47:57 b7321da9e726 Coach[1155] INFO NEW/PREV WINS : 9 / 5 ; DRAWS : 0
2022-10-26 02:47:57 b7321da9e726 Coach[1155] INFO ACCEPTING NEW MODEL


Checkpoint Directory exists! 
Checkpoint Directory exists! 


Once the algorithm is done training, print out the actions the algorithm took in the selection phase of MCTS. The output from this should match the instructor's.

In [8]:
print(c.actionsTaken)

[[(3, 4), (4, 2), (3, 1), (4, 0), (3, 0), (4, 4), (2, 1), (1, 2), (0, 3), (1, 0), (1, 1), (2, 4), (5, 0), (0, 1), (3, 5), (4, 1), (5, 2), (4, 3), (1, 3), (1, 5), (2, 5), (2, 0), (4, 5), (5, 1), (0, 0), (0, 4), (0, 2), (5, 5), (0, 5), (1, 4), (5, 4), (5, 3), (3, 4), (4, 2), (5, 1), (2, 4), (1, 1), (4, 5), (3, 1), (2, 1), (1, 3), (1, 2), (1, 0), (4, 1), (0, 3), (2, 0), (4, 0), (0, 1), (1, 5), (2, 5), (0, 2), (0, 5), (3, 5), (0, 4), (3, 0), (4, 4), (5, 4), (5, 2), (5, 3), (4, 3), (5, 5), (5, 0), (0, 0), (1, 4)], [(3, 4), (4, 2), (4, 1), (4, 0), (5, 2), (4, 4), (5, 0), (4, 3), (5, 3), (2, 4), (3, 5), (5, 4), (3, 0), (3, 1), (2, 0), (5, 1), (2, 5), (4, 5), (2, 1), (1, 3), (0, 3), (1, 1), (1, 2), (1, 5), (0, 1), (0, 0), (5, 5), (0, 4), (0, 5), (0, 2), (1, 4), (6, 0), (1, 0), (2, 1), (1, 3), (1, 4), (1, 5), (3, 4), (4, 3), (0, 5), (3, 1), (4, 1), (3, 5), (0, 3), (3, 0), (4, 5), (0, 4), (5, 3), (1, 0), (2, 5), (2, 4), (1, 2), (4, 4), (4, 0), (5, 2), (1, 1), (0, 1), (4, 2), (5, 4), (5, 5), (0, 

In [None]:
[[(3, 4), (4, 2), (3, 1), (4, 0), (3, 0), (4, 4), (2, 1), (1, 2), (0, 3), (1, 0), (1, 1), (2, 4), (5, 0), (0, 1), (3, 5), (4, 1), (5, 2), (4, 3), (1, 3), (1, 5), (2, 5), (2, 0), (4, 5), (5, 1), (0, 0), (0, 4), (0, 2), (5, 5), (0, 5), (1, 4), (5, 4), (5, 3), (3, 4), (4, 2), (5, 1), (2, 4), (1, 1), (4, 5), (3, 1), (2, 1), (1, 3), (1, 2), (1, 0), (4, 1), (0, 3), (2, 0), (4, 0), (0, 1), (1, 5), (2, 5), (0, 2), (0, 5), (3, 5), (0, 4), (3, 0), (4, 4), (5, 4), (5, 2), (5, 3), (4, 3), (5, 5), (5, 0), (0, 0), (1, 4)], [(3, 4), (4, 2), (4, 1), (4, 0), (5, 2), (4, 4), (5, 0), (4, 3), (5, 3), (2, 4), (3, 5), (5, 4), (3, 0), (3, 1), (2, 0), (5, 1), (2, 5), (4, 5), (2, 1), (1, 3), (0, 3), (1, 1), (1, 2), (1, 5), (0, 1), (0, 0), (5, 5), (0, 4), (0, 5), (0, 2), (1, 4), (6, 0), (1, 0), (2, 1), (1, 3), (1, 4), (1, 5), (3, 4), (4, 3), (0, 5), (3, 1), (4, 1), (3, 5), (0, 3), (3, 0), (4, 5), (0, 4), (5, 3), (1, 0), (2, 5), (2, 4), (1, 2), (4, 4), (4, 0), (5, 2), (1, 1), (0, 1), (4, 2), (5, 4), (5, 5), (0, 2), (0, 0), (5, 1), (5, 0), (6, 0), (2, 0)]]

[[(3, 4),
  (4, 2),
  (3, 1),
  (4, 0),
  (3, 0),
  (4, 4),
  (2, 1),
  (1, 2),
  (0, 3),
  (1, 0),
  (1, 1),
  (2, 4),
  (5, 0),
  (0, 1),
  (3, 5),
  (4, 1),
  (5, 2),
  (4, 3),
  (1, 3),
  (1, 5),
  (2, 5),
  (2, 0),
  (4, 5),
  (5, 1),
  (0, 0),
  (0, 4),
  (0, 2),
  (5, 5),
  (0, 5),
  (1, 4),
  (5, 4),
  (5, 3),
  (3, 4),
  (4, 2),
  (5, 1),
  (2, 4),
  (1, 1),
  (4, 5),
  (3, 1),
  (2, 1),
  (1, 3),
  (1, 2),
  (1, 0),
  (4, 1),
  (0, 3),
  (2, 0),
  (4, 0),
  (0, 1),
  (1, 5),
  (2, 5),
  (0, 2),
  (0, 5),
  (3, 5),
  (0, 4),
  (3, 0),
  (4, 4),
  (5, 4),
  (5, 2),
  (5, 3),
  (4, 3),
  (5, 5),
  (5, 0),
  (0, 0),
  (1, 4)],
 [(3, 4),
  (4, 2),
  (4, 1),
  (4, 0),
  (5, 2),
  (4, 4),
  (5, 0),
  (4, 3),
  (5, 3),
  (2, 4),
  (3, 5),
  (5, 4),
  (3, 0),
  (3, 1),
  (2, 0),
  (5, 1),
  (2, 5),
  (4, 5),
  (2, 1),
  (1, 3),
  (0, 3),
  (1, 1),
  (1, 2),
  (1, 5),
  (0, 1),
  (0, 0),
  (5, 5),
  (0, 4),
  (0, 5),
  (0, 2),
  (1, 4),
  (6, 0),
  (1, 0),
  (2, 1),
  (1, 3),
  (1, 4),