<a href="https://colab.research.google.com/github/andrewfeikema/alpha-zero-general/blob/master/Othello_Train_Trial_using_AlphaZero_General.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Setup, install dependencies**

In [1]:
# Clone repo and install requirements

!git clone https://github.com/andrewfeikema/alpha-zero-general.git

fatal: destination path 'alpha-zero-general' already exists and is not an empty directory.


In [2]:
%cd '/content/alpha-zero-general'

/content/alpha-zero-general


In [3]:
!git checkout -t origin/master

fatal: A branch named 'master' already exists.


In [20]:
!pip install -r docker/requirements.txt

Collecting pandas==0.23.1
  Using cached https://files.pythonhosted.org/packages/27/85/f9e4f0e47a6f1410b1d737b74a1764868e9197e3197a2be843507b505636/pandas-0.23.1.tar.gz
Collecting scikit-learn==0.19.1
  Using cached https://files.pythonhosted.org/packages/f5/2c/5edf2488897cad4fb8c4ace86369833552615bf264460ae4ef6e1f258982/scikit-learn-0.19.1.tar.gz
Collecting scikit-image==0.14.0
  Using cached https://files.pythonhosted.org/packages/d6/ae/c9ea76fb37724596bd031e98f7f356936cabc39e5c57f27d56f08e6d52f2/scikit_image-0.14.0-cp37-cp37m-manylinux1_x86_64.whl
Processing /root/.cache/pip/wheels/b1/c3/d6/9a1cc8f3a99a0fc1124cae20153f36af59a6e683daca0a0814/torchfile-0.1.0-cp37-none-any.whl
[31mERROR: Operation cancelled by user[0m


# **Train the AlphaZero model**

In [5]:
import logging
import coloredlogs
from Coach import Coach
from utils import dotdict
from othello.pytorch.NNet import NNetWrapper
from othello.OthelloGame import OthelloGame

In [6]:
log = logging.getLogger(__name__)
coloredlogs.install(level='INFO')  # Change this to DEBUG to see more info.

In [7]:
args = dotdict({
    'numIters': 1000,
    'numEps': 100,              # Number of complete self-play games to simulate during a new iteration.
    'tempThreshold': 15,        #
    'updateThreshold': 0.6,     # During arena playoff, new neural net will be accepted if threshold or more of games are won.
    'maxlenOfQueue': 200000,    # Number of game examples to train the neural networks.
    'numMCTSSims': 25,          # Number of games moves for MCTS to simulate.
    'arenaCompare': 40,         # Number of games to play during arena play to determine if new net will be accepted.
    'cpuct': 1,
    'checkpoint': './temp/',
    'load_model': False,
    'numItersForTrainExamplesHistory': 20,
})

In [8]:
# If you have a pre-trained model, you can load it here.
import os
if os.path.exists(os.path.join('./temp', 'best.pth.tar.index')):
  print ("Using best pre-existing model")
  args['load_model'] = True
  args['load_folder_file'] = ('./temp','best.pth.tar')
else:
  print ("Not using best pre-existing model")

Not using best pre-existing model


In [9]:
# Set very low iterations to let this notebook run in its entirety.

# In reality, training a model, even as simple as the one for Dots and Boxes, can take several hours or days.
args['numIters'] = 1
args['numEps'] = 1
args['arenaCompare'] = 2

In [10]:
game = OthelloGame(n=8)

In [11]:
nnet = NNetWrapper(game)

In [12]:
if args.load_model:
    print('Loading checkpoint "{}/{}"...'.format(args.load_folder_file[0], args.load_folder_file[1]))
    nnet.load_checkpoint(args.load_folder_file[0], args.load_folder_file[1])
else:
    print('Not loading a checkpoint.')

Not loading a checkpoint.


In [13]:
coach = Coach(game, nnet, args)

In [14]:
%time coach.learn()

2021-05-12 18:41:05 22635aaa6a54 Coach[31036] INFO Starting Iter #1 ...
Self Play: 100%|██████████| 1/1 [00:03<00:00,  3.21s/it]
Training Net:   0%|          | 0/7 [00:00<?, ?it/s]

Checkpoint Directory exists! 
EPOCH ::: 1


Training Net: 100%|██████████| 7/7 [00:00<00:00, 50.04it/s, Loss_pi=3.99e+00, Loss_v=6.77e-01]
Training Net: 100%|██████████| 7/7 [00:00<00:00, 87.95it/s, Loss_pi=3.56e+00, Loss_v=4.01e-01]
Training Net: 100%|██████████| 7/7 [00:00<00:00, 91.33it/s, Loss_pi=3.35e+00, Loss_v=2.36e-01]
Training Net:   0%|          | 0/7 [00:00<?, ?it/s, Loss_pi=3.16e+00, Loss_v=1.51e-01]

EPOCH ::: 2
EPOCH ::: 3
EPOCH ::: 4


Training Net: 100%|██████████| 7/7 [00:00<00:00, 94.38it/s, Loss_pi=3.11e+00, Loss_v=1.18e-01]
Training Net: 100%|██████████| 7/7 [00:00<00:00, 93.29it/s, Loss_pi=2.92e+00, Loss_v=1.10e-01]
Training Net: 100%|██████████| 7/7 [00:00<00:00, 93.50it/s, Loss_pi=2.65e+00, Loss_v=5.71e-02]
Training Net:   0%|          | 0/7 [00:00<?, ?it/s, Loss_pi=2.40e+00, Loss_v=6.77e-02]

EPOCH ::: 5
EPOCH ::: 6
EPOCH ::: 7


Training Net: 100%|██████████| 7/7 [00:00<00:00, 95.16it/s, Loss_pi=2.38e+00, Loss_v=6.58e-02]
Training Net: 100%|██████████| 7/7 [00:00<00:00, 91.59it/s, Loss_pi=2.21e+00, Loss_v=2.46e-02]
Training Net: 100%|██████████| 7/7 [00:00<00:00, 93.69it/s, Loss_pi=1.95e+00, Loss_v=2.09e-02]
Training Net:   0%|          | 0/7 [00:00<?, ?it/s, Loss_pi=1.75e+00, Loss_v=3.73e-02]

EPOCH ::: 8
EPOCH ::: 9
EPOCH ::: 10


Training Net: 100%|██████████| 7/7 [00:00<00:00, 96.75it/s, Loss_pi=1.70e+00, Loss_v=3.09e-02]
2021-05-12 18:41:09 22635aaa6a54 Coach[31036] INFO PITTING AGAINST PREVIOUS VERSION
Arena.playGames (1): 100%|██████████| 1/1 [00:03<00:00,  3.09s/it]
Arena.playGames (2): 100%|██████████| 1/1 [00:03<00:00,  3.26s/it]
2021-05-12 18:41:16 22635aaa6a54 Coach[31036] INFO NEW/PREV WINS : 1 / 1 ; DRAWS : 0
2021-05-12 18:41:16 22635aaa6a54 Coach[31036] INFO REJECTING NEW MODEL


CPU times: user 10.6 s, sys: 153 ms, total: 10.7 s
Wall time: 10.8 s


**Exploring the model**

In [19]:
!ls ./temp/

checkpoint_0.pth.tar.examples  temp.pth.tar
