# Excercise 8 Model-based Monte Carlo tree search on Santorini environment

## Goals:
- able to adapt an environment to work with MCTS
- implement MCTS

## Prerequisites

- pull updates of https://github.com/phizaz/chula_rl 
- you will need to write code in python (not just in the notebooks)
- a good code editor is preferable

## Things you need to implement:

1. **MCTS** `chula_rl.alphazero.mcts`, Monte-Carlo tree search file. We have coded the template, but you need to fill in the blanks.
2. **Game interface** `chula_rl.alphazero.santorini.game`, You need to complete the interface between an environment and MCTS. We have coded the template, you need to fill the blanks.
3. **Network** `chula_rl.alphazero.santorini.net.tf`, the main neural network shoud be implemented in TF2. 

## Hints

`chula_rl.alphazero.othello` is an example of adapting an environment (the game of Othello) to work with MCTS interface. You can take it as an inspiration for the Santorini. 

However, you still need to implement the MCTS code which is not provided. 



Disclaimer: most of the code in the repo is not developed by us. It is from an annonymous origin for now. 

## Step 0 Understanding the game of Santorini

Youtube explanation from the author of the game: https://www.youtube.com/watch?v=EzHAykZTCHU

We don't really play the "original" Santorini, we play a simpler one in which there are no "god powers" cards.

Python implementation of the game is available at: https://github.com/cstorm125/santorini (hats off to Charin 🙏)

We have made some changes to it ... so use the one in `chula_rl.alphazero.santorini.santorinigo.environment` instead. 

Details are seen in the repo under sections "Setup" and "Each turn".


In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import numpy as np
import tensorflow as tf

## Step 1 implement the game interface 

You might want to play around to make sure that it does what it should do. Checking each method one by one is a good idea.

In [None]:
from chula_rl.alphazero.santorini.game import SantoriniGame

game = SantoriniGame()

## Step 2 implement the neural network

Neural network in this case hase "two" heads.

One for action distribution. One for state-value prediction.

Make sure that the output shape makes sense.

**Feel free to change the existing code.**

In [None]:
from chula_rl.alphazero.santorini.net.tf import Backbone, SantoriniNet

## Step 3 implement MCTS

You need to implement the file `chula_rl.alphazero.mcts`. We have already provided some structure of it. The code should be fairly close to the pseucode in the classroom. 

In [None]:
from chula_rl.alphazero.mcts import MCTS

## Step 4 Train an agent with MCTS

To train an agent under the MCTS framework there are a few ingredients to consider (all of which are provided). 

We have provided:

- **Coach** `chula_rl.alphazero.coach` this is the main "traning loop". Basically, you call `coach.learn`
- **Arena** which is used by **Coach** for competing two policies to see which one fares better. **Coach** uses this to determine if the improved policy is really better, if not it is discarded.

Coach will automatically save neural nets for each step of progresses. You can control where to save in `Args` (default `./checkpoint/`)

### Advice

Training takes a very longggg time. If you want to debug, set args so that it would run fast to make sure your code work. Only when you are pretty sure about your code, run with a larger setting! 👌

In [None]:
from chula_rl.alphazero.coach import *

# config for the coach
args = Args() 

def make_net():
    # implement this 
    # ... it should return the neural network
    # return SantoriniNet()
    raise NotImplementedError()

g = SantoriniGame()

# logging is very useful to see if we make any progress!
writer = tf.summary.create_file_writer(f'tensorboard')
with writer.as_default():
    c = Coach(g, make_net, args)
    c.learn()

## Step 5 comparing agents

You could use **Arena** to compare many kinds of agents. You could compete the best agent in the `./checkpoint/` to compare with other versions of itself. Or with a random agent. 

Note: a baseline from us might be released later on.

### Provided agents in `chula_rl.alphazero.santorini.players`

- Human: you can control an agent by yourself, albeit you need to know the exact command for it (which means you need to dig deeper into the codebase 😉)
- Random: a random player. Your policy should be better than random agent at the very least. Show us!

In [None]:
from chula_rl.alphazero.arena import Arena

## How do we grade?

We look for the effort you put into making MCTS work with Santorini. 