# Deep Tree Search Variant

The goal here is to train a Neural Network to imitate a tree search to an arbitrary depth. The algorithm is as follows:

1. Start with agent.depth = 0
1. Generate a set of test data (various pods distributed throughout the state space)
1. For each training cycle:
    1. Generate labels for each test state
    1. Train the agent using the test data and labels
    1. Increment the agent's depth by one

The function for generating the label for a given state is the following:

1. For each possible action a:
    1. Set current_state = environment.step(state, a)
    1. For (agent.depth) turns:
        1. Use the agent's NN to determine the best action aa at the current state
        1. Set current_state = environment.step(current_state, aa)
    1. Set the value of taking action a from the initial state to reward(current_state)
1. Return the action that produced the highest reward

Another way to view it would be: this is equivalent to the **DeepRewardController**, with a custom reward function that takes (agent.depth) steps (based on the output of the NN) before calculating the reward.

In [None]:
import tensorflow as tf
import numpy as np

from pod.board import PodBoard
from pod.drawer import Drawer
from pod.ai.deep_tree_controller import DeepTreeController
from pod.ai.rewards import regood

board = PodBoard.trainer(4)
controller = DeepTreeController(board, regood)

In [None]:
import math

from pod.ai.ai_utils import gen_pods, play_gen_pods
from pod.ai.misc_controllers import RandomController
from pod.controller import SimpleController
from pod.constants import Constants

# Step 1: get a bunch of pods spread around the board
print("Generating pods...")
pods = gen_pods(
    board.checkpoints,
    [2 * i * math.pi / 5 for i in range(5)],
    [
        Constants.check_radius() * 1.01,
        Constants.check_radius() * 1.1,
        Constants.check_radius() * 1.3,
        Constants.check_radius() * 1.6,
        Constants.check_radius() * 2,
        Constants.check_radius() * 3,
        Constants.check_radius() * 4,
        Constants.check_radius() * 6,
    ],
    [i * math.pi for i in [1, 0.75, -0.75, 0.5, -0.5, 0.3, -0.3, 0.2, -0.2, 0]],
    [i * math.pi / 3 for i in range(6)],
    [i * Constants.max_vel() / 2 for i in range(3)]
)

# Step 2: play them a few turns to build even more
#print("Generating even more pods...")
pods = play_gen_pods(pods, SimpleController(board), 5)

# Step 3: Vectorize each pod
print("Vectorizing...")
pods = [(pod, controller.vectorizer.to_vector(board, pod)) for pod in pods]

print("Done!")

In [None]:
import matplotlib.pyplot as plt

board = PodBoard.trainer(4)
controller.board = board

for i in range(3):
    history = controller.train(pods, 50)
    print("Controller now at depth {}".format(controller.depth))

In [None]:
from pod.ai.reward_controller import RewardController
from pod.ai.tree_search_controller import TreeSearchController

board = PodBoard.grid().shuffle()
controller.board = board
drawer = Drawer(board, controllers=[
    controller,
    RewardController(board, regood),
    TreeSearchController(board, regood, 2),
    TreeSearchController(board, regood, 3),
    TreeSearchController(board, regood, 4),
], labels=[
    'Deep Tree',
    'Tree 1',
    'Tree 2',
    'Tree 3',
    'Tree 4',
])
drawer.animate(max_laps=3)

In [None]:
drawer.chart_rewards(regood)

# Scratchpad

In [None]:
import tensorflow as tf
import numpy as np
from pod.util import PodState
from pod.board import PodBoard
from pod.player import Player
from pod.drawer import Drawer
from pod.ai.rewards import regood
from pod.ai.tree_search_controller import TreeSearchController


board = PodBoard.grid().shuffle()
drawer = Drawer(board, controllers=[
    TreeSearchController(board, regood, max_depth=1),
    TreeSearchController(board, regood, max_depth=2),
    TreeSearchController(board, regood, max_depth=3),
    TreeSearchController(board, regood, max_depth=4),
], labels=['depth 1', 'depth 2', 'depth 3'])
drawer.animate(200, trail_len=100)