# Deep learning of rewards

This illustrates the **DeepRewardController** which contains a neural network to predict the action with the highest reward value.

In [None]:
from pod.board import PodBoard
from pod.ai.deep_reward_controller import DeepRewardController

board = PodBoard()
controller = DeepRewardController(board)

### Training

First, we create some training data: a bunch of pods in various states around the target checkpoint.

(TODO: training goes much better if I add extra pods pointing towards the check...why?)

In [None]:
from pod.ai.ai_utils import gen_pods, frange, MAX_VEL
import math

pods_everywhere = gen_pods(
    board.checkpoints[0],
    frange(1000, 10000, 5),
    frange(math.pi * -0.9, math.pi * 0.9, 11),
    frange(math.pi * -0.9, math.pi * 0.9, 11),
    frange(0, MAX_VEL, 5)
)

pods_focused = gen_pods(
    board.checkpoints[0],
    frange(1000, 10000, 5),
    frange(-0.3, 0.3, 11),
    frange(math.pi * -0.9, math.pi * 0.9, 11),
    frange(0, MAX_VEL, 5)
)

pods = [*pods_everywhere, *pods_focused]

print("{} total pods".format(len(pods)))

Now that we have a bunch of pod states, we can perform the training. The labels (i.e. the target output for each state) are calculated as whatever produces the highest reward.

In [None]:
loss = controller.train(pods, 75)

### Play

Now that the model has been trained, let's see what she can do!

In [None]:
from pod.util import PodState
from pod.game import Player
from pod.drawer import Drawer
from IPython.display import Image

player = Player(controller, pod=PodState(board.checkpoints[1]))
player.pod.nextCheckId = 1
drawer = Drawer(board, [player])

file = '/tf/notebooks/pods.gif'
drawer.animate(file, 200)
Image(filename = file)