# Deep learning of rewards

This illustrates the **DeepRewardController**. This controller uses a neural network to attempt to predict the action with the highest reward value. In other words, a perfectly-trained controller would be identical to the **RewardController**.

In [None]:
from pod.board import PodBoard
from pod.ai.deep_reward_controller import DeepRewardController
from pod.ai.rewards import regood
from pod.util import PodState
from pod.drawer import Drawer
from pod.ai.reward_controller import RewardController
from pod.controller import SimpleController

board = PodBoard.circle(4).shuffle()
controller = DeepRewardController(board, regood)

### Training

First, we create some training data: a bunch of pods in various states around the target checkpoint.

In [None]:
from pod.ai.ai_utils import gen_pods, play_gen_pods
from pod.constants import Constants
import math

pods = gen_pods(
    [board.checkpoints[0]],
    [i * math.pi / 5 for i in range(5)],
    [i for i in range(Constants.check_radius(), 10000, 750)],
    [i * math.pi / 10 for i in range(10)],
    [i * math.pi / 7 for i in range(7)],
    [i * Constants.max_vel() / 3 for i in range(4)]
)

# TODO: training goes much better if I add extra pods pointing towards the check...why?

pods = play_gen_pods(pods, SimpleController(board), 3)

print("{} total states".format(len(pods)))

Now that we have a bunch of pod states, we can perform the training. The labels (i.e. the target output for each state) are calculated as whatever produces the highest reward.

In [None]:
import matplotlib.pyplot as plt

history = controller.train(pods, 20)

plt.plot(history.history['accuracy'])
#plt.plot(history.history['loss'])
plt.legend([
    "Accuracy",
#    "Loss"
])
plt.show()

### Play

Now that the model has been trained, let's see what it can do!

As a comparison, we also add a **SimpleController** (which simply goes full-speed toward the next checkpoint) and **RewardController** (which takes whatever action produces the highest reward).

In [None]:
TURNS = 200

drawer = Drawer(board, controllers=[controller, RewardController(board, regood), SimpleController(board)])

drawer.animate(TURNS)

The following shows the rewards for the players in the above run.

In [None]:
drawer.chart_rewards(regood)