# Deep learning of rewards

This illustrates the use of a neural network to attempt to predict the action with the highest reward value. In other words, a perfectly-trained controller would be identical to the **GreedyController**.

In [None]:
from pod.board import PodBoard
from pod.ai.imitating_controller import ImitatingController
from pod.ai.rewards import re_dca
from pod.util import PodState
from pod.drawer import Drawer
from pod.ai.greedy_controller import GreedyController
from pod.controller import SimpleController
import matplotlib.pyplot as plt
from pod.ai.ai_utils import gen_pods, play_gen_pods
from pod.constants import Constants
import math

board = PodBoard.grid().shuffle()
controller = ImitatingController(GreedyController(board, re_dca))

## Training

### Using predefined states

One way to generate test data is to simply try to sample every possible state.

In [None]:
pods = gen_pods(
    [board.checkpoints[0]],
    [i * math.pi / 5 for i in range(5)],
    [i for i in range(Constants.check_radius(), 10000, 750)],
    [i * math.pi / 10 for i in range(10)],
    [i * math.pi / 7 for i in range(7)],
    [i * Constants.max_vel() / 3 for i in range(4)]
)

# TODO: training goes much better if I add extra pods pointing towards the check...why?

print("{} total states".format(len(pods)))

In [None]:
controller.board = board
accuracy = controller.train_from_states(pods, 1, 30, 3)

plt.plot(accuracy)
plt.show()

### Generating states by playing

Another way to generate test data is to simply start at a random position and play through for a few turns.

In [None]:
controller.board = board
accuracy = controller.train_by_playing(200, 100, 30, 3)

plt.plot(accuracy)
plt.show()

## Results

Now that the model has been trained, let's see what it can do!

As a comparison, we also add a **GreedyController** (to which our trained controller should be identical).

In [None]:
test_board = PodBoard.tester()
controller.board = test_board

drawer = Drawer(test_board, controllers=[controller, GreedyController(test_board, re_dca)])
drawer.animate(max_laps=2)

In [None]:
drawer.chart_rewards(re_dca)