# Q-Learning example

The following trains a **QController** to play the game. The **QController** has a Q-Table that indexes the Q-value for a given state and action. The states and actions are discretized from the (continuous) possible real states and actions.

In [None]:
from pod.board import PodBoard
from pod.ai.q_controller import QController
from pod.ai.rewards import re_dca, re_dcat
import matplotlib.pyplot as plt

board = PodBoard.trainer()
q_con = QController(board, re_dca)

Here, we train the controller, progressively decreasing the learning rate and varying the amount of random exploration.

In [None]:
q_con.board = PodBoard.trainer()

rewards = []
for p in range(10):
    prob = (10 - p) / 10
    print("P(random move) = {}".format(prob))
    results = q_con.train(
        num_episodes=20000,
        prob_rand_action=prob
    )
    avg = sum(results) / len(results)
    print(" ---> Average best reward: {}".format(avg))
    rewards.append(avg)

print("Number of states in Q-table: {}".format(len(q_con.q_table)))

plt.plot(rewards)
plt.legend(["Average best reward per epoch"])
plt.show()

Now that it has been trained, let's see the result!

In [None]:
from pod.drawer import Drawer
from pod.controller import SimpleController

board = PodBoard.tester()
q_con.board = board
drawer = Drawer(board, controllers=[q_con, SimpleController(board)])

drawer.animate(max_laps=2)

In [None]:
drawer.chart_rewards(re_dcat)