# CS440/ECE448 Spring 2023
# MP11: Reinforcement Learning

The first thing you need to do is to download this file: <a href="mp11.zip">mp11.zip</a>.  If you want, you can also download <a href="mp11_extra.zip">mp11_extra.zip</a>, the extra credit assignment.  `mp11.zip` has the following content:

* `submitted.py`: Your homework. Edit, and then submit to <a href="https://www.gradescope.com/courses/486387">Gradescope</a>.
* `mp11_notebook.ipynb`: This is a <a href="https://anaconda.org/anaconda/jupyter">Jupyter</a> notebook to help you debug.  You can completely ignore it if you want, although you might find that it gives you useful instructions.
* `pong.py`: This is a program that plays Pong.  If called interactively, it will call the module `pong_display.py` to create a display, so that you can play.  If told to use a Q-learner, it will call your `submitted.py` to do Q-learning.
* `grade.py`: Once your homework seems to be working, you can test it by typing `python grade.py`, which will run the tests in `tests/tests_visible.py`.
* `tests/test_visible.py`: This file contains about half of the <a href="https://docs.python.org/3/library/unittest.html">unit tests</a> that Gradescope will run in order to grade your homework.  If you can get a perfect score on these tests, then you should also get a perfect score on the additional hidden tests that Gradescope uses.
* `requirements.txt`: This tells you which python packages you need to have installed, in order to run `grade.py`.  You can install all of those packages by typing `pip install -r requirements.txt` or `pip3 install -r requirements.txt`.

This file (`mp11_notebook.ipynb`) will walk you through the whole MP, giving you instructions and debugging tips as you go.

### Table of Contents

1. <a href="#section1">Playing Pong</a>
1. <a href="#section2">Creating a Q-Learner Object</a>
1. <a href="#section3">Epsilon-First Exploration</a>
1. <a href="#section4">Q-Learning</a>
1. <a href="#section5">Saving and Loading Your Q and N Tables</a>
1. <a href="#section6">Exploitation</a>
1. <a href="#section7">Acting</a>
1. <a href="#section8">Training</a>
1. <a href="#section9">Extra Credit</a>
1. <a href="#grade">Grade Your Homework</a>

<a id='section1'></a>

## Playing Pong

Pong was the <a href="https://en.wikipedia.org/wiki/Pong">first video game produced by Atari.</a>  It is a simple game, based on table tennis.  Here is a two-person version of the game: https://commons.wikimedia.org/wiki/File:Pong_Game_Test2.gif

We will be playing a one-person version of the game:

* When the ball hits the top, bottom, or left wall of the playing field, it bounces.
* The right end of the playing field is open, except for the paddle.  If the ball hits the paddle, it bounces, and the player's score increments by one.  If the ball hits the open space, the game is over; the score resets to zero, and a new game begins.

The game is pretty simple, but in order to get a better feeling for it, you may want to try playing it yourself.  Use the up arrow to move the paddle upward, and the down arrow to move the paddle downward.  See how high you can make your score:


In [None]:
!python pong.py

Once you figure out how to use the arrow keys to control your paddle, we hope you will find that the game is not too hard for a human to play.  However, for a computer, it's difficult to know: where should the paddle be moved at each time step?  In order to see how difficult it is for a computer to play, let's ask the "random" player to play the game.

**WARNING:** The following line will open a pygame window.  The pygame window will be hidden by this window -- in order to see it, you will need to minimize this window.  The pygame window will consume a lot of CPU time just waiting for the processor, so in order to kill it, you will need to come back to this window, click on the block below, then click the Jupyter "stop" button (the square button at the top of this window) in order to stop processing.

In [None]:
!python pong.py --player random

<a id='section2'></a>

## Creating a Q-Learner Object

The first thing you will do is to create a `q_learner` object that can store your learned Q table and your N table (table of exploration counts).  

Like any other object-oriented language, python permits you to create new object classes in order to store data that will be needed from time to time.  If you are not already very, very familiar with python classes, you might want to study the python class tutorial: https://docs.python.org/3/tutorial/classes.html

Like any other object in python, a `q_learner` object is created by calling its name as a function, e.g., `my_q_learner=submitted.q_learner()`.  Doing so calls the function `submitted.q_learner.__init__()`.  Let's look at the docstring to see what it should do.

In [1]:
import submitted, importlib
importlib.reload(submitted)
help(submitted.q_learner.__init__)

Help on function __init__ in module submitted:

__init__(self, alpha, epsilon, gamma, nfirst, state_cardinality)
    Create a new q_learner object.
    Your q_learner object should store the provided values of alpha,
    epsilon, gamma, and nfirst.
    It should also create a Q table and an N table.
    Q[...state..., ...action...] = expected utility of state/action pair.
    N[...state..., ...action...] = # times state/action has been explored.
    Both are initialized to all zeros.
    Up to you: how will you encode the state and action in order to
    define these two lookup tables?  The state will be a list of 5 integers,
    such that 0 <= state[i] < state_cardinality[i] for 0 <= i < 5.
    The action will be either -1, 0, or 1.
    It is up to you to decide how to convert an input state and action
    into indices that you can use to access your stored Q and N tables.
    
    @params:
    alpha (scalar) - learning rate of the Q-learner
    epsilon (scalar) - probability of takin

Write your `__init__` function to meet the requirements specified in the docstring.  Once you have completed it, the following code should run without errors:

In [2]:
importlib.reload(submitted)

q_learner = submitted.q_learner(0.05,0.05,0.99,5,[10,10,2,2,10])

print(q_learner)


<submitted.q_learner object at 0x7f8c131773d0>


<a id='section3'></a>

## Epsilon-First Exploration

In order to manage the exploration/exploitation tradeoff, we will be using both "epsilon-first" and "epsilon-greedy" (https://en.wikipedia.org/wiki/Multi-armed_bandit#Semi-uniform_strategies).  

The epsilon-first strategy explores every state/action pair at least `nfirst` times before it ever starts to exploit any strategy.  Your `q_learner` should have a table to keep track of how many times it has explored a state/action pair prior to the start of any exploitation.  The method for storing that table is up to you; in order to have some standardized API, therefore, you need to write a method called `report_exploration_counts` that returns a list of the three exploration counts for a given state. 

In [3]:
importlib.reload(submitted)
help(submitted.q_learner.report_exploration_counts)

Help on function report_exploration_counts in module submitted:

report_exploration_counts(self, state)
    Check to see how many times each action has been explored in this state.
    @params:
    state (list of 5 ints): ball_x, ball_y, ball_vx, ball_vy, paddle_y.
      These are the (x,y) position of the ball, the (vx,vy) velocity of the ball,
      and the y-position of the paddle, all quantized.
      0 <= state[i] < state_cardinality[i], for all i in [0,4].
    
    @return:
    explored_count (array of 3 ints): 
      number of times that each action has been explored from this state.
      The mapping from actions to integers is up to you, but there must be three of them.



Write `report_exploration_counts` so that it returns a list or array for any given state.  Test your code with the following:

In [4]:
importlib.reload(submitted)
q_learner = submitted.q_learner(0.05,0.05,0.99,5,[10,10,2,2,10])
print('This is how many times state [0,0,0,0,0] has been explored so far:')
print(q_learner.report_exploration_counts([0,0,0,0,0]))
print('This is how many times state [9,9,1,1,9] has been explored so far:')
print(q_learner.report_exploration_counts([9,9,1,1,9]))

This is how many times state [0,0,0,0,0] has been explored so far:
[0. 0. 0.]
This is how many times state [9,9,1,1,9] has been explored so far:
[0. 0. 0.]


When your learner first starts learning, it will call the function `choose_unexplored_action` to choose an unexplored action.  This function should choose a function uniformly at random from the set of unexplored actions in a given state, if there are any:

In [5]:
importlib.reload(submitted)
help(submitted.q_learner.choose_unexplored_action)

Help on function choose_unexplored_action in module submitted:

choose_unexplored_action(self, state)
    Choose an action that has been explored less than nfirst times.
    If many actions are underexplored, you should choose uniformly
    from among those actions; don't just choose the first one all
    the time.
    
    @params:
    state (list of 5 ints): ball_x, ball_y, ball_vx, ball_vy, paddle_y.
       These are the (x,y) position of the ball, the (vx,vy) velocity of the ball,
      and the y-position of the paddle, all quantized.
      0 <= state[i] < state_cardinality[i], for all i in [0,4].
    
    @return:
    action (scalar): either -1, or 0, or 1, or None
      If all actions have been explored at least n_explore times, return None.
      Otherwise, choose one uniformly at random from those w/count less than n_explore.
      When you choose an action, you should increment its count in your counter table.



If this has been written correctly, the following block should generate a random sequence of actions.  If the next block produces the same action 5 times in a row, that is the wrong result, and the result would be that your code does not pass the autograder.

In [6]:
importlib.reload(submitted)
q_learner = submitted.q_learner(0.05,0.05,0.99,5,[10,10,2,2,10])
print('Next action:',q_learner.choose_unexplored_action([9,9,1,1,9]))
print('Next action:',q_learner.choose_unexplored_action([9,9,1,1,9]))
print('Next action:',q_learner.choose_unexplored_action([9,9,1,1,9]))
print('Next action:',q_learner.choose_unexplored_action([9,9,1,1,9]))
print('Next action:',q_learner.choose_unexplored_action([9,9,1,1,9]))


Next action: 1
Next action: -1
Next action: 0
Next action: 1
Next action: 0


After all three actions have been explored `nfirst` times, the function `choose_unexplored_action` should return `None`, as shown here:

In [7]:
importlib.reload(submitted)
q_learner = submitted.q_learner(0.05,0.05,0.99,1,[10,10,2,2,10])
print('Next action:',q_learner.choose_unexplored_action([9,9,1,1,9]))
print('Next action:',q_learner.choose_unexplored_action([9,9,1,1,9]))
print('Next action:',q_learner.choose_unexplored_action([9,9,1,1,9]))
print('Next action:',q_learner.choose_unexplored_action([9,9,1,1,9]))


Next action: 0
Next action: -1
Next action: 1
Next action: None


<a id='section4'></a>

## Q-Learning

The reinforcement learning we are implementing is called Q-learning (https://en.wikipedia.org/wiki/Q-learning).  

Q-learning keeps a table $Q[s,a]$ that specifies the expected utility of action $a$ in state $s$.  The organization of this table is up to you.  In order to have a standard API, the first thing you should implement is a function `report_q` with the following docstring:

In [8]:
importlib.reload(submitted)
help(submitted.q_learner.report_q)

Help on function report_q in module submitted:

report_q(self, state)
    Report the current Q values for the given state.
    @params:
    state (list of 5 ints): ball_x, ball_y, ball_vx, ball_vy, paddle_y.
      These are the (x,y) position of the ball, the (vx,vy) velocity of the ball,
      and the y-position of the paddle, all quantized.
      0 <= state[i] < state_cardinality[i], for all i in [0,4].
    
    @return:
    Q (array of 3 floats): 
      reward plus expected future utility of each of the three actions. 
      The mapping from actions to integers is up to you, but there must be three of them.



When your `q_learner` is first initialized, the value of $Q[state,action]$ should be zero for all state/action pairs, thus the `report_q` function should return lists of zeros:

In [9]:
importlib.reload(submitted)
q_learner=submitted.q_learner(0.05,0.05,0.99,5,[10,10,2,2,10])
print('Q[0,0,0,0,0] is now:',q_learner.report_q([0,0,0,0,0]))
print('Q[9,9,1,1,9] is now:',q_learner.report_q([9,9,1,1,9]))

Q[0,0,0,0,0] is now: [0. 0. 0.]
Q[9,9,1,1,9] is now: [0. 0. 0.]


There are actually many different Q-learning algorithms available, but when people refer to Q-learning with no modifier, they usually mean the time-difference (TD) algorithm.  For example, this is the algorithm that's described on the wikipedia page (https://en.wikipedia.org/wiki/Q-learning).  This is the algorithm you will implement for this MP.

In supervised machine learning, the learner tries to imitate a reference label.  In reinforcement learning, there is no reference label.  Q-learning replaces the reference label with a "local Q" value, which is the utility that was obtained by performing action $a$ in state $s$ one time.  It is usually calculated like this:

$$Q_{local}(s_t,a_t) = r_t + \gamma\max_{a_{t+1}}Q(s_{t+1},a_{t+1})$$

where $r_t$ is the reward that was achieved by performing action $a_t$ in state $s_t$, $s_{t+1}$ is the state into which the game transitioned, and $a_{t+1}$ is one of the actions that could be performed in that state.  $Q_{local}$ is computed by your `q_local` function, which has this docstring:  

In [10]:
importlib.reload(submitted)
help(submitted.q_learner.q_local)

Help on function q_local in module submitted:

q_local(self, reward, newstate)
    The update to Q estimated from a single step of game play:
    reward plus gamma times the max of Q[newstate, ...].
    
    @param:
    reward (scalar float): the reward achieved from the current step of game play.
    newstate (list of 5 ints): ball_x, ball_y, ball_vx, ball_vy, paddle_y.
      These are the (x,y) position of the ball, the (vx,vy) velocity of the ball,
      and the y-position of the paddle, all quantized.
      0 <= state[i] < state_cardinality[i], for all i in [0,4].
    
    @return:
    Q_local (scalar float): the local value of Q



Initially, `q_local` should just return the given reward, because initially, all Q values are 0:

In [11]:
importlib.reload(submitted)
q_learner = submitted.q_learner(0.05,0.05,0.99,5,[10,10,2,10,10])
print('Q_local(6.25,[9,9,1,1,9]) is currently:',q_learner.q_local(6.25,[9,9,1,1,9]))

Q_local(6.25,[9,9,1,1,9]) is currently: 6.25


Now you can use `q_learner.q_local` as the target for `q_learner.learn`.  The basic algorithm is

$$Q(s,a) = Q(s,a) + \alpha (Q_{local}(s,a)-Q(s,a))$$

Here is the docstring:

In [12]:
importlib.reload(submitted)
help(submitted.q_learner.learn)

Help on function learn in module submitted:

learn(self, state, action, reward, newstate)
    Update the internal Q-table on the basis of an observed
    state, action, reward, newstate sequence.
    
    @params:
    state: a list of 5 numbers: ball_x, ball_y, ball_vx, ball_vy, paddle_y.
      These are the (x,y) position of the ball, the (vx,vy) velocity of the ball,
      and the y-position of the paddle.
    action: an integer, one of -1, 0, or +1
    reward: a reward; positive for hitting the ball, negative for losing a game
    newstate: a list of 5 numbers, in the same format as state
    
    @return:
    None



The following block checks a sequence of Q updates:

1. First, $Q([9,9,1,1,9],-1)$ is updated.  Since all Q values start at zero, it will be updated to just have a value equal to $\alpha$ (0.05) times the given reward (6.25) for a total value of 0.3125.
1. When we print out $Q([9,9,1,1,9],:)$, we see that one of the elements has been updated.
1. Next, update $Q([9,9,1,1,8],1)$ with a given reward, and with $[9,9,1,1,9]$ as the given next state.  Since $Q([9,9,1,1,9],-1)$ is larger than zero, the next-state Q-value should be multiplied by $\gamma$ (0.99) and added to the reward (3.1), then multiplied by $\alpha$, giving a total value of 0.17046875.
1. The resulting Q-value is reported.

In [13]:
importlib.reload(submitted)
q_learner = submitted.q_learner(0.05,0.05,0.99,5,[10,10,2,2,10])
q_learner.learn([9,9,1,1,9],-1,6.25,[0,0,0,0,0])
print('Q[9,9,1,1,9] is now',q_learner.report_q([9,9,1,1,9]))
q_learner.learn([9,9,1,1,8],1,3.1,[9,9,1,1,9])
print('Q[9,9,1,1,8] is now',q_learner.report_q([9,9,1,1,8]))

Q[9,9,1,1,9] is now [0.     0.     0.3125]
Q[9,9,1,1,8] is now [0.         0.17046875 0.        ]


<a id='section5'></a>

## Saving and Loading your Q and N Tables

After you've spent a long time training your `q_learner`, you will want to save your Q and N tables so that you can reload them later.  The format of Q and N is up to you, therefore it's also up to you to write the `save` and `load` functions.  Here are the docstrings:

In [14]:
importlib.reload(submitted)
help(submitted.q_learner.save)

Help on function save in module submitted:

save(self, filename)
    Save your Q and N tables to a file.
    This can save in any format you like, as long as your "load" 
    function uses the same file format.  We recommend numpy.savez,
    but you can use something else if you prefer.
    
    @params:
    filename (str) - filename to which it should be saved
    @return:
    None



In [15]:
importlib.reload(submitted)
help(submitted.q_learner.load)

Help on function load in module submitted:

load(self, filename)
    Load the Q and N tables from a file.
    This should load from whatever file format your save function
    used.  We recommend numpy.load, but you can use something
    else if you prefer.
    
    @params:
    filename (str) - filename from which it should be loaded
    @return:
    None



These functions can be tested by doing one step of training one `q_learner`, then saving its results, then loading them into another `q_learner`:

In [16]:
importlib.reload(submitted)
q_learner1 = submitted.q_learner(0.05,0.05,0.99,5,[10,10,2,2,10])
print('Next action:',q_learner1.choose_unexplored_action([9,9,1,1,9]))
q_learner1.learn([9,9,1,1,9],-1,6.25,[0,0,0,0,0])
print('N1[9,9,1,1,8] is now',q_learner1.report_exploration_counts([9,9,1,1,9]))
print('Q1[9,9,1,1,8] is now',q_learner1.report_q([9,9,1,1,9]))
q_learner1.save('test.npz')

q_learner2 = submitted.q_learner(0.05,0.05,0.99,5,[10,10,2,2,10])
print('N2[9,9,1,1,8] starts out as',q_learner2.report_exploration_counts([9,9,1,1,9]))
print('Q2[9,9,1,1,8] starts out as',q_learner2.report_q([9,9,1,1,9]))
q_learner2.load('test.npz')
print('N2[9,9,1,1,8] is now',q_learner2.report_exploration_counts([9,9,1,1,9]))
print('Q2[9,9,1,1,8] is now',q_learner2.report_q([9,9,1,1,9]))


Next action: 0
N1[9,9,1,1,8] is now [0. 1. 1.]
Q1[9,9,1,1,8] is now [0.     0.     0.3125]
N2[9,9,1,1,8] starts out as [0. 0. 0.]
Q2[9,9,1,1,8] starts out as [0. 0. 0.]
N2[9,9,1,1,8] is now [0. 1. 1.]
Q2[9,9,1,1,8] is now [0.     0.     0.3125]


<a id='section6'></a>

## Exploitation

A reinforcement learner always has to trade off between exploration (choosing an action at random) versus exploitation (choosing the action with the maximum expected utility).  Before we worry about that tradeoff, though, let's first make sure that exploitation works.

In [17]:
importlib.reload(submitted)
help(submitted.q_learner.exploit)

Help on function exploit in module submitted:

exploit(self, state)
    Return the action that has the highest Q-value for the current state, and its Q-value.
    @params:
    state (list of 5 ints): ball_x, ball_y, ball_vx, ball_vy, paddle_y.
      These are the (x,y) position of the ball, the (vx,vy) velocity of the ball,
      and the y-position of the paddle, all quantized.
      0 <= state[i] < state_cardinality[i], for all i in [0,4].
    
    @return:
    action (scalar int): either -1, or 0, or 1.
      The action that has the highest Q-value.  Ties can be broken any way you want.
    Q (scalar float): 
      The Q-value of the selected action



In [18]:
importlib.reload(submitted)
q_learner1 = submitted.q_learner(0.05,0.05,0.99,5,[10,10,2,2,10])
q_learner1.learn([9,9,1,1,9],-1,6.25,[0,0,0,0,0])
print('Q1[9,9,1,1,9] is now',q_learner1.report_q([9,9,1,1,9]))
print('The best action and Q from state [9,9,1,1,9] are',q_learner1.exploit([9,9,1,1,9]))

Q1[9,9,1,1,9] is now [0.     0.     0.3125]
The best action and Q from state [9,9,1,1,9] are (-1, 0.3125)


<a id='section7'></a>

## Acting

When your learner decides which action to perform, it should trade off exploration vs. exploitation using both the epsilon-first and the epsilon-greedy strategies:
1. If there is any action that has been explored fewer than `nfirst` times, then choose one of those actions at random.  Otherwise...
1. With probability `epsilon`, choose an action at random.  Otherwise...
1. Exploit.


In [19]:
importlib.reload(submitted)
help(submitted.q_learner.act)

Help on function act in module submitted:

act(self, state)
    Decide what action to take in the current state.
    If any action has been taken less than nfirst times, then choose one of those
    actions, uniformly at random.
    Otherwise, with probability epsilon, choose an action uniformly at random.
    Otherwise, choose the action with the best Q(state,action).
    
    @params: 
    state: a list of 5 integers: ball_x, ball_y, ball_vx, ball_vy, paddle_y.
      These are the (x,y) position of the ball, the (vx,vy) velocity of the ball,
      and the y-position of the paddle, all quantized.
      0 <= state[i] < state_cardinality[i], for all i in [0,4].
    
    @return:
    -1 if the paddle should move upward
    0 if the paddle should be stationary
    1 if the paddle should move downward



In order to test all three types of action (epsilon-first exploration, epsilon-greedy exploration, and exploitation), let's create a learner with `nfirst=1` and `epsilon=0.25`, and set it so that the best action from state `[9,9,1,1,9]` is `-1`.  With these settings, a sequence of calls to `q_learner.act` should produce the following sequence of actions:

1. The first three actions should include each possible action once.
1. After the first three actions, 3/4 of the remaining actions should be `-1`.  The remaining 1/4 should be randomly chosen.

In [20]:
importlib.reload(submitted)
q_learner=submitted.q_learner(0.05,0.25,0.99,1,[10,10,2,2,10])
q_learner.learn([9,9,1,1,9],-1,6.25,[0,0,0,0,0])
print('An epsilon-first action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-first action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-first action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))
print('An epsilon-greedy explore/exploit action:',q_learner.act([9,9,1,1,9]))

An epsilon-first action: 0
An epsilon-first action: -1
An epsilon-first action: -1
An epsilon-greedy explore/exploit action: -1
An epsilon-greedy explore/exploit action: -1
An epsilon-greedy explore/exploit action: -1
An epsilon-greedy explore/exploit action: -1
An epsilon-greedy explore/exploit action: -1
An epsilon-greedy explore/exploit action: 1
An epsilon-greedy explore/exploit action: -1
An epsilon-greedy explore/exploit action: -1
An epsilon-greedy explore/exploit action: -1
An epsilon-greedy explore/exploit action: -1
An epsilon-greedy explore/exploit action: -1
An epsilon-greedy explore/exploit action: -1


<a id='section8'></a>

## Training

Now that all of your components work, you can try training your algorithm.  Do this by giving your `q_learner` as a player to a new `pong.PongGame` object.  Set `visibility=False` so that the `PongGame` doesn't create a new window.

In [21]:
import pong, importlib, submitted
importlib.reload(pong)
help(pong.PongGame.__init__)

Help on function __init__ in module pong:

__init__(self, ball_speed=4, paddle_speed=8, learner=None, visible=True, state_quantization=[10, 10, 2, 2, 10])
    Create a new pong game, with a specified player.
    
    @params:
    ball_speed (scalar int) - average ball speed in pixels/frame
    paddle_speed (scalar int) - paddle moves 0, +paddle_speed, or -paddle_speed
    learner - can be None if the player is human.  If not None, should be an 
      object of type random_learner, submitted.q_learner, or submitted.deep_q.
    visible (bool) - should this game have an attached pygame window?
    state_quantization (list) - if not None, state variables are quantized
      into integers of these cardinalities before being passed to the learner.



As you can see, we should set `visibility=False` so that the `PongGame` doesn't create a new window.  We should also make sure that the PongGame uses the same state quantization as the learner.

In [22]:
importlib.reload(pong)
importlib.reload(submitted)
state_quantization = [10,10,2,2,10]
q_learner=submitted.q_learner(0.05,0.05,0.99,5,state_quantization)

pong_game = pong.PongGame(learner=q_learner, visible=False, state_quantization=state_quantization)
print(pong_game)

<pong.PongGame object at 0x7f8c12d42a60>


In order to train our learner, we want it to play the game many times.  To do that we use the PongGame.run function:

In [23]:
help(pong_game.run)


Help on method run in module pong:

run(m_rewards=inf, m_games=inf, m_frames=inf, states=[]) method of pong.PongGame instance
    Run the game.
    @param
    m_frames (scalar int): maximum number of frames to be played
    m_rewards (scalar int): maximum number of rewards earned (+ or -)
    m_games (scalar int): maximum number of games
    states (list): list of states whose Q-values should be returned
       each state is a list of 5 ints: ball_x, ball_y, ball_vx, ball_vy, paddle_y.
       These are the (x,y) position of the ball, the (vx,vy) velocity of the ball,
       and the y-position of the paddle, all quantized.
       0 <= state[i] < state_cardinality[i], for all i in [0,4].
    
      
    @return
    scores (list): list of scores of all completed games
    
    The following will be returned only if the player is q_learning or deep_q.
    New elements will be added to these lists once/frame if m_frames is specified,
    else once/reward if m_rewards is specified, else once

In order to make sure our learner is learning, let's tell `pong_game.run` to output all 3 Q-values of all of the 4000 states in every time step.

To make sure that's not an outrageous amount of data, let's tell it to only output the Q values once/reward, and ask it to only collect 5000 rewards:


In [24]:
states = [[x,y,vx,vy,py] for x in range(10) for y in range(10) for vx in range(2) for vy in range(2) for py in range(10) ]

scores, q_achieved, q_states = pong_game.run(m_rewards=500, states=states)

print('The number of games played was',len(scores))
print('The number of rewards was',len(q_states))
print('The size of each returned Q-matrix was',q_states[0].shape)


Completed 0 games, 1 rewards, 209 frames, score 0, max score 0
Completed 1 games, 2 rewards, 417 frames, score 0, max score 0
Completed 2 games, 5 rewards, 912 frames, score 2, max score 2
Completed 3 games, 6 rewards, 1166 frames, score 0, max score 2
Completed 4 games, 7 rewards, 1420 frames, score 0, max score 2
Completed 5 games, 8 rewards, 1674 frames, score 0, max score 2
Completed 6 games, 9 rewards, 1855 frames, score 0, max score 2
Completed 7 games, 10 rewards, 2063 frames, score 0, max score 2
Completed 8 games, 11 rewards, 2317 frames, score 0, max score 2
Completed 9 games, 12 rewards, 2525 frames, score 0, max score 2
Completed 10 games, 13 rewards, 2779 frames, score 0, max score 2
Completed 11 games, 14 rewards, 2987 frames, score 0, max score 2
Completed 12 games, 15 rewards, 3241 frames, score 0, max score 2
Completed 13 games, 20 rewards, 4123 frames, score 4, max score 4
Completed 14 games, 21 rewards, 4377 frames, score 0, max score 4
Completed 15 games, 22 rewards

Completed 127 games, 209 rewards, 47067 frames, score 1, max score 5
Completed 128 games, 212 rewards, 47701 frames, score 2, max score 5
Completed 129 games, 213 rewards, 47955 frames, score 0, max score 5
Completed 130 games, 215 rewards, 48322 frames, score 1, max score 5
Completed 131 games, 220 rewards, 49366 frames, score 4, max score 5
Completed 132 games, 221 rewards, 49620 frames, score 0, max score 5
Completed 133 games, 222 rewards, 49801 frames, score 0, max score 5
Completed 134 games, 223 rewards, 50009 frames, score 0, max score 5
Completed 135 games, 224 rewards, 50217 frames, score 0, max score 5
Completed 136 games, 225 rewards, 50398 frames, score 0, max score 5
Completed 137 games, 226 rewards, 50606 frames, score 0, max score 5
Completed 138 games, 227 rewards, 50860 frames, score 0, max score 5
Completed 139 games, 228 rewards, 51068 frames, score 0, max score 5
Completed 140 games, 229 rewards, 51276 frames, score 0, max score 5
Completed 141 games, 230 rewards, 

Completed 248 games, 437 rewards, 101612 frames, score 6, max score 6
Completed 249 games, 440 rewards, 103636 frames, score 2, max score 6
Completed 250 games, 443 rewards, 104377 frames, score 2, max score 6
Completed 251 games, 451 rewards, 107356 frames, score 7, max score 7
Completed 252 games, 455 rewards, 108110 frames, score 3, max score 7
Completed 253 games, 460 rewards, 109065 frames, score 4, max score 7
Completed 254 games, 461 rewards, 109319 frames, score 0, max score 7
Completed 255 games, 462 rewards, 109527 frames, score 0, max score 7
Completed 256 games, 467 rewards, 110508 frames, score 4, max score 7
Completed 257 games, 471 rewards, 111258 frames, score 3, max score 7
Completed 258 games, 473 rewards, 111721 frames, score 1, max score 7
Completed 259 games, 476 rewards, 112478 frames, score 2, max score 7
Completed 260 games, 480 rewards, 113766 frames, score 3, max score 7
Completed 261 games, 481 rewards, 113947 frames, score 0, max score 7
Completed 262 games,

The returned value of `q_states` is a list of 4000x3 numpy arrays (20 states, 3 actions).  The list contains `m_rewards` of these. We want to convert it into something that matplotlib can plot.  

In [25]:
import numpy as np

Q = np.array([np.reshape(q,-1) for q in q_states])
print('Q is now of shape',Q.shape)
print('the max absolute value of Q is ',np.amax(abs(Q)))

Q is now of shape (500, 12000)
the max absolute value of Q is  2.2079640765998665


In [26]:
%matplotlib inline

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(14,6),layout='tight')
ax = [ fig.add_subplot(2,1,x) for x in range(1,3) ]
ax[0].plot(np.arange(0,len(q_states)),Q)
ax[0].set_title('Q values of all states')
ax[1].plot(np.arange(0,len(q_states)),q_achieved)
ax[1].set_title('Q values of state achieved at each time')
ax[1].set_ylabel('Reward number')

TypeError: __init__() got an unexpected keyword argument 'layout'

OK, now let's try running it for a much longer period -- say, 5000 complete games.  We won't ask it to print out any states this time.

In [27]:
scores, q_achieved, q_states = pong_game.run(m_games=5000, states=[])

print('The number of games played was',len(scores))
print('The number of video frames was',len(q_states))
print('The size of each returned Q-matrix was',q_states[0].shape)


Completed 0 games, 6 rewards, 1084 frames, score 5, max score 8
Completed 1 games, 9 rewards, 1499 frames, score 2, max score 8
Completed 2 games, 12 rewards, 2235 frames, score 2, max score 8
Completed 3 games, 13 rewards, 2416 frames, score 0, max score 8
Completed 4 games, 18 rewards, 3228 frames, score 4, max score 8
Completed 5 games, 23 rewards, 4190 frames, score 4, max score 8
Completed 6 games, 24 rewards, 4444 frames, score 0, max score 8
Completed 7 games, 26 rewards, 4956 frames, score 1, max score 8
Completed 8 games, 28 rewards, 5690 frames, score 1, max score 8
Completed 9 games, 33 rewards, 6869 frames, score 4, max score 8
Completed 10 games, 39 rewards, 8241 frames, score 5, max score 8
Completed 11 games, 43 rewards, 9126 frames, score 3, max score 8
Completed 12 games, 44 rewards, 9334 frames, score 0, max score 8
Completed 13 games, 45 rewards, 9515 frames, score 0, max score 8
Completed 14 games, 50 rewards, 10784 frames, score 4, max score 8
Completed 15 games, 5

Completed 137 games, 294 rewards, 74534 frames, score 5, max score 8
Completed 138 games, 295 rewards, 74742 frames, score 0, max score 8
Completed 139 games, 297 rewards, 75125 frames, score 1, max score 8
Completed 140 games, 303 rewards, 78263 frames, score 5, max score 8
Completed 141 games, 304 rewards, 78517 frames, score 0, max score 8
Completed 142 games, 313 rewards, 80065 frames, score 8, max score 8
Completed 143 games, 314 rewards, 80319 frames, score 0, max score 8
Completed 144 games, 316 rewards, 80921 frames, score 1, max score 8
Completed 145 games, 317 rewards, 81129 frames, score 0, max score 8
Completed 146 games, 321 rewards, 82182 frames, score 3, max score 8
Completed 147 games, 325 rewards, 83480 frames, score 3, max score 8
Completed 148 games, 326 rewards, 83661 frames, score 0, max score 8
Completed 149 games, 328 rewards, 84044 frames, score 1, max score 8
Completed 150 games, 330 rewards, 84778 frames, score 1, max score 8
Completed 151 games, 331 rewards, 

Completed 257 games, 601 rewards, 152741 frames, score 2, max score 10
Completed 258 games, 602 rewards, 152949 frames, score 0, max score 10
Completed 259 games, 604 rewards, 153551 frames, score 1, max score 10
Completed 260 games, 607 rewards, 154187 frames, score 2, max score 10
Completed 261 games, 610 rewards, 155141 frames, score 2, max score 10
Completed 262 games, 614 rewards, 155878 frames, score 3, max score 10
Completed 263 games, 615 rewards, 156059 frames, score 0, max score 10
Completed 264 games, 616 rewards, 156267 frames, score 0, max score 10
Completed 265 games, 617 rewards, 156521 frames, score 0, max score 10
Completed 266 games, 620 rewards, 157240 frames, score 2, max score 10
Completed 267 games, 622 rewards, 157830 frames, score 1, max score 10
Completed 268 games, 623 rewards, 158038 frames, score 0, max score 10
Completed 269 games, 624 rewards, 158219 frames, score 0, max score 10
Completed 270 games, 628 rewards, 158940 frames, score 3, max score 10
Comple

Completed 386 games, 995 rewards, 254573 frames, score 2, max score 10
Completed 387 games, 998 rewards, 255292 frames, score 2, max score 10
Completed 388 games, 1001 rewards, 256240 frames, score 2, max score 10
Completed 389 games, 1005 rewards, 257115 frames, score 3, max score 10
Completed 390 games, 1008 rewards, 258490 frames, score 2, max score 10
Completed 391 games, 1010 rewards, 259409 frames, score 1, max score 10
Completed 392 games, 1011 rewards, 259590 frames, score 0, max score 10
Completed 393 games, 1015 rewards, 260265 frames, score 3, max score 10
Completed 394 games, 1020 rewards, 261463 frames, score 4, max score 10
Completed 395 games, 1025 rewards, 262735 frames, score 4, max score 10
Completed 396 games, 1028 rewards, 263875 frames, score 2, max score 10
Completed 397 games, 1030 rewards, 264216 frames, score 1, max score 10
Completed 398 games, 1035 rewards, 265268 frames, score 4, max score 10
Completed 399 games, 1037 rewards, 265916 frames, score 1, max sco

Completed 505 games, 1400 rewards, 356526 frames, score 7, max score 24
Completed 506 games, 1402 rewards, 357260 frames, score 1, max score 24
Completed 507 games, 1408 rewards, 358103 frames, score 5, max score 24
Completed 508 games, 1413 rewards, 359750 frames, score 4, max score 24
Completed 509 games, 1420 rewards, 361585 frames, score 6, max score 24
Completed 510 games, 1423 rewards, 362358 frames, score 2, max score 24
Completed 511 games, 1425 rewards, 362846 frames, score 1, max score 24
Completed 512 games, 1427 rewards, 363306 frames, score 1, max score 24
Completed 513 games, 1429 rewards, 363857 frames, score 1, max score 24
Completed 514 games, 1431 rewards, 364354 frames, score 1, max score 24
Completed 515 games, 1433 rewards, 364956 frames, score 1, max score 24
Completed 516 games, 1435 rewards, 365490 frames, score 1, max score 24
Completed 517 games, 1437 rewards, 365873 frames, score 1, max score 24
Completed 518 games, 1439 rewards, 366333 frames, score 1, max s

Completed 622 games, 1790 rewards, 452738 frames, score 0, max score 24
Completed 623 games, 1791 rewards, 452919 frames, score 0, max score 24
Completed 624 games, 1793 rewards, 453509 frames, score 1, max score 24
Completed 625 games, 1796 rewards, 454110 frames, score 2, max score 24
Completed 626 games, 1800 rewards, 455294 frames, score 3, max score 24
Completed 627 games, 1806 rewards, 456655 frames, score 5, max score 24
Completed 628 games, 1812 rewards, 457828 frames, score 5, max score 24
Completed 629 games, 1815 rewards, 458527 frames, score 2, max score 24
Completed 630 games, 1817 rewards, 458941 frames, score 1, max score 24
Completed 631 games, 1819 rewards, 459308 frames, score 1, max score 24
Completed 632 games, 1820 rewards, 459489 frames, score 0, max score 24
Completed 633 games, 1823 rewards, 460077 frames, score 2, max score 24
Completed 634 games, 1825 rewards, 460739 frames, score 1, max score 24
Completed 635 games, 1826 rewards, 460947 frames, score 0, max s

Completed 739 games, 2240 rewards, 577687 frames, score 7, max score 24
Completed 740 games, 2242 rewards, 578070 frames, score 1, max score 24
Completed 741 games, 2244 rewards, 578613 frames, score 1, max score 24
Completed 742 games, 2245 rewards, 578794 frames, score 0, max score 24
Completed 743 games, 2249 rewards, 579708 frames, score 3, max score 24
Completed 744 games, 2251 rewards, 580196 frames, score 1, max score 24
Completed 745 games, 2254 rewards, 581005 frames, score 2, max score 24
Completed 746 games, 2259 rewards, 582789 frames, score 4, max score 24
Completed 747 games, 2263 rewards, 583605 frames, score 3, max score 24
Completed 748 games, 2266 rewards, 584246 frames, score 2, max score 24
Completed 749 games, 2269 rewards, 584972 frames, score 2, max score 24
Completed 750 games, 2272 rewards, 585619 frames, score 2, max score 24
Completed 751 games, 2278 rewards, 587449 frames, score 5, max score 24
Completed 752 games, 2280 rewards, 587863 frames, score 1, max s

Completed 855 games, 2762 rewards, 719348 frames, score 1, max score 25
Completed 856 games, 2768 rewards, 720330 frames, score 5, max score 25
Completed 857 games, 2778 rewards, 722576 frames, score 9, max score 25
Completed 858 games, 2780 rewards, 723065 frames, score 1, max score 25
Completed 859 games, 2783 rewards, 723703 frames, score 2, max score 25
Completed 860 games, 2785 rewards, 724140 frames, score 1, max score 25
Completed 861 games, 2790 rewards, 725134 frames, score 4, max score 25
Completed 862 games, 2794 rewards, 725915 frames, score 3, max score 25
Completed 863 games, 2795 rewards, 726096 frames, score 0, max score 25
Completed 864 games, 2800 rewards, 727538 frames, score 4, max score 25
Completed 865 games, 2804 rewards, 728363 frames, score 3, max score 25
Completed 866 games, 2806 rewards, 728746 frames, score 1, max score 25
Completed 867 games, 2814 rewards, 730605 frames, score 7, max score 25
Completed 868 games, 2816 rewards, 731005 frames, score 1, max s

Completed 975 games, 3223 rewards, 832600 frames, score 10, max score 25
Completed 976 games, 3227 rewards, 833877 frames, score 3, max score 25
Completed 977 games, 3229 rewards, 834611 frames, score 1, max score 25
Completed 978 games, 3240 rewards, 836493 frames, score 10, max score 25
Completed 979 games, 3242 rewards, 837412 frames, score 1, max score 25
Completed 980 games, 3247 rewards, 838364 frames, score 4, max score 25
Completed 981 games, 3250 rewards, 838952 frames, score 2, max score 25
Completed 982 games, 3253 rewards, 839452 frames, score 2, max score 25
Completed 983 games, 3255 rewards, 839930 frames, score 1, max score 25
Completed 984 games, 3265 rewards, 843458 frames, score 9, max score 25
Completed 985 games, 3267 rewards, 843992 frames, score 1, max score 25
Completed 986 games, 3275 rewards, 846178 frames, score 7, max score 25
Completed 987 games, 3278 rewards, 847226 frames, score 2, max score 25
Completed 988 games, 3283 rewards, 848728 frames, score 4, max

Completed 1093 games, 3889 rewards, 1013115 frames, score 6, max score 27
Completed 1094 games, 3895 rewards, 1014235 frames, score 5, max score 27
Completed 1095 games, 3900 rewards, 1015247 frames, score 4, max score 27
Completed 1096 games, 3908 rewards, 1016763 frames, score 7, max score 27
Completed 1097 games, 3909 rewards, 1016971 frames, score 0, max score 27
Completed 1098 games, 3917 rewards, 1019060 frames, score 7, max score 27
Completed 1099 games, 3927 rewards, 1022376 frames, score 9, max score 27
Completed 1100 games, 3930 rewards, 1023278 frames, score 2, max score 27
Completed 1101 games, 3933 rewards, 1023825 frames, score 2, max score 27
Completed 1102 games, 3944 rewards, 1026956 frames, score 10, max score 27
Completed 1103 games, 3956 rewards, 1028780 frames, score 11, max score 27
Completed 1104 games, 3962 rewards, 1029933 frames, score 5, max score 27
Completed 1105 games, 3965 rewards, 1030887 frames, score 2, max score 27
Completed 1106 games, 3968 rewards, 

Completed 1216 games, 4505 rewards, 1160089 frames, score 5, max score 27
Completed 1217 games, 4510 rewards, 1161030 frames, score 4, max score 27
Completed 1218 games, 4515 rewards, 1162084 frames, score 4, max score 27
Completed 1219 games, 4517 rewards, 1162436 frames, score 1, max score 27
Completed 1220 games, 4518 rewards, 1162690 frames, score 0, max score 27
Completed 1221 games, 4525 rewards, 1164172 frames, score 6, max score 27
Completed 1222 games, 4527 rewards, 1164555 frames, score 1, max score 27
Completed 1223 games, 4530 rewards, 1165312 frames, score 2, max score 27
Completed 1224 games, 4533 rewards, 1166323 frames, score 2, max score 27
Completed 1225 games, 4535 rewards, 1166820 frames, score 1, max score 27
Completed 1226 games, 4545 rewards, 1168520 frames, score 9, max score 27
Completed 1227 games, 4548 rewards, 1169085 frames, score 2, max score 27
Completed 1228 games, 4552 rewards, 1169777 frames, score 3, max score 27
Completed 1229 games, 4554 rewards, 11

Completed 1334 games, 5129 rewards, 1317498 frames, score 2, max score 27
Completed 1335 games, 5132 rewards, 1319537 frames, score 2, max score 27
Completed 1336 games, 5141 rewards, 1323095 frames, score 8, max score 27
Completed 1337 games, 5144 rewards, 1323991 frames, score 2, max score 27
Completed 1338 games, 5145 rewards, 1324199 frames, score 0, max score 27
Completed 1339 games, 5146 rewards, 1324407 frames, score 0, max score 27
Completed 1340 games, 5148 rewards, 1324861 frames, score 1, max score 27
Completed 1341 games, 5153 rewards, 1325989 frames, score 4, max score 27
Completed 1342 games, 5155 rewards, 1326540 frames, score 1, max score 27
Completed 1343 games, 5159 rewards, 1327214 frames, score 3, max score 27
Completed 1344 games, 5162 rewards, 1328324 frames, score 2, max score 27
Completed 1345 games, 5168 rewards, 1329300 frames, score 5, max score 27
Completed 1346 games, 5174 rewards, 1331981 frames, score 5, max score 27
Completed 1347 games, 5184 rewards, 13

Completed 1445 games, 5807 rewards, 1474320 frames, score 9, max score 30
Completed 1446 games, 5814 rewards, 1476558 frames, score 6, max score 30
Completed 1447 games, 5817 rewards, 1477872 frames, score 2, max score 30
Completed 1448 games, 5819 rewards, 1478423 frames, score 1, max score 30
Completed 1449 games, 5822 rewards, 1479104 frames, score 2, max score 30
Completed 1450 games, 5823 rewards, 1479312 frames, score 0, max score 30
Completed 1451 games, 5827 rewards, 1480269 frames, score 3, max score 30
Completed 1452 games, 5830 rewards, 1480821 frames, score 2, max score 30
Completed 1453 games, 5847 rewards, 1484104 frames, score 16, max score 30
Completed 1454 games, 5852 rewards, 1486089 frames, score 4, max score 30
Completed 1455 games, 5855 rewards, 1486715 frames, score 2, max score 30
Completed 1456 games, 5876 rewards, 1490284 frames, score 20, max score 30
Completed 1457 games, 5877 rewards, 1490492 frames, score 0, max score 30
Completed 1458 games, 5886 rewards, 

Completed 1561 games, 6433 rewards, 1628953 frames, score 17, max score 30
Completed 1562 games, 6442 rewards, 1632300 frames, score 8, max score 30
Completed 1563 games, 6449 rewards, 1633532 frames, score 6, max score 30
Completed 1564 games, 6452 rewards, 1634071 frames, score 2, max score 30
Completed 1565 games, 6458 rewards, 1635241 frames, score 5, max score 30
Completed 1566 games, 6465 rewards, 1636777 frames, score 6, max score 30
Completed 1567 games, 6470 rewards, 1637838 frames, score 4, max score 30
Completed 1568 games, 6478 rewards, 1639164 frames, score 7, max score 30
Completed 1569 games, 6488 rewards, 1640873 frames, score 9, max score 30
Completed 1570 games, 6491 rewards, 1641377 frames, score 2, max score 30
Completed 1571 games, 6495 rewards, 1642328 frames, score 3, max score 30
Completed 1572 games, 6499 rewards, 1643031 frames, score 3, max score 30
Completed 1573 games, 6513 rewards, 1645579 frames, score 13, max score 30
Completed 1574 games, 6514 rewards, 

Completed 1673 games, 7164 rewards, 1821172 frames, score 1, max score 32
Completed 1674 games, 7165 rewards, 1821380 frames, score 0, max score 32
Completed 1675 games, 7171 rewards, 1822723 frames, score 5, max score 32
Completed 1676 games, 7184 rewards, 1826184 frames, score 12, max score 32
Completed 1677 games, 7186 rewards, 1826918 frames, score 1, max score 32
Completed 1678 games, 7189 rewards, 1827593 frames, score 2, max score 32
Completed 1679 games, 7192 rewards, 1828604 frames, score 2, max score 32
Completed 1680 games, 7195 rewards, 1829406 frames, score 2, max score 32
Completed 1681 games, 7196 rewards, 1829587 frames, score 0, max score 32
Completed 1682 games, 7198 rewards, 1829993 frames, score 1, max score 32
Completed 1683 games, 7201 rewards, 1830672 frames, score 2, max score 32
Completed 1684 games, 7203 rewards, 1831077 frames, score 1, max score 32
Completed 1685 games, 7210 rewards, 1832289 frames, score 6, max score 32
Completed 1686 games, 7213 rewards, 1

Completed 1786 games, 7715 rewards, 1955521 frames, score 0, max score 32
Completed 1787 games, 7717 rewards, 1955873 frames, score 1, max score 32
Completed 1788 games, 7718 rewards, 1956054 frames, score 0, max score 32
Completed 1789 games, 7727 rewards, 1957599 frames, score 8, max score 32
Completed 1790 games, 7737 rewards, 1959208 frames, score 9, max score 32
Completed 1791 games, 7741 rewards, 1959866 frames, score 3, max score 32
Completed 1792 games, 7744 rewards, 1960372 frames, score 2, max score 32
Completed 1793 games, 7749 rewards, 1961580 frames, score 4, max score 32
Completed 1794 games, 7752 rewards, 1962134 frames, score 2, max score 32
Completed 1795 games, 7754 rewards, 1962622 frames, score 1, max score 32
Completed 1796 games, 7758 rewards, 1963676 frames, score 3, max score 32
Completed 1797 games, 7767 rewards, 1965064 frames, score 8, max score 32
Completed 1798 games, 7769 rewards, 1965552 frames, score 1, max score 32
Completed 1799 games, 7773 rewards, 19

Completed 1900 games, 8342 rewards, 2102929 frames, score 10, max score 32
Completed 1901 games, 8355 rewards, 2105077 frames, score 12, max score 32
Completed 1902 games, 8359 rewards, 2105804 frames, score 3, max score 32
Completed 1903 games, 8363 rewards, 2106622 frames, score 3, max score 32
Completed 1904 games, 8370 rewards, 2107879 frames, score 6, max score 32
Completed 1905 games, 8377 rewards, 2109051 frames, score 6, max score 32
Completed 1906 games, 8384 rewards, 2110107 frames, score 6, max score 32
Completed 1907 games, 8387 rewards, 2110970 frames, score 2, max score 32
Completed 1908 games, 8389 rewards, 2111368 frames, score 1, max score 32
Completed 1909 games, 8390 rewards, 2111576 frames, score 0, max score 32
Completed 1910 games, 8393 rewards, 2112341 frames, score 2, max score 32
Completed 1911 games, 8397 rewards, 2113435 frames, score 3, max score 32
Completed 1912 games, 8403 rewards, 2114372 frames, score 5, max score 32
Completed 1913 games, 8411 rewards, 

Completed 2014 games, 9016 rewards, 2265986 frames, score 11, max score 32
Completed 2015 games, 9021 rewards, 2267033 frames, score 4, max score 32
Completed 2016 games, 9022 rewards, 2267287 frames, score 0, max score 32
Completed 2017 games, 9026 rewards, 2267973 frames, score 3, max score 32
Completed 2018 games, 9029 rewards, 2269113 frames, score 2, max score 32
Completed 2019 games, 9030 rewards, 2269294 frames, score 0, max score 32
Completed 2020 games, 9035 rewards, 2270710 frames, score 4, max score 32
Completed 2021 games, 9036 rewards, 2270918 frames, score 0, max score 32
Completed 2022 games, 9037 rewards, 2271172 frames, score 0, max score 32
Completed 2023 games, 9049 rewards, 2272855 frames, score 11, max score 32
Completed 2024 games, 9053 rewards, 2273795 frames, score 3, max score 32
Completed 2025 games, 9061 rewards, 2275044 frames, score 7, max score 32
Completed 2026 games, 9064 rewards, 2275659 frames, score 2, max score 32
Completed 2027 games, 9068 rewards, 

Completed 2125 games, 9942 rewards, 2475054 frames, score 6, max score 44
Completed 2126 games, 9946 rewards, 2476343 frames, score 3, max score 44
Completed 2127 games, 9953 rewards, 2477525 frames, score 6, max score 44
Completed 2128 games, 9956 rewards, 2478410 frames, score 2, max score 44
Completed 2129 games, 9978 rewards, 2482018 frames, score 21, max score 44
Completed 2130 games, 9986 rewards, 2484145 frames, score 7, max score 44
Completed 2131 games, 9995 rewards, 2485712 frames, score 8, max score 44
Completed 2132 games, 9999 rewards, 2486494 frames, score 3, max score 44
Completed 2133 games, 10007 rewards, 2489236 frames, score 7, max score 44
Completed 2134 games, 10027 rewards, 2492723 frames, score 19, max score 44
Completed 2135 games, 10032 rewards, 2493904 frames, score 4, max score 44
Completed 2136 games, 10049 rewards, 2496771 frames, score 16, max score 44
Completed 2137 games, 10060 rewards, 2499514 frames, score 10, max score 44
Completed 2138 games, 10061 r

Completed 2234 games, 11530 rewards, 2800953 frames, score 22, max score 80
Completed 2235 games, 11543 rewards, 2802865 frames, score 12, max score 80
Completed 2236 games, 11546 rewards, 2803455 frames, score 2, max score 80
Completed 2237 games, 11563 rewards, 2806255 frames, score 16, max score 80
Completed 2238 games, 11571 rewards, 2808277 frames, score 7, max score 80
Completed 2239 games, 11590 rewards, 2812030 frames, score 18, max score 80
Completed 2240 games, 11613 rewards, 2815665 frames, score 22, max score 80
Completed 2241 games, 11622 rewards, 2817038 frames, score 8, max score 80
Completed 2242 games, 11626 rewards, 2817900 frames, score 3, max score 80
Completed 2243 games, 11653 rewards, 2825589 frames, score 26, max score 80
Completed 2244 games, 11658 rewards, 2826824 frames, score 4, max score 80
Completed 2245 games, 11669 rewards, 2828622 frames, score 10, max score 80
Completed 2246 games, 11686 rewards, 2831680 frames, score 16, max score 80
Completed 2247 ga

Completed 2348 games, 13362 rewards, 3221660 frames, score 11, max score 94
Completed 2349 games, 13365 rewards, 3222303 frames, score 2, max score 94
Completed 2350 games, 13376 rewards, 3224988 frames, score 10, max score 94
Completed 2351 games, 13379 rewards, 3225645 frames, score 2, max score 94
Completed 2352 games, 13396 rewards, 3228991 frames, score 16, max score 94
Completed 2353 games, 13397 rewards, 3229245 frames, score 0, max score 94
Completed 2354 games, 13412 rewards, 3235402 frames, score 14, max score 94
Completed 2355 games, 13414 rewards, 3235770 frames, score 1, max score 94
Completed 2356 games, 13435 rewards, 3239461 frames, score 20, max score 94
Completed 2357 games, 13437 rewards, 3239859 frames, score 1, max score 94
Completed 2358 games, 13465 rewards, 3244967 frames, score 27, max score 94
Completed 2359 games, 13480 rewards, 3247422 frames, score 14, max score 94
Completed 2360 games, 13489 rewards, 3249572 frames, score 8, max score 94
Completed 2361 gam

Completed 2458 games, 14972 rewards, 3562809 frames, score 13, max score 94
Completed 2459 games, 14982 rewards, 3565112 frames, score 9, max score 94
Completed 2460 games, 14984 rewards, 3566031 frames, score 1, max score 94
Completed 2461 games, 14989 rewards, 3567566 frames, score 4, max score 94
Completed 2462 games, 15070 rewards, 3593443 frames, score 80, max score 94
Completed 2463 games, 15117 rewards, 3604469 frames, score 46, max score 94
Completed 2464 games, 15142 rewards, 3615701 frames, score 24, max score 94
Completed 2465 games, 15149 rewards, 3616920 frames, score 6, max score 94
Completed 2466 games, 15155 rewards, 3618171 frames, score 5, max score 94
Completed 2467 games, 15168 rewards, 3620692 frames, score 12, max score 94
Completed 2468 games, 15192 rewards, 3625944 frames, score 23, max score 94
Completed 2469 games, 15214 rewards, 3631340 frames, score 21, max score 94
Completed 2470 games, 15223 rewards, 3633343 frames, score 8, max score 94
Completed 2471 gam

Completed 2571 games, 16603 rewards, 3988919 frames, score 19, max score 94
Completed 2572 games, 16627 rewards, 3993211 frames, score 23, max score 94
Completed 2573 games, 16663 rewards, 4002943 frames, score 35, max score 94
Completed 2574 games, 16673 rewards, 4004690 frames, score 9, max score 94
Completed 2575 games, 16682 rewards, 4006134 frames, score 8, max score 94
Completed 2576 games, 16685 rewards, 4006614 frames, score 2, max score 94
Completed 2577 games, 16706 rewards, 4010261 frames, score 20, max score 94
Completed 2578 games, 16714 rewards, 4011640 frames, score 7, max score 94
Completed 2579 games, 16716 rewards, 4012374 frames, score 1, max score 94
Completed 2580 games, 16717 rewards, 4012628 frames, score 0, max score 94
Completed 2581 games, 16721 rewards, 4013397 frames, score 3, max score 94
Completed 2582 games, 16744 rewards, 4018378 frames, score 22, max score 94
Completed 2583 games, 16749 rewards, 4019200 frames, score 4, max score 94
Completed 2584 games

Completed 2693 games, 17866 rewards, 4292065 frames, score 12, max score 94
Completed 2694 games, 17869 rewards, 4292566 frames, score 2, max score 94
Completed 2695 games, 17876 rewards, 4293957 frames, score 6, max score 94
Completed 2696 games, 17891 rewards, 4299720 frames, score 14, max score 94
Completed 2697 games, 17893 rewards, 4300072 frames, score 1, max score 94
Completed 2698 games, 17906 rewards, 4303042 frames, score 12, max score 94
Completed 2699 games, 17908 rewards, 4303407 frames, score 1, max score 94
Completed 2700 games, 17910 rewards, 4303844 frames, score 1, max score 94
Completed 2701 games, 17922 rewards, 4306037 frames, score 11, max score 94
Completed 2702 games, 17929 rewards, 4309833 frames, score 6, max score 94
Completed 2703 games, 17936 rewards, 4311069 frames, score 6, max score 94
Completed 2704 games, 17941 rewards, 4311976 frames, score 4, max score 94
Completed 2705 games, 17949 rewards, 4313655 frames, score 7, max score 94
Completed 2706 games,

Completed 2806 games, 18422 rewards, 4434593 frames, score 6, max score 94
Completed 2807 games, 18431 rewards, 4436262 frames, score 8, max score 94
Completed 2808 games, 18436 rewards, 4437428 frames, score 4, max score 94
Completed 2809 games, 18440 rewards, 4438442 frames, score 3, max score 94
Completed 2810 games, 18445 rewards, 4441198 frames, score 4, max score 94
Completed 2811 games, 18446 rewards, 4441452 frames, score 0, max score 94
Completed 2812 games, 18450 rewards, 4442204 frames, score 3, max score 94
Completed 2813 games, 18463 rewards, 4444773 frames, score 12, max score 94
Completed 2814 games, 18467 rewards, 4445569 frames, score 3, max score 94
Completed 2815 games, 18471 rewards, 4446686 frames, score 3, max score 94
Completed 2816 games, 18476 rewards, 4447531 frames, score 4, max score 94
Completed 2817 games, 18480 rewards, 4448187 frames, score 3, max score 94
Completed 2818 games, 18481 rewards, 4448395 frames, score 0, max score 94
Completed 2819 games, 18

Completed 2923 games, 19102 rewards, 4604092 frames, score 26, max score 94
Completed 2924 games, 19108 rewards, 4605079 frames, score 5, max score 94
Completed 2925 games, 19120 rewards, 4608511 frames, score 11, max score 94
Completed 2926 games, 19124 rewards, 4609869 frames, score 3, max score 94
Completed 2927 games, 19138 rewards, 4614205 frames, score 13, max score 94
Completed 2928 games, 19141 rewards, 4615035 frames, score 2, max score 94
Completed 2929 games, 19146 rewards, 4616403 frames, score 4, max score 94
Completed 2930 games, 19151 rewards, 4617780 frames, score 4, max score 94
Completed 2931 games, 19157 rewards, 4619024 frames, score 5, max score 94
Completed 2932 games, 19159 rewards, 4619387 frames, score 1, max score 94
Completed 2933 games, 19167 rewards, 4621069 frames, score 7, max score 94
Completed 2934 games, 19176 rewards, 4624406 frames, score 8, max score 94
Completed 2935 games, 19184 rewards, 4627580 frames, score 7, max score 94
Completed 2936 games, 

Completed 3042 games, 19783 rewards, 4785328 frames, score 7, max score 94
Completed 3043 games, 19785 rewards, 4785879 frames, score 1, max score 94
Completed 3044 games, 19786 rewards, 4786133 frames, score 0, max score 94
Completed 3045 games, 19790 rewards, 4786923 frames, score 3, max score 94
Completed 3046 games, 19795 rewards, 4788266 frames, score 4, max score 94
Completed 3047 games, 19796 rewards, 4788474 frames, score 0, max score 94
Completed 3048 games, 19802 rewards, 4789622 frames, score 5, max score 94
Completed 3049 games, 19804 rewards, 4790028 frames, score 1, max score 94
Completed 3050 games, 19806 rewards, 4790947 frames, score 1, max score 94
Completed 3051 games, 19809 rewards, 4791599 frames, score 2, max score 94
Completed 3052 games, 19815 rewards, 4792558 frames, score 5, max score 94
Completed 3053 games, 19820 rewards, 4793343 frames, score 4, max score 94
Completed 3054 games, 19829 rewards, 4795116 frames, score 8, max score 94
Completed 3055 games, 198

Completed 3157 games, 20332 rewards, 4922816 frames, score 5, max score 94
Completed 3158 games, 20340 rewards, 4924254 frames, score 7, max score 94
Completed 3159 games, 20346 rewards, 4925694 frames, score 5, max score 94
Completed 3160 games, 20349 rewards, 4926228 frames, score 2, max score 94
Completed 3161 games, 20352 rewards, 4926811 frames, score 2, max score 94
Completed 3162 games, 20356 rewards, 4929249 frames, score 3, max score 94
Completed 3163 games, 20357 rewards, 4929457 frames, score 0, max score 94
Completed 3164 games, 20361 rewards, 4930309 frames, score 3, max score 94
Completed 3165 games, 20367 rewards, 4931328 frames, score 5, max score 94
Completed 3166 games, 20368 rewards, 4931582 frames, score 0, max score 94
Completed 3167 games, 20370 rewards, 4932316 frames, score 1, max score 94
Completed 3168 games, 20372 rewards, 4932722 frames, score 1, max score 94
Completed 3169 games, 20385 rewards, 4939459 frames, score 12, max score 94
Completed 3170 games, 20

Completed 3270 games, 20899 rewards, 5077382 frames, score 6, max score 94
Completed 3271 games, 20901 rewards, 5078044 frames, score 1, max score 94
Completed 3272 games, 20906 rewards, 5078842 frames, score 4, max score 94
Completed 3273 games, 20908 rewards, 5079279 frames, score 1, max score 94
Completed 3274 games, 20910 rewards, 5079869 frames, score 1, max score 94
Completed 3275 games, 20916 rewards, 5080909 frames, score 5, max score 94
Completed 3276 games, 20918 rewards, 5081643 frames, score 1, max score 94
Completed 3277 games, 20919 rewards, 5081897 frames, score 0, max score 94
Completed 3278 games, 20922 rewards, 5082508 frames, score 2, max score 94
Completed 3279 games, 20923 rewards, 5082689 frames, score 0, max score 94
Completed 3280 games, 20931 rewards, 5085518 frames, score 7, max score 94
Completed 3281 games, 20933 rewards, 5086069 frames, score 1, max score 94
Completed 3282 games, 20936 rewards, 5086560 frames, score 2, max score 94
Completed 3283 games, 209

Completed 3387 games, 21569 rewards, 5243657 frames, score 4, max score 94
Completed 3388 games, 21576 rewards, 5245119 frames, score 6, max score 94
Completed 3389 games, 21581 rewards, 5246127 frames, score 4, max score 94
Completed 3390 games, 21586 rewards, 5247213 frames, score 4, max score 94
Completed 3391 games, 21594 rewards, 5248975 frames, score 7, max score 94
Completed 3392 games, 21599 rewards, 5249783 frames, score 4, max score 94
Completed 3393 games, 21609 rewards, 5251484 frames, score 9, max score 94
Completed 3394 games, 21622 rewards, 5253538 frames, score 12, max score 94
Completed 3395 games, 21626 rewards, 5254366 frames, score 3, max score 94
Completed 3396 games, 21634 rewards, 5255763 frames, score 7, max score 94
Completed 3397 games, 21645 rewards, 5257750 frames, score 10, max score 94
Completed 3398 games, 21662 rewards, 5260649 frames, score 16, max score 94
Completed 3399 games, 21665 rewards, 5261564 frames, score 2, max score 94
Completed 3400 games, 

Completed 3500 games, 22519 rewards, 5470833 frames, score 17, max score 94
Completed 3501 games, 22539 rewards, 5482420 frames, score 19, max score 94
Completed 3502 games, 22558 rewards, 5485438 frames, score 18, max score 94
Completed 3503 games, 22564 rewards, 5486528 frames, score 5, max score 94
Completed 3504 games, 22573 rewards, 5488287 frames, score 8, max score 94
Completed 3505 games, 22595 rewards, 5491556 frames, score 21, max score 94
Completed 3506 games, 22608 rewards, 5495371 frames, score 12, max score 94
Completed 3507 games, 22624 rewards, 5498797 frames, score 15, max score 94
Completed 3508 games, 22636 rewards, 5501325 frames, score 11, max score 94
Completed 3509 games, 22655 rewards, 5504450 frames, score 18, max score 94
Completed 3510 games, 22687 rewards, 5516768 frames, score 31, max score 94
Completed 3511 games, 22711 rewards, 5521460 frames, score 23, max score 94
Completed 3512 games, 22713 rewards, 5521920 frames, score 1, max score 94
Completed 3513 

Completed 3608 games, 25896 rewards, 6159738 frames, score 160, max score 223
Completed 3609 games, 25926 rewards, 6165898 frames, score 29, max score 223
Completed 3610 games, 25972 rewards, 6173233 frames, score 45, max score 223
Completed 3611 games, 25979 rewards, 6174991 frames, score 6, max score 223
Completed 3612 games, 26091 rewards, 6193196 frames, score 111, max score 223
Completed 3613 games, 26093 rewards, 6193930 frames, score 1, max score 223
Completed 3614 games, 26224 rewards, 6215810 frames, score 130, max score 223
Completed 3615 games, 26246 rewards, 6225888 frames, score 21, max score 223
Completed 3616 games, 26259 rewards, 6228701 frames, score 12, max score 223
Completed 3617 games, 26394 rewards, 6252040 frames, score 134, max score 223
Completed 3618 games, 26430 rewards, 6257783 frames, score 35, max score 223
Completed 3619 games, 26472 rewards, 6269139 frames, score 41, max score 223
Completed 3620 games, 26478 rewards, 6270124 frames, score 5, max score 22

Completed 3716 games, 29753 rewards, 6904347 frames, score 32, max score 236
Completed 3717 games, 29773 rewards, 6911444 frames, score 19, max score 236
Completed 3718 games, 29796 rewards, 6915090 frames, score 22, max score 236
Completed 3719 games, 29799 rewards, 6915884 frames, score 2, max score 236
Completed 3720 games, 29818 rewards, 6919039 frames, score 18, max score 236
Completed 3721 games, 29837 rewards, 6923210 frames, score 18, max score 236
Completed 3722 games, 29851 rewards, 6931729 frames, score 13, max score 236
Completed 3723 games, 29938 rewards, 6945381 frames, score 86, max score 236
Completed 3724 games, 29983 rewards, 6952389 frames, score 44, max score 236
Completed 3725 games, 30006 rewards, 6956903 frames, score 22, max score 236
Completed 3726 games, 30061 rewards, 6966017 frames, score 54, max score 236
Completed 3727 games, 30067 rewards, 6967805 frames, score 5, max score 236
Completed 3728 games, 30079 rewards, 6969870 frames, score 11, max score 236
C

Completed 3826 games, 32607 rewards, 7461705 frames, score 59, max score 236
Completed 3827 games, 32741 rewards, 7481754 frames, score 133, max score 236
Completed 3828 games, 32817 rewards, 7496858 frames, score 75, max score 236
Completed 3829 games, 32933 rewards, 7514053 frames, score 115, max score 236
Completed 3830 games, 32967 rewards, 7519182 frames, score 33, max score 236
Completed 3831 games, 32972 rewards, 7520191 frames, score 4, max score 236
Completed 3832 games, 32983 rewards, 7522636 frames, score 10, max score 236
Completed 3833 games, 33026 rewards, 7532116 frames, score 42, max score 236
Completed 3834 games, 33031 rewards, 7532968 frames, score 4, max score 236
Completed 3835 games, 33050 rewards, 7535881 frames, score 18, max score 236
Completed 3836 games, 33072 rewards, 7539373 frames, score 21, max score 236
Completed 3837 games, 33151 rewards, 7555128 frames, score 78, max score 236
Completed 3838 games, 33154 rewards, 7555885 frames, score 2, max score 236


Completed 3937 games, 35746 rewards, 8065774 frames, score 5, max score 236
Completed 3938 games, 35766 rewards, 8070295 frames, score 19, max score 236
Completed 3939 games, 35792 rewards, 8080673 frames, score 25, max score 236
Completed 3940 games, 35815 rewards, 8089705 frames, score 22, max score 236
Completed 3941 games, 35816 rewards, 8089959 frames, score 0, max score 236
Completed 3942 games, 35819 rewards, 8090526 frames, score 2, max score 236
Completed 3943 games, 35824 rewards, 8092225 frames, score 4, max score 236
Completed 3944 games, 35850 rewards, 8104532 frames, score 25, max score 236
Completed 3945 games, 35855 rewards, 8105519 frames, score 4, max score 236
Completed 3946 games, 35856 rewards, 8105727 frames, score 0, max score 236
Completed 3947 games, 35863 rewards, 8108361 frames, score 6, max score 236
Completed 3948 games, 35869 rewards, 8109502 frames, score 5, max score 236
Completed 3949 games, 35882 rewards, 8117375 frames, score 12, max score 236
Complet

Completed 4048 games, 36954 rewards, 8355173 frames, score 63, max score 236
Completed 4049 games, 36971 rewards, 8357727 frames, score 16, max score 236
Completed 4050 games, 36977 rewards, 8358872 frames, score 5, max score 236
Completed 4051 games, 36983 rewards, 8359935 frames, score 5, max score 236
Completed 4052 games, 37011 rewards, 8364354 frames, score 27, max score 236
Completed 4053 games, 37021 rewards, 8366589 frames, score 9, max score 236
Completed 4054 games, 37035 rewards, 8368763 frames, score 13, max score 236
Completed 4055 games, 37037 rewards, 8369682 frames, score 1, max score 236
Completed 4056 games, 37041 rewards, 8370437 frames, score 3, max score 236
Completed 4057 games, 37051 rewards, 8373669 frames, score 9, max score 236
Completed 4058 games, 37052 rewards, 8373923 frames, score 0, max score 236
Completed 4059 games, 37079 rewards, 8379299 frames, score 26, max score 236
Completed 4060 games, 37081 rewards, 8379727 frames, score 1, max score 236
Complet

Completed 4160 games, 38448 rewards, 8670826 frames, score 10, max score 236
Completed 4161 games, 38481 rewards, 8680353 frames, score 32, max score 236
Completed 4162 games, 38487 rewards, 8681294 frames, score 5, max score 236
Completed 4163 games, 38496 rewards, 8682942 frames, score 8, max score 236
Completed 4164 games, 38534 rewards, 8693312 frames, score 37, max score 236
Completed 4165 games, 38542 rewards, 8694840 frames, score 7, max score 236
Completed 4166 games, 38555 rewards, 8697409 frames, score 12, max score 236
Completed 4167 games, 38565 rewards, 8699031 frames, score 9, max score 236
Completed 4168 games, 38581 rewards, 8702376 frames, score 15, max score 236
Completed 4169 games, 38589 rewards, 8704690 frames, score 7, max score 236
Completed 4170 games, 38605 rewards, 8709569 frames, score 15, max score 236
Completed 4171 games, 38606 rewards, 8709777 frames, score 0, max score 236
Completed 4172 games, 38624 rewards, 8712696 frames, score 17, max score 236
Compl

Completed 4268 games, 39693 rewards, 8994325 frames, score 11, max score 236
Completed 4269 games, 39700 rewards, 8996184 frames, score 6, max score 236
Completed 4270 games, 39718 rewards, 8998847 frames, score 17, max score 236
Completed 4271 games, 39731 rewards, 9001014 frames, score 12, max score 236
Completed 4272 games, 39732 rewards, 9001195 frames, score 0, max score 236
Completed 4273 games, 39736 rewards, 9001961 frames, score 3, max score 236
Completed 4274 games, 39744 rewards, 9003234 frames, score 7, max score 236
Completed 4275 games, 39758 rewards, 9005818 frames, score 13, max score 236
Completed 4276 games, 39767 rewards, 9007656 frames, score 8, max score 236
Completed 4277 games, 39769 rewards, 9008054 frames, score 1, max score 236
Completed 4278 games, 39771 rewards, 9008415 frames, score 1, max score 236
Completed 4279 games, 39773 rewards, 9008852 frames, score 1, max score 236
Completed 4280 games, 39797 rewards, 9022803 frames, score 23, max score 236
Complet

Completed 4376 games, 40578 rewards, 9226387 frames, score 23, max score 236
Completed 4377 games, 40591 rewards, 9228459 frames, score 12, max score 236
Completed 4378 games, 40595 rewards, 9229191 frames, score 3, max score 236
Completed 4379 games, 40635 rewards, 9239301 frames, score 39, max score 236
Completed 4380 games, 40667 rewards, 9244586 frames, score 31, max score 236
Completed 4381 games, 40676 rewards, 9246204 frames, score 8, max score 236
Completed 4382 games, 40692 rewards, 9249932 frames, score 15, max score 236
Completed 4383 games, 40728 rewards, 9258557 frames, score 35, max score 236
Completed 4384 games, 40736 rewards, 9262358 frames, score 7, max score 236
Completed 4385 games, 40751 rewards, 9264892 frames, score 14, max score 236
Completed 4386 games, 40758 rewards, 9266422 frames, score 6, max score 236
Completed 4387 games, 40767 rewards, 9267889 frames, score 8, max score 236
Completed 4388 games, 40774 rewards, 9269509 frames, score 6, max score 236
Compl

Completed 4487 games, 41669 rewards, 9501057 frames, score 17, max score 236
Completed 4488 games, 41684 rewards, 9504825 frames, score 14, max score 236
Completed 4489 games, 41688 rewards, 9505580 frames, score 3, max score 236
Completed 4490 games, 41691 rewards, 9506690 frames, score 2, max score 236
Completed 4491 games, 41710 rewards, 9512108 frames, score 18, max score 236
Completed 4492 games, 41716 rewards, 9513522 frames, score 5, max score 236
Completed 4493 games, 41725 rewards, 9514550 frames, score 8, max score 236
Completed 4494 games, 41726 rewards, 9514758 frames, score 0, max score 236
Completed 4495 games, 41734 rewards, 9516265 frames, score 7, max score 236
Completed 4496 games, 41737 rewards, 9516801 frames, score 2, max score 236
Completed 4497 games, 41747 rewards, 9519486 frames, score 9, max score 236
Completed 4498 games, 41749 rewards, 9519891 frames, score 1, max score 236
Completed 4499 games, 41756 rewards, 9521356 frames, score 6, max score 236
Completed

Completed 4598 games, 42434 rewards, 9688919 frames, score 10, max score 236
Completed 4599 games, 42440 rewards, 9691312 frames, score 5, max score 236
Completed 4600 games, 42445 rewards, 9692306 frames, score 4, max score 236
Completed 4601 games, 42447 rewards, 9692711 frames, score 1, max score 236
Completed 4602 games, 42459 rewards, 9695956 frames, score 11, max score 236
Completed 4603 games, 42462 rewards, 9696757 frames, score 2, max score 236
Completed 4604 games, 42469 rewards, 9698127 frames, score 6, max score 236
Completed 4605 games, 42475 rewards, 9699244 frames, score 5, max score 236
Completed 4606 games, 42481 rewards, 9701280 frames, score 5, max score 236
Completed 4607 games, 42483 rewards, 9701722 frames, score 1, max score 236
Completed 4608 games, 42487 rewards, 9702453 frames, score 3, max score 236
Completed 4609 games, 42489 rewards, 9702794 frames, score 1, max score 236
Completed 4610 games, 42492 rewards, 9703675 frames, score 2, max score 236
Completed 

Completed 4709 games, 43558 rewards, 10003723 frames, score 20, max score 236
Completed 4710 games, 43559 rewards, 10003977 frames, score 0, max score 236
Completed 4711 games, 43567 rewards, 10005342 frames, score 7, max score 236
Completed 4712 games, 43579 rewards, 10011785 frames, score 11, max score 236
Completed 4713 games, 43581 rewards, 10012213 frames, score 1, max score 236
Completed 4714 games, 43586 rewards, 10013583 frames, score 4, max score 236
Completed 4715 games, 43594 rewards, 10014871 frames, score 7, max score 236
Completed 4716 games, 43599 rewards, 10015723 frames, score 4, max score 236
Completed 4717 games, 43608 rewards, 10018339 frames, score 8, max score 236
Completed 4718 games, 43612 rewards, 10019092 frames, score 3, max score 236
Completed 4719 games, 43625 rewards, 10025209 frames, score 12, max score 236
Completed 4720 games, 43632 rewards, 10027709 frames, score 6, max score 236
Completed 4721 games, 43639 rewards, 10029138 frames, score 6, max score 

Completed 4815 games, 44684 rewards, 10274839 frames, score 14, max score 236
Completed 4816 games, 44688 rewards, 10275570 frames, score 3, max score 236
Completed 4817 games, 44710 rewards, 10283886 frames, score 21, max score 236
Completed 4818 games, 44713 rewards, 10284443 frames, score 2, max score 236
Completed 4819 games, 44742 rewards, 10295036 frames, score 28, max score 236
Completed 4820 games, 44753 rewards, 10296875 frames, score 10, max score 236
Completed 4821 games, 44757 rewards, 10297578 frames, score 3, max score 236
Completed 4822 games, 44759 rewards, 10297946 frames, score 1, max score 236
Completed 4823 games, 44774 rewards, 10300820 frames, score 14, max score 236
Completed 4824 games, 44782 rewards, 10303599 frames, score 7, max score 236
Completed 4825 games, 44785 rewards, 10304841 frames, score 2, max score 236
Completed 4826 games, 44789 rewards, 10305960 frames, score 3, max score 236
Completed 4827 games, 44794 rewards, 10306886 frames, score 4, max scor

Completed 4924 games, 46072 rewards, 10644576 frames, score 9, max score 236
Completed 4925 games, 46088 rewards, 10647293 frames, score 15, max score 236
Completed 4926 games, 46093 rewards, 10648105 frames, score 4, max score 236
Completed 4927 games, 46099 rewards, 10649634 frames, score 5, max score 236
Completed 4928 games, 46103 rewards, 10650842 frames, score 3, max score 236
Completed 4929 games, 46112 rewards, 10652507 frames, score 8, max score 236
Completed 4930 games, 46125 rewards, 10654400 frames, score 12, max score 236
Completed 4931 games, 46155 rewards, 10660175 frames, score 29, max score 236
Completed 4932 games, 46174 rewards, 10663471 frames, score 18, max score 236
Completed 4933 games, 46177 rewards, 10664110 frames, score 2, max score 236
Completed 4934 games, 46186 rewards, 10666464 frames, score 8, max score 236
Completed 4935 games, 46197 rewards, 10668459 frames, score 10, max score 236
Completed 4936 games, 46200 rewards, 10669915 frames, score 2, max scor

Now let's plot the score, to see if it improved over time.  We will also plot the local average, averaged over 10 consecutive games, to see if that has improved.  Notice that we can use `np.convolve` to compute the local average.

These numbers are really noisy, with a really large maximum.  We will plot `np.log10(1+x)`, rather than x, so that we can better see the average numbers, and ignore the very large noisy spikes.

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(14,9),layout='tight')
ax = [ fig.add_subplot(3,1,x) for x in range(1,4) ]
ax[0].plot(np.arange(0,len(scores)),np.log10(1+np.array(scores)))
ax[0].plot([0,5000],np.log10([7,7]),'k--')
ax[0].set_title('Game scores')
ax[1].plot(np.arange(4991),np.log10(1+np.convolve(np.ones(10)/10,scores,mode='valid')))
ax[1].plot([0,4991],np.log10([7,7]),'k--')
ax[1].set_title('Game scores, average 10 consecutive games')
ax[2].plot(np.arange(0,len(q_achieved)),q_achieved)
ax[2].set_title('Q values of state achieved at each time')
ax[2].set_ylabel('Game number')

Hooray, it has learned!  If you are getting a ten-game average score of better than 6, then you are ready to submit your model for grading.   In order to do that, you need to save the model:

In [28]:
q_learner.save('trained_model.npz')

<a id='section9'></a>

## Extra Credit

For extra credit, download the file <a href="mp11_extra.zip">mp11_extra.zip</a>.  The only important file in this package is:
* `tests/test_extra.py`: this contains extra tests that will evaluate your pre-trained `deep_q` learner, which should be in a file called `trained_model.pkl`.  For full credit, your model should achieve an average score of greater than 20, averaged over 10 consecutive games. 

With a quantized lookup table, it's probably not possible to achieve an average score of 20.  With a deep-Q learner, however, it is eminently possible.  In order to do the extra credit, therefore, you should just fill in the part of `submitted.py` that implements the `deep_q` learner, using pytorch to define a model structure, train it, save it, load it, and act on it.  This learner only needs to have five methods: `__init__`, `act`, `learn`, `save`, and `load`:

In [None]:
importlib.reload(submitted)
help(submitted.deep_q.__init__)

In [None]:
help(submitted.deep_q.act)

In [None]:
help(submitted.deep_q.learn)

In [None]:
help(submitted.deep_q.save)

In [None]:
help(submitted.deep_q.load)

<a id='grade'></a>

## Grade your homework

If you've reached this point, and all of the above sections work, then you're ready to try grading your homework!  Before you submit it to Gradescope, try grading it on your own machine.  This will run some visible test cases (which you can read in `tests/test_visible.py`), and compare the results to the solutions (which you can read in `solution.json`).

The exclamation point (!) tells python to run the following as a shell command.  Obviously you don't need to run the code this way -- this usage is here just to remind you that you can also, if you wish, run this command in a terminal window.

In [None]:
!python grade.py

Now you should try uploading your code to <a href="https://www.gradescope.com/courses/486387">Gradescope</a>.  

**Warning:** For this MP you need to update two files, not just one. 
* `trained_model.npz` should contain your trained model
* `submitted.py` should contain your code that loads and runs your trained model.

**Warning:** The autograder calculates the average score over ten random games.  If you are getting an average score above 10 almost all the time on your own computer, and if the autograder says you had a score below 10, try resubmitting to see if the next round of random games is better.

**Extra Credit:** Your extra credit should also be uploaded as two files,
* `trained_model.pkl` should contain your trained model
* `submitted.py` should contain your code that loads and runs your trained model.