## Training Data Display and Analysis
This is notebook is for viewing the data gathered during training runs of our agent. We're currently only supporting one traning file, but in the future we're probably going to split the data up into multiple files.

#### Data Structure
Each state data set in the format of an array:
0. Board State: 2D Array representation of board
1. Reward: Integer reward at end of state
2. Done: True if game is over
3. Starting Anchor: Anchor where the shape dropped
4. End Anchor: Anchor after completing the move
5. Shape: Shape that was dropped
6. Height: How far the blocks are from the top of the board, greater is better
7. Success: Were we able to complete the move?
8. Actions: An array of the actions we took to complete the state

In [8]:
import numpy as np
import matplotlib
from matplotlib import pyplot as plt

BRD_HEIGHT = 20
BRD_WIDTH = 10

dat_array = np.load('training_data.npy')

print("Total number of states in dataset: ", len(dat_array))

Total number of states in dataset:  20325


Now we can get some averages out of this dataset

In [55]:
# Function to average reward in the dataset based on y at index iy
def average(dat, iy = -1, y = None):
    n = len(dat)
    m = 0
    
    if iy < 0:
        for j in range(n):
            a = 1/(j+1)
            m = m + a*(dat[j][1] - m )
            
    elif y is not None:
        for j in range(n):
            # print(str(dat[j][iy]))
            if np.array_equal(dat[j][iy], y):
                a = 1/(j+1)
                m = m + a*(dat[j][1] - m )
        
    return m

In [53]:
avgR = average(dat_array)
avgR

-0.39778597785977754

Definitions for shapes and their rotations

In [71]:
def rot_shape(shape):
    return [(j, -i) for i, j in shape]

def shp_avg_ind(shape):
    return average(dat_array, iy = 5, y = shape)

def shp_avg(shape):
    avg = 0
    for i in range(4):
        avg = avg + average(dat_array, iy = 5, y = shape)
        shape = rot_shape(shape)
    return avg

shapes = {
    'T': [(0, 0), (-1, 0), (1, 0), (0, -1)],
    'J': [(0, 0), (-1, 0), (0, -1), (0, -2)],
    'L': [(0, 0), (1, 0), (0, -1), (0, -2)],
    'Z': [(0, 0), (-1, 0), (0, -1), (1, -1)],
    'S': [(0, 0), (-1, -1), (0, -1), (1, 0)],
    'I': [(0, 0), (0, -1), (0, -2), (0, -3)],
    'O': [(0, 0), (0, -1), (-1, 0), (-1, -1)],
}
shape_names = ['T', 'J', 'L', 'Z', 'S', 'I', 'O']

print("Mean reward when dropped shape is T: ", shp_avg(shapes[shape_names[0]]))
print("Mean reward when dropped shape is J: ", shp_avg(shapes[shape_names[1]]))
print("Mean reward when dropped shape is L: ", shp_avg(shapes[shape_names[2]]))
print("Mean reward when dropped shape is Z: ", shp_avg(shapes[shape_names[3]]))
print("Mean reward when dropped shape is S: ", shp_avg(shapes[shape_names[4]]))
print("Mean reward when dropped shape is I: ", shp_avg(shapes[shape_names[5]]))
print("Mean reward when dropped shape is O: ", shp_avg(shapes[shape_names[6]]))

Mean reward when dropped shape is T:  -0.49278016001
Mean reward when dropped shape is J:  -0.380693683179
Mean reward when dropped shape is L:  -0.653110381455
Mean reward when dropped shape is Z:  -0.731789845654
Mean reward when dropped shape is S:  -0.871091878069
Mean reward when dropped shape is I:  0.399109375712
Mean reward when dropped shape is O:  -0.318258374206
