# Meaningful Play Score Assigner

This program is designed to take an adjacency matrix of a topology of non-looping, non-backtracking linear choices, and apply q-learning to determine how meaningful the set of choices would be from the perspective of the actor.

## Setup

Initial set up before we begin:

#### Import Statements

Here all libraries that we use will be imported

In [2]:
from math import *
from decimal import Decimal
import numpy as np
import xlsxwriter

#### Gather User Input

We'll need to know the input file for the adjacency matrix and the number of layers

TODO: (Can modify layer and ending counts to be automatically calculated from adjacency matrix)

In [3]:
filename = input("Please input the name of the topology .txt file you want to score: ")
layers = int(input("Please input the number of non-ending layers your topology has: "))
endings = int(input("Please input the number of endings in your topology: "))

Please input the name of the topology .txt file you want to score: TopNonIntegrated.txt
Please input the number of non-ending layers your topology has: 5
Please input the number of endings in your topology: 4


#### Create State Mapping

The states need to be put into a map for identification purposes.

TODO:(Can automate this step based on information from the adjacency matrix)

In [4]:
#Mapping for the states of NoIntegrated.txt
location_to_state = {
    'Start' : 0,
    '1D' : 1,
    '1N1' : 2,
    '1N2' : 3,
    '1C' : 4,
    '2D' : 5,
    '2N1' : 6,
    '2N2' : 7,
    '2C' : 8,
    '3D' : 9,
    '3N1' : 10,
    '3N2' : 11,
    '3C' : 12,
    '4D' : 13,
    '4N1' : 14,
    '4N2' : 15,
    '4C' : 16,
    'E1' : 17,
    'E2' : 18,
    'E3' : 19,
    'E4' : 20
}

# Map indices to locations
state_to_location = dict((state,location) for location,state in location_to_state.items())

#### Define q-learning Functions

In [5]:
class QAgent():
    
    def __init__(self, alpha, gamma, location_to_state, rewards, state_to_location, Q):
        """ Initialize alpha, gamma, states, actions, rewards, and Q-values
        """
        self.gamma = gamma  
        self.alpha = alpha 
        
        self.location_to_state = location_to_state
        self.rewards = rewards
        self.state_to_location = state_to_location
        
        self.Q = Q
        
    def training(self, start_location, end_location, iterations):
        """Training the system in the given environment to move from a start state to an end state
        """
        rewards_new = np.copy(self.rewards)
        
        #set reward for end state to 100 to incentivize reaching desired end
        ending_state = self.location_to_state[end_location]
        rewards_new[ending_state, ending_state] = 100

        #Loop for iterations
        for i in range(iterations):
            #Randomly pick a state to observe
            current_state = np.random.randint(0,len(self.rewards)) 
            playable_actions = []

            #Construct list of possible actions
            for j in range(len(self.rewards)):
                if rewards_new[current_state,j] > 0:
                    playable_actions.append(j)

            #Only run updates if observed state has performable actions
            if len(playable_actions) > 0:
                next_state = np.random.choice(playable_actions)

                #Calculate temporal difference
                TD = rewards_new[current_state,next_state] + \
                        self.gamma * self.Q[next_state, np.argmax(self.Q[next_state,])] - self.Q[current_state,next_state]

                #updates Q-value using Bellman equation
                self.Q[current_state,next_state] += self.alpha * TD

        route = [start_location]
        next_location = start_location
        
        # Get the route 
        return self.get_optimal_route(start_location, end_location, next_location, route, self.Q)
        
    # Get the optimal route
    def get_optimal_route(self, start_location, end_location, next_location, route, Q):
        
        while(next_location != end_location):
            starting_state = self.location_to_state[start_location]
            next_state = np.argmax(Q[starting_state,])
            next_location = self.state_to_location[next_state]
            route.append(next_location)
            start_location = next_location
        
        return route
    
#Take set of q-tables and average them into one q-table
def qaverage(table_set):
    num = 0
    output_table = table_set[0].copy()
    for i in range(len(table_set[0][0])):
        for j in range(len(table_set[0])):
            for k in range(len(table_set)):
                num += table_set[k][j][i]
            output_table[j][i] = num / len(table_set)
            num = 0

    return output_table
    
# Initialize parameters
gamma = 0.75 # Discount factor (discounts previous rewards)
alpha = 0.9 # Learning rate

## q-learning

So now that all the setup has been done, we can start our q-learning algorithm, then move on to processing its output.

#### Define q-learning Execution Functions

We need a couple of functions to handle our q-learning, since we need to execute multiple times

In [6]:
#Handle all q-learning for a given topology
def qmaster(final_state, output_tables):
    #array to store the final Q-Table of each 1000 iterations
    qtables = []
    for i in range(100):
      qagent = QAgent(alpha, gamma, location_to_state, rewards,  state_to_location, 
                      np.array(np.zeros([len(location_to_state),len(location_to_state)])))
      qagent.training('Start', final_state, 1000)
      qtables.append(qagent.Q)

    output_tables.append(qaverage(qtables))

#### Open File

Now we open our file and format the data from the adjacency matrix to be compatible

In [7]:
#get topology from file
with open(filename) as textFile:
    rewards = np.array([[int(digit) for digit in line.strip().split(",")] for line in textFile])
    
print(rewards)

[[0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]


#### Run q-learning algorithm

Finally, we can run q-master and store the output

In [49]:
#an array to hold the outputs of q-master.
averaged_tables = []

#run qmaster for each ending state
for i in range(endings):
    qmaster('E' + str(i + 1), averaged_tables)

#### Get weights

Calculate and apply weights to each of the averaged q-tables

In [50]:
#Calculate weights
def weight_calculator(layers):
    #get slope
    slope = 1 / (layers * layers)
    #get sum
    sum = 0
    for i in range(layers):
        sum += (i * slope)
    
    #get amount to add to equal 1
    toAdd = (1 - sum) / layers
    
    #Finally, set up and return array of weights
    weights = []
    for i in range(layers):
        weights.append((i * slope) + toAdd)
        
    #print(weights)
    return weights

#Apply weighting function to give high score to early states
def apply_weights_helper(array, layers, endings):
    for i in range(1, endings + 1):
        for j in range(len(array[0])):
            array[-1*i][j] = 0
    
    weights = weight_calculator(int(layers))
    weights.reverse()
    apply_weights(array, weights, 0, 0)
    
def apply_weights(array, weights, x, level):
    for i in range(len(array[x])):
        if(level < len(weights)):
            array[x][i] = array[x][i] * weights[level]
        if(array[x][i] > 0):
            apply_weights(array, weights, i, (level + 1))
            
for i in range(len(averaged_tables)):
    apply_weights_helper(averaged_tables[i], layers, endings)
    
print(averaged_tables)

[array([[0.00000000e+00, 2.70962589e+01, 2.74005871e+01, 2.72985645e+01,
        2.73669342e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 3.10249597e+01, 6.56249040e-01, 6.56249867e-01,
        6.56249842e-01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 3.10251726e+01, 6.56249095e-01, 6.56242474e-01,
        6.56247222e-01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.000

#### Normalizing

Before we take the pairwise minkowski distance to get our score, we want to normalize our weighted q-tables.
We can do this by running each subarray of each table through the Softmax function.

In [51]:
#Softmax implementation modified from
#https://intellipaat.com/community/942/how-to-implement-the-softmax-function-in-python

def softmax(x): 
    """Compute softmax values for each sets of scores in x.""" 
    e_x = np.exp(x - np.max(x)) 
    return (e_x / e_x.sum(axis=0)).tolist()

def normalize(array, endings):
    processedList = []
    indices = []
    pair = []
    for i in range(len(array) - endings):
        for j in range(len(array)):
            if(array[i][j] > 0):
                pair.append(i)
                pair.append(j)
                indices.append(pair.copy())
                pair = []
                processedList.append(array[i][j])
        
    processedList = softmax(processedList)
    for j in range(len(indices)):
        array[indices[j][0]][indices[j][1]] = processedList[j]
        
            
#generates excel spreadsheet containing all q-tables in a given path
def to_excel(qtables):
    """store data in excel
    """
    workbook = xlsxwriter.Workbook(filename + '.xlsx')

    #write each q-table to another worksheet
    for i in range(len(qtables)):
        worksheet = workbook.add_worksheet()
        for j in range(len(qtables[i])):
            for k in range(len(qtables[i])):
                worksheet.write(j, k, qtables[i][j][k])

    workbook.close()
            
for i in range(len(averaged_tables)):
    normalize(averaged_tables[i], endings)
    to_excel(averaged_tables)

#### Calculate Minkowski Distances and Output the Score

We're finally ready to calculate the minkowski distance and output a meaningfulness score

In [52]:
#Minkowski Distance implementation altered from
#https://www.geeksforgeeks.org/minkowski-distance-python/

#convert array into 1D vector for ease of manipulation
def vectorize(input_array):
    output_array = []
    for i in range(21):
        for j in range(21):
            output_array.append(input_array[i][j])
            
    return output_array

#Calculate Minkowski distance between arrays
  
# Function distance between two points  
# and calculate distance value to given 
# root value(p is root value) 
def p_root(value, root): 
      
    root_value = 1 / float(root) 
    return round (Decimal(value) **
             Decimal(root_value), 3) 
  
def minkowski_distance(x, y, p_value): 
    # pass the p_root function to calculate 
    # all the values of vector in parallel 
    return (p_root(sum(pow(abs(a-b), p_value) 
            for a, b in zip(x, y)), p_value))

#vectorize averaged tables
vectors = []
for i in range(len(averaged_tables)):
    vectors.append(vectorize(averaged_tables[i]))
    
#calculate minkowski distances in a pairwise fashion
distances = []
acc = 0
for i in range(len(averaged_tables) - 1):
    for j in range(i + 1, len(vectors)):
        distance = minkowski_distance(vectors[i], vectors[j], 1) / 2
        distances.append(distance)
        acc += distance

print(acc / len(distances))

0.9764166666666666666666666667
