# Meaningful Play Score Assigner

This program is designed to take an adjacency matrix of a topology of non-looping, non-backtracking linear choices, and apply q-learning to determine how meaningful the set of choices would be from the perspective of the actor.

## Setup

Initial set up before we begin:

#### Import Statements

Here all libraries that we use will be imported

In [37]:
from math import *
from decimal import Decimal
import numpy as np
import random
import xlsxwriter
import xml.dom.minidom as minidom

#### Gather User Input

We'll need to know the input file for the graph and the number of layers

TODO: (Can modify layer and ending counts to be automatically calculated from adjacency matrix)

In [38]:
filename = "OnlyIntegrated.xml" #input("Please input the name of the topology .xml file you want to score: ")
layers =  5#int(input("Please input the number of non-ending layers your topology has: "))
endings = 4 #int(input("Please input the number of endings in your topology: "))

#### Create State Mapping and Adjacency matrix

The states need to be put into a map for identification purposes, and an adjacency matrix can also be generated from the same file.

In [39]:
#info: https://www.guru99.com/manipulating-xml-with-python.html#3

#Mapping for the states
location_to_state = {}

#Map to help finish populating (imperfect copy of location_to_state)
tempMap = {}

doc = minidom.parse(filename)

#load in all the relevant elements to a list called "cells"
cells = doc.getElementsByTagName("mxCell")

#list to hold nodes
nodes = []
#list to hold connections between nodes
connections = []
#Number to keep track of how many endings have been added
endingIndex = 1

#loop through cells to sort ellipses and endArrows into their respective lists
for label in cells:
    if(label.getAttribute("style") != ""):
        if(label.getAttribute("style").startswith("ellipse")):
            nodes.append(label)
        elif(label.getAttribute("style").startswith("endArrow")):
            connections.append(label)

#declare empty adjacency matrix to populate with data on the graph
rewards = []

#populate matrix and map simulatneously as you iterate through the nodes
for i in range(len(nodes)):
    rewards.append([])
    if "ending" in nodes[i].getAttribute("value"):
        location_to_state["E" + str(endingIndex)] = i
        print("E" + str(endingIndex) + " is: " + nodes[i].getAttribute("value"))
        endingIndex = endingIndex + 1
    elif "start" in nodes[i].getAttribute("value"):
        location_to_state["Start"] = i
    else:
        location_to_state[nodes[i].getAttribute("id")] = i
    
    tempMap[nodes[i].getAttribute("id")] = i
    
    for j in range(len(nodes)):
        rewards[i].append(0)

#finish populating the adjacency matrix with data on the connections between nodes
for link in connections:
    x = tempMap[link.getAttribute("source")]
    y = tempMap[link.getAttribute("target")]
    rewards[x][y] = 1

#convert matrix into nparray and print it and the map for manual inspection
rewards = np.asarray(rewards)
print("Matrix:\n")
print(rewards)
print("\nMap:\n")
print(location_to_state)

print(tempMap)

# Map indices to locations
state_to_location = dict((state,location) for location,state in location_to_state.items())

E1 is: ending_1
E2 is: ending_2
E3 is: ending_3
E4 is: ending_4
Matrix:

[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1]
 [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0

#### Define q-learning Functions

In [40]:
class QAgent():
    
    def __init__(self, alpha, gamma, epsilon, location_to_state, rewards, state_to_location, Q):
        """ Initialize alpha, gamma, epsilon, states, actions, rewards, and Q-values
        """
        self.gamma = gamma  
        self.alpha = alpha 
        self.epsilon = epsilon
        
        self.location_to_state = location_to_state
        self.rewards = rewards
        self.state_to_location = state_to_location
        
        self.Q = Q
        
    def training(self, start_location, end_location, iterations):
        """Training the system in the given environment to move from a start state to an end state
        """
        rewards_new = np.copy(self.rewards)
        
        #set reward for end state to 100 to incentivize reaching desired end
        ending_state = self.location_to_state[end_location]
        rewards_new[ending_state, ending_state] = 100

        #Loop for specified number of iterations
        for i in range(iterations):
            
            #initialize current state as the start state
            #current_state = self.location_to_state[start_location]
            
            #Randomly pick a state to observe
            current_state = np.random.randint(0,len(self.rewards)) 
            
            #counter to make sure it hits a dead end after a while
            counter = 0
            
            #infinite loop runs through route, updating until it hits a dead-end
            while(True):
                
                #Iterate counter. If it hits the limit, break the loop and print error message
                counter += 1
                if(counter == limit):
                    print("Error: Hit limit before reaching an end while training")
                    break
                
                #Construct list of possible actions
                playable_actions = []
                
                for j in range(len(self.rewards)):
                    if rewards_new[current_state,j] > 0:
                        playable_actions.append(j)

                #Only run updates if observed state has performable actions
                if playable_actions:
                    #Decide whether to random walk or follow standard policy
                    if random.uniform(0, 1) < epsilon:
                        next_state = np.random.choice(playable_actions)
                    else:
                        next_state = np.argmax(rewards_new[current_state,])

                    #Calculate temporal difference
                    TD = rewards_new[current_state,next_state] + \
                            self.gamma * self.Q[next_state, np.argmax(self.Q[next_state,])] - self.Q[current_state,next_state]

                    #updates Q-value using Bellman equation
                    self.Q[current_state,next_state] += self.alpha * TD
                    
                    #check if agent is at desired ending
                    if next_state == current_state:
                        break
                    
                    #update current state to move forward
                    current_state = next_state
                else:
                    break

        route = [start_location]
        next_location = start_location
        
        # Get the route 
        return self.get_optimal_route(start_location, end_location, next_location, route, self.Q)
        
    # Get the optimal route
    def get_optimal_route(self, start_location, end_location, next_location, route, Q):
        
        #set counter to break if it learns wrong route
        counter = 0
        
        while(next_location != end_location):
            #Iterate counter. If it hits the limit, break the loop and print error message
            counter += 1
            if(counter >= limit):
                print("Error: Failed to learn correct route")
                return None
            
            starting_state = self.location_to_state[start_location]
            next_state = np.argmax(Q[starting_state,])
            next_location = self.state_to_location[next_state]
            route.append(next_location)
            start_location = next_location            

        return route
    
#Take set of q-tables and average them into one q-table
def qaverage(table_set):
    num = 0
    output_table = table_set[0].copy()
    for i in range(len(table_set[0][0])):
        for j in range(len(table_set[0])):
            for k in range(len(table_set)):
                num += table_set[k][j][i]
            output_table[j][i] = num / len(table_set)
            num = 0

    return output_table
    
# Initialize parameters
gamma = 0.75 # Discount factor (discounts previous rewards)
alpha = 0.9 # Learning rate
epsilon = 0.2 # Exploration vs exploitation percentage

limit = 100000 # number of steps until a dead end is hit

## q-learning

So now that all the setup has been done, we can start our q-learning algorithm, then move on to processing its output.

#### Define q-learning Execution Functions

We need a couple of functions to handle our q-learning, since we need to execute multiple times

In [41]:
#Handle all q-learning for a given topology
def qmaster(final_state, output_tables):
    #array to store the final Q-Table of each 1000 iterations
    qtables = []
    for i in range(10):
        qagent = QAgent(alpha, gamma, epsilon, location_to_state, rewards,  state_to_location, 
                        np.array(np.zeros([len(location_to_state),len(location_to_state)])))
        #TODO: REMOVE
        print("Starting training with ending: ", final_state)
    
        training_results = qagent.training("Start", final_state, 100000)
        
        #TODO: REMOVE
        print("Done with training")
        if training_results is not None:
            qtables.append(qagent.Q)

    if len(qtables) > 0:
        output_tables.append(qaverage(qtables))
        
#generates excel spreadsheet containing all q-tables in a given path
def to_excel(qtables, excel_name):
    """store data in excel
    """
    workbook = xlsxwriter.Workbook(excel_name + ".xlsx")

    #write each q-table to another worksheet
    for i in range(len(qtables)):
        worksheet = workbook.add_worksheet()
        for j in range(len(qtables[i])):
            for k in range(len(qtables[i])):
                worksheet.write(j, k, qtables[i][j][k])

    workbook.close()

#### Run q-learning algorithm

Finally, we can run q-master and store the output

In [42]:
#an array to hold the outputs of q-master.
averaged_tables = []

#run qmaster for each ending state
for i in range(endings):
    qmaster("E" + str(i + 1), averaged_tables)
    
to_excel(averaged_tables, "Raw")

Starting training with ending:  E1
Done with training
Starting training with ending:  E1
Done with training
Starting training with ending:  E1
Done with training
Starting training with ending:  E1
Done with training
Starting training with ending:  E1
Done with training
Starting training with ending:  E1
Done with training
Starting training with ending:  E1
Done with training
Starting training with ending:  E1
Done with training
Starting training with ending:  E1
Done with training
Starting training with ending:  E1
Done with training
Starting training with ending:  E2
Done with training
Starting training with ending:  E2
Done with training
Starting training with ending:  E2
Done with training
Starting training with ending:  E2
Done with training
Starting training with ending:  E2
Done with training
Starting training with ending:  E2
Done with training
Starting training with ending:  E2
Done with training
Starting training with ending:  E2
Done with training
Starting training with endin

#### Get weights

Calculate and apply weights to each of the averaged q-tables

In [43]:
#Calculate weights
def weight_calculator(layers):
    #get slope
    slope = 1 / (layers * layers)
    #get sum
    sum = 0
    for i in range(1, layers + 1):
        sum += (i * slope)
    
    #get amount to add to equal 1
    toAdd = (1 - sum) / layers
    
    #Finally, set up and return array of weights
    weights = []
    for i in range(1, layers + 1):
        weights.append((i * slope) + toAdd)
        
    print(weights)
    return weights

#list of visited nodes for weighting function to ignore
visited = []

#Apply weighting function to give high score to early states
def apply_weights_helper(array, layers, endings):
    global visited
    #zero out the diagonal of the array to eliminate reward values, and eliminate any links to the start state
    for i in range(len(array)):
        array[i][i] = 0
        #array[i][0] = 0
    
    weights = weight_calculator(int(layers))
    weights.reverse()
    apply_weights(array, weights, location_to_state['Start'], 0)
    visited = []
    
def apply_weights(array, weights, x, level):
    for i in range(len(array[x])):
        if(level < len(weights)):
            array[x][i] = array[x][i] * weights[level]
        if(array[x][i] > 0 and i not in visited):
            print("Going to:" + str(x) + "," + str(i))
            visited.append(i)
            apply_weights(array, weights, i, (level + 1))
       
for i in range(len(averaged_tables)):
    apply_weights_helper(averaged_tables[i], layers, endings)
    
to_excel(averaged_tables, "Weighted")

[0.12, 0.15999999999999998, 0.19999999999999998, 0.24, 0.28]
Going to:0,17
Going to:17,1
Going to:1,2
Going to:2,3
Going to:3,4
Going to:0,18
Going to:18,5
Going to:5,6
Going to:6,7
Going to:7,8
Going to:0,19
Going to:19,9
Going to:9,10
Going to:10,11
Going to:11,12
Going to:0,20
Going to:20,13
Going to:13,14
Going to:14,15
Going to:15,16
[0.12, 0.15999999999999998, 0.19999999999999998, 0.24, 0.28]
Going to:0,17
Going to:17,1
Going to:1,2
Going to:2,3
Going to:3,4
Going to:0,18
Going to:18,5
Going to:5,6
Going to:6,7
Going to:7,8
Going to:0,19
Going to:19,9
Going to:9,10
Going to:10,11
Going to:11,12
Going to:0,20
Going to:20,13
Going to:13,14
Going to:14,15
Going to:15,16
[0.12, 0.15999999999999998, 0.19999999999999998, 0.24, 0.28]
Going to:0,17
Going to:17,1
Going to:1,2
Going to:2,3
Going to:3,4
Going to:0,18
Going to:18,5
Going to:5,6
Going to:6,7
Going to:7,8
Going to:0,19
Going to:19,9
Going to:9,10
Going to:10,11
Going to:11,12
Going to:0,20
Going to:20,13
Going to:13,14
Going t

#### Normalizing

Before we take the pairwise minkowski distance to get our score, we want to normalize our weighted q-tables.
We can do this by running each subarray of each table through the Softmax function.

In [44]:
#Softmax implementation modified from
#https://intellipaat.com/community/942/how-to-implement-the-softmax-function-in-python

def softmax(x): 
    """Compute softmax values for each sets of scores in x.""" 
    e_x = np.exp(x - np.max(x))
    return (e_x / e_x.sum(axis=0)).tolist()

def normalize(array, endings):
    processedList = []
    indices = []
    pair = []
    for i in range(len(array)):
        for j in range(len(array)):
            if(array[i][j] > 0):
                pair.append(i)
                pair.append(j)
                indices.append(pair.copy())
                pair = []
                processedList.append(array[i][j])
        
    print("BEFORE--------------------------------------")
    print(processedList)
    processedList = softmax(processedList)
    print("AFTER---------------------------------------")
    print(processedList)
    for j in range(len(indices)):
        array[indices[j][0]][indices[j][1]] = processedList[j]
            
for i in range(len(averaged_tables)):
    normalize(averaged_tables[i], endings)
    
to_excel(averaged_tables, "Normalized")

BEFORE--------------------------------------
[27.432343749999994, 0.85421875, 0.85421875, 0.85421875, 34.21249999999999, 36.27999999999999, 36.11999999999999, 0.46249999999999997, 0.27999999999999997, 0.12, 0.46249999999999997, 0.27999999999999997, 0.12, 0.46249999999999997, 0.27999999999999997, 0.12, 31.031249999999993, 0.65625, 0.65625, 0.65625]
AFTER---------------------------------------
[7.243702111673905e-05, 2.076000985538585e-16, 2.076000985538585e-16, 2.076000985538585e-16, 0.06375951839885427, 0.5040212758207727, 0.4294985996974986, 1.4031569889271156e-16, 1.16908885619479e-16, 9.962318075559989e-17, 1.4031569889271156e-16, 1.16908885619479e-16, 9.962318075559989e-17, 1.4031569889271156e-16, 1.16908885619479e-16, 9.962318075559989e-17, 0.0026481690617552755, 1.7031418459713627e-16, 1.7031418459713627e-16, 1.7031418459713627e-16]
BEFORE--------------------------------------
[0.85421875, 27.432343749999994, 0.85421875, 0.85421875, 0.46249999999999997, 0.27999999999999997, 0.12,

#### Calculate Minkowski Distances and Output the Score

We're finally ready to calculate the minkowski distance and output a meaningfulness score

In [45]:
#Minkowski Distance implementation altered from
#https://www.geeksforgeeks.org/minkowski-distance-python/

#convert array into 1D vector for ease of manipulation
def vectorize(input_array):
    output_array = []
    for i in range(len(input_array)):
        for j in range(len(input_array[0])):
            output_array.append(input_array[i][j])
            
    return output_array

#Calculate Minkowski distance between arrays
  
# Function distance between two points  
# and calculate distance value to given 
# root value(p is root value) 
def p_root(value, root): 
      
    root_value = 1 / float(root) 
    return round (Decimal(value) **
             Decimal(root_value), 3) 
  
def minkowski_distance(x, y, p_value): 
    # pass the p_root function to calculate 
    # all the values of vector in parallel 
    return (p_root(sum(pow(abs(a-b), p_value) 
            for a, b in zip(x, y)), p_value))

#vectorize averaged tables
vectors = []
for i in range(len(averaged_tables)):
    vectors.append(vectorize(averaged_tables[i]))
    
#calculate minkowski distances in a pairwise fashion
distances = []
acc = 0
for i in range(len(averaged_tables) - 1):
    for j in range(i + 1, len(vectors)):
        #Note: Distance must be divided by 2, because when two nodes differ in a topology, 
        #the q-values are different in two different places by the minkowski distance function
        distance = minkowski_distance(vectors[i], vectors[j], 1) / 2
        distances.append(distance)
        acc += distance
        
print((acc / len(distances)))

1.000
