# EVAC-2 Code And Details
This notebook contains all code and explanations of said code. Code is separated into blocks, organised and split by purpose of code. 

The method used for this assessment to evolve agents throughout gameplay was a neural network, where weights of said neural network are evolved generation by generation. Further detail can be found at the relevant code sections within this report.

## Agent Representation
As each agent is required to be part of a group, groups were defined as an integer with representation as follows:
* `0` - Saints
* `1` - Buddies
* `2` - Fight Club
* `3` - Vandals

An agent was decided to be represented using a class, this class could then be used to keep track of all important agent attributes:
* `wealth` - The total wealth of the agent.
* `startingGroup` - The initial group assignment of the agent, reassigned each generation start. This is randomly allocated when the agent is first created.
* `group` - The current group assignment of the agent.
* `gameCount` - The number of games played by the agent.
* `weights` - List of floats used in the neural network to make a decision on which group to join. Evolved by the evolutionary algorithm.
* `fitness` - Current fitness of the agent. 

The game can be played by calling the `getPayoff()` method with the opponent as the required parameter. This is called for each of the two agents chosen in a single game as each agent assumes the role of opponent to the other. Representing each group as an integer allows for very simple calculation of payoffs, with a lookup table implemented for each agent as follows: `[[4,0,4,0],[6,4,6,1],[4,0,1,0],[6,1,6,0]]`. This lookup table stores the payoffs for each group interaction with each group, for a total of 16 different possible interactions. 

The evolutionary aspect of the player class is called with `evaluate()`, with the required parameters of the current game opponent along with an instance of the neural network class. This method loads the agent weights into the neural network, calculates the group that the agent should be assigned to (this is the agent deciding if it should move group) and then the agent fitness is assigned as the agent wealth divided by the game count. This fitness was chosen to allow for a fair comparison between agents regardless of how many games they were randomly chosen to play.

The `reset()` method is called to setup an agent ready for a new generation. This resets the group to the agents starting group, along with setting the agent wealth, fitness, and game counter to 0.

In [135]:
import numpy as np

class player():
  def __init__(self, IND_SIZE):
    self.wealth = 0
    self.startingGroup = random.randint(0,3)
    self.group = self.startingGroup
    self.gameCount = 0
    self.weights = tools.initRepeat(list, toolbox.attr_float, IND_SIZE)
    self.fitness = 0
  
  def evaluate(self,opponent,network):
    network.setWeightsLinear(self.weights)
    output = network.feedForward([self.wealth, self.group, opponent.wealth, opponent.group])
    decision = np.argmax(output, axis=0)
    if (self.group != decision):
      self.group = decision
    self.fitness = self.wealth / self.gameCount

  def addPayoff(self, opponent):
    payoffs = [[4,0,4,0],[6,4,6,1],[4,0,1,0],[6,1,6,0]]
    self.wealth += payoffs[self.group][opponent.group]
    self.gameCount += 1

  def reset(self):
    self.wealth = 0
    self.gameCount = 0
    self.group = self.startingGroup
    self.fitness = 0

## Agent Evolution Environment

### Neural Network
The neural network used by all agents in the system is a single hidden layer network with 4 input nodes, 8 hidden nodes and 4 output nodes. These 4 output nodes represent the decision of which group to join, and a group is selected using a `softmax` function. The input nodes of this neural network are the agents own group, the agents own wealth, the opponents group and the opponents wealth. These inputs were settled on after experimentation with other inputs such as the number of games the agent had played, or the current game number. These additional inputs did not improve performance and were removed to reduce chance of overfitting.

A bias of 1 was added to the input layer to improve performance.

In [136]:
import numpy as np

numInputNodes = 4
numHiddenNodes = 8
numOutputNodes = 4

IND_SIZE = ((numInputNodes+1) * numHiddenNodes) +  + (numHiddenNodes * numOutputNodes)

class NeuralNetwork(object):
  def __init__(self, numInput, numHidden, numOutput):
    self.numInput = numInput + 1
    self.numHidden = numHidden
    self.numOutput = numOutput

    self.wh = np.random.randn(self.numHidden, self.numInput) 
    self.wo = np.random.randn(self.numOutput, self.numHidden)

    self.ReLU = lambda x : max(0,x)

  def softmax(self, x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

  def feedForward(self, inputs):
    inputsBias = inputs[:]
    inputsBias.insert(len(inputs), 1)

    h1 = np.dot(self.wh, inputsBias)
    h1 = [self.ReLU(x) for x in h1]

    output = np.dot(self.wo, h1)
    return self.softmax(output)

  def getWeightsLinear(self):
    flat_wh = list(self.wh.flatten())
    flat_wo = list(self.wo.flatten())
    return( flat_wh + flat_wo )

  def setWeightsLinear(self, Wgenome):
    numWeights_IH = self.numHidden * (self.numInput)
    self.wh = np.array(Wgenome[:numWeights_IH])
    self.wh = self.wh.reshape((self.numHidden, self.numInput))
    self.wo = np.array(Wgenome[numWeights_IH:])
    self.wo = self.wo.reshape((self.numOutput, self.numHidden))

### Running the Simulation
The game simulation is defined as: for a set number of games, two agents are selected at random from a population to play against each other. Payoffs are calculated and each agent is then given the opportunity to change groups. 

This is implemented within the method `playGameAndEvolve` using the constant `NGAMES` to define the number of games played, and `POP` to specify the population size. The two agents are selected using the code `random.sample(range(POP),2)`, which selects two distinct population indexes, these agents then calculate their respective payoffs using `addPayoff`, and calculate their fitness and decision on if they should migrate groups using `evaluate`. Details of these methods are found in the Agent Representation part of this code.

`playBasicGame` is an additional method that can be used to run the game without allowing agents to swap their group assignments, this will be used to evaluate the behaviour developed through adaption.

In [137]:
NGAMES = 10000
POP = 4*400

def playGameAndEvolve(pop, network):
  for r in range(NGAMES):
    selection = random.sample(range(POP),2)
    pop[selection[0]].addPayoff(pop[selection[1]])
    pop[selection[1]].addPayoff(pop[selection[0]])
    pop[selection[0]].evaluate(pop[selection[1]],network)
    pop[selection[1]].evaluate(pop[selection[0]],network)
  return pop

def playBasicGame(pop):
  for r in range(NGAMES):
    selection = random.sample(range(POP),2)
    pop[selection[0]].addPayoff(pop[selection[1]])
    pop[selection[1]].addPayoff(pop[selection[0]])
  return pop

## Agent Adaptation Procedure (Training through Evolution)

In [138]:
def sumGroups(groups):
  groupTotals=[0,0,0,0]
  for person in groups:
    groupTotals[person.startingGroup] += person.wealth
  return groupTotals

def avgGroups(groups):
  groupTotals=[0,0,0,0]
  counts=[0,0,0,0]
  for person in groups:
    if (person.gameCount > 0):
      groupTotals[person.startingGroup] += person.wealth
      counts[person.startingGroup] += 1
  for i in range(0,4):
    if(groupTotals[i] > 0):
      groupTotals[i] = groupTotals[i] / counts[i] 
  return groupTotals

def countGroups(pop):
  groupTotals=[0,0,0,0]
  for person in pop:
    groupTotals[person.group] += 1
  return groupTotals

def fitnessStats(pop):
  fitnessSum = 0
  fitnessMax = 0 
  fitnessMin = -1
  for person in pop:
    if(person.fitness > fitnessMax):
      fitnessMax = person.fitness
    if(person.fitness < fitnessMin or fitnessMin == -1):
      fitnessMin = person.fitness
    fitnessSum += person.fitness
  return { "mean": fitnessSum/len(pop), "max":fitnessMax, "min":fitnessMin }

In [139]:
from deap import base
from deap import tools
import random

INDPB=0.1
NGEN = 50
CXPB = 0.1
MUTPB = 0.5

toolbox = base.Toolbox()
toolbox.register("attr_float", random.uniform, -1.0, 1.0)
toolbox.register("individual", player, IND_SIZE)
toolbox.register("mate", tools.cxOnePoint)
toolbox.register("select", tools.selTournament, tournsize=5)
toolbox.register("mutate", tools.mutGaussian, mu=0.0, sigma=0.5, indpb=INDPB)
toolbox.register("pop", tools.initRepeat, list, toolbox.individual)

network = NeuralNetwork(numInputNodes, numHiddenNodes, numOutputNodes)

pop = toolbox.pop(n=POP)
for g in range(NGEN):
  print("-- Generation %i --" % g)
  offspring = toolbox.select(pop, len(pop))

  offspring = list(map(toolbox.clone, offspring))

  for child1, child2 in zip(offspring[::2], offspring[1::2]):
    if random.random() < CXPB:
      toolbox.mate(child1.weights, child2.weights)

  for mutant in offspring:
    if random.random() < MUTPB:
      toolbox.mutate(mutant.weights)
      if (random.random() < INDPB):
        mutant.startingGroup = random.randint(0,3)

  for person in offspring:
    person.reset()
  
  pop[:] = playGameAndEvolve(offspring, network)
  print("Group avg    : " + str(avgGroups(pop)))
  print("Group sum    : " + str(sumGroups(pop)))
  print("Group count  : "+str(countGroups(pop)))


-- Generation 0 --


KeyboardInterrupt: 