Copyright **`(c)`** 2022 Giovanni Squillero `<squillero@polito.it>`  
[`https://github.com/squillero/computational-intelligence`](https://github.com/squillero/computational-intelligence)  
Free for personal or classroom use; see [`LICENSE.md`](https://github.com/squillero/computational-intelligence/blob/master/LICENSE.md) for details.  


# Lab 2: ES

## Task

Write agents able to play [*Nim*](https://en.wikipedia.org/wiki/Nim), with an arbitrary number of rows and an upper bound $k$ on the number of objects that can be removed in a turn (a.k.a., *subtraction game*).

The goal of the game is to **avoid** taking the last object.

* Task2.1: An agent using fixed rules based on *nim-sum* (i.e., an *expert system*)
* Task2.2: An agent using evolved rules using ES

## Instructions

* Create the directory `lab2` inside your personal course repository for the course 
* Put a `README.md` and your solution (all the files, code and auxiliary data if needed)

## Notes

* Working in group is not only allowed, but recommended (see: [Ubuntu](https://en.wikipedia.org/wiki/Ubuntu_philosophy) and [Cooperative Learning](https://files.eric.ed.gov/fulltext/EJ1096789.pdf)). Collaborations must be explicitly declared in the `README.md`.
* [Yanking](https://www.emacswiki.org/emacs/KillingAndYanking) from the internet is allowed, but sources must be explicitly declared in the `README.md`.



In [17]:
import logging
from pprint import pprint, pformat
from collections import namedtuple
import random
from copy import deepcopy


## The *Nim* and *Nimply* classes

In [18]:
Nimply = namedtuple("Nimply", "row, num_objects")


In [19]:
class Nim:
    def __init__(self, num_rows: int, k: int = None) -> None:
        self._rows = [i * 2 + 1 for i in range(num_rows)]
        self._k = k if k is not None else self._rows[-1]

    def __bool__(self):
        return sum(self._rows) > 0

    def __str__(self):
        return "<" + " ".join(str(_) for _ in self._rows) + ">"

    @property
    def rows(self) -> tuple:
        return tuple(self._rows)
    
    @property
    def k(self):
        return self._k

    def nimming(self, ply: Nimply) -> None:
        row, num_objects = ply
        assert self._rows[row] >= num_objects
        assert self._k is None or num_objects <= self._k
        self._rows[row] -= num_objects


## Sample (and silly) startegies 

In [20]:
def pure_random(state: Nim) -> Nimply:
    """A completely random move"""
    row = random.choice([r for r, c in enumerate(state.rows) if c > 0])
    num_objects = random.randint(1, min(state.rows[row], state.k))
    return Nimply(row, num_objects)


In [21]:
def gabriele(state: Nim) -> Nimply:
    """Pick always the maximum possible number of the lowest row"""
    possible_moves = [(r, o) for r, c in enumerate(state.rows) for o in range(1, min(c + 1, state.k))]
    return Nimply(*max(possible_moves, key=lambda m: (-m[0], m[1])))


This strategies consists in select the rows with the max number of objects and take from it a number proportional to the ratio between the max number of remaining moves and the total number of moves

In [22]:
def adaptive(state: Nim) -> Nimply:
    """A strategy that can adapt its parameters"""
    remaining_moves = sum(state.rows) - 1;
    tot_moves = sum([i * 2 + 1 for i in range(NIM_ROWS)])
    genome = remaining_moves/tot_moves
    
    index_max_rows = state.rows.index(max(state.rows))
    objects_to_take = max(min(int(genome * max(state.rows)), state.k), 1)
    return Nimply(index_max_rows, objects_to_take)


In [23]:
import numpy as np


def nim_sum(state: Nim) -> int:
    tmp = np.array([tuple(int(x) for x in f"{c:032b}") for c in state.rows])
    xor = tmp.sum(axis=0) % 2
    return int("".join(str(_) for _ in xor), base=2)


def analize(raw: Nim) -> dict:
    cooked = dict()
    cooked["possible_moves"] = dict()
    for ply in (Nimply(r, o) for r, c in enumerate(raw.rows) for o in range(1, c+1 if raw.k is None else min(c+1, raw.k))):
        tmp = deepcopy(raw)
        tmp.nimming(ply)
        cooked["possible_moves"][ply] = nim_sum(tmp)
    return cooked


def optimal(state: Nim) -> Nimply:
    analysis = analize(state)
    logging.debug(f"analysis:\n{pformat(analysis)}")
    spicy_moves = [ply for ply, ns in analysis["possible_moves"].items() if ns != 0 and ply.num_objects <= state.k]
    if not spicy_moves:
        spicy_moves = list(analysis["possible_moves"].keys())
    ply = random.choice(spicy_moves)
    return ply

This function enhanced the previous optimal strategy adding the case in which the game end up in a position with only one row of size 2 or more and at this point the nim sum is not equal to zero so the best move is to reduce this to a size of 0 or 1 and leaving an odd number of rows with size 1, from which all the moves are constrained

In [24]:
def expert_agent(state: Nim) -> Nimply:
    analysis = analize(state)
    
    #check if remains only one rows with size 2 or more and eventually remove 2 objects from the rows with 2 elements if the number of rows with one object is even, in the other case remove one 
    if state.rows.count(1) == (len(state.rows) - state.rows.count(0))-1:
        row, objects = [(row, objects) for row, objects in enumerate(state.rows) if objects > 1][0]
        objects_to_remove = objects
        if (state.rows.count(1) % 2) != 1:
            objects_to_remove = objects-1
        return Nimply(row, objects_to_remove if state.k is None else min(objects_to_remove, state.k))  
    spicy_moves = [ply for ply, ns in analysis["possible_moves"].items() if ns != 0]
    if not spicy_moves:
        spicy_moves = list(analysis["possible_moves"].keys())
    ply = random.choice(spicy_moves)
    return ply

this function simulate a match between the evolutionary and the expert agent

In [25]:

def play_game_against_expert(nim: Nim, agent) -> int:
    
    logging.info(f"init : {nim}")
    player = random.randint(0, 1)

    while nim: 
        current_strategy = expert_agent if player == 0 else random.choices(agent[0], weights=agent[1], k=1)[0]
        #print("CURRENT STRATEGIES IS: " + current_strategy.__name__)
        ply = current_strategy(nim)
        logging.info(f"ply: player {player} plays {ply}")
        nim.nimming(ply)
        logging.info(f"status: {nim}")
        player = 1 - player
    logging.info(f"status: Player {player} won!")
    return player


Here the evolutionary strategy that consist in creating as population agent with 4 different strategies, each one with a different weight to compute the probabilities of being chosen during the game to compute the move
The evolutionary parameters are:
* Parent selection $\mu$: 1/3
* Reproduction $\rho$ : 1
* Mutation rate $\gamma$: 0.2

The other are:
* POPULATION_SIZE: How many agents are part of every generation
* STRATEGIES: The list of all the different strategies that can be used by an agent
* AGENT_STRATEGIES: How many strategies an agent has

The steps are:
* The population is reduced in every generation by first selecting the remaining parents assigning a weight that depends on the fitness value and randomly choose which parent eliminate proportionally to the weight
* Then based on the mutation rate probability, it is applied to every remaining parent to restore the population size:
    * a mutation by changing randomly one strategy and one weight
    * or a crossover by mix the strategies and the weights of 2 different parents

Every 5 generation is printed the best and the average fitness to check how the evolution is going.
Expectation are that the expert agent wins the most part of the games and that the agent tend to preserve and maximize the weight of the optimal strategy which is like the one used by the expert agent with except for the final part where it does not make the optimal move

In [26]:
import random

# Evolutionary parameters
POPULATION_SIZE = 30
MUTATION_RATE = 0.2
NUMBER_GENERATIONS = 50
NIM_ROWS = 10
STRATEGIES = [gabriele, pure_random, optimal, adaptive]
AGENT_STRATEGIES = 8

def generate_random_agent():
    
    agent_strategies =  [random.choices(STRATEGIES, weights=[random.randint(1, 4) for _ in range(len(STRATEGIES))])[0] for _ in range(AGENT_STRATEGIES)]
    strategies_weights = random.choices([1, 2, 3, 4], k=AGENT_STRATEGIES)
    return (agent_strategies, strategies_weights)


def fitness(agent, number_of_matches=20):
    victories_agent = 0
    for i in range(number_of_matches):
        number_of_rows = random.randint(2, 10)
        max_k = random.randint(2, 10)
        nim = Nim(NIM_ROWS, max_k)
        results = play_game_against_expert(nim, agent)
        victories_agent += 1 if results == 1 else 0
        
    return victories_agent/number_of_matches 


def mutate(agent):
    strategies, weights = agent
    mutated_strategies = strategies[:]  
    idx_to_mutate = random.randint(0, len(mutated_strategies) - 1)
    mutated_strategies[idx_to_mutate] = random.choice(STRATEGIES)
    mutated_weights = weights[:]
    mutated_weights[random.randint(0, len(mutated_strategies) -1 )] = random.choice([1, 2, 3, 4])
    return mutated_strategies, mutated_weights


def reproduce(agent1, agent2):
    strategies1, weights1 = agent1
    strategies2, weights2 = agent2
    crossover_point = random.randint(0, len(strategies1))
    child_strategies = strategies1[:crossover_point] + strategies2[crossover_point:]
    child_weights = weights1[:crossover_point] + weights2[crossover_point:]
    return child_strategies, child_weights  

# Initialize the population
population = [generate_random_agent() for _ in range(POPULATION_SIZE - 1)]

# Evolutionary loop
for generation in range(NUMBER_GENERATIONS):
    # Evaluate current generation
    fitness_scores = [fitness(agent) for agent in population]

    if generation % 5 == 0:
        max_fitness = max(fitness_scores)
        print(f"- Generation: {generation} - Best Fitness: {max_fitness} - Avg Fitness: {sum(fitness_scores) / len(fitness_scores)}")

    # Keep the best agent from the previous generation
    best_of_generation = max(population, key=fitness)
    
    # Select parents
    selected_parents = random.choices(population, weights=fitness_scores, k=POPULATION_SIZE // 3)

    # Create next generation
    new_population = [best_of_generation]  

    for i in range(POPULATION_SIZE - 1):
        if random.random() < MUTATION_RATE:
            new_population.append(mutate(random.choice(selected_parents)))
        else:
            agent1 = random.choice(selected_parents)
            agent2 = random.choice(selected_parents)
            new_population.append(reproduce(agent1, agent2))

    population = new_population

# Print the best agents
best_3_agents = sorted(population, key=fitness, reverse=True)[:3]
for i, agent in enumerate(best_3_agents):
    print(f"Agent {i} using functions: {[func.__name__ for func in agent[0]]}, weights: {agent[1]} with a fitness value of: {fitness(agent)}")



- Generation: 0 - Best Fitness: 0.3 - Avg Fitness: 0.18793103448275864
- Generation: 5 - Best Fitness: 0.4 - Avg Fitness: 0.1716666666666667
- Generation: 10 - Best Fitness: 0.4 - Avg Fitness: 0.15666666666666665
- Generation: 15 - Best Fitness: 0.35 - Avg Fitness: 0.1716666666666667
- Generation: 20 - Best Fitness: 0.35 - Avg Fitness: 0.19166666666666668
- Generation: 25 - Best Fitness: 0.35 - Avg Fitness: 0.18333333333333338
- Generation: 30 - Best Fitness: 0.35 - Avg Fitness: 0.205
- Generation: 35 - Best Fitness: 0.35 - Avg Fitness: 0.17833333333333334
- Generation: 40 - Best Fitness: 0.45 - Avg Fitness: 0.2033333333333333
- Generation: 45 - Best Fitness: 0.35 - Avg Fitness: 0.19333333333333333
Agent 0 using functions: ['optimal', 'adaptive', 'optimal', 'adaptive', 'pure_random', 'optimal', 'gabriele', 'adaptive'], weights: [1, 3, 1, 4, 2, 2, 4, 2] with a fitness value of: 0.2
Agent 1 using functions: ['gabriele', 'adaptive', 'optimal', 'adaptive', 'pure_random', 'optimal', 'adapti

From the results the winning strategy is the adaptive one which is used more than the optimal 
To test it, here there are 4 simulations to test all the four strategies against the expert agent to confirm the result

In [27]:
def test_strategies(agent):
    sum_fitness = 0
    for _ in range(100):
        sum_fitness += fitness(agent)
    print(f"The {agent[0][0].__name__} strategy has a mean fitness value of {sum_fitness/100}")
    
test_strategies(([pure_random], [1]))
test_strategies(([gabriele], [1]))
test_strategies(([optimal], [1]))
test_strategies(([adaptive], [1]))

The pure_random strategy has a mean fitness value of 0.15000000000000005
The gabriele strategy has a mean fitness value of 0.1305
The optimal strategy has a mean fitness value of 0.19499999999999995
The adaptive strategy has a mean fitness value of 0.2795000000000001


The tests confirm that the most successful strategy is adaptive, despite the fact that the expert agent wins much more than half the time