Copyright **`(c)`** 2022 Giovanni Squillero `<squillero@polito.it>`  
[`https://github.com/squillero/computational-intelligence`](https://github.com/squillero/computational-intelligence)  
Free for personal or classroom use; see [`LICENSE.md`](https://github.com/squillero/computational-intelligence/blob/master/LICENSE.md) for details.  


# Lab 3: ES

## Task

Write agents able to play [*Nim*](https://en.wikipedia.org/wiki/Nim), with an arbitrary number of rows and an upper bound $k$ on the number of objects that can be removed in a turn (a.k.a., *subtraction game*).

The goal of the game is to **avoid** taking the last object.

* Task2.1: An agent using fixed rules based on *nim-sum* (i.e., an *expert system*)
* Task2.2: An agent using evolved rules using ES

## Instructions

* Create the directory `lab2` inside your personal course repository for the course 
* Put a `README.md` and your solution (all the files, code and auxiliary data if needed)

## Notes

* Working in group is not only allowed, but recommended (see: [Ubuntu](https://en.wikipedia.org/wiki/Ubuntu_philosophy) and [Cooperative Learning](https://files.eric.ed.gov/fulltext/EJ1096789.pdf)). Collaborations must be explicitly declared in the `README.md`.
* [Yanking](https://www.emacswiki.org/emacs/KillingAndYanking) from the internet is allowed, but sources must be explicitly declared in the `README.md`.



In [1]:
import logging
from pprint import pprint, pformat
from collections import namedtuple
import random
import matplotlib.pyplot as plt
from copy import deepcopy

`Nimply` is the individual, representing the candidate move we are considering, i.e. a `namedtuple` containing the row and the number of objects to remove. Then we can consider a $(1+ \lambda)$ ES, since we start from a parent (our current move) and we have several offsprings to choose (our next moves); since we may also repeat our current move, we can use the *plus* strategy, considering, at the next step, also the move we just did. The **genotype** is the row and the number of objects to be removed, the **phenotype** is the actual configuration of the problem and the **fitness** is something related to the `nim_sum`.

---

#### **IDEAS**
Some ideas are listed in the following.

- **Possible Mutations**:
Increase/Decrease the number of objects to remove by varying or keeping the same row of the previous move; negative numbers kill the offsprings, same for numbers higher than the current cardinality. Such mutation could increase/decrease by a number generated from a Poisson or other probability distributions;

- **Possible Recombinations**:
Given the rigid data structure, it would be difficult to find a meaningful recombination, so skip this for the moment. The only possibility would be to make a sort of `row` or `num_objects` average between the offsprings;

- **Self Adaptiveness**:
To control the magnitude of the mutations and/or the frequency of recombinations, try a dynamic parameter $\alpha$ that takes into account the amount of entropy in the problem (i.e. it evaluates how far we are from the solution). Notice that, by embedding this value in the problem, we may also optimize it;

- **Evaluation of the Fitness**:
To give a fitness score, consider evaluating whether the move sets to zero the Nim-Sum at each move. However, this could be considered as an expert system. Consider also taking the move with the lowest Nim-Sum, lower performance but more fair. In the last case, it must be taken into account also how to evaluate *bad moves* and *really bad moves* (i.e. moves that allow the opponent to get an advantage). For example, a move that allows the opponent to reach Nim-Sum equal to zero should be strongly avoided.


#### **TUNING**
In order to perform the mutation, a fixed $\sigma$ is needed. Run several games performing changing the parameter everytime and evolve it as well! How do we do this?

- **Constant**:
Keep the same $\sigma$ for each move.

- **Linear Decrease**:
Decrease $\sigma$ in a linear way at each step.

- **Meta Learning**:
Run several games keeping the same $\sigma$ and record, for each, the average win rate. Compute the correlation and keep the best result.



## The *Nim* and *Nimply* classes

In [187]:
Nimply = namedtuple("Nimply", "row, num_objects")

class Nim:
    def __init__(self, num_rows: int, k: int = None) -> None:
        self._rows = [i * 2 + 1 for i in range(num_rows)]
        self._k = k

    def __bool__(self):
        return sum(self._rows) > 0

    def __str__(self):
        return "<" + " ".join(str(_) for _ in self._rows) + ">"

    @property
    def rows(self) -> tuple:
        return tuple(self._rows)

    def nimming(self, ply: Nimply) -> None:
        '''
        A player makes a move, updating the current state of the game.

        Parameters
        ---
            ply : `Nimply`,
        `namedtuple` in the format `(row, num_objects)` representing the
        next move.  
        '''
        
        row, num_objects = ply
        assert self._rows[row] >= num_objects
        assert self._k is None or num_objects <= self._k
        self._rows[row] -= num_objects


## Sample (and silly) strategies 

In [188]:
def pure_random(state: Nim) -> Nimply:
    """A completely random move"""
    row = random.choice([r for r, c in enumerate(state.rows) if c > 0]) # Pick randomly one of the available rows.
    num_objects = random.randint(1, state.rows[row])                    # Pick a random number of objects.
    return Nimply(row, num_objects)

def gabriele(state: Nim) -> Nimply:
    """Pick always the maximum possible number of the lowest row"""                        
    possible_moves = [(r, o) for r, c in enumerate(state.rows) for o in range(1, c + 1)]    # Greedy approach.
    return Nimply(*max(possible_moves, key=lambda m: (-m[0], m[1])))

def adaptive(state: Nim) -> Nimply:
    """A strategy that can adapt its parameters"""
    genome = {"love_small": 0.5}

In [189]:
import numpy as np


def nim_sum(state: Nim) -> int:
    tmp = np.array([tuple(int(x) for x in f"{c:032b}") for c in state.rows])    # Get the binary representation of each row.
    xor = tmp.sum(axis=0) % 2                                                   # Compute the XOR across the columns.
    return int("".join(str(_) for _ in xor), base=2)                            # Sum, it has to be 

# tmp
#       |      001          1
#      |||     011        2 + 1
#     |||||    101        4 + 1
# 
# xor
#      0 0 1
#      + + +
#      0 1 1
#      + + +
#      1 1 1
#      % % %
#      2 2 2
#      = = =
#      1 1 1
# 
# int 
#      111 = 7 
# 
# The goal is to keep the nim sum equal to 0 -> winning configuration 


def analize(raw: Nim) -> dict:
    # Iterate over all the rows and check the number of objects per each.
    cooked = dict()
    cooked["possible_moves"] = dict()
    for ply in (Nimply(r, o) for r, c in enumerate(raw.rows) for o in range(1, c + 1)):
        tmp = deepcopy(raw)
        tmp.nimming(ply)                                # Check what happens making the move.
        cooked["possible_moves"][ply] = nim_sum(tmp)    # Assign the corresponding nim sum.
    return cooked


def optimal(state: Nim) -> Nimply:
    # Analize the state.
    analysis = analize(state)
    logging.debug(f"analysis:\n{pformat(analysis)}")
    
    # Checks the moves for which the nim sum is different than zero.
    spicy_moves = [ply for ply, ns in analysis["possible_moves"].items() if ns != 0]
    
    # Takes the moves that are EQUAL to zero, i.e. nim sum equal to zero -> optimal.
    if not spicy_moves:
        spicy_moves = list(analysis["possible_moves"].keys())

    # Choose one random move.
    ply = random.choice(spicy_moves)
    return ply


## Oversimplified match

In [195]:
logging.getLogger().setLevel(logging.INFO)

strategy = (optimal, pure_random)

nim = Nim(5)
logging.info(f"init : {nim}")
player = 0
while nim:
    ply = strategy[player](nim)
    logging.info(f"ply: player {player} plays {ply}")
    nim.nimming(ply)
    logging.info(f"status: {nim}")
    player = 1 - player
logging.info(f"status: Player {player} won!")


INFO:root:init : <1 3 5 7 9>
INFO:root:ply: player 0 plays Nimply(row=4, num_objects=7)
INFO:root:status: <1 3 5 7 2>
INFO:root:ply: player 1 plays Nimply(row=1, num_objects=1)
INFO:root:status: <1 2 5 7 2>
INFO:root:ply: player 0 plays Nimply(row=0, num_objects=1)
INFO:root:status: <0 2 5 7 2>
INFO:root:ply: player 1 plays Nimply(row=2, num_objects=5)
INFO:root:status: <0 2 0 7 2>
INFO:root:ply: player 0 plays Nimply(row=1, num_objects=1)
INFO:root:status: <0 1 0 7 2>
INFO:root:ply: player 1 plays Nimply(row=4, num_objects=1)
INFO:root:status: <0 1 0 7 1>
INFO:root:ply: player 0 plays Nimply(row=3, num_objects=1)
INFO:root:status: <0 1 0 6 1>
INFO:root:ply: player 1 plays Nimply(row=4, num_objects=1)
INFO:root:status: <0 1 0 6 0>
INFO:root:ply: player 0 plays Nimply(row=3, num_objects=6)
INFO:root:status: <0 1 0 0 0>
INFO:root:ply: player 1 plays Nimply(row=1, num_objects=1)
INFO:root:status: <0 0 0 0 0>
INFO:root:status: Player 0 won!


## Functions

In [196]:
POPULATION_SIZE = 30

In [197]:
def analize(raw: Nim) -> dict:
    # Iterate over all the rows and check the number of objects per each.
    cooked = dict()
    cooked["possible_moves"] = dict()
    for ply in (Nimply(r, o) for r, c in enumerate(raw.rows) for o in range(1, c + 1)):
        tmp = deepcopy(raw)
        tmp.nimming(ply)                                # Check what happens making the move.
        cooked["possible_moves"][ply] = nim_sum(tmp)    # Assign the corresponding nim sum.
    return cooked

def mutate(state: Nim) -> Nim:

    # Select a random row, by filtering out the empty ones.
    non_zero_rows = [index for index, value in enumerate(state.rows) if value != 0]
    row_index = random.choice(non_zero_rows)
    
    # Generate a random number to be subtracted. How to choose lambda?
    multiverse = deepcopy(state)
    mut = np.random.poisson(np.mean(multiverse.rows) + 1)
    ply = Nimply(row_index, mut)
    if multiverse.rows[row_index] < mut :
        ply = Nimply(row_index, 1)
        return ply
    
    else:
        return ply

def fitness(state: Nim, move: Nimply) -> int:

    player_actual_advantage = 0
    player_future_advantage = 0
    opponent_future_advantage = 0

    # Check if the opponent can get Nim sum equal to 0 at next move. If it can, but the player
    # can obtain as well a new nim_sum it's okay. If it can, but the player cannot obtain a 0,
    # is not okay. If the player can get nim sum equal to zero it is always okay
    
    multiverse = deepcopy(state)
    multiverse.nimming(move)

    # Increase if with the actual mutation the agent gets Nim Sum == 0.
    if (nim_sum(multiverse) == 0):
        
        player_actual_advantage += 1

        # Check if the match ended.
        if sum(multiverse.rows) == 1:
            return np.inf
        
        if sum(multiverse.rows) == 0:
            return -np.inf
    
    else:
        # The opponent does the best move he can.
        optimal_ply = optimal(multiverse)
        multiverse.nimming(optimal_ply)
        
        if sum(multiverse.rows) == 1:
            return -np.inf
        
        if sum(multiverse.rows) == 0:
            return np.inf

        elif nim_sum(multiverse) == 0:
            # Increase if from the actual mutation the opponent can reach Nim Sum == 0.
            opponent_future_advantage += 1

            # The agent does the best move he can.
            optimal_ply = optimal(multiverse)
            multiverse.nimming(optimal_ply)
            
            if sum(multiverse.rows) == 1:
                return np.inf

            elif nim_sum(multiverse) == 1:
                player_future_advantage += 1
    
    return player_actual_advantage + player_future_advantage - opponent_future_advantage


In [209]:
def pure_random(state: Nim) -> Nimply:
    """A completely random move"""
    row = random.choice([r for r, c in enumerate(state.rows) if c > 0]) # Pick randomly one of the available rows.
    num_objects = random.randint(1, state.rows[row])                    # Pick a random number of objects.
    return Nimply(row, num_objects)

def gabriele(state: Nim) -> Nimply:
    """Pick always the maximum possible number of the lowest row"""                        
    possible_moves = [(r, o) for r, c in enumerate(state.rows) for o in range(1, c + 1)]    # Greedy approach.
    return Nimply(*max(possible_moves, key=lambda m: (-m[0], m[1])))

def adaptive(state: Nim) -> Nimply:
    """A strategy that can adapt its parameters"""
    genome = {"love_small": 0.5}

def optimal(state: Nim) -> Nimply:
    # Analize the state.
    analysis = analize(state)
    logging.debug(f"analysis:\n{pformat(analysis)}")
    
    # Checks the moves for which the nim sum is different than zero.
    spicy_moves = [ply for ply, ns in analysis["possible_moves"].items() if ns != 0]
    
    # Takes the moves that are EQUAL to zero, i.e. nim sum equal to zero -> optimal.
    if not spicy_moves:
        spicy_moves = list(analysis["possible_moves"].keys())

    # Choose one random move.
    ply = random.choice(spicy_moves)
    return ply

def greedy(state: Nim) -> Nimply:

    most_objects = state.rows[np.argmax([x[1] for x in state.rows])]
    return Nimply(most_objects[0], most_objects[1])

def clairvoyant(state: Nim) -> Nimply:
    
    population = []
    universe = deepcopy(state)
    best = None

    if sum(universe.rows) == 1:
        return Nimply(0, 0)
    
    else:
        for i in range(POPULATION_SIZE):
            multiverse = deepcopy(universe)
            mutation = mutate(multiverse)                       # Nim
            population.append((mutation, fitness(multiverse, mutation)))    # Use Poisson and tune sigma and mean based on the problem.
        
        values_to_compare = [x[1] for x in population]
        max_index = np.argmax(values_to_compare)
        best = population[max_index][0]
        # rnd = population[random.choice([i for i in range(len(population)) if i != max_index])][0]

        clarity = 1 - 0.40 * (np.mean(universe.rows) / AVG)
        print(clarity)
        if random.random() < clarity:
            chosen = best

        else:
            chosen = gabriele(multiverse)

        # Implement gabriele rather than simply random.
    
        
        # universe.nimming(chosen)
        # index_diff = np.where(universe.rows != chosen.rows)
        
        # abs_diff = multiverse.rows[index_diff] - chosen.rows[index_diff]
        # ply = Nimply(index_diff, abs_diff)
        ply = chosen
        return ply

In [None]:
logging.getLogger().setLevel(logging.INFO)

strategy = (optimal, clairvoyant)

nim = Nim(25)

AVG = np.mean(nim.rows)
logging.info(f"init : {nim}")
player = 0

while nim:
    ply = strategy[player](nim)
    logging.info(f"ply: player {player} plays {ply}")
    nim.nimming(ply)
    logging.info(f"status: {nim}")
    player = 1 - player
logging.info(f"status: Player {player} won!")


To evaluate the mutation, compute both the actual min sum and the expected worst behaviour of the opponent. We start from the assumption that a stupid agent will keep doing the same move or a random one. A greedy one will just select the best in his turn, without considering the opponent.

In [None]:
#       |      001          1
#      |||     011        2 + 1
#     |||||    101        4 + 1

#       |      001          1
#      |||     011        2 + 1
#     ||||     100        4 + 0

#       |      001          1
#       |      001        0 + 1
#     ||||     100        4 + 0

#              000          0
#              000        0 + 0
#     ||||     100        4 + 0

#              000          0
#       |      001        0 + 1
#     ||||     100        4 + 0

## Test

In [184]:
temp = Nim(3)
temp.rows
np.array([tuple(int(x) for x in f"{c:032b}") for c in temp.rows])

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 1, 0, 1]])

In [None]:
# tmp
#       |      001          1
#      |||     011        2 + 1
#     |||||    101        4 + 1
# 
# xor
#      0 0 1
#      + + +
#      0 1 1
#      + + +
#      1 1 1
#      % % %
#      2 2 2
#      = = =
#      1 1 1
# 
# int 
#      111 = 7 
# 
# The goal is to keep the nim sum equal to 0 -> winning configuration 



# tmp
#       |      001          1
#      |||     011        2 + 1
#     ||||     100        4 + 0
# 
# xor
#      0 0 1
#      + + +
#      0 1 1
#      + + +
#      1 0 0
#      % % %
#      2 2 2
#      = = =
#      1 1 0
# 
# int 
#      110 = 6 
# 
# The goal is to keep the nim sum equal to 0 -> winning configuration



# tmp
#       |      001          1
#       |      001        0 + 1
#     ||||     100        4 + 0
# 
# xor
#      0 0 1
#      + + +
#      0 0 1
#      + + +
#      1 0 0
#      % % %
#      2 2 2
#      = = =
#      1 0 0
# 
# int 
#      110 = 4 
# 
# The goal is to keep the nim sum equal to 0 -> winning configuration



# tmp
#              000          0
#              000        0 + 0
#     ||||     100        4 + 0
# 
# xor
#      0 0 0
#      + + +
#      0 0 0
#      + + +
#      1 0 0
#      % % %
#      2 2 2
#      = = =
#      1 0 1
# 
# int 
#      101 = 5 
# 
# The goal is to keep the nim sum equal to 0 -> winning configuration



# tmp
#              000          0
#       |      001        0 + 1
#     ||||     100        4 + 0
# 
# xor
#      0 0 0
#      + + +
#      0 0 1
#      + + +
#      1 0 0
#      % % %
#      2 2 2
#      = = =
#      1 0 1
# 
# int 
#      101 = 5 
# 
# The goal is to keep the nim sum equal to 0 -> winning configuration