Copyright **`(c)`** 2022 Giovanni Squillero `<squillero@polito.it>`  
[`https://github.com/squillero/computational-intelligence`](https://github.com/squillero/computational-intelligence)  
Free for personal or classroom use; see [`LICENSE.md`](https://github.com/squillero/computational-intelligence/blob/master/LICENSE.md) for details.


# Lab 3: ES

## Task

Write agents able to play [_Nim_](https://en.wikipedia.org/wiki/Nim), with an arbitrary number of rows and an upper bound $k$ on the number of objects that can be removed in a turn (a.k.a., _subtraction game_).

The goal of the game is to **avoid** taking the last object.

- Task2.1: An agent using fixed rules based on _nim-sum_ (i.e., an _expert system_)
- Task2.2: An agent using evolved rules using ES

## Instructions

- Create the directory `lab2` inside the course repo
- Put a `README.md` and your solution (all the files, code and auxiliary data if needed)

## Notes

- Working in group is not only allowed, but recommended (see: [Ubuntu](https://en.wikipedia.org/wiki/Ubuntu_philosophy) and [Cooperative Learning](https://files.eric.ed.gov/fulltext/EJ1096789.pdf)). Collaborations must be explicitly declared in the `README.md`.
- [Yanking](https://www.emacswiki.org/emacs/KillingAndYanking) from the internet is allowed, but sources must be explicitly declared in the `README.md`.


In [None]:
import logging
from pprint import pprint, pformat
from collections import namedtuple
import random
from copy import deepcopy

## The _Nim_ and _Nimply_ classes


In [None]:
Nimply = namedtuple("Nimply", "row, num_objects")

In [None]:
class Nim:
    def __init__(self, num_rows: int, k: int = None) -> None:
        self._rows = [i * 2 + 1 for i in range(num_rows)]
        self._k = k

    def __bool__(self):
        return sum(self._rows) > 0

    def __str__(self):
        return "<" + " ".join(str(_) for _ in self._rows) + ">"

    @property
    def rows(self) -> tuple:
        return tuple(self._rows)

    def nimming(self, ply: Nimply) -> None:
        row, num_objects = ply
        assert self._rows[row] >= num_objects
        assert self._k is None or num_objects <= self._k
        self._rows[row] -= num_objects

- Class `Nim` is defined, with an initializer method `__init__`, three methods `__bool__`, `__str__`, and `nimming`, and a property `rows`.
- The `__init__` method takes two arguments: `num_rows` and `k` (with a default value of `None`), and initializes `self._rows` with a list of odd numbers and stores `k` in `self._k`.
- The `__bool__` method returns a boolean value indicating whether the sum of the numbers in `self._rows` is greater than 0.
- The `__str__` method returns a string representation of the `Nim` instance by joining the numbers in `self._rows` into a string enclosed in angle brackets.
- The `rows` property returns a tuple containing the numbers in `self._rows`.
- The `nimming` method takes a `Nimply` instance (or a tuple), unpacks it into `row` and `num_objects`, and performs two assertions. It subtracts `num_objects` from the number of objects in the specified row if both assertions pass.


## Sample (and silly) startegies


In [None]:
def pure_random(state: Nim) -> Nimply:
    """A completely random move"""
    row = random.choice(
        [r for r, c in enumerate(state.rows) if c > 0]
    )  # select random non empty row
    num_objects = random.randint(
        1, state.rows[row]
    )  # select random number of objects to be removed from row
    return Nimply(
        row, num_objects
    )  # return new Nimply object for that row with updated number of objects

In [None]:
def gabriele(state: Nim) -> Nimply:
    """Pick always the maximum possible number of the lowest row"""
    possible_moves = [
        (r, o) for r, c in enumerate(state.rows) for o in range(1, c + 1)
    ]  # create list of all possible moves
    return Nimply(
        *max(possible_moves, key=lambda m: (-m[0], m[1]))
    )  # return Nimply object with maximum number of objects from lowest row

In [None]:
def adaptive1(state: Nim) -> Nimply:
    """A strategy that can adapt its parameters"""
    genome = {"love_small": 0.5}  # set initial value for love_small

    if state.rows[0] <= 3:  # if lowest row has 3 or less objects
        genome["love_small"] = 0.9  # increase love_small
    elif state.rows[0] >= 7:  # if lowest row has 7 or more objects
        genome["love_small"] = 0.1  # decrease love_small

    row = min(
        range(len(state.rows)), key=lambda r: state.rows[r]
    )  # select row with lowest number of objects

    num_objects = int(
        genome["love_small"] * state.rows[row]
    )  # select number of objects to be removed from row

    return Nimply(
        row, num_objects
    )  # return Nimply object for that row with updated number of objects

In [None]:
import numpy as np


def nim_sum(state: Nim) -> int:
    """
    takes a `Nim` object and calculates the Nim-sum of the current game
    state by converting the counts of objects in the rows to binary, summing
    them, and taking the modulo 2 to obtain an integer Nim-sum
    """
    tmp = np.array([tuple(int(x) for x in f"{c:032b}") for c in state.rows])
    xor = tmp.sum(axis=0) % 2
    return int("".join(str(_) for _ in xor), base=2)


def analize(raw: Nim) -> dict:
    """
    takes a `Nim` object and returns a dictionary with a single key,
    "possible_moves," which maps each possible move to its resulting Nim-sum. To
    calculate this, it iterates over all possible moves, makes a deep copy of
    the game state, applies the move to the copy, and calculates the Nim-sum of
    the resulting state
    """
    cooked = dict()
    cooked["possible_moves"] = dict()
    for ply in (Nimply(r, o) for r, c in enumerate(raw.rows) for o in range(1, c + 1)):
        tmp = deepcopy(raw)
        tmp.nimming(ply)
        cooked["possible_moves"][ply] = nim_sum(tmp)
    return cooked


def optimal(state: Nim) -> Nimply:
    """
    takes a `Nim` object and returns an optimal move by analyzing the
    current game state to get the Nim-sums of all possible moves. It selects
    moves that result in a non-zero Nim-sum, representing "winning" moves.
    If no such moves exist, it selects all moves and chooses one at random
    to return as the optimal move
    """

    # from archimedes-lab.org:

    # To win at Nim-game, always make a move, whenever possible, that leaves a
    # configuration with a ZERO “Nim sum”, that is with ZERO unpaired multiple(s)
    # of 4, 2 or 1. Otherwise, your opponent has the advantage, and you have to
    # depend on his/her committing an error in order to win.

    analysis = analize(state)
    logging.debug(f"analysis:\n{pformat(analysis)}")
    spicy_moves = [ply for ply, ns in analysis["possible_moves"].items() if ns == 0]
    if not spicy_moves:
        spicy_moves = list(analysis["possible_moves"].keys())
    ply = random.choice(spicy_moves)
    return ply

## Oversimplified match

- The line `logging.getLogger().setLevel(logging.INFO)` sets the logging level to `INFO`, indicating that all logging messages with level `INFO` or higher will be displayed.

- The `strategy` tuple contains two functions, `optimal` and `pure_random`, which are used to determine the moves of the two players.

- The `while nim:` loop continues as long as the game is not over, meaning there are objects remaining in the game. Inside the loop, the current player's strategy function is invoked with the current game state to determine the next move (`ply = strategy[player](nim)`). This move is then applied to the game (`nim.nimming(ply)`), and both the game state and the move are logged. Finally, the current player is switched using `player = 1 - player`.

- After the loop, the final game state is logged, along with the player who won the game.


In [None]:
# logging.getLogger().setLevel(logging.INFO)

# # strategy = (optimal, pure_random)
# # strategy = (gabriele, adaptive1)
# strategy = (optimal, adaptive1)

# nim = Nim(5)
# logging.info(f"init : {nim}\n")
# player = 0
# while nim:
#     ply = strategy[player](nim)
#     logging.info(f"ply: player {player} plays {ply}")
#     nim.nimming(ply)
#     logging.info(f"status: {nim}\n")
#     player = 1 - player
# logging.info(f"status: Player {player} won using {strategy[player].__name__} startegy!")

# NIM ES


In [None]:
# evolutionary params
POPULATION_SIZE = 20
MUTATION_RATE = 0.01
NUMBER_GENERATIONS = 100

# game params
NIM_DIM = 5
# Longest Nim Match is if every player takes away only one stick per turn -> numbersticks-1
MAX_NUMBER_MOVES = sum([i * 2 + 1 for i in range(NIM_DIM)])
STRATEGIES = [pure_random, gabriele, adaptive1, optimal]
NUMBER_STRATEGIES = len(STRATEGIES)

# number on matches in fitness
FITNESS_MATCHES = 10

# expert agent -> each move is optimal move
EXPERT_AGENT = [optimal] * MAX_NUMBER_MOVES

In [None]:
# some rnadomly generated agents already beat optimal??


def generate_random_agent_1():
    # no optimal
    return [random.choice(STRATEGIES[:-1]) for _ in range(MAX_NUMBER_MOVES)]


def generate_random_agent_2():
    # with small change to perform an optimal move
    return [
        random.choices(STRATEGIES, weights=[4, 4, 4, 1])[0]
        for _ in range(MAX_NUMBER_MOVES)
    ]


print(generate_random_agent_1())
print(generate_random_agent_2())

In [None]:
def nim_match(agent1):
    strategy = (agent1, EXPERT_AGENT)

    nim = Nim(5)
    player = random.randint(0, 1)
    number_moves = 0
    while nim:
        ply = strategy[player][number_moves](nim)
        nim.nimming(ply)
        player = 1 - player
        number_moves += 1

    # maybe can return 1, number of moves and it could evolve to win in less moves??
    return player, number_moves

In [None]:
from multiprocessing import Pool


def fitness1(agent):
    # plays against expert by exectuing in order the moves of the agent and the expert agent
    # fitness is sum of (won match / number moves)
    with Pool() as pool:
        results = pool.map(
            nim_match, [agent] * FITNESS_MATCHES
        )  # [(victory,moves),...]
    return sum(
        [res[0] / res[1] for res in results]
    )  # optimizes for more aggressive moves ie less turns to win -> doesnt perform as well on final boss as fitness2


def fitness2(agent):
    # plays against expert by exectuing in order the moves of the agent and the expert agent
    # fitness is number of matches won by agent with max 10 matches
    with Pool() as pool:
        results = pool.map(
            nim_match, [agent] * FITNESS_MATCHES
        )  # [(victory,moves),...]
    return sum(
        [res[0] for res in results]
    )  # optimizes for more aggressive moves ie less turns to win


fitness = fitness2


# test
print(fitness(EXPERT_AGENT))
print(fitness([STRATEGIES[0]] * MAX_NUMBER_MOVES))  # rand
print(fitness([STRATEGIES[1]] * MAX_NUMBER_MOVES))  # gab
print(fitness([STRATEGIES[2]] * MAX_NUMBER_MOVES))  # adap

In [None]:
def mutate(agent):
    # swap two move strategies
    if random.randint(0, 1):
        swap_index1, swap_index2 = random.sample(range(MAX_NUMBER_MOVES), 2)
        agent[swap_index1], agent[swap_index2] = (
            agent[swap_index2],
            agent[swap_index1],
        )
    # change one move strategy to another strategy
    else:
        agent[random.randint(0, MAX_NUMBER_MOVES - 1)] = random.choice(STRATEGIES)

    return agent

In [None]:
def reproduce(agent1, agent2):
    # crossover
    # random split of the two agents and then concatenate them
    agent1_index = random.randint(0, MAX_NUMBER_MOVES - 1)
    return agent1[:agent1_index] + agent2[agent1_index:]

In [None]:
# population = [generate_random_agent_1() for _ in range(POPULATION_SIZE)]  # no optimal -> dont work if all loses first gen (common) cause weights in random choices

population = [
    generate_random_agent_2() for _ in range(POPULATION_SIZE)
]  # with small change to perform an optimal move

In [None]:
for generation in range(NUMBER_GENERATIONS):
    # evaluate current generation
    fitness_scores = [fitness(agent) for agent in population]

    if generation % 5 == 0:
        max_fitness = max(fitness_scores)
        print(
            "Generation",
            generation,
            "- Best Fitness:",
            max_fitness,
            "- Avg Fitness:",
            np.mean(fitness_scores),
        )

    # next gen parents
    selected_parents = random.choices(
        population, weights=fitness_scores, k=POPULATION_SIZE // 3
    )

    # create next gen
    new_population = []
    for i in range(POPULATION_SIZE):
        if random.random() < MUTATION_RATE:
            new_population.append(mutate(random.choice(selected_parents)))
        else:
            agent1 = random.choice(selected_parents)
            agent2 = random.choice(selected_parents)
            new_population.append(reproduce(agent1, agent2))

    population = new_population

# print best agent
best_agent = max(population, key=fitness)
print()
print("Best Agent ->", [strat.__name__ for strat in best_agent])

# 5ish mins on i5

In [None]:
print("!FINAL BOSS!\n1000 matches VS EXPERT AGENT") # 30ish secs on i5
# if around 500 its basically as good as EXPERT AGENT
print(
    "Evolved Agent ->",
    sum([nim_match(best_agent)[0] for _ in range(1000)]),
    "won!",
)


print(
    "Random Agent  ->",
    sum([nim_match(generate_random_agent_2())[0] for _ in range(1000)]),
    "won!",
)

### LATEST RUN
```
Generation 0 - Best Fitness: 6 - Avg Fitness: 1.6
Generation 5 - Best Fitness: 8 - Avg Fitness: 4.35
Generation 10 - Best Fitness: 9 - Avg Fitness: 5.2
Generation 15 - Best Fitness: 9 - Avg Fitness: 5.65
Generation 20 - Best Fitness: 9 - Avg Fitness: 5.85
Generation 25 - Best Fitness: 8 - Avg Fitness: 4.95
Generation 30 - Best Fitness: 8 - Avg Fitness: 4.95
Generation 35 - Best Fitness: 8 - Avg Fitness: 5.35
Generation 40 - Best Fitness: 9 - Avg Fitness: 5.35
Generation 45 - Best Fitness: 7 - Avg Fitness: 4.35
Generation 50 - Best Fitness: 10 - Avg Fitness: 5.9
Generation 55 - Best Fitness: 7 - Avg Fitness: 5.05
Generation 60 - Best Fitness: 8 - Avg Fitness: 5.8
Generation 65 - Best Fitness: 8 - Avg Fitness: 5.15
Generation 70 - Best Fitness: 7 - Avg Fitness: 4.5
Generation 75 - Best Fitness: 8 - Avg Fitness: 5.4
Generation 80 - Best Fitness: 8 - Avg Fitness: 5.25
Generation 85 - Best Fitness: 8 - Avg Fitness: 4.7
Generation 90 - Best Fitness: 9 - Avg Fitness: 5.6
Generation 95 - Best Fitness: 8 - Avg Fitness: 5.55

Best Agent -> ['adaptive1', 'adaptive1', 'adaptive1', 'pure_random', 'pure_random', 'pure_random', 'pure_random', 'adaptive1', 'adaptive1', 'pure_random', 'gabriele', 'pure_random', 'gabriele', 'adaptive1', 'pure_random', 'gabriele', 'gabriele', 'pure_random', 'pure_random', 'gabriele', 'gabriele', 'gabriele', 'pure_random', 'pure_random', 'gabriele']

!FINAL BOSS!
1000 matches VS EXPERT AGENT
Evolved Agent -> 519 won!
Random Agent  -> 151 won!
```