# Lab 3: Policy Search

## Task

Write agents able to play [*Nim*](https://en.wikipedia.org/wiki/Nim), with an arbitrary number of rows and an upper bound $k$ on the number of objects that can be removed in a turn (a.k.a., *subtraction game*).

The player **taking the last object wins**.

* Task3.1: An agent using fixed rules based on *nim-sum* (i.e., an *expert system*)
* Task3.2: An agent using evolved rules
* Task3.3: An agent using minmax
* Task3.4: An agent using reinforcement learning

## Instructions

* Create the directory `lab3` inside the course repo 
* Put a `README.md` and your solution (all the files, code and auxiliary data if needed)

In [1]:
%load_ext autoreload
%autoreload 2

from task1_lib import gabriele, pure_random, strategy_v3
from task2_lib import run_GA, strategy_ga
from nim_utils import evaluate, evaluate_GA, play_match

# Task 1

In [2]:
NUM_MATCHES = 100
NIM_SIZE = 10
K_SIZE = None

#print(f"Win-rate against {gabriele.__name__}: {evaluate(strategy_v3, gabriele, NUM_MATCHES, NIM_SIZE, k_size=K_SIZE)}")
#print(f"Win-rate against {pure_random.__name__}: {evaluate(strategy_v3, pure_random, NUM_MATCHES, NIM_SIZE, k_size=K_SIZE)}")


# Task 2

In [7]:
best_genome = run_GA()
print(f"Win-rate against gabriele: {evaluate_GA(best_genome, strategy_ga, gabriele, NUM_MATCHES, NIM_SIZE, k_size=K_SIZE)}")
print(f"Win-rate against pure_random: {evaluate_GA(best_genome, strategy_ga, pure_random, NUM_MATCHES, NIM_SIZE, k_size=K_SIZE)}")

[info] - Start generating the population


100%|██████████| 30/30 [00:01<00:00, 27.44it/s]


[info] - Evolving...


100%|██████████| 100/100 [02:42<00:00,  1.63s/it]

[info] - Best genome found is {'alpha': 0.13455870481228693, 'beta': 0.9799679641612153, 'gamma': 0.9016061679554783} with fitness: (1.0, 0.96)
Win-rate against gabriele: 1.0
Win-rate against pure_random: 0.89





# Fr vs Al
## Al genome: {'alpha': 0.47840317321043324, 'beta': 0.015350681808749878, 'gamma': 0.9791301988440648}, {'alpha': 0.10717668577545536, 'beta': 0.10019660149353893, 'gamma': 0.9943791635871431
## Fr genome:


In [4]:
import random
from nimply import Nim, Nimply, cook_status_t2

def strategy_gaAl(state: Nim, genome) -> Nimply:
    cooked = cook_status_t2(state)
    alpha = genome["alpha"]
    beta = genome["beta"]
    gamma = genome["gamma"]
    outcome = random.choices([True, False], weights=[gamma, 1 - gamma], k=1)[0]

    if outcome:

        if cooked["active_rows_number"] % 2 == 1:
            row = random.choice([i for i, e in enumerate(state.rows) if e > 0])
            num_objects = state.rows[row]

        else:
            if state.rows[cooked["longest_row"]] > 1:
                row = random.choice([i for i, e in enumerate(state.rows) if e > 1])
                num_objects = state.rows[row] - 1
            else:
                row = random.choice([i for i, e in enumerate(state.rows) if e > 0])
                num_objects = state.rows[row]
    else:
        row = random.choices(
            [
                random.choice(cooked["over_avg_rows"]),
                random.choice(cooked["under_avg_rows"])
            ],
            weights=[alpha, 1 - alpha],
            k=1)[0]

        num_objects = random.choices(
            [
                1,
                random.randint(1, state.rows[row])
            ],
            weights=[beta, 1 - beta],
            k=1)[0]

    return Nimply(row, num_objects)


def challenge(genomeFr, strategy_gaFr, genomeAl, strategy_gaAl, num_matches=100, nim_size=10, k_size=None):
    won = 0

    for m in range(num_matches):
        nim = Nim(nim_size, k=k_size)
        player = random.randint(0, 1)
        while nim:
            if player == 0:
                ply = strategy_gaFr(nim, genomeFr)
            else:
                ply = strategy_gaAl(nim, genomeAl)
            nim.nimming(ply)
            player = 1 - player
        if player == 1:
            won += 1
    return won / num_matches

# Task 3

# Task 4

## Oversimplified match

In [5]:
play_match(strategy_v3, pure_random, 10, k_size=None)

status: After player 0 -> <1 3 5 7 9 1 13 15 17 19>
status: After player 1 -> <0 3 5 7 9 1 13 15 17 19>
status: After player 0 -> <0 3 5 0 9 1 13 15 17 19>
status: After player 1 -> <0 3 5 0 9 1 13 15 15 19>
status: After player 0 -> <0 3 5 0 9 1 13 15 1 19>
status: After player 1 -> <0 3 5 0 9 1 13 13 1 19>
status: After player 0 -> <0 1 5 0 9 1 13 13 1 19>
status: After player 1 -> <0 1 5 0 3 1 13 13 1 19>
status: After player 0 -> <0 1 5 0 3 1 1 13 1 19>
status: After player 1 -> <0 1 5 0 3 1 1 13 1 15>
status: After player 0 -> <0 1 5 0 3 1 1 1 1 15>
status: After player 1 -> <0 0 5 0 3 1 1 1 1 15>
status: After player 0 -> <0 0 5 0 3 1 1 0 1 15>
status: After player 1 -> <0 0 0 0 3 1 1 0 1 15>
status: After player 0 -> <0 0 0 0 3 1 0 0 1 15>
status: After player 1 -> <0 0 0 0 3 1 0 0 0 15>
status: After player 0 -> <0 0 0 0 3 1 0 0 0 0>
status: After player 1 -> <0 0 0 0 2 1 0 0 0 0>
status: After player 0 -> <0 0 0 0 1 1 0 0 0 0>
status: After player 1 -> <0 0 0 0 1 0 0 0 0 0>
st