# Assignment 1b: Adversarial Minesweeper
In the previous assignment, you were introduced to heuristic-based graph search using the game *Minesweeper*. In that assignment, you were also introduced to several techniques for creating and training agents to play Minesweeper. In this assignment, you will explore ways to understand and hinder the performance of such agents. As usual, execute the following cell to begin the assignment.

In [None]:
!pip install -q git+https://github.com/RobertTLange/evosax.git@main

import random
import statistics
import evosax
from evosax import ParameterReshaper, FitnessShaper
import jax
import jax.numpy as jnp
from flax import linen as nn
import numpy as np
from minesweeper import MinesweeperGame, render_map
import json
import utils1b


example_instances = {
    'beginner' : [
        [0,0,0,0,0,1,0,0,0],
        [0,0,0,0,0,0,0,0,0],
        [0,0,0,0,0,0,0,0,0],
        [0,0,0,0,0,0,0,0,1],
        [0,0,0,1,0,1,0,0,0],
        [0,1,0,0,0,0,0,0,1],
        [0,0,1,0,0,0,0,1,1],
        [0,0,0,0,0,0,0,1,0],
        [0,0,0,0,0,0,0,0,0]
    ],
    
    'intermediate' : [
        [0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
        [0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0],
        [1,1,0,1,0,0,0,1,0,0,1,0,0,0,0,1],
        [0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0],
        [0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1],
        [1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0],
        [0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1],
        [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
        [0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0],
        [0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0],
        [0,0,0,0,1,1,0,0,0,1,0,0,1,0,0,0],
        [0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0],
        [0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0],
        [1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0],
        [1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0],
        [0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1]
    ],
    
    'expert' : [
        [1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1],
        [0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0],
        [0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0],
        [0,0,0,0,1,1,0,0,1,0,0,1,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1],
        [0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0],
        [0,1,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0],
        [0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0],
        [0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0,1,0,1,1,0],
        [0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],
        [0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0],
        [0,1,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0],
        [0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0],
        [0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],
        [0,0,0,0,0,1,0,0,0,1,1,0,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0],
        [0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0],
        [0,0,1,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0]
    ]
}

print('The first cell has been executed!')

In this assignment, we'll provide you with a pretrained agent that performs meaningfully well on the minesweeper instances used in the previous assignment. Execute the following cell to load the parameters of this agent and evaluate its performance.

In [None]:
with open('./reference_agent.json') as file:
    encoded_params = json.load(file)
reference_agent = utils1b.agent_decoder(encoded_params)
network = utils1b.MLP(num_hidden_units=48, num_hidden_layers=2, num_output_units=1)
network_apply = jax.jit(network.apply)
for name, map_instance in example_instances.items():
    score, solution, num_actions = utils1b.neural_network_agent(map_instance, network_apply, reference_agent, enforce_reachability=True)
    print(f'{name} score: {score}')
    print(f'number of actions on {name} instance until game over: {num_actions}')
    print(f'{name} solution:')
    render_map(solution, compact=False)

Recall that in Assignment 1a, we decided to use neuroevolution to train an agent using a deterministic heuristic in an effort to achieve performances comparable to the best scores achieved by a stochastic agent that uniform-randomly selected actions. There is an important assumption behind the decision to use a deterministic heuristic that we overlooked though. Namely, for a deterministic agent of this nature to perform optimally on every instance of Minesweeper, the agent would need to be able to discern the best action to make without ambiguity. Simply put, this is impossible in our formulation of Minesweeper.

Consider a Minesweeper instance where the initial space revealed at the start of the game contains a `1`. This value indicates that one of the eight surrounding cells contains a mine. At this point, the agent must select one of the surrounding eight cells for the next move, but there simply isn't enough information to *guarantee* the selection of a cell without a mine. Furthermore, the deterministic behavior of our agent guarantees that the same action would be selected every time. In other words, regardless of where other mines may be placed on the map, when our agent encounters a starting cell with a value of `1`, it will always select the same cell. The agent may select a different cell if the value were `2`, for example, but, because the agent is deterministic, the agent effectively has a hard-coded action it will always make for each non-zero value of the starting cell. The behavior we've just discussed is inherent to any *deterministic agent* we might try on our version of Minesweeper and not, for example, a consequence of using a neural network, specifically.

Despite the parameters of the neural network being unintelligible, and perhaps being unfamiliar with neural networks on the whole, you can now begin to characterize and (to a limited extent) reason about the agent's behavior. We'll leverage this insight to begin creating Minesweeper problem instances that exploit this understanding and impact the agent's performance.

## Minesweeper Problem Instances
As was demonstrated earlier in this notebook and in Assignment 1a, Minesweeper problem instances are defined as 2D binary lists where cells with a value of `1` contain a mine and cells with a value of `0` do not contain a mine. While there are almost certainly Minesweeper instances which will cause crashes and unintended behaviors in the Minesweeper environment, our goal is not to find them. In general, adhere to the following rules when completing exercises in this assignment:
1. the minimum board size is 5x5, as this is the size of the box used to generate observations;
2. the starting location must not contain a mine (by default, this is the middle of the board);
3. rows and columns in the binary list must be of a consistent size; and
4. the binary values in the 2D list are expressed as integers (as opposed to Bools), for conciseness.

## Minesweeper Scoring
In order to understand agent performance on Minesweeper, we should understand how score is calculated on Minesweeper. This was briefly mentioned in Assignment 1a, but (for the sake of clarity) score is calculated using the following equation:  
$score = revealed\_cells/mineless\_cells$.

Recall that our version of Minesweeper is modified from the original to make it a graph traversal problem. We can only select cells immediately diagonal or adjacent to already revealed cells. As a consequence, of the location selected at the start of the game and the graph of cells connected to that cell, it is possible to define Minesweeper problem instances with inaccessible locations (as reaching these cells would require traversing cells containing mines). As such, when `enforce_reachability=True`, $mineless\_cells$ is calculated using only cells without mines that are reachable from the start location. When `enforce_reachability=False`, we calculate $mineless\_cells$ using all cells without mines, regardless of the ability to reach these locations from the start location.

## Exercise 0
Implement a Minesweeper problem instance, with a single mine, in which the agent fails in their first action. We define failure in this case as making an action to select a cell containing a mine.

In [None]:
# Modify this problem instance to define mine placement
exercise_0 = [
    [0,0,0,0,0],
    [0,0,0,0,0],
    [0,0,0,0,0],
    [0,0,0,0,0],
    [0,0,0,0,0]
]

starting_mine, mine_count = utils1b.map_check(exercise_0)
if starting_mine:
    print('Error: mine place in default starting location')
elif mine_count == 0:
    print('Please define mine placement in exercise_0')
elif mine_count > 1:
    print('Please place only a single mine for this exercise')
else:
    score, solution, num_actions = utils1b.neural_network_agent(exercise_0, network_apply, reference_agent, enforce_reachability=True)
    print(f'Exercise 0 score: {score}')
    print(f'Number of actions: {num_actions}')
    print(f'Exercise 0 solution:')
    render_map(solution, compact=False)

## Exercise 1
Implement a Minesweeper problem instance, with a single mine, in which the agent achieves a complete solution in their first action. In other words, the first action of the agent should reveal all cells without a mine.

In [None]:
# Modify this problem instance to define mine placement
exercise_1 = [
    [0,0,0,0,0],
    [0,0,0,0,0],
    [0,0,0,0,0],
    [0,0,0,0,0],
    [0,0,0,0,0]
]

starting_mine, mine_count = utils1b.map_check(exercise_1)
if starting_mine:
    print('Error: mine place in default starting location')
elif mine_count == 0:
    print('Please define mine placement in exercise_1')
elif mine_count > 1:
    print('Please place only a single mine for this exercise')
else:
    score, solution, num_actions = utils1b.neural_network_agent(exercise_1, network_apply, reference_agent, enforce_reachability=True)
    print(f'Exercise 1 score: {score}')
    print(f'Number of actions: {num_actions}')
    print(f'Exercise 1 solution:')
    render_map(solution, compact=False)

## Exercise 2
Implement a Minesweeper problem instance in which the agent selects a different cell for their starting move. Hint: you can use as many mines as you like.

In [None]:
# Modify this problem instance to define mine placement
exercise_2 = [
    [0,0,0,0,0],
    [0,0,0,0,0],
    [0,0,0,0,0],
    [0,0,0,0,0],
    [0,0,0,0,0]
]

starting_mine, mine_count = utils1b.map_check(exercise_2)
if starting_mine:
    print('Error: mine place in default starting location')
elif mine_count == 0:
    print('Please define mine placement in exercise_2')
else:
    score, solution, num_actions = utils1b.neural_network_agent(exercise_2, network_apply, reference_agent, enforce_reachability=True)
    print(f'Exercise 2 score: {score}')
    print(f'Number of actions: {num_actions}')
    print(f'Exercise 2 solution:')
    render_map(solution, compact=False)

## Exercise 3
Implement a Minesweeper problem instance in which the agent achieves a score less than 0.01. You are welcome to use a board of whatever size and number of mines you deem appropriate, but be sure to assign your problem instance to the `exercise_3` variable.

In [None]:
# Define your problem instance here
exercise_3 = None

starting_mine, mine_count = utils1b.map_check(exercise_3)
if starting_mine:
    print('Error: mine place in default starting location')
elif mine_count == 0:
    print('Please define mine placement in exercise_3')
else:
    score, solution, num_actions = utils1b.neural_network_agent(exercise_3, network_apply, reference_agent, enforce_reachability=True)
    print(f'Exercise 3 score: {score}')
    print(f'Number of actions: {num_actions}')
    print(f'Exercise 3 solution:')
    render_map(solution, compact=False)

## Exercise 4
Implement a Minesweeper problem instance in which the agent achieves a score of 0.50 $\pm$ 0.01. To simplify this exercise, we'll calculate game score with `enforce_reachability=False`, so that you can accomplish this task by simply making spaces inaccessible to the normal traversal of the agent. You are welcome to use a board of whatever size and number of mines you deem appropriate, but be sure to assign your problem instance to the `exercise_4` variable.

In [None]:
# Define your problem instance here
exercise_4 = None

starting_mine, mine_count = utils1b.map_check(exercise_4)
if starting_mine:
    print('Error: mine place in default starting location')
elif mine_count == 0:
    print('Please define mine placement in exercise_4')
else:
    score, solution, num_actions = utils1b.neural_network_agent(exercise_4, network_apply, reference_agent, enforce_reachability=False)
    print(f'Exercise 4 score: {score}')
    print(f'Number of actions: {num_actions}')
    print(f'Exercise 4 solution:')
    render_map(solution, compact=False)

## Exercise 5
Implement a Minesweeper problem instance in which the agent achieves a score of 0.50 $\pm$ 0.01. To simplify this exercise, but this time we'll calculate game score with `enforce_reachability=True`. You must guarantee, then, that it is possible to reach the ~50% of non-mine cells the agent fails to select. You are welcome to use a board of whatever size and number of mines you deem appropriate, but be sure to assign your problem instance to the `exercise_5` variable.

In [None]:
# Define your problem instance here
exercise_5 = None

starting_mine, mine_count = utils1b.map_check(exercise_5)
if starting_mine:
    print('Error: mine place in default starting location')
elif mine_count == 0:
    print('Please define mine placement in exercise_5')
else:
    score, solution, num_actions = utils1b.neural_network_agent(exercise_5, network_apply, reference_agent, enforce_reachability=True)
    print(f'Exercise 5 score: {score}')
    print(f'Number of actions: {num_actions}')
    print(f'Exercise 5 solution:')
    render_map(solution, compact=False)