# **Games**
We can see games with opponents (multi-agent environment) as search (in way broader search space) with the following properties:
* Two-player games (MIN and MAX) in which moves are alternated and players have **complementary objective functions**;
* Games with **perfect knowledge** in which players have the same information;

The development of a match can be interpeted as a tree, in which: 
* The root is the starting position;
* Leaves are the final positions.


# The minimax algorithm
The minimax algorithm is designed to determine the optimal strategy for MAX and to suggest the first best move to perform, assuming that MIN plays at his best.
We're not interested on the path but only in the next move and then propagate back.

>To evaluate a node $n$ in a game tree:
1. Put $n$ in $L$ (a list of open nodes);
1. Let $x$ be the first node in $L$;
1. If $x=n$ and there's a value assigned to it, then return the value.
1. Otherwise, if $x$ has an assigned value $V_x$, $p$ is the father of $x$ with provisional value $V_p$;
1. If $p$ is a MIN node then $V_p=\min(V_p,V_x)$, otherwise is a MAX node and then $V_p=\max(V_p,V_x)$;
1. Remove $x$ from $L$ and return to step 2.
1. If $x$ has no value assigned and is a leaf node, assign it either 1, 0 or -1, and put it in $L$ because we need to update his ancestors and return to step 2.
1. If $x$ has no value assigned and isn't a terminal node, assing $V_x=-\infty$ if it's a MAX or $V_x=+\infty$ if it's a MIN, then add his children to $L$ and return to step 2.

Property | Value
--- | ---
Complete | Yes, if the tree is finite (can be huge)
Optimal | Yes, against an opponent that plays at his best
Temporal complexity | $\mathcal{O}(b^m)$, pruning is needed
Space complexity | $\mathcal{O}(bm)$

If we have to develope the entire tree the procedure is very inefficient.
A solution can be look forward few levels and asses the configuration of a non-terminal nodes, using an evalutation function for estimating the quality of a node:
* $e(n)=+1$ if MAX wins;
* $e(n)=-1$ if MIN wins;
* $e(n)=0$ if they have same probability.

It's possible to use intermediate values for $e(n)$

For example, in chess we add the value of all pieces of each player and normalize: $\text{Eval}(s)= w_1 f_1(s) + w_2 f_2(s) + \dots + w_n f_n(s)$.
There'll be a trade off between search and evaluation function, but in any case the heuristic want to capture the knowledge of the game (use of domain experts, learning from data).

>To evaluate a node $n$ in a game tree:
1. Put $n$ in $L$ (a list of open nodes);
1. Let $x$ be the first node in $L$;
1. If $x=n$ and there's a value assigned to it, then return the value.
1. Otherwise, if $x$ has an assigned value $V_x$, $p$ is the father of $x$ with provisional value $V_p$;
1. If $p$ is a MIN node then $V_p=\min(V_p,V_x)$, otherwise is a MAX node and then $V_p=\max(V_p,V_x)$;
1. Remove $x$ from $L$ and return to step 2.
1. If $x$ has no value assigned and is a leaf node, **you can decide not to expand the tree further and assign it's value using the evaluation function $e(n)$**. Then, put the node in $L$ because we need to update his ancestors and return to step 2.
1. If $x$ has no value assigned and isn't a terminal node, assing $V_x=-\infty$ if it's a MAX or $V_x=+\infty$ if it's a MIN, then add his children to $L$ and return to step 2.

*But how we decide if to expand a node or not?* A simple solution from the computational point of view would be: always expand up to a certain depth.
Although, more tactically complicated moves, with higher variance for $e(n)$ should be evaluated with higher depth, until quiescence, where the value of $e(n)$ changes more slowly.

**Horizon effect**: you have a chess program that looks exactly 4 moves ahead. It's not going to be able see anything after 5 moves. If there is a bad outcome for them, sometimes they will make poor moves just to push that outcome past move 4 (the “horizon”), and not being able to see it, it thinks it has avoided it.

**Solution**: sometimes it pays to do a secondary search, focused on the best move choice




# Alfa-beta cuts
The pruning seen so far consists in computers playing all possible matches up to a certain depth, evaluate leaves and propagate back. 
In this way they also consider moves and nodes that will never occour.
We can try to reduce further the search space.

Consider a node $N$ in the tree. If the player had a better choice $M$ in the parent node level or at any point along the path, $N$ will never be selected, and so we can **eliminate it**.
* ALFA - the value of the best choice found on the path for MAX (the highest);
* BETA - the value of the best choice found on the path for MIN (the lowest).

The algorithm updates ALFA and BETA and cuts branches when their choice is worse.

# The principle
We generate depth-first search tree, left to right and we propagate the estimated values from the leaves:
* The temporary values in MAX nodes are ALPHA values;
* The temporary valued in MIN nodes are BETA values;
* If a ALPHA value is greater than or equal than a BETA value of a descending node then stop the generation of children of the descending node (MIN);
* If a BETA value is smaller than or equal than a ALPHA value of a descending node then stop the generation of children of the descending node (MAX);

# The algorithm
>To evaluate a node $n$ in a game tree:
1. Put $n$ in $L$ (a list of open nodes);
1. Let $x$ be the first node in $L$;
1. If $x=n$ and there's a value assigned to it, then return the value.
1. Otherwise, if $x$ has an assigned value $V_x$, $p$ is the father of $x$. If $x$ ha no assigne value jump to step 5;
* We need to determine if $p$ and its children can be removed. If $p$ is a MIN node then ALPHA is the maximum of all current values assigned to the brothers of $p$ and of the nodes that are ancestors of $p$.
* If there are no such values then $\text{ALPHA}=-\infty$;
* If $V_x \le \text{ALPHA}$ then remove $p$ and all its descendants from $L$;
5. If $p$ cannot be eliminated, let $V_p$ be its current value. If $p$ is a MIN node then $V_p=\min(V_p,V_x)$, otherwise is a MAX node and then $V_p=\max(V_p,V_x)$. Remove $x$ from $L$ and return to step 2;
1. If $x$ has no value assigned and is a terminal node, we can decide to not further expand the tree and use an evaluation function $e(n)$. Leave $x$ in $L$ because we need to update his ancestors and return to step 2;
1. If $x$ has no value assigned and isn't a terminal node, assing $V_x=-\infty$ if it's a MAX or $V_x=+\infty$ if it's a MIN, then add his children to $L$ and return to step 2.