# Board Games

The current situation is representable by finite set of positions. There is a finite set of moves players can play. The effects of actions are deterministic. The game ends when a terminal position is reached. The terminal position is reached after a finite number of steps. Terminal positions yield a utility. No randomness, no hidden information.

For now we consider a two players called MAX and MIN. Both observe the entire position. It is the turn of exactly one player in each non-terminal position. Utility for MAX is opposite of utility for MIN. MAX aims to maximize utility, MIN aims to minimize utility.

## Classification
- Static
- Deterministic
- Fully observable
- Discrete
- Multi agent (adversarial)
- Problem specific

The objective of the agent is to
- Compute a strategy
- That determines which move to execute
- In the current position or in any reachable position

The performance measure is then to maximize the utility.

## Definition

A game is a 7-tuple $\mathcal{S} = \langle S, A, T, s_I, S_G, utility, player \rangle$ with
- Finite set of positions $S$
- Finite set of moves $A$
- Deterministic transition relation $T \subseteq S \times A \times S$
- Initial positions $s_I \in S$
- Set of terminal positions $S_G \subseteq S$
- Utility function $utility: S_G \rightarrow \mathbb{R}$
- Player function $player: S \textbackslash S_G \rightarrow \{MAX, MIN\}$

## Strategie

Let $\mathcal{S} = \langle S, A, T, s_I, S_G, utility, player \rangle$ be a game and let $S_{MAX} = \{s \in S | player(s) = MAX\}$. A partial strategy for player MAX is a function

$$
\pi: S'_{MAX} \mapsto A
$$

with $S'_{MAX} \subseteq S_{MAX}$ and $\pi(s) = a$ implies that $a$ is applicable in $s$. If $S'_{MAX} = S_{MAX}$, then $\pi$ is called a total strategy (or strategy).

We consider approaches that must be tailored to a specific board game for good performance, e.g., by using a suitable evaluation function.

## Algorithms

Properties of a good algorithm for board games is
- Look ahead as far as possible (deep search)
- Consider only interesting parts of the game tree (selective search, analogously to heuristic search algorithms)
- Evaluate current position as accurately as possible (evaluation function, analogously to heuristics)

## Minimax Search

Idea
- DFS in game tree
- Determine utility value of terminal position with utility function
- Strategy: action that maximizes utility value (minimax decision)
- Compute utility value of inner nodes from below to above through the tree:
    - MIN's turn: utility value is minimum of utility values of children
    - MAX's turn: utility value is maximum of utility values of children

function minimax(p)

if p is terminal position:

- return $\langle utility(p), none \rangle$

best_move = none

if player(p) = MAX:

- v = $-\infty$

else:

- v = $\infty$

for each $\langle move, p' \rangle \in succ(p)$:
- $\langle v', best\_move\rangle = minimax(p)$
- if (player(p) = MAX and $v'>v$) or (player(p) = MIN and $v' < v$):
    - $v = v'$
    - best_move = move

return $\langle v, best\_move \rangle$

Minimax is the simplest search algorithm for games. It yields a optimal strategy (in game theoretic sence, i.e., under the assumption that the opponent plays perfectly)
MAX obtains at least the utility value computed for the root, no matter how MIN plays.
If MIN plays perfectly, MAX obtains exactly the computed value.

if the size of the game tree is too big for minimax, an alternative would be alpha-beta search.

## Evaluation Functions

Let $\mathcal{S}$ be a game with set of positions $S$. An evaluation function for $\mathcal{S}$ is a function

$$
h:S \rightarrow \mathbb{R}
$$

which assigns a real-valued number to each position $s \in S$

Due to the game tree being too big, we search only up to a predefined depth. If this depth is reached, we estimate the utility value according to heuristic criteria. High values should relate to high "winning chances", and at the same time, the evaluation function should be efficiently computable in order to be able to search deeply.

### Linear

Expert knowledge is often represented with weighted linear functions:

$$
h(s) = w_0 + w_1 f_1(s) + ... + w_n f_n(s)
$$

where $w_i$ are weights and $f_i$ are features.

This assumes that feature contributions are mutually independant. Features are usually provided by human expers. Weights are provided by human expers or learned automatically.

Alternative: Evaluation function based on neural networks
- Value network takes position features as input (usually provided by human experts)
- And outputs utility value prediction
- Weights of network learned automatically

- Objective: Search as deeply as possible withing a given time
- Problem: Search time difficult to predict
- Solution: Iterative deepening
    - Sequence of searches of increasing depth
    - Time expires: Return result of previously finished search
    - Overhead acceptable
- Refinement: Search deeped in turbulent states -> quiescence search