In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from GomokuTools import N_9x9
from IPython.display import HTML
from tools import python_to_html

# Heuristics: Learning from Humans
Alpha zero started from scratch and explored the world of go with no previous knowledge but the rules. Typically though, some human knowledge can jump-start the learning process. That's why we give our agent a head-start with some truly not-so-rigorous (but still quite smart) heuristics. We'll make sure though that the agent will be able to extend its understanding beyond its initial knowlegde and eventually even abandon the heuristics in favour of what it has learned by itself.

---
## Heuristic Score

In [3]:
from HeuristicScore import HeuristicScore

The heuristic score tries to naively measure the *value* of a position, i.e. the importance with which one should consider putting a stone onto it.

```HeuristicScore``` uses the 2-byte = 2x 8 bit representation of a line of 9 positions. For example:
```
[[0, 0, 1, 0,      0, 0, 0, 0],
 [0, 0, 0, 0,      0, 0, 1, 1]]
```
means there is a black stone in one direction (say: left) with distance 2 and there are 2 white stones in the opposite direction (say: right) with distances 3 and 4. The line could as well be represented by a string like:
```
- - x - * - - o o 
```
The actual direction of the viewpoint don't matter here. What matters is that the position marked by ```*``` would create an wide-split-3 threat in favor of white.

In [4]:
# Uncomment to view the code
# HTML(python_to_html("HeuristicScore.py"))

Let's create a particular line with an obvious threat and see what our heuristic score function says about it:

In [5]:
h = HeuristicScore(kappa0=1.6, kappa1=5)

See [TuningKappa.ipynb](TuningKappa.ipynb) to play with other values for $\kappa_i$. I just tried a bit and ended up with the above.

In [6]:
def to_bits (line):
    n = N_9x9().setline('e', line)
    line = n.as_bits()[0]    
    return line
               
to_bits([ 0, 1, 1, 0, 1, 0, 0, 2])

[[0, 1, 1, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]]

Hint: Check out [GomokuTools.ipynb](GomokuTools.ipynb) to understand what N_9x9 is all about.


To estimate the *value* of the center position, we need to look at the largest free sub-line.

---
#### Adversary-free range

In [7]:
h.f_range?

[0;31mSignature:[0m [0mh[0m[0;34m.[0m[0mf_range[0m[0;34m([0m[0mline[0m[0;34m,[0m [0mfof[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m [0medge[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0m
[0;31mDocstring:[0m
The largest adversary-free range within a given line

Args:
    line: 8x2 integer array that represents the stones
    fof:  friend or foe? 0 to look at black, 1 to consider white
[0;31mFile:[0m      ~/workspace/tutorials/other_stuff/DeepGomoku/HeuristicScore.py
[0;31mType:[0m      method


In [8]:
h.f_range(to_bits([ 0, 1, 1, 0, 1, 0, 2, 2]), fof=0)

array([0, 1, 1, 0, 1, 0])

---
#### Scoring a single line
First count the max number of my stones of any subset of adjacent 5 (including the empty center field), and count how many subsets have that max number of my stones:

In [9]:
h.cscore(to_bits([ 0, 1, 1, 0, 1, 0, 0, 2]), fof=0)

(3, 2)

An adverary on one end of the line reduces the value:

In [10]:
print(h.score(to_bits([ 0, 1, 1, 0, 1, 0, 0, 2]), fof=0))
print(h.score(to_bits([ 2, 1, 1, 0, 1, 0, 0, 2]), fof=0))

3.4460950649911055
3.0


The position amongst the three is better than the one on the edge:

In [11]:
print(h.score(to_bits([ 0, 1, 1, 0, 1, 0, 0, 2]), fof=0))
print(h.score(to_bits([ 1, 1, 0, 1, 0, 0, 2, 0]), fof=0))

3.4460950649911055
3.0


A score of $3$ is considered a serious threat that could participate in a sure-win threat sequence.

---
### Scoring crossing lines

The main idea of the implementation is that the value of the field is some kind of sum of the values of the intersecting threat lines. Since the ideas implemented here miss any rigor, it's even hard to explain. I just felt that some kind of pythagorean sum with exponents $\kappa_0$ other than $2$ (```kappa0``` in the code) would be a natural choice. We'll need to find the actual values for hyper-parameters $\kappa_0$ and $\kappa_1$ by some form of hyperparameter tuning. That's the impurity of heuristics, folks!

In [12]:
import inspect
print(inspect.getsource(HeuristicScore.total_score))

    def total_score(self, lines, fof=0, edges=[None, None, None, None]):
        """
        total score of the given list of lines
        """
        scores = [self.score(line, fof, edge=edge) for line, edge in zip(lines, edges)]
        return sum(s**self.kappa0 for s in scores)**(1/self.kappa0) 



Two open-3s: Both not too critical.

In [16]:
print(h.score(to_bits([ 0, 0, 1, 1, 0, 0, 0, 2]), fof=0))
print(h.score(to_bits([ 0, 1, 1, 0, 0, 2, 0, 2]), fof=0))

2.29739670999407
2.29739670999407


But if these lines cross, things get tough.
Also, observe that the simple heuristic rule is indeed capable of estimating the effect of a defensive stone. 

In [15]:
print(h.total_score([
    to_bits([ 0, 0, 1, 0, 1, 0, 0, 2]),
    to_bits([ 1, 1, 0, 0, 0, 2, 0, 2])], fof=0))

print(h.total_score([

    #            |-- A single defensive stone            
    #            V    
    to_bits([ 0, 2, 1, 0, 1, 0, 0, 2]),
    to_bits([ 1, 1, 0, 0, 0, 2, 0, 2])], fof=0))

3.3185059187118733
3.084421650815882
