# Exercise 3: MENACE <img src="kaip_logo_header.png" align="right">

## Learning in machines

The ability to learn is not unique to the biological world, and is captured in our neural network models. However, the concept of machine learning goes against many of the commonly held beliefs about computersö that they can only do what they are programmed to do and cannot adapt to their surroundings.

Whilst this is true on an atomic level that the program controls the machine, the behaviour that results does not have to be so rigid and deterministic. Having a computer learn to respond correctly to any given input, or learn to play a game, is not a simple concept and it is often felt that complicated programs and systems are required to achieve such behaviour.

*MENACE* (Matchbox Educable Noughts And Crosses Engine) is a machine developed by Donale Michie in the early 1960's that learns to play the game of tic-tac-toe (noughts and crosses).

<img src="tic-tac-toe.png" width="25%"/>

MENACE consists of 288 matchboxes, one for every possible distinct board position that the opening player can encounter. Each matchbox is then filled with a random selection of coloured beads, each colour representing a move to a corresponding colour on the board. The game is played by selecting at random a bead from the matchbox that corresponds to the current board position, with the colour selected determining the machine's move. The first game is played, with the machine moving completely at random. >When the game is over, the outcome is fed back into the machine so that it can adapt its behaviour in light of the outcome. In other words, **it can learn to play better next time**.

<img src="MENACE.jpg">

This is achieved by reinforcing all the moves that were ultimately successful when the machine won, and by decreasing the chance of it making the same bad moves that led to a defeat. Learning therefore becomes adding a bead of teh same colour to bnoxes represemntinmg a successful series of moves, or by removing a bead of the colour that led to defeat. A draw means that the number of beads remains the same. This slow process of learning continues until the probability of the machine making a good move far outweights the chance of it making a bad one. Once it has learnt, the machine is then almost invincible, and the best that can be hoped for is to consistently draw with it.



**What are the possible states in tic-tac-toe?**

**Why does MENACE require far fewer states?**

(Much) fewer than 3^9 = 19683 but today we can afford to represent them all.

For this exercise, we can model the mechanisms of MENACE with Python code, and play tic-tac-toe using a set of pre-defined functions.

In [None]:
# a dictionary with states as keys, and gum drops for possible moves.

def board2string(board): # : Board -> String
    str = ''
    for i in range(3):
        for j in range(3):
            str += '|' + board[i][j]
        str += '|\n'
    return str

def printBoard(board):  # : Board -> IO
    print(board2string(board))
    
testBoard = [['X', ' ', ' '], [' ', 'O', ' '], ['X', ' ', ' ']]

def initGumDrops(board):   # : Board -> List Pos
    gd = []
    for i in range(3):
        for j in range(3):
            if (board[i][j] == ' '):
                gd += [(i, j)] * 30;
    return gd

import random

def pickGumDrop(gs):       # : List Pos ~> Pos  Not a function!
    g = random.choice(gs)
    gs.remove(g)
    return g

def putBackGumDrop(gs, g): # : (List Pos, Pos) -> None  Not a function!
    gs.append(g)

emptyRow = ([' '] * 3)
initBoard = [emptyRow] * 3

def countOs(board):  # Board -> Nat
    n = 0
    for row in range(3):
        n += board[row].count('O')
    return n

def countXs(board):  # Board -> Nat
    n = 0
    for row in range(3):
        n += board[row].count('X')
    return n

def copyBoard(board): # Board -> Board
    newBoard = []
    for i in range(len(board)):
        newRow = []
        for j in range(len(board[i])):
            newRow.append(board[i][j])
        newBoard.append(newRow)
    return newBoard

def moveAt(board, pos): # (Board, Pos) -> Board
    newBoard = copyBoard(board)
    player = ''
    if (countXs(board) <= countOs(board)):
        player = 'X'
    else:
        player = 'O'
    i = pos[0]
    j = pos[1]
    newBoard[i][j] = player
    return newBoard

In [None]:
printBoard(initBoard)

In [None]:
initGumDrops(initBoard)

In [None]:
initBoard

In [None]:
printBoard(testBoard)