# Expectiminimax

Der Vollständigkeits halber der ganze Expectiminimax Algorithmus. <br>
Während 1-ply, 2-ply und 3-ply nur den ersten, die ersten beiden, bzw. ersten drei Schritte von Expectiminimax ausgeführt haben, kann man alle mit dem Expectiminmax Algorithmus zusammenfassen. Das erlaubt einem eine saubere Notation und kann (mit einem ausreichend starken Rechner und genug Geduld) eventuell noch tiefer suchen!

In [13]:
from Player import ValuePlayer

class ExpectiminimaxValuePlayer(ValuePlayer):

    # Konstruktor braucht einen Parameter für die maximal Suchtiefe
    # 0 = 1-ply, 1= 2-ply, 2 = 3-ply, usw.
    def __init__(self, player, valuefunction, max_depth):
        ValuePlayer.__init__(self, player, valuefunction)
        self.max_depth = max_depth
    
    def get_action(self, actions, game):
        # Spielstatus speichern
        old_state = game.get_state()
        # Variablen initialisieren
        best_value = -1
        best_action = None
        # Alle Züge durchsuchen
        for a in actions:
            # Zug ausführen
            game.execute_moves(a, self.player)
            # Spielstatus bewerten
            value = self.expectiminimax(game, 0)
            # Besten merken
            if value > best_value:
                best_value = value
                best_action = a
            # Spiel zurücksetzen
            game.reset_to_state(old_state)
        return best_action
        
    def expectiminimax(self, game, depth):
        # Blatt in unserem Baum
        if depth == self.max_depth:
            return self.value(game, self.player)
        else:
            # Alle möglichen Würfe betrachten
            all_rolls = [(a,b) for a in range(1,7) for b in range(a,7)]
            value = 0
            for roll in all_rolls:
                # Wahrscheinlichkeiten von jedem Wurf
                probability = 1/18 if roll[0] != roll[1] else 1/36
                state = game.get_state()
                # Min-Knoten
                if depth % 2 == 0:
                    moves = game.get_moves(roll, game.get_opponent(self.player))
                    temp_val = 1
                    for move in moves:
                        game.execute_moves(move, game.get_opponent(self.player))
                        # Bewertet wird aber aus unserer Perspektive
                        v = self.expectiminimax(game, depth + 1)
                        if v < temp_val:
                            temp_val = v
                # Max-Knoten
                else:
                    moves = game.get_moves(roll, self.player)
                    temp_val = 0
                    for move in moves:
                        game.execute_moves(move, self.player)
                        # Bewertet wird aber aus unserer Perspektive
                        v = self.expectiminimax(game, depth + 1)
                        if v > temp_val:
                            temp_val = v
                # Spiel zurücksetzen    
                game.reset_to_state(state)
                # Wert gewichtet addieren
                value += probability * temp_val
            return value
    
    def get_name(self):
        return "ExpectiminimaxValuePlayer [" + self.value.__name__ + "]"
    

class ExpectiminimaxModelPlayer(ExpectiminimaxValuePlayer):
    
    def __init__(self, player, model, depth):
        ExpectiminimaxValuePlayer.__init__(self, player, self.get_value, depth)
        self.model = model
        
    def get_value(self, game, player):
        features = game.extractFeatures(player)
        v = self.model.get_output(features)
        v = 1 - v if self.player == game.players[0] else v
        return v
    
    def get_name(self):
        return "EMinMaxModelPlayer [" + self.model.get_name() +"]"
    

In [14]:
import Player
from NeuralNetModel import TDGammonModel
import tensorflow as tf

graph = tf.Graph()
sess = tf.Session(graph=graph)
with sess.as_default(), graph.as_default():
    model = TDGammonModel(sess, restore=True)
    model.test(games = 100, enemyPlayer = ExpectiminimaxModelPlayer('white', model, 1))

Restoring checkpoint: checkpoints/TD-Gammon/checkpoint.ckpt-1593683
INFO:tensorflow:Restoring parameters from checkpoints/TD-Gammon/checkpoint.ckpt-1593683
[Game 0] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 0:1 of 1 games (0.00%)
[Game 1] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 1:1 of 2 games (50.00%)
[Game 2] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 1:2 of 3 games (33.33%)
[Game 3] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 2:2 of 4 games (50.00%)
[Game 4] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 3:2 of 5 games (60.00%)
[Game 5] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 4:2 of 6 games (66.67%)
[Game 6] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 4:3 of 7 games (57.14%)
[Game 7] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 4:4 of 8 game

[Game 73] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 27:47 of 74 games (36.49%)
[Game 74] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 27:48 of 75 games (36.00%)
[Game 75] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 27:49 of 76 games (35.53%)
[Game 76] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 27:50 of 77 games (35.06%)
[Game 77] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 27:51 of 78 games (34.62%)
[Game 78] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 27:52 of 79 games (34.18%)
[Game 79] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 28:52 of 80 games (35.00%)
[Game 80] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 29:52 of 81 games (35.80%)
[Game 81] ModelPlayer [TD-Gammon] (black) vs EMinMaxModelPlayer [TD-Gammon] (white) 29:53 of 82 games (35.37%)
[

In [15]:
import Player
import PlayerTest

players = [Player.ValuePlayer('black', Player.blocker), ExpectiminimaxValuePlayer('white', Player.blocker, 1)]
PlayerTest.test(players, 100)

Spiel 0 von 100 geht an ExpectiminimaxValuePlayer [blocker] ( white )
Spiel 1 von 100 geht an ExpectiminimaxValuePlayer [blocker] ( white )
Spiel 2 von 100 geht an ValuePlayer [blocker] ( black )
Spiel 3 von 100 geht an ValuePlayer [blocker] ( black )
Spiel 4 von 100 geht an ExpectiminimaxValuePlayer [blocker] ( white )
Spiel 5 von 100 geht an ValuePlayer [blocker] ( black )
Spiel 6 von 100 geht an ValuePlayer [blocker] ( black )
Spiel 7 von 100 geht an ExpectiminimaxValuePlayer [blocker] ( white )
Spiel 8 von 100 geht an ExpectiminimaxValuePlayer [blocker] ( white )
Spiel 9 von 100 geht an ValuePlayer [blocker] ( black )
Spiel 10 von 100 geht an ValuePlayer [blocker] ( black )
Spiel 11 von 100 geht an ValuePlayer [blocker] ( black )
Spiel 12 von 100 geht an ExpectiminimaxValuePlayer [blocker] ( white )
Spiel 13 von 100 geht an ExpectiminimaxValuePlayer [blocker] ( white )
Spiel 14 von 100 geht an ExpectiminimaxValuePlayer [blocker] ( white )
Spiel 15 von 100 geht an ValuePlayer [block

In [2]:
import Player
import PlayerTest
from NeuralNetModel import TDGammonModel
import tensorflow as tf

graph = tf.Graph()
sess = tf.Session(graph=graph)
with sess.as_default(), graph.as_default():
    model = TDGammonModel(sess, restore=True)
    players = [Player.ModelPlayer('black', model), Player.ExpectiminimaxModelPlayer('white', model, 2)]
    PlayerTest.test(players, 10)

Restoring checkpoint: checkpoints/TD-Gammon/checkpoint.ckpt-1593683
INFO:tensorflow:Restoring parameters from checkpoints/TD-Gammon/checkpoint.ckpt-1593683
Spiel 0 von 10 geht an ModelPlayer [TD-Gammon] ( black )
Spiel 1 von 10 geht an ModelPlayer [TD-Gammon] ( black )
Spiel 2 von 10 geht an EMinMaxModelPlayer [TD-Gammon] ( white )


KeyboardInterrupt: 

Diese 3 Spiele haben 24 Stunden gedauert....