## Intro

There were a bunch of questions on how to validate an agent or a neural net for connect 4. So I decided to release a dataset that I use to validate the nets that I use. The dataset was generated with a modified version of the C++ connect4 solver provided by http://connect4.gamesolver.org It contains 1000 samples of board positions from ply 8 to 20. 

With each of the positions it has the perfect score for the position as well as the scores of all positions after the next move. This allows to estimate how good an agent or a net is by comparing its move with a perfect solution. 

Format of the dataset: 
> {"board": [0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 1, 1, 1, 2, 1, 0, 1, 2], "score": -2, "move score": [-3, -4, -4, -2, -6, -5, -4]}

Each row is a json dictionary with the following fields:
* "board": kaggle format of a connect4 baord, 
* "score": Score for the position, 
* "move score": Array of 7 scores corresponding to a play in each of the 7 columns

A note on the scores in the dataset:
* Score = 0: Game will be a draw
* Score > 0: Current player will win (the bigger the number the sooner the player will win). The score is the half the ammount of plies from the end the game will be won. So +5 is means the win will be in ply 42 - 2*5 = 32. 
* Score < 0: Current player will lose (the bigger the number the sooner the player will lose)
* Score = -99: simply indicates that that was not a legal move.

There are 2 metrics that I use are:
* Perfect Move: Here the agent picks a move with the same score as the perfect player.
* Good Move: The agent picks a move in the same categoty (win, loss or draw) as the perfect player. If an agent play 100% good moves it will play as well as a perfect player, but the win might be later in the game.

## Let's analyze the built in agents

In [None]:
!pip install 'tensorflow==1.15.0'
import tensorflow as tf
assert tf.__version__=='1.15.0'
!apt-get update
!apt-get install -y cmake libopenmpi-dev python3-dev zlib1g-dev
!pip install "stable-baselines[mpi]==2.9.0"

In [None]:
assert tf.__version__=='1.15.0'
from stable_baselines import PPO1, A2C
!pip install kaggle_environments
from kaggle_environments import make

In [3]:
def score(agent, max_lines = 1000):
    ''' scores an agent against a set of "perfect moves" '''
    
    # required imports
    import json
    #!pip install kaggle_environments
    import kaggle_environments
    from kaggle_environments.utils import structify
    #also requires a copy of 'refmoves1k_kaggle.csv'
    
    def win_loss_draw(score):
        if score>0: 
            return 'win'
        if score<0: 
            return 'loss'
        return 'draw'


    print("scoring ",agent)
    count = 0
    good_move_count = 0
    perfect_move_count = 0
    observation = structify({'mark': None, 'board': None})
    #with open("/kaggle/input/1k-connect4-validation-set/refmoves1k_kaggle") as f:
    with open("/content/refmoves1k_kaggle.csv") as f:
        for line in f:
            count += 1
            data = json.loads(line)
            observation.board = data["board"]
            # find out how many moves are played to set the correct mark.
            ply = len([x for x in data["board"] if x>0])
            #if ply&1:
            if ply % 2 == 0:
                observation.mark = 2
            else:
                observation.mark = 1
            
            #call the agent
            agent_move = agent(observation,env.configuration)
            
            moves = data["move score"]
            perfect_score = max(moves)
            perfect_moves = [ i for i in range(7) if moves[i]==perfect_score]

            if(agent_move in perfect_moves):
                perfect_move_count += 1

            if win_loss_draw(moves[agent_move]) == win_loss_draw(perfect_score):
                good_move_count += 1

            if count == max_lines:
                break

        print("perfect move percentage: ",perfect_move_count/count)
        print("good moves percentage: ",good_move_count/count)

#to call:
#from kaggle_environments import make
#env = make("connectx")
#score(env.agents["random"],100)
#score(agentX,1000)

###Output should be:

> scoring **random_agent**  
> perfect move percentage:  0.22  
> good moves percentage:  0.67  
---
> scoring **negamax_agent**  
> perfect move percentage:  0.4  
> good moves percentage:  0.71  
___
Some more references:  
A neural net that I use in my best agent (1267 score on 2/24/20) score as follows:
> perfect move percentage:  0.737  
> good moves percentage:  0.939


In [4]:
# Score the 2 built in agents
from kaggle_environments import make
env = make("connectx")
#score(env.agents["random"],10)
# the built in agents are remarkably slow so only evaluating on 100 moves here
#score(env.agents["random"],100)  
#score(env.agents["negamax"],100)

## Results

>scoring  **test_agent_v9**
* perfect move percentage:  0.681
* good moves percentage:  0.884
___
>scoring  **test_agent_v6**
* perfect move percentage:  0.68
* good moves percentage:  0.886
___
>scoring  **heuristic**
* perfect move percentage:  0.667
* good moves percentage:  0.889
___
>scoring  **quick_look**
* perfect move percentage:  0.65
* good moves percentage:  0.863
___
>scoring  **strong_coeffs**
* perfect move percentage:  0.639
* good moves percentage: 0.856 

####Heuristic Agents

In [None]:
from deep_lookahead import debug_agent as deep_agent
from test_agent_v9 import my_agent as test_agent_v9
#from strong_coeffs import my_agent as strongCs
#from quick import my_agent as quick
#from quick_pick_submit import my_agent as quick_pick

In [None]:
score(test_agent_v9,100)
#score(strong,1000)

####Trained agents:

In [None]:
xtrain = PPO1.load('/content/xtrain.zip', env=None)
xtrain_agent = lambda x,y: agentX(x,y,model=xtrain)
modelX = PPO1.load('/content/modelX.zip', env=None, verbose=0)
#trained_model = PPO1.load('/content/trained.zip', env=None, verbose=0)
#trained_256 = PPO1.load('/content/trained_256.zip', env=None, verbose=0)
scoresetA = PPO1.load('/content/scoresetA.zip', env=None, verbose=0)
ssagent = lambda x,y: agentX(x,y,model=scoresetA,debug=False)

In [7]:
import numpy as np
import random

def agentX(obs, config, model=None, debug=False):
    col, _ = model.predict(np.array(obs['board']).reshape(6,7,1))
    is_valid = (obs['board'][int(col)] == 0)
    if is_valid:
        return int(col)
    else:
        return random.choice([col for col in range(config.columns) if obs.board[int(col)] == 0])

In [None]:
#@ title Combineer
def combineer(obs, config, model1=None, model2=None, model3=None, model4=None, model5=None, debug=False):
    import time
    start = time.time()
    board = np.array(obs['board']).reshape(6,7,1)

    p1, _ = model1.predict(board)
    p2, _ = model2.predict(board)
    p3, _ = model3.predict(board)
    #p4, _ = model3.predict(board)
    #p5, _ = model3.predict(board)

    #If two or three agree >> col, else if none agree col = p1
    if p1!=p2 and p2==p3:
        col = p2
    else:  #all other combos, go with p1
        col = p1
    
    if (obs['board'][int(col)] != 0):
        col = random.choice([col for col in range(config.columns) if obs.board[int(col)] == 0])
        if debug:
            print("\n>>>> Agent is guessing... column",col)      
        return int(col)
    
    if debug:
        print("\nModel1 predicted: {}, Model2 predicted: {}, Model3 predicted: {}".format(p1,p2,p3))
        print("Consensus is: column", col)
        print("Time taken =", time.time() - start)

    return int(col)

agentC = lambda x,y: combineer(x,y,model1=modelX,model2=ssmodelB30,model3=scoresetA,debug=False)
#perfect move percentage:  0.289
#good moves percentage:  0.676

In [None]:
score(agentC)

In [None]:
#ssmodelB10 = PPO1.load('/ssB_10k.zip', env=None, verbose=0)
#ssagent10 = lambda x,y: agentX(x,y,model=ssmodelB10,debug=False)
#perfect move percentage:  0.282
#good moves percentage:  0.687

#ssmodelB20 = PPO1.load('/ssB_20k.zip', env=None, verbose=0)
#ssagent20 = lambda x,y: agentX(x,y,model=ssmodelB20,debug=False)
#perfect move percentage:  0.301
#good moves percentage:  0.694

#ssmodelB30 = PPO1.load('/ssB_30k.zip', env=None, verbose=0)
#ssagent30 = lambda x,y: agentX(x,y,model=ssmodelB30,debug=False)
#perfect move percentage:  0.311
#good moves percentage:  0.712

#ssmodelB40 = PPO1.load('/ssB_40k.zip', env=None, verbose=0)
#ssagent40 = lambda x,y: agentX(x,y,model=ssmodelB40,debug=False)
#perfect move percentage:  0.285
#good moves percentage:  0.695

In [None]:
score(ssagent40)

>scoring  **combineer**
* perfect move percentage:  0.29
* good moves percentage:  0.681
___
>scoring  **agentX**
* perfect move percentage:  0.273
* good moves percentage:  0.675
___
>scoring  **xtrain**
* perfect move percentage:  0.288
* good moves percentage:  0.682
___
>scoring  **scoresetA**
* perfect move percentage:  0.295
* good moves percentage:  0.695
___

In [None]:
#scoreset_A: 10/42, -100/42, 1/42, -10
#scoreset_B: 1/42, -100/42, 1/42, -420/42
#scoreset_C: 1/2*42, -210/42, 1/42, -420/42
#scoreset_D: -1/42, -300/42, 1/42, -420/42
#scoreset_E: -1/42, -300/42, 2/42, -420/42
#scoreset_F: -50/42, -300/42, 1/42, -400/42

#scoreset_O: 1, -1, 1/42, -10

In [None]:
#scoresetA = PPO1.load('/content/scoresetA.zip', env=None, verbose=0)
#scoresetB = PPO1.load('/content/scoresetB.zip', env=None, verbose=0)
#scoresetC = PPO1.load('/content/scoresetC.zip', env=None, verbose=0)
#scoresetD = PPO1.load('/content/scoresetD.zip', env=None, verbose=0)
#scoresetE = PPO1.load('/content/scoresetE.zip', env=None, verbose=0)
#scoresetF = PPO1.load('/content/scoresetF.zip', env=None, verbose=0)

>scoring  **scoreset A**
* perfect move percentage:  0.295
* good moves percentage:  0.699
___
>scoring  **scoreset B**
* perfect move percentage:  0.318
* good moves percentage:  0.700
___
>scoring  **scoreset C**
* perfect move percentage:  0.307
* good moves percentage:  0.705
___
>scoring  **scoreset D**
* perfect move percentage:  0.299, 0.302
* good moves percentage:  0.691, 0.699
___
>scoring  **scoreset E**
* perfect move percentage:  0.33
* good moves percentage:  0.709
___
>scoring  **scoreset F_50k**
* perfect move percentage:  0.231
* good moves percentage:  0.645
___


In [None]:
scoresetD100 = PPO1.load('/content/scoresetD_100k.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoresetD100,debug=False)
score(score_agent)
scoresetD50 = PPO1.load('/content/scoresetD_50k.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoresetD50,debug=False)
score(score_agent)

>scoring  **scoreset D50**
* perfect move percentage:  0.294
* good moves percentage:  0.692
___
>scoring  **scoreset D100**
* perfect move percentage:  0.294
* good moves percentage:  0.683
___

In [None]:
scoresetE_1000k = PPO1.load('/content/scoresetE_1000k.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoresetE_1000k,debug=False)
score(score_agent)

scoresetE_2000k = PPO1.load('/content/scoresetE_2000k.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoresetE_2000k,debug=False)
score(score_agent)

scoresetE_3000k = PPO1.load('/content/scoresetE_3000k.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoresetE_3000k,debug=False)
score(score_agent)

>scoring  **scoreset E_100k**
* perfect move percentage:  0.32
* good moves percentage:  0.702
___
>scoring  **scoreset E_1000k**
* perfect move percentage:  0.306
* good moves percentage:  0.691
___
>scoring  **scoreset E_2000k**
* perfect move percentage:  0.305
* good moves percentage:  0.698
___
>scoring  **scoreset E_3000k**
* perfect move percentage:  0.285
* good moves percentage:  0.695
___

In [None]:
scoresetE_OG1 = PPO1.load('/content/scoresetE_OG1.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoresetE_OG1,debug=False)
score(score_agent)
scoresetE_OG2 = PPO1.load('/content/scoresetE_OG2.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoresetE_OG2,debug=False)
score(score_agent)
scoresetO_OG = PPO1.load('/content/scoresetO_OG.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoresetO_OG,debug=False)
score(score_agent)

'''scoring  <function <lambda> at 0x7fb5d4c2aa60>
perfect move percentage:  0.321
good moves percentage:  0.705
scoring  <function <lambda> at 0x7fb5d4ce6c80>
perfect move percentage:  0.315
good moves percentage:  0.696
scoring  <function <lambda> at 0x7fb5d4120268>
perfect move percentage:  0.292
good moves percentage:  0.685'''

In [None]:
scoreset = PPO1.load('/content/big_scoresetH_10k.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoreset,debug=False)
score(score_agent)
scoreset = PPO1.load('/content/big_scoresetH_30k.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoreset,debug=False)
score(score_agent)
scoreset = PPO1.load('/content/big_scoresetJ_100k.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoreset,debug=False)
score(score_agent)

'''scoring  <function <lambda> at 0x7fb5d2833ae8>
perfect move percentage:  0.305
good moves percentage:  0.702
scoring  <function <lambda> at 0x7fb607dc4e18>
perfect move percentage:  0.313
good moves percentage:  0.693
scoring  <function <lambda> at 0x7fb6592fa730>
perfect move percentage:  0.283
good moves percentage:  0.694 '''

In [None]:
scoreset = A2C.load('/content/modelA2C_A.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoreset,debug=False)
score(score_agent)
scoreset = A2C.load('/content/modelA2C_B.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoreset,debug=False)
score(score_agent)
scoreset = A2C.load('/content/model_OG_A2C.zip', env=None, verbose=0)
score_agent = lambda x,y: agentX(x,y,model=scoreset,debug=False)
score(score_agent)

'''scoring  <function <lambda> at 0x7fb5d02c2f28>
perfect move percentage:  0.253
good moves percentage:  0.67
scoring  <function <lambda> at 0x7fb607dc4e18>
perfect move percentage:  0.289
good moves percentage:  0.701
scoring  <function <lambda> at 0x7fb5d02c21e0>
perfect move percentage:  0.299
good moves percentage:  0.705'''