# Linear TD($\lambda$) Agent
As a first-pass at constructing an agent to play Connect 4 in the Kaggle [ConnectX tournament](https://www.kaggle.com/c/connectx), I'll construct an agent that acts greedily via a linear value function that is approximated using coarse coding and the TD($\lambda$) algorithm. First, import the Kaggle ConnectX environment to train our linear agent.

In [1]:
from kaggle_environments import evaluate, make
from linear_TD_agent import TDAgent


env = make("connectx", debug=True)
agent = TDAgent(agent_info = env.configuration)

In [2]:
from random import choice

def train_episode(env, agent, adversary = "random"):

    agent.agent_start_episode()

    first_play = choice([1,0])
    if first_play == 1:
        trainer = env.train([None, adversary])
    else:
        trainer = env.train([adversary, None])

    observation = trainer.reset()
    while not env.done:
        my_action = agent.select_action(observation)
        observation, reward, done, info = trainer.step(my_action)
        if not done:
            agent.agent_update(-1, observation)
        else:
            if reward==-1:
                agent.last_agent_update(-10)
            else:
                agent.last_agent_update(10)

In [3]:
import copy
new_w = copy.copy(agent.w) 
config = copy.copy(env.configuration)
config['w'] = new_w
mirror_agent = TDAgent(agent_info = config)

In [4]:
for n in range(1000):
    opponent = choice(['random', 'negamax', lambda x: mirror_agent.select_action(x)])
    train_episode(env, agent, adversary= opponent)

In [5]:
def mean_reward(rewards):
    return sum(r[0] for r in rewards) / float(len(rewards))

# Run multiple episodes to estimate its performance.
print("My Agent vs Random Agent:", mean_reward(evaluate("connectx", [lambda x: agent.select_action(x), "random"], num_episodes=50)))
print("My Agent vs Negamax Agent:", mean_reward(evaluate("connectx", [lambda x: agent.select_action(x), "negamax"], num_episodes=50)))

My Agent vs Random Agent: 0.12
My Agent vs Negamax Agent: -1.0


In [6]:
env.run([lambda x: agent.select_action(x), "negamax"])
env.render(mode="ipython", width=500, height=450)