# [ConnectX with Reinforcement Learning](https://www.kaggle.com/c/connectx)

## Description

We’re excited to announce a beta-version of a brand-new type of ML competition called Simulations. In Simulation Competitions, you’ll compete against a set of rules, rather than against an evaluation metric. To enter, [accept the rules](https://www.kaggle.com/c/connectx/rules) and create a python submission file that can “play” against a computer, or another user.

### The Challenge

In this game, your objective is to get a certain number of your checkers in a row horizontally, vertically, or diagonally on the game board before your opponent. When it's your turn, you “drop” one of your checkers into one of the columns at the top of the board. Then, let your opponent take their turn. This means each move may be trying to either win for you, or trying to stop your opponent from winning. The default number is four-in-a-row, but we’ll have other options to come soon.

### Background History

For the past 10 years, our competitions have been mostly focused on supervised machine learning. The field has grown, and we want to continue to provide the data science community cutting-edge opportunities to challenge themselves and grow their skills.

So, what’s next? Reinforcement learning is clearly a crucial piece in the next wave of data science learning. We hope that Simulation Competitions will provide the opportunity for Kagglers to practice and hone this burgeoning skill.

### How is this Competition Different?

Instead of submitting a CSV file, or a Kaggle Notebook, you will submit a Python .py file (more submission options are in development). You’ll also notice that the leaderboard is not based on how accurate your model is but rather how well you’ve performed against other users. See [Evaluation](https://www.kaggle.com/c/connectx/overview/evaluation) for more details.

### We’d Love Your Feedback

This competition is a low-stakes, trial-run introduction. We’re considering this a beta launch – there are complicated new mechanics in play and we’re still working on refining the process. We’d love your help testing the experience and want to hear your feedback.

Please note that we may make changes throughout the competition that could include things like resetting the leaderboard, invalidating episodes, making changes to the interface, or changing the environment configuration (e.g. modifying the number of columns, rows, or tokens in a row required to win, etc).

## Evaluation

Each Submission has an estimated Skill Rating which is modeled by a Gaussian N(μ,σ2) where μ is the estimated skill and σ represents our uncertainty of that estimate.

When you upload a Submission, we first play a Validation Episode where that Submission plays against itself to make sure it works properly. If the Episode fails, the Submission is marked as Error. Otherwise, we initialize the Submission with μ0=600 and it joins the pool of All Submissions for ongoing evaluation.

We repeatedly run Episodes from the pool of All Submissions, and try to pick Submissions with similar ratings for fair matches. We aim to run ~8 Episodes a day per Submission, with an additional slight rate increase for newer Episodes to give you feedback faster.

After an Episode finishes, we'll update the Rating estimate for both Submissions. If one Submission won, we'll increase its μ and decrease its opponent's μ -- if the result was a draw, then we'll move the two μ values closer towards their mean. The updates will have magnitude relative to the deviation from the expected result based on the previous μ values, and also relative to each Submission's uncertainty σ. We also reduce the σ terms relative to the amount of information gained by the result.

So all valid Submissions will continually play more matches and have dynamically changing scores as the pool increases. The Leaderboard will show the μ value of each Team's best Submission.

## Getting Started

**TLDR;**

Create `submission.py` with the following source and submit!

```python
def act(observation, configuration):
    board = observation.board
    columns = configuration.columns
    return [c for c in range(columns) if board[c] == 0][0]
```

**Starter Notebook**

Fork the [ConnectX Starter Notebook](https://www.kaggle.com/ajeffries/connectx-getting-started) and submit the generated `submission.py` file.

**Client Library**

Read the [README](https://github.com/Kaggle/kaggle-environments/blob/master/README.md) for the [kaggle-environments](https://pypi.org/project/kaggle-environments/) python package and checkout the [ConnectX Notebook](https://github.com/Kaggle/kaggle-environments/blob/master/kaggle_environments/envs/connectx/connectx.ipynb).

```bash
pip install kaggle-environments
```

## Environment Rules

### Episode Objective

Use your Agent to get a certain number of your checkers in a row horizontally, vertically, or diagonally on the game board before your opponent.

### How To Play

Player 1 will take the first turn. When it's your turn, you add, or “drop”, one of your checkers into the top of a column on the board and the checker will land in the last empty row in that column. The following can occur after dropping your checker in a column:

1. If the column you chose has no empty rows or is out of range of the number of columns, you lose the episode.
2. If the checker placed creates an "X-in-a-row", you win the episode. X represents the number specified in the parameters, for example 4, and to be “in a row”, the checkers can be in a row horizontally vertically, or diagonally.
3. If there are no empty cells, you tie the episode.
4. Otherwise, it's your opponent’s turn.

This episode continues until a win, lose, or tie occurs.

### Writing Agents

An Agent will receive the following parameters:

1. The episode configuration:
    - Number of Columns on the board.
    - Number of Rows on the board.
    - How many checkers, X, "in a row" are required to win.
2. The current state of the board (serialized grid of cells; rows by cols).
    - Empty cells are represented by "0".
    - Player 1's checkers are represented by "1".
    - Player 2's checkers are represented by "2".
3. Which player you are ("1" or "2").

An Agent should return which column to place a checker in. The column is an integer: [0, configuration.columns), and represents the columns going left to right. The row is an integer: [0, configuration.rows), and represents the rows going top to bottom

Here’s what that looks like as code:

```python
def agent(observation, configuration):
    # Number of Columns on the Board.
    columns = configuration.columns
    # Number of Rows on the Board.
    rows = configuration.rows
    # Number of Checkers "in a row" needed to win.
    inarow = configuration.inarow
    # The current serialized Board (rows x columns).
    board = observation.board
    # Which player the agent is playing as (1 or 2).
    mark = observation.mark

    # Return which column to drop a checker (action).
    return 0
```

### Agent Rules

1. Your Submission must be an “Agent”.
2. An Agent may only use modules from "The Python Standard Library", "numpy", "gym", "pytorch", and "scipy".
3. An Agent’s sole purpose is to generate an action. Activities/code which do not directly contribute to this will be considered malicious and handled according to the Rules.
4. An Agent can have a maximum file size limit of 1 MB.
5. An Agent must return an action within 5 seconds of being invoked. If the Agent does not, it will lose the episode and may be invalidated.
6. An Agent which throws errors or returns an invalid action will lose the episode and may be invalidated.
7. An Agent cannot store information between invocations.

## Install `kaggle-environments`

In [16]:
import sys

!{sys.executable} -m pip install 'kaggle-environments>=0.1.6'



## Create ConnectX Environment

In [3]:
from kaggle_environments import evaluate, make, utils

env = make("connectx", debug=True)
env.render()

## Create an Agent

To create the submission, an agent function should be fully encapsulated (no external dependencies).

When your agent is being evaluated against others, it will not have access to the Kaggle docker image. Only the following can be imported: Python Standard Library Modules, gym, numpy, scipy, pytorch (1.3.1, cpu only), and more may be added later.

In [17]:
# This agent random chooses a non-empty column.
def random_agent(observation, configuration):
    from random import choice
    return choice([c for c in range(configuration.columns) if observation.board[c] == 0])

## Test your Agent

In [18]:
env.reset()
# Play as the first agent against default "random" agent.
env.run([random_agent, "random"])
env.render(mode="ipython", width=500, height=450)

## Test your Agent

In [7]:
# Play as first position against random agent.
trainer = env.train([None, "random"])

observation = trainer.reset()

while not env.done:
    my_action = my_agent(observation, env.configuration)
    print("My Action", my_action)
    observation, reward, done, info = trainer.step(my_action)
    # env.render(mode="ipython", width=100, height=90, header=False, controls=False)
env.render()

My Action 2
My Action 5
My Action 1
My Action 2
My Action 4


## Evaluate your Agent¶

In [8]:
def mean_reward(rewards):
    return sum(r[0] for r in rewards) / float(len(rewards))

# Run multiple episodes to estimate its performance.
print("My Agent vs Random Agent:", mean_reward(evaluate("connectx", [my_agent, "random"], num_episodes=10)))
print("My Agent vs Negamax Agent:", mean_reward(evaluate("connectx", [my_agent, "negamax"], num_episodes=10)))

My Agent vs Random Agent: 0.2
My Agent vs Negamax Agent: -1.0


## Play your Agent

Click on any column to place a checker there ("manually select action").

In [9]:
# "None" represents which agent you'll manually play as (first or second player).
env.play([None, "negamax"], width=500, height=450)

## Write Submission File

In [10]:
import inspect
import os

def write_agent_to_file(function, file):
    with open(file, "a" if os.path.exists(file) else "w") as f:
        f.write(inspect.getsource(function))
        print(function, "written to", file)

write_agent_to_file(my_agent, "submission.py")

<function my_agent at 0x7fdddb339158> written to submission.py


## Validate Submission

Play your submission against itself. This is the first episode the competition will run to weed out erroneous agents.

Why validate? This roughly verifies that your submission is fully encapsulated and can be run remotely.

In [13]:
# Note: Stdout replacement is a temporary workaround.
import sys
out = sys.stdout
submission = utils.read_file("submission.py")
agent = utils.get_last_callable(submission)
sys.stdout = out

env = make("connectx", debug=True)
env.run([agent, agent])
print("Success!" if env.state[0].status == env.state[1].status == "DONE" else "Failed...")

Success!


## Submit to Competition¶

1. Commit this kernel.
2. View the commited version.
3. Go to "Data" section and find submission.py file.
4. Click "Submit to Competition"
5. Go to My Submissions to view your score and episodes being played.