# Tic Tac Toe

In this cookbook, I want to show how Multi-Step environments work in CAMEL. We will use the Tic-Tac-Toe environment as an example.

The Tic-Tac-Toe environment can be used to evaluate agents in Tic-Tac-Toe, generate synthetic data for distilliation or, of course, to train an agent to play Tic-Tac-Toe!

First, we need to define our environment and set it up. Then we can call `reset` to get our initial observation.

In [1]:
import asyncio
from camel.environments.models import Action
from camel.environments.tic_tac_toe import TicTacToeEnv, Opponent

# Initialize and set up the environment
env = TicTacToeEnv(opponent=Opponent(play_style="random"))
await env.setup()

# Reset environment and get initial observation
observation = await env.reset()
print("Initial Observation:\n")
print(observation.question)

  from .autonotebook import tqdm as notebook_tqdm


Initial Observation:

You are playing Tic Tac Toe with standard rules.
You are the player with X.
Choose a number between 1 and 9 to place an X.
This is the current state of the board:
1 | 2 | 3
---------
4 | 5 | 6
---------
7 | 8 | 9
Each number that you can see is still an empty field that you can place your 'X' in. Please end your response with <Action> [a number from 1 to 9]


We will use GPT-4o-mini, so let's enter our API key.

In [2]:
import os
from getpass import getpass

openai_api_key = getpass('Enter your API key: ')
os.environ["OPENAI_API_KEY"] = openai_api_key

Let's next define the model-backend and the agent.

You can also add a system prompt or equip your agent with tools, but for the sake of simplicity we just create a bare agent.

In [3]:
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
from camel.configs import ChatGPTConfig
from camel.agents import ChatAgent

model = ModelFactory.create(
    model_platform=ModelPlatformType.OPENAI,
    model_type=ModelType.GPT_4O_MINI,
    model_config_dict=ChatGPTConfig().as_dict(),
)

agent = ChatAgent(model=model)

Next, we will simulate one episode.

In [4]:
while not env.is_done():

    llm_response = agent.step(observation.question).msgs[0].content
    agent.reset() # clear context window

    action = Action(llm_response=llm_response)
    result = await env.step(action)

    observation, reward, done, info = result

    print("\nAgent Move:", action)
    print("Observation:")
    print(observation.question)
    print("Reward:", reward)
    print("Done:", done)
    print("Info:", info)



Agent Move: index=0 llm_response="I'll place my X in position 5. \n\n<Action> 5" metadata={} timestamp=datetime.datetime(2025, 4, 8, 22, 46, 49, 501898, tzinfo=datetime.timezone.utc)
Observation:
You are playing Tic Tac Toe with standard rules.
You are the player with X.
Choose a number between 1 and 9 to place an X.
This is the current state of the board:
O | 2 | 3
---------
4 | X | 6
---------
7 | 8 | 9
Each number that you can see is still an empty field that you can place your 'X' in. Please end your response with <Action> [a number from 1 to 9]
Reward: 0.5
Done: False
Info: {'extraction_result': '5', 'step': 1, 'state': {'board': ['O', ' ', ' ', ' ', 'X', ' ', ' ', ' ', ' '], 'game_over': False, 'winner': None, 'last_move_illegal': False, 'last_move': 5, 'extraction_error': None}, 'rewards_dict': {'x_non_loss_value': 0.5}}

Agent Move: index=0 llm_response="I'll place my X in position 3.\n\nAction 3" metadata={} timestamp=datetime.datetime(2025, 4, 8, 22, 46, 50, 241265, tzinfo=d

Finally, we close the environment.

In [5]:
await env.close()