# Working with the Loong environment

You can also check this cookbook in colab [here](https://colab.research.google.com/drive/1RbNiZMcn5eW_lwwJ4uGZfWKcYIsR5p_4?usp=sharing)

<div class="align-center">
  <a href="https://www.camel-ai.org/"><img src="https://i.postimg.cc/KzQ5rfBC/button.png"width="150"></a>
  <a href="https://discord.camel-ai.org"><img src="https://i.postimg.cc/L4wPdG9N/join-2.png"  width="150"></a></a>
  
⭐ <i>Star us on [*Github*](https://github.com/camel-ai/camel), join our [*Discord*](https://discord.camel-ai.org) or follow our [*X*](https://x.com/camelaiorg)
</div>

The Loong *environment* is a unified interface that can be used for Synthetic Data Generation, RL training and Benchmarking agents. It integrates all the primitives that we implemented at CAMEL to provide a nice interface for developers and researchers. In this cookbook, we will explain how to initialize a *Single Step Environment* to generate synthetic data. More cookbooks about RL training and how to customize the environment are coming soon.

This type of environment is called a *single step* environment, because the agent only does one step. It gets a question sampled from the dataset (the initial state / observation) and then answers. The answer is then scored according to the reward function. Recently, rules-based reward functions, i.e. functions without any learnable parameters, have been successfully used to do RL with LLMs as as policy.

Since many RL algorithms (such as GRPO) need multiple rollouts at each step, batching is important to guarantee concurrency / parallelism. This notebook will show how to use batched environments.

First, we have to load a dataset from which we will sample questions. The dataset can be either a `StaticDataset`, which is finite or it can be a `BaseGenerator`, which is an infinite supply of question - answer pairs, synthetically generated in some way, depending on the implementation. To seed the generative process of the `BaseGenerator`, we need to seed it with a *seed dataset*. Each generator uses the seed dataset it was initialized with to generate new data.

In this cookbook, we will use the `FewShotGenerator`, which will generate new data points by doing simple few-shot prompting, using random data points from the seed dataset as examples.

A seed dataset can easily be thought of as a type of `StaticDataset`, so let's initialize our seed dataset as such a `StaticDataset`.

In [None]:
# 🐍 Install in editable mode with dependencies
!pip install "git+https://github.com/camel-ai/camel.git@bec98152d3df3dd1731b78208608b4a9438a010e#egg=camel-ai[all]"
# ⬅️ Return to notebook root
%cd ..

In [None]:
from camel.datasets import StaticDataset

from datasets import load_dataset

dataset_dict = load_dataset("camel-ai/loong")
dataset = dataset_dict["train"].filter(lambda example: example['source_type'] == 'graph_discrete_math')

seed_dataset = StaticDataset(dataset)

In [3]:
seed_dataset[0]

DataPoint(question='Given an undirected path graph with 10 vertices, what is the largest independent node set and the list of maximal cliques that can be obtained by repeatedly removing cliques from the graph? Return the result as a 2-tuple, i.e., (largest_independent_node_set, list_of_maximal_cliques), where the first element is a set of nodes in sorted order and the second element is a list of maximal cliques in sorted order (each clique is represented as a set of nodes).', final_answer='({0, 2, 4, 6, 9}, [{0, 1}, {2, 3}, {4, 5}, {6, 7}, {8, 9}])', rationale='import networkx as nx\n\nG = nx.path_graph(10)\nprint(nx.approximation.clique_removal(G))', metadata=None)

The `FewShotGenerator` needs a python interpreter to compute a pseudo ground truth from the code it generated. For this, let's define a `PythonVerifier`.

Note: We will soon use dedicated CAMEL-based code interpreters instead of repurposing our Python verifier for this.

In [4]:
from camel.verifiers import PythonVerifier
from camel.agents import ChatAgent
from camel.extractors import BaseExtractor, BoxedStrategy

interpreter = PythonVerifier(required_packages=["numpy", "networkx"])
await interpreter.setup(uv=True)

Lastly, we need a model backend for the generation agent. Let's use the `ModelFactory` to create one.

Note: We use GPT-4o mini as a default here, hence we load our OpenAI API key. Feel free to use other models!

In [5]:
import os
from getpass import getpass

# Prompt for the API key securely
openai_api_key = getpass('Enter your API key: ')
os.environ["OPENAI_API_KEY"] = openai_api_key

Enter your API key: ··········


In [6]:
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
from camel.configs import ChatGPTConfig
from camel.datasets import FewShotGenerator

model = ModelFactory.create(
    model_platform=ModelPlatformType.OPENAI,
    model_type=ModelType.GPT_4O_MINI,
    model_config_dict=ChatGPTConfig().as_dict(),
)

# Note: When the generator needs to create new datapoints, it will by default create 20 new datapoints
# Since we are paying for the API, let's set this number to 2 instead
generator = FewShotGenerator(
    puffer=2, seed_dataset=seed_dataset, verifier=interpreter, model=model
)

Let's next create a verifier that extracts content inside a `\boxed{...}` from the llm response and compares it semantically to the reference answer.

In [7]:
from camel.verifiers import PythonVerifier
from camel.agents import ChatAgent
from camel.extractors import BaseExtractor, BoxedStrategy

# Initialize extractor
extractor = BaseExtractor([[BoxedStrategy()]])


verifier = PythonVerifier(extractor=extractor, required_packages=["numpy", "networkx"])
await verifier.setup(uv=True)

Now that our generator and verifier are all set up, let's create a `SingleStepEnv` with it.

We can then call `env.reset()` to sample the underlying generator, which returns that question as an observation. We can then feed this observation into the CoT agent.

In [8]:
from camel.environments import Action, SingleStepEnv

env = SingleStepEnv(generator, verifier)

obs = await env.reset(seed=42)

print(obs)



question='In a complete bipartite graph K(3,3), what are the edges of the graph represented as a list of tuples, where each tuple represents an edge between two nodes?' context={} metadata={}


The agent would then process this observation and select an action, which it would feed into the `step` function, which feeds it back into the environment. More specifically, it feeds it back into the verifier, which then returns a reward based on whether the llm response and reference answer are aligned or not.

Let's first define a CAMEL agent and feed it the observation. Afterwards, we use the `step` function of the environment to get a reward.

In [9]:
agent = ChatAgent(model=model)

USER_PROMPT = r"""
You are an agent designed to answer mathematical questions with clarity and precision. Your task is to provide a step-by-step explanation for
any mathematical problem posed by the user, ensuring the response is easy to follow. Adhere to these guidelines:
Analyze the mathematical question carefully and break down the solution process into clear, logical steps.
Use natural language to explain each step, incorporating LaTeX notation (e.g., $x + 2$)
for mathematical expressions when helpful. Conclude your response with the final answer enclosed
in a LaTeX \boxed{} environment (e.g., \boxed{5}).
Place this at the end of your explanation as a standalone statement.
It should be a Python expression, for example "[1, 2, 3]" for a list.

The question you should answer is: """



Finally, let's compare the agents output to our pseudo ground truth:

In [10]:
response = agent.step(USER_PROMPT + obs.question).msgs[0].content

result = await env.step(Action(index=0, llm_response=response))

agent.reset()

print(result)

(Observation(question='Episode ended. This is just a placeholder.', context={}, metadata=None), 0.0, True, {'proposed_solution': 'To determine the edges of the complete bipartite graph \\( K(3,3) \\), we need to understand what a complete bipartite graph is. A bipartite graph consists of two distinct sets of vertices, with edges only existing between vertices from different sets, and in \\( K(m,n) \\), there are \\( m \\) vertices in the first set and \\( n \\) vertices in the second set.\n\nIn \\( K(3,3) \\):\n- Let’s denote the vertices in the first set as \\( A_1, A_2, A_3 \\).\n- Let’s denote the vertices in the second set as \\( B_1, B_2, B_3 \\).\n\n### Step-by-Step Explanation\n\n1. **Identify the sets of vertices**:\n   - Set 1 (the first group) contains vertices \\( \\{ A_1, A_2, A_3 \\} \\).\n   - Set 2 (the second group) contains vertices \\( \\{ B_1, B_2, B_3 \\} \\).\n\n2. **Establish edges between sets**:\n   - Each vertex in the first set can connect to each vertex in th

### Environment Loop

Let's look at how this would look like in a loop.

In [16]:
for i in range(2):
  obs = await env.reset()
  response = agent.step(USER_PROMPT + obs.question).msgs[0].content

  next_obs, reward, done, info = await env.step(Action(llm_response=response))
  print(f"Reward at step {i}: {reward}")
  agent.reset() # to clear context window

Reward at step 0: 10.0
Reward at step 1: 10.0
