# Working with the Loong environment

The Loong *environment* is a unified interface that can be used for Synthetic Data Generation, RL training and Benchmarking agents. It integrates all the primitives that we implemented at CAMEL to provide a nice interface for developers and researchers. In this cookbook, we will explain how to initialize a *Single Step Environment* to generate synthetic data. More cookbooks about RL training and how to customize the environment are coming soon.

This type of environment is called a *single step* environment, because the agent only does one step. It gets a question sampled from the dataset (the initial state / observation) and then answers. The answer is then scored according to the reward function. Recently, rules-based reward functions, i.e. functions without any learnable parameters, have been successfully used to do RL with LLMs as as policy.

Since many RL algorithms (such as GRPO) need multiple rollouts at each step, batching is important to guarantee concurrency / parallelism. This notebook will show how to use batched environments.

First, we have to load a dataset from which we will sample questions. The dataset can be either a `StaticDataset`, which is finite or it can be a `BaseGenerator`, which is an infinite supply of question - answer pairs, synthetically generated in some way, depending on the implementation. To seed the generative process of the `BaseGenerator`, we need to seed it with a *seed dataset*. Each generator uses the seed dataset it was initialized with to generate new data.

In this cookbook, we will use the `FewShotGenerator`, which will generate new data points by doing simple few-shot prompting, using random data points from the seed dataset as examples.

A seed dataset can easily be thought of as a type of `StaticDataset`, so let's initialize our seed dataset as such a `StaticDataset`.

In [None]:
from camel.datasets import StaticDataset
from camel.logger import get_logger

logger = get_logger(__name__)

raw_data = [
    {
        "question": "Evaluate the limit as x approaches 0 of (sin(3*x) - 3*x) / x**3.",  # noqa: E501
        "final_answer": "-9/2",
        "rationale": '''from sympy import symbols, limit, sin
x = symbols('x')
expr = (sin(3*x) - 3*x) / x**3
result = limit(expr, x, 0)
print(result)''',
    },
    {
        "question": "Evaluate the definite integral of (1 - x**2)**3 from x = 0 to x = 1.",  # noqa: E501
        "final_answer": "16/35",
        "rationale": '''from sympy import symbols, integrate
x = symbols('x')
expr = (1 - x**2)**3
result = integrate(expr, (x, 0, 1))
print(result)''',
    },
    {
        "question": "Evaluate the limit as n approaches infinity of n*(sqrt(n**2 + 1) - n).",  # noqa: E501
        "final_answer": "1/2",
        "rationale": '''from sympy import symbols, limit, sqrt
n = symbols('n', positive=True)
expr = n*(sqrt(n**2 + 1) - n)
result = limit(expr, n, float("inf"))
print(result)''',
    },
    {
        "question": "Compute the sum of the series sum from n = 1 to 50 of 1/(n*(n+1)).",  # noqa: E501
        "final_answer": "50/51",
        "rationale": '''from sympy import symbols, summation
n = symbols('n', positive=True, integer=True)
expr = 1/(n*(n+1))
result = summation(expr, (n, 1, 50))
print(result)''',
    },
]

seed_dataset = StaticDataset(raw_data)

The `FewShotGenerator` needs an python interpreter to compute a pseudo ground truth from the code it generated. For this, let's define a `PythonVerifier`.

In [None]:
from camel.verifiers import PythonVerifier

verifier = PythonVerifier(required_packages=["sympy"])
await verifier.setup(uv=True)

Lastly, we need a model backend for the generation agent. Let's use the `ModelFactory` to create one.

Note: We use GPT-4o mini as a default here, hence we load our OpenAI API key. Feel free to use other models!

In [None]:
from dotenv import load_dotenv

load_dotenv()

In [None]:
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
from camel.configs import ChatGPTConfig
from camel.datasets import FewShotGenerator

model = ModelFactory.create(
    model_platform=ModelPlatformType.OPENAI,
    model_type=ModelType.GPT_4O_MINI,
    model_config_dict=ChatGPTConfig().as_dict(),
)

# Note: When the generator needs to create new datapoints, it will by default create 20 new datapoints
# Since we are paying for the API, let's set this number to 2 instead
generator = FewShotGenerator(
    puffer=2, seed_dataset=seed_dataset, verifier=verifier, model=model
)

Now that our generator is all set up, let's create a `SingleStepEnv` with it.

We also need to supply a verifier that checks for semantic equivalence between the response of the CoT agent (the one that we want to train with RL) and the synthetic answer.

We can then call `env.reset()` to sample the underlying generator, which returns that question as an observation. We can then feed this observation into the CoT agent.

In [None]:
from camel.environments import Action, SingleStepEnv

env = SingleStepEnv(generator, verifier)

obs = await env.reset(seed=42)

print(obs)

The agent would then process this observation and select an action, which it would feed into the `step` function. For this, we will make use of an *extractor*. An extractor takes the LLM response and extracts the verifiable part out of it. Extractors can be initialized with different strategies which modifies the extraction behavior. We then compare it to our computed pseudo ground truth which we got internally by running the code in the rationale with the `PythonVerifier`

In [None]:
from camel.agents import ChatAgent
from camel.extractors import BaseExtractor, BoxedStrategy

# Initialize extractor
extractor = BaseExtractor([[BoxedStrategy()]])
await extractor.setup()
agent = ChatAgent(model=model)

SYSTEM_PROMPT = r"""
You are an agent designed to answer mathematical questions with clarity and precision. Your task is to provide a step-by-step explanation for 
any mathematical problem posed by the user, ensuring the response is easy to follow. Adhere to these guidelines:
Analyze the mathematical question carefully and break down the solution process into clear, logical steps.
Use natural language to explain each step, incorporating LaTeX notation (e.g., $x + 2$) 
for mathematical expressions when helpful. Conclude your response with the final answer enclosed 
in a LaTeX \boxed{} environment (e.g., \boxed{5}). 
Place this at the end of your explanation as a standalone statement.

The question you should answer is: """



Finally, let's compare the agents output to our pseudo ground truth:

In [None]:
response = agent.step(SYSTEM_PROMPT + obs.question)

proposed_solution = await extractor.extract(response.msgs[0].content)

result = await env.step(Action(index=0, llm_response=proposed_solution))

print(result)