In [1]:
# # First, let's install the required dependencies
# # Note: Installing TorchRL with LLM support and playwright

# # Install TorchRL with LLM dependencies
# %pip install "torchrl[llm]" --quiet

# # Install playwright for browser automation
# import subprocess
# import sys

# try:
#     result = subprocess.run([sys.executable, "-m", "playwright", "install"], 
#                           capture_output=True, text=True, check=True)
#     print("✅ Playwright browsers installed successfully")
# except subprocess.CalledProcessError as e:
#     print(f"❌ Error installing playwright browsers: {e}")
#     print("Stdout:", e.stdout)
#     print("Stderr:", e.stderr)

Note: you may need to restart the kernel to use updated packages.
✅ Playwright browsers installed successfully



# TorchRL LLM: Building Tool-Enabled Environments

**Author**: [Vincent Moens](https://github.com/vmoens)


This tutorial demonstrates how to build and compose LLM environments with tool capabilities
in TorchRL. We'll show how to create a complete environment that can execute tools,
format responses, and handle interactions between the LLM and external tools.

The tutorial uses web browsing as a concrete example, but the concepts apply to any
tool integration in TorchRL's LLM framework.

Main takeaways:

- Understanding TorchRL's LLM environment composition
- Creating and appending tool transforms
- Formatting tool responses and LLM interactions
- Handling tool execution and state management

Prerequisites: Basic familiarity with TorchRL's environment concepts.


## Installation

First, install TorchRL with LLM support. If you're running this in a Jupyter
notebook, you can install the packages using:

```bash
%pip install "torchrl[llm]"    # Install TorchRL with all LLM dependencies
```
The `torchrl[llm]` package includes all necessary dependencies for LLM functionality,
including transformers, vllm, and playwright for browser automation.

After installation, you'll need to set up the browser automation components:

```bash
!playwright install            # Install browser binaries
```
Note: The `!` and `%pip` prefixes are specific to Jupyter notebooks. In a regular
terminal, use these commands without the prefixes.



## Environment Setup

TorchRL's LLM interface is built around composable environments and transforms.
The key components are:

1. A base environment (ChatEnv)
2. Tool execution transforms
3. Data loading transforms
4. Reward computation transforms

Let's import the necessary components and set up our environment.



In [20]:
from __future__ import annotations

import warnings

import torch

from tensordict import set_list_to_stack, TensorDict
from torchrl import torchrl_logger
from torchrl.data import CompositeSpec, Unbounded
from torchrl.envs import Transform
from torchrl.envs.llm import ChatEnv
from torchrl.envs.llm.transforms.browser import BrowserTransform
from transformers import AutoTokenizer

warnings.filterwarnings("ignore")

## Step 1: Basic Environment Configuration

We'll create a ChatEnv and configure it with browser automation capabilities.
First, we enable list-to-stack conversion for TensorDict, which is required
for proper batch handling in LLM environments.



In [21]:
# Enable list-to-stack conversion for TensorDict
set_list_to_stack(True).set()

Now we'll create the tokenizer and base environment. The environment requires
a batch size, even if we're only running a single instance.



In [22]:
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Original code - this has an invalid parameter 'apply_template'
# env = ChatEnv(
#     batch_size=(1,),
#     tokenizer=tokenizer,
#     apply_template=True,
#     system_prompt=(
#         "You are a helpful assistant that can use tools to accomplish tasks. "
#         "Tools will be executed and their responses will be added to our conversation."
#     ),
# )

# Fixed code - removed 'apply_template' parameter which doesn't exist in ChatEnv
env = ChatEnv(
    batch_size=(1,),
    tokenizer=tokenizer,
    system_prompt=(
        "You are a helpful assistant that can use tools to accomplish tasks. "
        "Tools will be executed and their responses will be added to our conversation."
    ),
)

Next, we'll add the browser transform with safety configurations. This transform
enables web browsing capabilities with domain restrictions for security.



In [23]:
# Original code that might have issues with the current TorchRL version
browser_transform = BrowserTransform(
    allowed_domains=["google.com", "github.com"],
    headless=False,  # Set to False to see the browser actions
)
env = env.append_transform(browser_transform)

# Let's try a different approach - first check if BrowserTransform works differently
# print("Checking BrowserTransform parameters...")
# try:
#     # Let's see what parameters BrowserTransform actually expects
#     import inspect
#     browser_sig = inspect.signature(BrowserTransform.__init__)
#     print(f"BrowserTransform signature: {browser_sig}")
    
#     # Try creating the transform with minimal parameters
#     browser_transform = BrowserTransform()
#     print("✅ BrowserTransform created successfully with default parameters")
    
#     # Now try appending it
#     env = env.append_transform(browser_transform)
#     print("✅ BrowserTransform appended successfully")
    
# except Exception as e:
#     print(f"❌ BrowserTransform creation failed: {e}")
#     print("Continuing without browser transform for now...")
#     browser_transform = None

We can also design a transform to assign rewards to the environment.
For example, we can parse the result of the browser transform to assign a reward
whenever specific goals are achieved. Very simply, in this example, we will assign
a reward of 2 if the LLM finds the answer to the question (Paris), a reward of 1 if it
reaches the desired website, and a reward of 0 otherwise.



In [24]:
class RewardTransform(Transform):
    """A transform that assigns rewards based on the LLM's responses.

    This transform parses the browser responses in the environment's history and assigns
    rewards based on specific achievements:

    - Finding the correct answer (Paris): reward = 2.0
    - Successfully reaching Google: reward = 1.0
    - Otherwise: reward = 0.0

    """

    def _call(self, tensordict: TensorDict) -> TensorDict:
        """Process the tensordict and assign rewards based on the LLM's response.

        Args:
            tensordict (TensorDict): The tensordict containing the environment state.
                Must have a "history" key containing the conversation history.

        Returns:
            TensorDict: The tensordict with an added "reward" key containing the
                computed reward with shape (B, 1) where B is the batch size.
        """
        # ChatEnv has created a history item. We just pick up the last item,
        # and check if `"Paris"` is in the response.
        # We use index 0 because we are in a single-instance environment.
        history = tensordict["history"]
        
        # Check if history has prompt attribute and it's not None
        if hasattr(history, 'prompt') and history.prompt is not None:
            last_item = history.prompt[-1]
            # Check if the last item has content attribute
            if hasattr(last_item, 'content') and "Paris" in str(last_item.content):
                torchrl_logger.info("Found the answer to the question: Paris")
                # Recall that rewards have a trailing singleton dimension.
                tensordict["reward"] = torch.full((1, 1), 2.0)
            # Check if we successfully reached the website
            elif (
                hasattr(last_item, 'content') and 
                "google.com" in str(last_item.content) and
                "executed successfully" in str(last_item.content)
            ):
                torchrl_logger.info("Reached the website google.com")
                tensordict["reward"] = torch.full((1, 1), 1.0)
            else:
                tensordict["reward"] = torch.full((1, 1), 0.0)
        else:
            # If no proper history available, give zero reward
            tensordict["reward"] = torch.full((1, 1), 0.0)
        return tensordict

    def transform_reward_spec(self, reward_spec: CompositeSpec) -> CompositeSpec:
        """Transform the reward spec to include our custom reward.

        This method is required to override the reward spec since the environment
        is initially reward-agnostic.

        Args:
            reward_spec (CompositeSpec): The original reward spec from the environment.

        Returns:
            CompositeSpec: The transformed reward spec with our custom reward definition.
                The reward will have shape (B, 1) where B is the batch size.
        """
        reward_spec["reward"] = Unbounded(
            shape=reward_spec.shape + (1,), dtype=torch.float32
        )
        return reward_spec


# We append the reward transform to the environment.
env = env.append_transform(RewardTransform())

## Step 2: Tool Execution Helper

To make our interaction with tools more organized, we'll create a helper function
that executes tool actions and displays the results.



In [25]:
# Original execute_tool_action function - had wrong input format
# def execute_tool_action(
#     env: ChatEnv,
#     current_state: TensorDict,
#     action: str,
#     verbose: bool = True,
# ) -> tuple[TensorDict, TensorDict]:
#     """Execute a tool action and show the formatted interaction."""
#     s = current_state.set("text_response", [action])
#     s, s_ = env.step_and_maybe_reset(s)

#     if verbose:
#         print("\nLLM Action:")
#         print("-----------")
#         print(action)
#         print("\nEnvironment Response:")
#         print("--------------------")
#         torchrl_logger.info(s_["history"].apply_chat_template(tokenizer=env.tokenizer))

#     return s, s_

# Fixed execute_tool_action function - uses proper History format with simpler approach
from torchrl.data.llm.history import History
from torchrl.modules.llm.policies.common import ChatHistory

def execute_tool_action(
    env: ChatEnv,
    current_state: TensorDict,
    action: str,
    verbose: bool = True,
) -> tuple[TensorDict, TensorDict]:
    """Execute a tool action and show the formatted interaction."""
    
    # Create a simple assistant response history that matches the expected structure
    # Based on the ChatEnv design, we need to create a ChatHistory with "full" containing the response
    assistant_history = History(
        role="assistant", 
        content=action, 
        is_complete=True
    ).unsqueeze(0)  # Add batch dimension to match environment expectations
    
    # Create ChatHistory object with the assistant response as "full"
    # This is what ChatEnv expects for input during step
    chat_history = ChatHistory(full=assistant_history)
    
    # Create the input tensordict
    s = current_state.clone().set("history", chat_history)
    
    # Step the environment
    s, s_ = env.step_and_maybe_reset(s)

    if verbose:
        print("\nLLM Action:")
        print("-----------")
        print(action)
        print("\nEnvironment Response:")
        print("--------------------")
        if "history" in s_ and s_["history"].prompt is not None:
            torchrl_logger.info(s_["history"].prompt.apply_chat_template(tokenizer=env.tokenizer))
        else:
            print("No history available in response")

    return s, s_

## Step 3: Starting the Interaction

Let's begin by initializing the environment with a question and navigating
to a search engine. Note that the tensordict used as input to the environment
must share the same batch size as the environment. The text query is put in a list
of length 1, such that it is compatible with the environment's batch size.



Now we'll navigate to Google using the browser transform. The transform
expects actions in a specific JSON format wrapped in tool tags.
In practice, this action should be the output of our LLM which
will write the response string in the `"text_response"` key.

**Note:** The browser actions may fail in Jupyter notebooks due to event loop conflicts 
("This event loop is already running"). This is a known limitation when mixing async 
operations with Jupyter's event loop. The tool calling framework and conversation 
management still work correctly - this demonstrates the core TorchRL LLM concepts 
even if the actual browser automation fails.

In [None]:
# s, s_ = execute_tool_action(
#     env,
#     reset,
#     """
#     Let me search for that:
#     <tool>browser
#     {
#         "action": "navigate",
#         "url": "https://google.com"
#     }
#     </tool><|im_end|>
#     """,
# )


LLM Action:
-----------

    Let me search for that:
    <tool>browser
    {
        "action": "navigate",
        "url": "https://google.com"
    }
    </tool><|im_end|>
    

Environment Response:
--------------------
[92m2025-09-16 14:55:32,573 [torchrl][INFO][0m    ['<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>assistant\n\n    Let me search for that:\n    <tool>browser\n    {\n        "action": "navigate",\n        "url": "https://google.com"\n    }\n    </tool><|im_end|>\n    <|im_end|>\n    <|im_start|>user\n<tool_response>\nTool browser failed:\nTool execution failed: This event loop is already running\n</tool_response><|im_end|>\n<|im_start|>assistant\n'][92m [END][0m


### Fix Event Loop (Advanced)

If you want to use the real browser automation, you can fix the event loop issue with:

In [26]:
reset = env.reset(
    TensorDict(
        # text=["What is the capital of France?"],
        query=["What is the capital of France?"],
        batch_size=(1,),
    )
)

In [27]:
# Option 2: Fix the event loop issue for real browser automation
# CAUTION: This modifies the global event loop and may have side effects

import asyncio
import nest_asyncio

# Enable nested event loops (allows async code to run in Jupyter)
nest_asyncio.apply()

print("✅ Event loop patched! You can now use the real BrowserTransform.")
print("Note: You may need to restart the kernel if you encounter issues.")

✅ Event loop patched! You can now use the real BrowserTransform.
Note: You may need to restart the kernel if you encounter issues.


## Step 4: Navigate to Google

Now that we've fixed the event loop issue with `nest_asyncio.apply()`, we can properly navigate to Google. The previous navigation attempt failed due to the async conflict, so we need to retry it.

In [28]:
# Now that the event loop is fixed, let's navigate to Google.com
# This is the step that failed before due to the async conflict
# Use 'reset' since the previous s_ failed due to the async error
s, s_ = execute_tool_action(
    env,
    reset,  # Use the reset state instead of s_ which doesn't exist
    """
    Let me navigate to Google to search for the capital of France:
    <tool>browser
    {
        "action": "navigate",
        "url": "https://google.com"
    }
    </tool>
    """,
)

[92m2025-09-16 16:14:47,428 [torchrl][INFO][0m    Reached the website google.com[92m [END][0m

LLM Action:
-----------

    Let me navigate to Google to search for the capital of France:
    <tool>browser
    {
        "action": "navigate",
        "url": "https://google.com"
    }
    </tool>
    

Environment Response:
--------------------
[92m2025-09-16 16:14:47,432 [torchrl][INFO][0m    ['<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>assistant\n\n    Let me navigate to Google to search for the capital of France:\n    <tool>browser\n    {\n        "action": "navigate",\n        "url": "https://google.com"\n    }\n    </tool>\n    <|im_end|>\n    <|im_start|>user\n<tool_response>\nTool browser executed successfully:\n{\'success\': True, \'result\': {\'url\': \'https://www.google.com/\', \'status\': 200}}\n</tool_response><|im_end|>\n<|im_start|>assistant\n'][92m [END][0m


## Step 5: Performing the Search

With the browser open, we can now type our query and execute the search.
First, we'll type the search query into Google's search box.



In [29]:
s, s_ = execute_tool_action(
    env,
    s_,
    """
    Let me type the search query:
    <tool>browser
    {
        "action": "type",
        "selector": "[name='q']",
        "text": "What is the capital of France?"
    }
    </tool><|im_end|>
    """,
)


LLM Action:
-----------

    Let me type the search query:
    <tool>browser
    {
        "action": "type",
        "selector": "[name='q']",
        "text": "What is the capital of France?"
    }
    </tool><|im_end|>
    

Environment Response:
--------------------
[92m2025-09-16 16:15:21,666 [torchrl][INFO][0m    ['<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>assistant\n\n    Let me type the search query:\n    <tool>browser\n    {\n        "action": "type",\n        "selector": "[name=\'q\']",\n        "text": "What is the capital of France?"\n    }\n    </tool><|im_end|>\n    <|im_end|>\n    <|im_start|>user\n<tool_response>\nTool browser executed successfully:\n{\'success\': True, \'result\': {\'typed\': \'What is the capital of France?\', \'into\': "[name=\'q\']"}}\n</tool_response><|im_end|>\n<|im_start|>assistant\n'][92m [END][0m


Next, we'll click the search button to execute the search. Note how we
use CSS selectors to identify elements on the page.



In [31]:
s, s_ = execute_tool_action(
    env,
    s_,
    """
    Now let me click the search button:
    <tool>browser
    {
        "action": "click",
        "selector": "[name='btnK']"
    }
    </tool><|im_end|>
    """,
)


LLM Action:
-----------

    Now let me click the search button:
    <tool>browser
    {
        "action": "click",
        "selector": "[name='btnK']"
    }
    </tool><|im_end|>
    

Environment Response:
--------------------
[92m2025-09-16 16:16:42,419 [torchrl][INFO][0m    ['<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>assistant\n\n    Now let me click the search button:\n    <tool>browser\n    {\n        "action": "click",\n        "selector": "[name=\'btnK\']"\n    }\n    </tool><|im_end|>\n    <|im_end|>\n    <|im_start|>user\n<tool_response>\nTool browser executed successfully:\n{\'success\': True, \'result\': {\'clicked\': "[name=\'btnK\']"}}\n</tool_response><|im_end|>\n<|im_start|>assistant\n'][92m [END][0m


## Step 5: Extracting Results

Finally, we'll extract the search results from the page. The browser transform
can extract both text content and HTML from specified elements.



In [32]:
s, s_ = execute_tool_action(
    env,
    s_,
    """
    Let me extract the results:
    <tool>browser
    {
        "action": "extract",
        "selector": "#search",
        "extract_type": "text"
    }
    </tool><|im_end|>
    """,
)

[92m2025-09-16 16:16:54,108 [torchrl][INFO][0m    Found the answer to the question: Paris[92m [END][0m

LLM Action:
-----------

    Let me extract the results:
    <tool>browser
    {
        "action": "extract",
        "selector": "#search",
        "extract_type": "text"
    }
    </tool><|im_end|>
    

Environment Response:
--------------------
[92m2025-09-16 16:16:54,110 [torchrl][INFO][0m    ['<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>assistant\n\n    Let me extract the results:\n    <tool>browser\n    {\n        "action": "extract",\n        "selector": "#search",\n        "extract_type": "text"\n    }\n    </tool><|im_end|>\n    <|im_end|>\n    <|im_start|>user\n<tool_response>\nTool browser executed successfully:\n{\'success\': True, \'result\': {\'content\': \'People also askWhat are the two capitals of France?Brazzaville (1940–1943), with metropolitan France under Axis powers rule, Brazzaville was announced as the seat of the Free France

Let's close the environment.



In [34]:
env.close()

## Conclusion

This tutorial demonstrates how to build and compose LLM environments with tool capabilities
in TorchRL. We've shown how to create a complete environment that can execute tools,
format responses, and handle interactions between the LLM and external tools.

The key concepts are:

1. **Environment Composition**: Understanding TorchRL's LLM environment composition
2. **Tool Integration**: Creating and appending tool transforms
3. **Conversation Management**: Formatting tool responses and LLM interactions using History and ChatHistory
4. **State Management**: Handling tool execution and state management
5. **Wrapper Integration**: Integrating with LLM wrappers (vLLM, Transformers)

### Troubleshooting Browser Issues

If you encounter "This event loop is already running" errors:

1. **Use the Mock Transform**: Replace BrowserTransform with MockBrowserTransform for educational purposes
2. **Apply nest_asyncio**: Use `nest_asyncio.apply()` to fix event loop conflicts  
3. **Run in Terminal**: Execute the script outside Jupyter for full browser automation

The core TorchRL LLM framework concepts work correctly regardless of browser automation issues.

### Next Steps

- Explore other tool transforms (file operations, API calls, etc.)
- Integrate with actual LLM inference using vLLMWrapper or TransformersWrapper
- Build custom reward functions for reinforcement learning scenarios
- Create multi-turn conversations with complex tool interactions

See the `ref_llms` tutorial for more information on how to build tool-enabled
environments with TorchRL.