# Building a ReAct Agent for Computational Chemistry

This notebook demonstrates how to build a **ReAct-style agent** using:

- **LangGraph**
- **ALCF Inference Endpoint** (via `ChatOpenAI`)
- Three domain-specific tools:
  - `molecule_name_to_smiles`
  - `smiles_to_coordinate_file`
  - `run_mace_calculation`

The agent can:
1. Take a molecule name.
2. Convert it to a SMILES string.
3. Generate a coordinate file from the SMILES.
4. Run a MACE-based calculation on the structure.

> ⚠️ **Note:** Sometimes the agent may skip tool calls and answer from its internal knowledge. This is expected behavior in ReAct-style agents.


## Imports and Setup

In this cell, we import all the Python packages and tools we need:

- **TypedDict / Annotated**: Define the graph state schema.
- **LangGraph**: Build the agentic workflow as a state machine.
- **LangChain OpenAI**: Connect to the ALCF-hosted LLM.
- **ToolNode**: Execute tool calls automatically.
- **get_access_token**: Helper to authenticate with the ALCF Inference Endpoint.
- **tools**: Domain-specific tools for computational chemistry.

In [1]:
from typing import TypedDict, Annotated

from langgraph.graph import add_messages
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode

from langchain_openai import ChatOpenAI

from inference_auth_token import get_access_token
from tools import (
    molecule_name_to_smiles,
    smiles_to_coordinate_file,
    run_mace_calculation,
)


## Define the Graph State

LangGraph represents the conversation as a **state**.  
In this simple example, our state only has one field:

- `messages`: a list of messages that represent the conversation history.

We use `Annotated[..., add_messages]` so that LangGraph knows how to **append** new messages as the graph runs.
python
Copy code


In [2]:
class State(TypedDict):
    # A list of LangChain messages (HumanMessage, AIMessage, ToolMessage, etc.)
    messages: Annotated[list, add_messages]


## Routing Logic: When Should We Call Tools?

After the LLM responds, we need to decide:

- Should we send the result to the **tool node** (because the LLM requested tool calls)?
- Or are we **done** (no tool calls, just a final answer)?

The function `route_tools` looks at the **last AI message** and checks if it has `tool_calls`. If yes, we route to `"tools"`. Otherwise, we route to `"done"` (which will map to `END` in our graph).
python
Copy code


In [3]:
def route_tools(state: State) -> str:
    """Route to the 'tools' node if the last message has tool calls; otherwise, route to 'done'.

    Parameters
    ----------
    state : State
        The current state containing messages.

    Returns
    -------
    str
        Either 'tools' or 'done' based on whether the last AI message requested tool calls.
    """
    # Handle the case where LangGraph might pass a list directly
    if isinstance(state, list):
        ai_message = state[-1]
    elif messages := state.get("messages", []):
        ai_message = messages[-1]
    else:
        raise ValueError(f"No messages found in input state to route_tools: {state}")

    # If the AI message contains tool_calls, we route to the tools node
    if hasattr(ai_message, "tool_calls") and len(ai_message.tool_calls) > 0:
        return "tools"

    # Otherwise, we are done (no tools to call)
    return "done"


## Define the LLM Node: `chem_agent`

This node is our **agent**: it

1. Takes the current state (conversation history),
2. Prepends a **system prompt**,
3. Asks the LLM to respond, **binding** the available tools.

If the LLM decides that tools are needed, it will output `tool_calls` in its response.  
Our routing function from the previous cell will detect that and send control to the `tools` node.

In [4]:
def chem_agent(
    state: State,
    llm: ChatOpenAI,
    tools: list,
    system_prompt: str = "You are an assistant that uses tools to solve problems.",
):
    """Core agent node that calls the LLM with tools enabled.

    Parameters
    ----------
    state : State
        Current graph state containing prior messages.
    llm : ChatOpenAI
        LLM client connected to the ALCF Inference Endpoint.
    tools : list
        List of tool functions the LLM is allowed to call.
    system_prompt : str, optional
        System message guiding the agent's behavior.

    Returns
    -------
    dict
        Updated state with a new AI message appended under 'messages'.
    """
    # Build the message list for the LLM
    messages = [
        {"role": "system", "content": system_prompt},
        # We pass the full message history as a single user message for simplicity
        {"role": "user", "content": f"{state['messages']}"},
    ]

    # Bind tools so the LLM is allowed to call them
    llm_with_tools = llm.bind_tools(tools=tools)

    # Invoke the LLM and return the new AI message inside the state
    ai_message = llm_with_tools.invoke(messages)
    return {"messages": [ai_message]}


## Configure the LLM and Tools

Now we:

1. Grab an **access token** for the ALCF Inference Endpoint.
2. Initialize a `ChatOpenAI` model that points to the ALCF endpoint.
3. Define the **tool list** the LLM can use.

In [5]:
# Get token for your ALCF inference endpoint
access_token = get_access_token()

# Initialize the model hosted on the ALCF endpoint
llm = ChatOpenAI(
    model_name="openai/gpt-oss-120b",
    api_key=access_token,
    base_url="https://data-portal-dev.cels.anl.gov/resource_server/sophia/vllm/v1",
    temperature=0,
)

# Tool list that the LLM can call
tools = [molecule_name_to_smiles, smiles_to_coordinate_file, run_mace_calculation]


## Build the LangGraph State Machine

We now use **LangGraph** to build a small state machine:

1. `START → chem_agent`
2. From `chem_agent`, we decide:
   - `tools` if tool calls are present
   - `END` if no tools are needed
3. After `tools` run, we go back to `chem_agent` so that the LLM can:
   - See the tool results
   - Decide if more tools are needed
   - Or generate a final answer

This creates a **loop**:  
`chem_agent → tools → chem_agent → ... → END`


In [6]:
graph_builder = StateGraph(State)

# Agent node: calls LLM, which may decide to call tools
graph_builder.add_node(
    "chem_agent",
    lambda state: chem_agent(state, llm=llm, tools=tools),
)

# Tool node: executes tool calls emitted by the LLM
tool_node = ToolNode(tools)
graph_builder.add_node("tools", tool_node)

# Graph logic:
# 1. START -> chem_agent
graph_builder.add_edge(START, "chem_agent")

# 2. After chem_agent runs, route based on whether there are tool calls
graph_builder.add_conditional_edges(
    "chem_agent",
    route_tools,
    {
        "tools": "tools",  # go to tools node if tool calls are present
        "done": END,       # otherwise, end the graph
    },
)

# 3. After tools run, go back to the agent so it can use the tool results
graph_builder.add_edge("tools", "chem_agent")

# Compile the graph into an executable object
graph = graph_builder.compile()


## Visualize the graph

You can visualize the graph using draw_ascii() method, or draw_mermaid_png()

In [8]:
print(graph.get_graph().draw_ascii())

        +-----------+         
        | __start__ |         
        +-----------+         
               *              
               *              
               *              
        +------------+        
        | chem_agent |        
        +------------+        
          .         .         
        ..           ..       
       .               .      
+---------+         +-------+ 
| __end__ |         | tools | 
+---------+         +-------+ 


## Run and Stream the Graph

Finally, we run the graph with a **user prompt** and stream the intermediate states.

In [12]:
prompt = (
    "What is the SMILES string of methanol and the optimized structure of a carbon dioxide molecule?"
)

for chunk in graph.stream(
    {"messages": prompt},
    stream_mode="values",
):
    new_message = chunk["messages"][-1]
    # pretty_print() is a LangChain helper to show messages nicely
    new_message.pretty_print()



What is the SMILES string of methanol and the optimized structure of a carbon dioxide molecule?
Tool Calls:
  molecule_name_to_smiles (chatcmpl-tool-27e260cffa5d46798348810e64fd0340)
 Call ID: chatcmpl-tool-27e260cffa5d46798348810e64fd0340
  Args:
    name: methanol
Name: molecule_name_to_smiles

CO
Tool Calls:
  smiles_to_coordinate_file (chatcmpl-tool-e40af4df748a4caca0d3cbdccbb0137e)
 Call ID: chatcmpl-tool-e40af4df748a4caca0d3cbdccbb0137e
  Args:
    smiles: O=C=O
    output_file: co2.xyz
    randomSeed: 2025
    fmt: xyz
Name: smiles_to_coordinate_file

{"ok": true, "artifact": "coordinate_file", "path": "/lus/grand/projects/IQC/thang/ALCF_contributions/ai-science-training-series/04-Inference-Workflows/Agentic-workflows/co2.xyz", "smiles": "O=C=O", "natoms": 3}
Tool Calls:
  run_mace_calculation (chatcmpl-tool-5c92e8e7b2fd4249bfc5e52293570ab3)
 Call ID: chatcmpl-tool-5c92e8e7b2fd4249bfc5e52293570ab3
  Args:
    input_file: /lus/grand/projects/IQC/thang/ALCF_contributions/ai-scien

  _Jd, _W3j_flat, _W3j_indices = torch.load(os.path.join(os.path.dirname(__file__), 'constants.pt'))


cuequivariance or cuequivariance_torch is not available. Cuequivariance acceleration will be disabled.
Using Materials Project MACE for MACECalculator with /home/tdpham2/.cache/mace/20231210mace128L0_energy_epoch249model
Using float32 for MACECalculator, which is faster but less accurate. Recommended for MD. Use float64 for geometry optimization.


  torch.load(f=model_path, map_location=device)


Using head Default out of ['Default']
Default dtype float32 does not match model dtype float64, converting models to float32.
      Step     Time          Energy          fmax
BFGS:    0 20:39:53      -22.486820        5.389489
BFGS:    1 20:39:53      -22.794083        2.073127
BFGS:    2 20:39:53      -22.828390        0.410957
BFGS:    3 20:39:53      -22.829935        0.023935
Name: run_mace_calculation

{"status": "success", "message": "MACE geometry optimization completed.", "mode": "geometry_optimization", "converged": true, "input_file": "/lus/grand/projects/IQC/thang/ALCF_contributions/ai-science-training-series/04-Inference-Workflows/Agentic-workflows/co2.xyz", "mace_model_name": "small", "device": "cpu", "final_energy_eV": -22.82993507385254, "final_positions": [[-1.1782012963728021, -0.017800747019835143, -2.1457061410080795e-22], [1.9271610064685546e-06, -1.9288522034269548e-07, -1.1372469341102953e-22], [1.1781993688828778, 0.017800939871510348, -2.639680559781154e-23]], 