# Building a multi-agent system for Computational Chemistry

This notebook demonstrates how to build a **ReAct-style agent** using:

- **LangGraph**
- **ALCF Inference Endpoint** (via `ChatOpenAI`)
- Three domain-specific tools:
  - `molecule_name_to_smiles`
  - `smiles_to_coordinate_file`
  - `run_mace_calculation`

The agent can:
1. Take a molecule name.
2. Convert it to a SMILES string.
3. Generate a coordinate file from the SMILES.
4. Run a MACE-based calculation on the structure.

> ⚠️ **Note:** Sometimes the agent may skip tool calls and answer from its internal knowledge. This is expected behavior in ReAct-style agents.

In [None]:
from typing import TypedDict, Annotated
from langgraph.graph import add_messages

from langchain_openai import ChatOpenAI
from langgraph.prebuilt import ToolNode
from langgraph.graph import StateGraph, START, END
from inference_auth_token import get_access_token

from tools import molecule_name_to_smiles, smiles_to_coordinate_file, run_mace_calculation


In [None]:
# ============================================================
# 1. State definition
# ============================================================


class State(TypedDict):
    # LangGraph will automatically append/merge messages using add_messages
    messages: Annotated[list, add_messages]


In [None]:
# ============================================================
# 2. Routing logic
# ============================================================


def route_tools(state: State):
    """Route to the 'tools' node if the last message has tool calls; otherwise, route to 'done'.

    Parameters
    ----------
    state : State
        The current state containing messages and remaining steps

    Returns
    -------
    str
        Either 'tools' or 'done' based on the state conditions
    """
    # The state may sometimes be a list of messages; handle that case
    if isinstance(state, list):
        ai_message = state[-1]
    elif messages := state.get("messages", []):
        ai_message = messages[-1]
    else:
        raise ValueError(f"No messages found in input state to tool_edge: {state}")

    # If the last AI message has tool_calls, route to the tools node
    if hasattr(ai_message, "tool_calls") and len(ai_message.tool_calls) > 0:
        return "tools"
    return "done"


In [None]:
# ============================================================
# 3. LLM node: the "agent"
# ============================================================


def chem_agent(
    state: State,
    llm: ChatOpenAI,
    tools: list,
    system_prompt: str = "You are an assistant that use tools to solve problems ",
):
    # In this simple example, we wrap the entire message state as a user message.
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"{state['messages']}"},
    ]

    # Bind tools to the LLM so it can decide to call them
    llm_with_tools = llm.bind_tools(tools=tools)

    # Invoke the LLM and return the updated messages
    return {"messages": [llm_with_tools.invoke(messages)]}


## Second LLM Node – `structured_output_agent`

Once tools have finished (or if no tools were needed), we hand the state to a
**second agent** whose only job is to produce **JSON-only output**.

This is useful for downstream consumption (e.g., other scripts, dashboards, or pipelines).


In [None]:
# ============================================================
# 3*. A second agent: Handle creating structured output
# ============================================================


def structured_output_agent(
    state: State,
    llm: ChatOpenAI,
    system_prompt: str = ("You are an assistant that returns ONLY JSON. "),
):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"{state['messages']}"},
    ]

    result = llm.invoke(messages)
    return {"messages": [result]}


In [6]:
# ============================================================
# 4. LLM / tools setup
# ============================================================

# Get token for your ALCF inference endpoint
access_token = get_access_token()

# Initialize the model hosted on the ALCF endpoint
llm = ChatOpenAI(
    model_name="openai/gpt-oss-20b",
    # model_name="Qwen/Qwen3-32B",
    api_key=access_token,
    base_url="https://data-portal-dev.cels.anl.gov/resource_server/sophia/vllm/v1",
    temperature=0,
)

# Tool list that the LLM can call
tools = [molecule_name_to_smiles, smiles_to_coordinate_file, run_mace_calculation]


## Build the LangGraph

We build a `StateGraph` with the following nodes:

- `"chem_agent"` – decides whether tools are needed.
- `"tools"` – executes any requested tools.
- `"structured_output_agent"` – converts everything into final JSON.

**Flow:**

1. `START` → `chem_agent`
2. After `chem_agent`:
   - If it requested tools → `"tools"` → back to `"chem_agent"`
   - If no tools needed → `"structured_output_agent"` → `END`
python
Copy code


In [7]:
# ============================================================
# 5. Build the graph
# ============================================================

graph_builder = StateGraph(State)

# Agent node: calls LLM, which may decide to call tools
graph_builder.add_node(
    "chem_agent",
    lambda state: chem_agent(state, llm=llm, tools=tools),
)
graph_builder.add_node(
    "structured_output_agent",
    lambda state: structured_output_agent(state, llm=llm),
)

# Tool node: executes tool calls emitted by the LLM
tool_node = ToolNode(tools)
graph_builder.add_node("tools", tool_node)

# Graph logic
# START -> chem_agent
graph_builder.add_edge(START, "chem_agent")

# After chem_agent runs, check if we need to run tools
graph_builder.add_conditional_edges(
    "chem_agent", route_tools, {"tools": "tools", "done": "structured_output_agent"}
)

# After tools run, go back to the agent so it can use tool results
graph_builder.add_edge("tools", "chem_agent")

# After structured_output_agent, terminate the graph
graph_builder.add_edge("structured_output_agent", END)

# Compile the graph
graph = graph_builder.compile()


## Visualize the graph

You can visualize the graph using draw_ascii() method, or draw_mermaid_png()

In [10]:
print(graph.get_graph().draw_ascii())

            +-----------+                     
            | __start__ |                     
            +-----------+                     
                  *                           
                  *                           
                  *                           
           +------------+                     
           | chem_agent |                     
           +------------+                     
           **           ..                    
         **               ..                  
       **                   ..                
+-------+         +-------------------------+ 
| tools |         | structured_output_agent | 
+-------+         +-------------------------+ 
                                *             
                                *             
                                *             
                          +---------+         
                          | __end__ |         
                          +---------+         


## Run the Graph – Example Prompt

Now we can **stream** the graph execution for a chemistry task:

> “Optimize formic acid and acetic acid with MACE. Return the results in a JSON.”

As the graph runs, we print the **latest message** at each step.  
Depending on the model and tools, you should see:

- Tool calls to MACE for each molecule.
- Final JSON-style result from the `structured_output_agent`.



In [13]:
# ============================================================
# 6. Run / stream the graph
# ============================================================

prompt = "Optimize formic acid and acetic acid with MACE. Return the results in a JSON."

for chunk in graph.stream(
    {"messages": prompt},
    stream_mode="values",
):
    new_message = chunk["messages"][-1]
    # pretty_print is a LangChain helper for nicely formatted output
    if hasattr(new_message, "pretty_print"):
        new_message.pretty_print()
    else:
        # Fallback in case pretty_print is not available
        print(new_message)



Optimize formic acid and acetic acid with MACE. Return the results in a JSON.
Tool Calls:
  molecule_name_to_smiles (chatcmpl-tool-e1a2cd214dc74bd69dd1299891659a81)
 Call ID: chatcmpl-tool-e1a2cd214dc74bd69dd1299891659a81
  Args:
    name: formic acid
Name: molecule_name_to_smiles

C(=O)O
Tool Calls:
  molecule_name_to_smiles (chatcmpl-tool-0b221f37a38b40dd8f76aefeaefc7f4a)
 Call ID: chatcmpl-tool-0b221f37a38b40dd8f76aefeaefc7f4a
  Args:
    name: acetic acid
Name: molecule_name_to_smiles

CC(=O)O
Tool Calls:
  smiles_to_coordinate_file (chatcmpl-tool-6c3b1b7997cd45339e21da88947a3500)
 Call ID: chatcmpl-tool-6c3b1b7997cd45339e21da88947a3500
  Args:
    smiles: C(=O)O
    output_file: formic.xyz
    randomSeed: 2025
    fmt: xyz
Name: smiles_to_coordinate_file

{"ok": true, "artifact": "coordinate_file", "path": "/lus/grand/projects/IQC/thang/ALCF_contributions/ai-science-training-series/04-Inference-Workflows/Agentic-workflows/formic.xyz", "smiles": "C(=O)O", "natoms": 5}
Tool Calls:


  _Jd, _W3j_flat, _W3j_indices = torch.load(os.path.join(os.path.dirname(__file__), 'constants.pt'))


cuequivariance or cuequivariance_torch is not available. Cuequivariance acceleration will be disabled.
Using Materials Project MACE for MACECalculator with /home/tdpham2/.cache/mace/20231210mace128L0_energy_epoch249model
Using float32 for MACECalculator, which is faster but less accurate. Recommended for MD. Use float64 for geometry optimization.
Using head Default out of ['Default']
Default dtype float32 does not match model dtype float64, converting models to float32.


  torch.load(f=model_path, map_location=device)


      Step     Time          Energy          fmax
BFGS:    0 20:46:01      -29.060833        4.386633
BFGS:    1 20:46:01      -29.179813        3.443640
BFGS:    2 20:46:01      -29.272917        0.968001
BFGS:    3 20:46:01      -29.314631        0.749908
BFGS:    4 20:46:01      -29.381224        0.561765
BFGS:    5 20:46:01      -29.386503        0.306008
BFGS:    6 20:46:01      -29.391729        0.284840
BFGS:    7 20:46:01      -29.392881        0.167910
BFGS:    8 20:46:01      -29.393208        0.027594
Name: run_mace_calculation

{"status": "success", "message": "MACE geometry optimization completed.", "mode": "geometry_optimization", "converged": true, "input_file": "/lus/grand/projects/IQC/thang/ALCF_contributions/ai-science-training-series/04-Inference-Workflows/Agentic-workflows/formic.xyz", "mace_model_name": "small", "device": "cpu", "final_energy_eV": -29.393207550048828, "final_positions": [[-0.4298449031878176, 0.028635591103541944, 0.10156060565193815], [-0.57517125

  torch.load(f=model_path, map_location=device)


BFGS:    3 20:46:03      -46.202526        0.791176
BFGS:    4 20:46:03      -46.240646        0.873693
BFGS:    5 20:46:03      -46.256691        0.522984
BFGS:    6 20:46:03      -46.266163        0.219253
BFGS:    7 20:46:03      -46.268791        0.139251
BFGS:    8 20:46:03      -46.270576        0.116754
BFGS:    9 20:46:03      -46.271477        0.070092
BFGS:   10 20:46:03      -46.272083        0.069516
BFGS:   11 20:46:03      -46.272594        0.070193
BFGS:   12 20:46:03      -46.273178        0.078776
BFGS:   13 20:46:03      -46.273720        0.077531
BFGS:   14 20:46:03      -46.274151        0.057216
BFGS:   15 20:46:03      -46.274509        0.073081
BFGS:   16 20:46:03      -46.274906        0.077480
BFGS:   17 20:46:03      -46.275295        0.055988
BFGS:   18 20:46:03      -46.275551        0.043609
Name: run_mace_calculation

{"status": "success", "message": "MACE geometry optimization completed.", "mode": "geometry_optimization", "converged": true, "input_file": 