# Injected State

While working with [RAGV4](https://github.com/MatteoFalcioni/experiments/blob/main/RAG_V4.0_.ipynb) I noticed a huge problem. 

I was building an agent supervisor system that managed two worker agents - a data analyst and a visualizer. Problem was, when the data analyst had completed its analysis on the datasets, the visualizer wasn't able to access the analysed data, and he would try to run the analysis again. 

Why did this happen? Well, the datasets were loaded in memory and stored as global variables in a dictionary. But this made it difficult for different agents to retrieve the same data, as the global needed to be shared and the possibility to access it needed to be enforced through prompting. Isn't there a standard way of letting all agents access the same data at runtime?

Of course there is, and it's actually a basic concept in LangGraph: it's the graph's **State**. If we want some object to be accessible and visible to all agents at any time, than that object should be in the Graph state.

So we'd just need to subclass the basic `MessagesState` class from LangGraph and add our relevant object to state to define our *"state schema"*, like this: 

```python
class MyState(MessagesState):   # already contains a structure like Annotated[Sequence[BaseMessage], operator.add]
    dataframes : dict   # add your needed fields 
```

Perfect, right...? No! If we only did this and tried to access the `dataframes` dict inside our tools, we wouldn't manage to do so. 

We have to follow a specific syntax in order to access state data in our tools and modify it. We need to use the [`InjectedState`](https://langchain-ai.github.io/langgraph/reference/agents/#langgraph.prebuilt.tool_node.InjectedState) annotation.

### Using `InjectedState` in tools

The following is an example from the [`InjectedState`](https://langchain-ai.github.io/langgraph/reference/agents/#langgraph.prebuilt.tool_node.InjectedState) documentation.

Here they don't subclass `MessagesState` but the principle is the same. 

In [1]:
from typing import List
from typing_extensions import Annotated, TypedDict

from langchain_core.messages import BaseMessage, AIMessage
from langchain_core.tools import tool

from langgraph.prebuilt import InjectedState, ToolNode


class AgentState(TypedDict):    # create your state schema
    messages: List[BaseMessage]
    foo: str

@tool
def state_tool(x: int, state: Annotated[dict, InjectedState]) -> str:   # use Annotated[dict, InjectedState]
    '''Do something with state.'''
    if len(state["messages"]) > 2:      # here we use the whole state
        return state["foo"] + str(x)
    else:
        return "not enough messages"

@tool
def foo_tool(x: int, foo: Annotated[str, InjectedState("foo")]) -> str: # we can select a specific field to pass with InjectedState("<field_name>")
    '''Do something else with state.'''
    return foo + str(x + 1)

node = ToolNode([state_tool, foo_tool])

tool_call1 = {"name": "state_tool", "args": {"x": 1}, "id": "1", "type": "tool_call"}
tool_call2 = {"name": "foo_tool", "args": {"x": 1}, "id": "2", "type": "tool_call"}
state = {
    "messages": [AIMessage("", tool_calls=[tool_call1, tool_call2])],
    "foo": "bar",
}
node.invoke(state)

{'messages': [ToolMessage(content='not enough messages', name='state_tool', tool_call_id='1'),
  ToolMessage(content='bar2', name='foo_tool', tool_call_id='2')]}

### Integrating `InjectedState` with agents

The simplest way to integrate `InjectedState` with agentic framework is to use the [`create_react_agent()`](https://langchain-ai.github.io/langgraph/reference/agents/#langgraph.prebuilt.chat_agent_executor.create_react_agent) function from LangGraph. 

We need to pass our custom state as the `state_schema` parameter, like this:

```python
agent = create_react_agent(
    model=..., 
    tools=[state_tool, foo_tool],
    state_schema=MyState
)
```

In this way the model knows what states it's working with.

### Other Context Management practises 

Before moving to a practical example, allow us to cite [other common context management practises in LangGraph](https://langchain-ai.github.io/langgraph/how-tos/tool-calling/#context-management).

As a matter of fact, `InjectedState` is not the only way to allow our graph state to persist as context data. It is the most flexible and "lightweight" standard, but we can use: 

* [`Configuration`](https://langchain-ai.github.io/langgraph/how-tos/tool-calling/#configuration) : "*Use configuration when you have static, immutable runtime data that tools require, such as user identifiers"*. 

* [Long term memory](https://langchain-ai.github.io/langgraph/how-tos/tool-calling/#long-term-memory) : *"Use long-term memory to store user-specific or application-specific data across different sessions"*. The main difference here is that the data persists to other sources (like disk) even after the current session ends. This is useful for working with heavy datasets, or to leverage memory across different runs of a chatbot. See [Stores](https://langchain-ai.github.io/langgraph/reference/store/#storage) for further references.

## Example #1 of `InjectedState` workflow

Let's make a simple example to recap the actual workflow with `InjectedState`:

### 1. Create your custom "`state_schema`":

In [2]:
from langgraph.graph import MessagesState

class CustomState(MessagesState): 
    username : str 
    remaining_steps : int

>**Note:** from the [create_react_agent() doc](https://langchain-ai.github.io/langgraph/reference/agents/#langgraph.prebuilt.chat_agent_executor.create_react_agent):  
>
>*`state_schema` : An optional state schema that defines graph state. Must have `messages` and `remaining_steps` keys. Defaults to AgentState that defines those two keys.*
>
> `messages` is implemented by `MessagesState`, but we need to implement `reamining_steps` otherwise it will error.

### 2. Write your tools using `InjectedState`:

In [3]:
from langchain_core.tools import tool
from langgraph.prebuilt import InjectedState
from langchain_core.messages import ToolMessage
from langchain_core.tools import tool, InjectedToolCallId
from langgraph.types import Command

@tool 
def get_internal_value(state : Annotated[CustomState, InjectedState]) -> str:
    """tool to retrieve the username"""
    return state.get('username')

@tool 
def update_username(new_name : str, tool_call_id : Annotated[str, InjectedToolCallId]
) -> Command:
    """Update username in short-term memory."""
    
    return Command(update={
        "username" : new_name,
        "messages" : [
            ToolMessage(f"Updated username to {new_name}", tool_call_id=tool_call_id)
        ]
    })

>**Note:** Notice how we used: 
>   - `state.get()` to read the value
>   - a `Command` return in order to update the state : here we also need to append to messages a `ToolMessage`, otherwise it will error. In order to do so, we constructed it with `Annotated[str, InjectedToolCallId]` to follow the correct approach - but we could have done it in a simpler way like `ToolMessage("Success", tool_call_id=...)` as the error suggests:
>
>   *Expected to have a matching ToolMessage in Command.update for tool 'update_username', got: []. Every tool call (LLM requesting to call a tool) in the message history MUST have a corresponding ToolMessage. You can fix it by modifying the tool to return `Command(update=[ToolMessage("Success", tool_call_id=tool_call_id), ...], ...)`*

### 3. Create the agent passing the custom `state_schema` 

In [4]:
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI

agent = create_react_agent(
    model=ChatOpenAI(model="gpt-4o"),
    tools=[update_username, get_internal_value],
    state_schema=CustomState
)

In [5]:
from langchain_core.messages import HumanMessage

initial_state = {
    "messages": [HumanMessage(content="Whats my username? Update it to Mario. What's the username now?")],
    "username": "Matteo",
    "remaining_steps": 15
}

print(agent.invoke(initial_state)["messages"][-1].content)

Your previous username was "Matteo". I have updated it to "Mario". The username now is "Mario".


## RAGV4 Example

How about a practical application? 

Let's correctly build RAGV4 using `InjectedState`. We will only build the data analyst.

In [6]:
# setup keys

import getpass
import os
from dotenv import load_dotenv

load_dotenv()


True

In [7]:
def _set_if_undefined(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"Please provide your {var}")


_set_if_undefined("OPENAI_API_KEY")
_set_if_undefined("ANTHROPIC_API_KEY")

In [8]:
from langchain_core.tools import tool, InjectedToolCallId
from langchain_core.messages import ToolMessage
import geopandas as gpd
import pandas as pd
import os
from pathlib import Path
from langgraph.prebuilt import InjectedState
from typing_extensions import Annotated
from typing import Dict, Union
from langchain_experimental.utilities import PythonREPL
from langgraph.types import Command
from langgraph.graph import MessagesState


DATASET_FOLDER = "./LLM_data"

# ----------------------
# Define state schema
# ----------------------

class DatasetState(MessagesState):
    loaded: Dict[str, Union[pd.DataFrame, gpd.GeoDataFrame]]  # will store either pd.DataFrame or gpd.GeoDataFrame
    descriptions: Dict[str, str]    # will store datasets descriptions
    remaining_steps: int

# ----------------------
# Tool: list datasets
# ----------------------
@tool
def list_loadable_datasets() -> str:
    """Lists all available parquet datasets in the dataset folder."""
    files = [f for f in os.listdir(DATASET_FOLDER) if f.endswith(".parquet")]
    return "\n".join(files) if files else "No parquet datasets found."

@tool
def list_inmemory_datasets(state: Annotated[DatasetState, InjectedState]) -> str:
    """Lists all loaded datasets and their type (DataFrame or GeoDataFrame)."""
    if not state["loaded"]:
        return "No loaded datasets in memory. Use list_loadable_datasets() to see available files."
    
    lines = []
    for name, df in state["loaded"].items():
        dtype = "GeoDataFrame" if isinstance(df, gpd.GeoDataFrame) else "DataFrame"
        lines.append(f"- {name}: {dtype} (shape={df.shape})")

    return "\n".join(lines)

# ----------------------
# Tool: load dataset
# ----------------------
@tool
def load_dataset_named(file_name: str, state: Annotated[DatasetState, InjectedState], tool_call_id: Annotated[str, InjectedToolCallId]) -> Command:
    """
    Loads a Parquet dataset and its description.
    If geometry is present, loads as GeoDataFrame.
    Updates state['loaded'][name] and state['descriptions'][name].
    """
    loaded = state.get('loaded')
    descriptions = state.get('descriptions')

    file_stem = Path(file_name).stem
    file_name = f"{file_stem}.parquet"
    path = Path(DATASET_FOLDER) / file_name

    if not path.exists():
        available_files = os.listdir(DATASET_FOLDER)
        return f"File '{file_name}' not found. Available files: {available_files}"

    # Load DataFrame
    try:
        df = pd.read_parquet(path)
        if "geometry" in df.columns:
            try:
                df = gpd.read_parquet(path)
            except Exception as geo_err:
                return f"Geometry column found but failed to load as GeoDataFrame: {geo_err}"
        loaded[file_stem] = df
    except Exception as e:
        return f"Error loading dataset '{file_name}': {e}"

    # Load description
    desc_path = Path(DATASET_FOLDER) / f"{file_stem}.txt"
    try:
        with open(desc_path, "r", encoding="utf-8") as f:
            raw_desc = f.read()
    except Exception as e:
        raw_desc = "Description file missing or unreadable."

    # Enrich and save
    dtype_str = type(df).__name__   # DataFrame or GeoDataFrame
    head_str = df.head().to_string(index=False)
    cols_str = ", ".join(df.columns)
    enriched_desc = f"{dtype_str}\n{raw_desc}\n\n---\nPreview (first rows):\n{head_str}\n\nColumns: {cols_str}"

    descriptions[file_stem] = enriched_desc

    return Command(update={
        "loaded" : loaded,
        "descriptions" : descriptions,
        "messages" : [
            ToolMessage(f"Updated state dictionaries with loaded[{file_stem}] and descriptions[{file_stem}]", tool_call_id=tool_call_id)
        ]
    })

>**Incredibly important:** As we saw above, **all state updates from tool must be made through `Command`**. If you try to update states just by reassignment, this will fail silently! 
>
>The correct way is to return a `Command` with the updates. What I also did, since I wanted to add entries to my existing dictionaries, was get them with the `.get()` function, then add a given dictionary, and then update with the new dictionaries in `Command`. 

In [9]:
# ----------------------
# Tool: python repl
# ----------------------
repl = PythonREPL()
# Now use the tool with your injected REPL
@tool
def python_repl_tool(
    code: Annotated[str, "The python code to execute"], state: Annotated[DatasetState, InjectedState]
):
    """
    Use this to execute python code. If you want to see the output of a value,
    you should print it out with `print(...)`. This is visible to the user.
    Datasets are available as variables like `quartieri`, and descriptions as a dict `descriptions`['quartieri']
    """

    for name, df in state["loaded"].items():
        repl.globals[name] = df

    # Inject descriptions as a dictionary
    repl.globals["descriptions"] = state["descriptions"]
    
    try:
        result = repl.run(code)
    except BaseException as e:
        return f"Failed to execute. Error: {repr(e)}"
    return f"Successfully executed:\n```python\n{code}\n```\nStdout: {result}"



>**Note for `python_repl()`:** a subtle mistake is that if not specified in the docstring, the llm will not print results from the repl and therefore it will not understand the code it's writing. *So always specify it!*
>
>Also, Claude 4 understands it better than gpt-4o. 

In [10]:
prompt = ("You are a data analyst. Use your tools to explore and load datasets relevant to the task.\n"
        "The files you need to load are in the subdirectory at ./LLM_data\n\n"
        "Always check the description before working with a dataframe, in order to see what columns and values you are working with.\n"
        "You can also check the datasets loaded in memory with your list_inmemory_datasets() tool, \
        and check available datasets to load with the list_loadable_datasets() tool."
    ) 

In [11]:
from langgraph.prebuilt import create_react_agent

analyst_agent = create_react_agent(
    model="openai:gpt-4o",
    tools=[list_loadable_datasets, list_inmemory_datasets, load_dataset_named, python_repl_tool],
    prompt=prompt,
    name="data_analyst",
    state_schema=DatasetState
)

>**Note:** Don't forget `state_schema`, otherwise the model won't know what its state is supposed to look like...

In [12]:
from langchain_core.messages import convert_to_messages


def pretty_print_message(message, indent=False):
    pretty_message = message.pretty_repr(html=True)
    if not indent:
        print(pretty_message)
        return

    indented = "\n".join("\t" + c for c in pretty_message.split("\n"))
    print(indented)


def pretty_print_messages(update, last_message=False):
    is_subgraph = False
    if isinstance(update, tuple):
        ns, update = update
        # skip parent graph updates in the printouts
        if len(ns) == 0:
            return

        graph_id = ns[-1].split(":")[0]
        print(f"Update from subgraph {graph_id}:")
        print("\n")
        is_subgraph = True

    for node_name, node_update in update.items():
        update_label = f"Update from node {node_name}:"
        if is_subgraph:
            update_label = "\t" + update_label

        print(update_label)
        print("\n")

        messages = convert_to_messages(node_update["messages"])
        if last_message:
            messages = messages[-1:]

        for m in messages:
            pretty_print_message(m, indent=is_subgraph)
        print("\n")

>**Final note:** passing initial input without initialized values will error. This should probably be done in a better way - like initializing values to a default, or making them Optional (this could make sense if we have some agents that don't need datasets).

In [15]:
from langchain_core.messages import HumanMessage
import time

initial_state_2 = {
    "messages": [HumanMessage(content="Load a dataset of your choice, read its description, then load another. Then list the datasets you have in memory.")],
    "remaining_steps": 22,
    "loaded": {},
    "descriptions": {}
}

for chunk in analyst_agent.stream(initial_state_2):
    pretty_print_messages(chunk)

Update from node agent:


Name: data_analyst
Tool Calls:
  list_loadable_datasets (call_59o3GIDynOIRFWMX8OuaUxE7)
 Call ID: call_59o3GIDynOIRFWMX8OuaUxE7
  Args:
  list_inmemory_datasets (call_E6XjxCCL8Pz4dpYpxcToR2P9)
 Call ID: call_E6XjxCCL8Pz4dpYpxcToR2P9
  Args:


Update from node tools:


Name: list_loadable_datasets

neighborhoods.parquet
public_bathrooms.parquet
median_income_by_statistical_area.parquet
neighborhood_residents_data_1986to2024.parquet
neighborhood_socio_demographic_data_lastupdated2019.parquet
statistical_zones.parquet
pharmacies.parquet
points_of_interest.parquet


Update from node tools:


Name: list_inmemory_datasets

No loaded datasets in memory. Use list_loadable_datasets() to see available files.


Update from node agent:


Name: data_analyst
Tool Calls:
  load_dataset_named (call_dBFl381YyEbjKKXTzcZM5kdz)
 Call ID: call_dBFl381YyEbjKKXTzcZM5kdz
  Args:
    file_name: neighborhoods.parquet
  load_dataset_named (call_sFYxEDLPv8RZPXUUn2aCLhbA)
 Call ID: call_s

InvalidUpdateError: At key 'loaded': Can receive only one value per step. Use an Annotated key to handle multiple values.
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/INVALID_CONCURRENT_GRAPH_UPDATE

^^ for this problem see https://github.com/langchain-ai/langgraph/discussions/1787 : basically the system is updating the state keys in parallel. But the solution should be simple. Also: https://www.reddit.com/r/LangChain/comments/1hxt5t7/help_me_understand_state_reducers_in_langgraph/