# Project 2 - Building Langgraph Model - Document Analysis Graph


### Follow the instructions 
Here are [instructions](https://huggingface.co/learn/agents-course/unit2/langgraph/first_graph) for the tutorial that are helpful for this notebook.


### Description:
This project uses `langgraph`, a library that provides a framework for developing your agents with ease.

We look at Agents here. 

### The Problem this Notebook Attempts to Solve

Let’s create a document analysis system using LangGraph to serve Mr. Wayne’s needs. This system can:

1. Process images document

2. Extract text using vision models (Vision Language Model)

3. Perform calculations when needed (to demonstrate normal tools)

4. Analyze content and provide concise summaries

5. Execute specific instructions related to documents


### For this course, I am using 


see also [github code](https://github.com/huggingface/agents-course)

Specifically, I look at the [course notebook](https://huggingface.co/agents-course/notebooks/blob/main/unit2/langgraph/agent.ipynb)


## Load Imports

 `LangGraph` provides the graph structure, while `LangChain` offers convenient interfaces for working with LLMs.

In [None]:
import base64
import os
from typing import List, TypedDict, Annotated, Optional

from langchain_openai import ChatOpenAI
from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage
from langgraph.graph.message import add_messages
from langgraph.graph import START, StateGraph
from langgraph.prebuilt import ToolNode, tools_condition
from IPython.display import Image, display

### This notebook relies on the idea of (Reason-Act-Observe) - `ReAct`

As seen in the Unit 1, an agent needs 3 steps as introduced in the [ReAct](https://react-lm.github.io/) architecture : ReAct, a general agent architecture.

* `reason` - about documents and requests. Let the model reason about the tool output to decide what to do next (e.g., call another tool or just respond directly) 

* `act` - let the model call specific tools that are appropriate for the task

* `observe` - the results, i.e., pass the tool output back to the model

*  `repeat` as necessary until needs have been fully addressed (the loop part of it all)



<p>
  <img alt="Langgraph ReAct Figure" src="/imgs/langgraph_react.png"/>
</p>





### Preparing Tools


In [None]:
# Set your OpenAI API key here
os.environ["OPENAI_API_KEY"] = "sk-xxxxx"  # Replace with your actual API key

# Initialize our vision LLM to handle image documents
vision_llm = ChatOpenAI(model="gpt-4o", temperature=0) #temperature means here to be very precise

# Initalize text LLM
llm = ChatOpenAI(model="gpt-4o")

In [None]:
def extract_text(img_path: str) -> str:
    """
    Extract text from an image file using a multimodal model.

    Args:
        img_path: A local image file path (strings).

    Returns:
        A single string containing the concatenated text extracted from each image.
    """
    all_text = ""
    try:

        # Read image and encode as base64
        with open(img_path, "rb") as image_file:
            image_bytes = image_file.read()

        image_base64 = base64.b64encode(image_bytes).decode("utf-8")

        # Prepare the prompt including the base64 image data
        message = [
            HumanMessage(
                content=[
                    {
                        "type": "text",
                        "text": (
                            "Extract all the text from this image. "
                            "Return only the extracted text, no explanations."
                        ),
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{image_base64}"
                        },
                    },
                ]
            )
        ]

        # Call the vision-capable model
        response = vision_llm.invoke(message)

        # Append extracted text
        all_text += response.content + "\n\n"

        return all_text.strip()
    except Exception as e:
        # You can choose whether to raise or just return an empty string / error message
        error_msg = f"Error extracting text: {str(e)}"
        print(error_msg)
        return ""



def divide(a: int, b: int) -> float:
    """Divide a and b."""
    return a / b


tools = [
    divide,
    extract_text
]
llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=False)

### Create LLM and Prompt It with the Overall Desired Agent Behavior

#### This Agent's States are More Complex
This state is a little more complex than the previous ones we have seen. **AnyMessage** is a class from Langchain that defines messages, and **add_messages** is an operator that adds the latest message rather than overwriting it with the latest state.

This is a new concept in `LangGraph`, where you can add operators in your state to define the way they should interact together.

In [None]:
class AgentState(TypedDict):
    # The input document
    input_file: Optional[str]  # Contains file path, type (PNG)
    messages: Annotated[list[AnyMessage], add_messages]

###  Create the Nodes 


##### First Define Assistant

In [None]:
def assistant(state: AgentState):
    # System message
    textual_description_of_tool = """
extract_text(img_path: str) -> str:
    Extract text from an image file using a multimodal model.

    Args:
        img_path: A local image file path (strings).

    Returns:
        A single string containing the concatenated text extracted from each image.
divide(a: int, b: int) -> float:
    Divide a and b
"""
    image = state["input_file"]
    sys_msg = SystemMessage(content=f"You are an helpful agent that can analyse some images and run some computatio without provided tools :\n{textual_description_of_tool} \n You have access to some optional images. Currently the loaded images is : {image}")

    return {"messages": [llm_with_tools.invoke([sys_msg] + state["messages"])], "input_file": state["input_file"]}

### Building Nodes

We define a **tools** node with our list of tools.

The **assistant** node is just our model with bound tools.

We create a graph with **assistant** and **tools** nodes.

We add **tools_condition** edge, which routes to **End** or to **tools** based on whether the **assistant** calls a tool.

Now, we add one new step:

We connect the **tools** node back to the **assistant**, forming a loop.

* After the **assistant** node executes, **tools_condition** checks if the model's output is a tool call.

* If it is a tool call, the flow is directed to the **tools** node.

* The **tools** node connects back to **assistant**.

* This loop continues as long as the model decides to call tools.

* If the model response is not a tool call, the flow is directed to **END**, terminating the process.




In [None]:
# Graph
builder = StateGraph(AgentState)

# Define nodes: these do the work
builder.add_node("assistant", assistant)
builder.add_node("tools", ToolNode(tools))

# Define edges: these determine how the control flow moves
builder.add_edge(START, "assistant")
builder.add_conditional_edges(
    "assistant",
    # If the latest message (result) from assistant is a tool call -> tools_condition routes to tools
    # If the latest message (result) from assistant is a not a tool call -> tools_condition routes to END
    tools_condition,
)
builder.add_edge("tools", "assistant")
react_graph = builder.compile()

# Show
display(Image(react_graph.get_graph(xray=True).draw_mermaid_png()))

#### The Butler Langgraph Agentic Assistant In action

Here is an example to show a simple use case of an agent using a tool in LangGraph.

In [None]:
messages = [HumanMessage(content="Divide 6790 by 5")]

messages = react_graph.invoke({"messages": messages, "input_file": None})

In [None]:
for m in messages['messages']:
    m.pretty_print()S

The results:

`Human`: Divide 6790 by 5

`AI Tool Call`: divide(a=6790, b=5)

`Tool Response`: 1358.0

`Alfred`: The result of dividing 6790 by 5 is 1358.0.

---
---
# Example Building Out a Training Program

obtained the figure: `Batman_training_and_meals.png` [here](https://huggingface.co/datasets/agents-course/course-images/blob/main/en/unit2/LangGraph/Batman_training_and_meals.png)


When Master Wayne leaves his training and meal notes:

In [None]:
messages = [HumanMessage(content="According the note provided by MR wayne in the provided images. What's the list of items I should buy for the dinner menu ?")]

messages = react_graph.invoke({"messages": messages, "input_file": "Batman_training_and_meals.png"})

In [None]:
for m in messages['messages']:
    m.pretty_print()

The Results:

`Human`: According to the note provided by Mr. Wayne in the provided images. What's the list of items I should buy for the dinner menu?

AI Tool `Call`: extract_text(img_path="Batman_training_and_meals.png")

Tool Response: [Extracted `text with` training schedule `and` menu details]

`Alfred`: **For** the dinner menu, you should buy the following items:

1. Grass-fed local sirloin steak

2. Organic spinach

3. Piquillo peppers

4. Potatoes (for oven-baked golden herb potato)

5. Fish oil (2 grams)


Ensure the steak is grass-fed and the spinach and peppers are organic for the best quality meal.

## Key Takeaways

Should you wish to create your own document analysis butler, here are key considerations:

* Define clear tools for specific document-related tasks

* Create a robust state tracker to maintain context between tool calls

* Consider error handling for tool failures

* Maintain contextual awareness of previous interactions (ensured by the operator add_messages)

* With these principles, you too can provide exemplary document analysis service worthy of Wayne Manor.