# Implementing the `write_report()` tool and testing it out

## Brainstorming



We want the model to be able to produce a report of the analysis performed. 

What it should do:

1. *Get the full chat history in order to see al relevant analysis performed*
    
    **challenges:** 

    - If chat gets summarized, previous analysis are blurred out. We need detailed history, but at the same time we cannot input more than a certain number of tokens to the model that produces the analysis. 
    
    **solutions:**

    - we need to keep the full chat state somewhere when we summarize. Maybe store it in state. Could be a long `analysis` str, or could be a virtual file in the virtual file system. Then the `write_report()` tool could split the full history in sections in order not to exceed context length.     

2. *Collect all sources used - meaning, the list of all datasets analized*
    
    **challenges:**

    - We have to keep track of all used datasets.
    
    **solutions:**
    
    - Just add a state var `sources`; it gets filled every time we **select** a dataset with the `select_dataset` tool. The model that writes the report can then cite the source throughout the report, understanding when it was used from chat history.

3. *Produce an extensive report (.md format probably)*

----

> **? Should the report include python code ?**
>
> It could be useful to include in the report - or somewhere else - all python code written during the analysis. 
> If could be a section/appendyx in the report, or a separate file that the user can ask to be given. 
> or simply another state str that gets filled up anytime the code exec runs - then we manually combine it into an appendyx in the report. i Just don't want it to be like 3000 lines at the end of the report. maybe they should be separate files. 

> *DB Related:* This extra stuff needs to remain visible to the user when he goes back to previous chats - add those in the db 


## Implementation

Let's replicate the graph below, but simulate the analysis in some way...

We can 

1. create a new state with the new state vars we need
2. give the model a simpler repl tool that can still generate python code and get stdout and stderr - or make it fake
3. give the model a fake select ds tool that adds the sources to state var
4. create the real write_report() tool (and a nice prompt) for the model to use all the rest. 
5. make it write to file so we can read it.   

### 1. State Vars

In [None]:
from langgraph.prebuilt.chat_agent_executor import AgentState
from typing import Annotated

def list_add(left: list[str] | None = None, right: str | None = None) -> list[str]:
    """
    Reducer to add a new item to a list. Used for sources and code.
    """
    if left is None:
        left = []
    if right is None:
        right = []
    
    return left + right

class MyState(AgentState):
    # here we would have also summary and token count
    sources : Annotated[list[dict[str, str]], list_add] # key is the dataset id, value is the dataset description
    code: Annotated[list[dict[str, str]], list_add]  # list of dicts, each dicts is input and output of a code block (out can be stdout or stderr)
    report: Annotated[list[dict[str, str]], list_add]  # key is the title, value is the content - but there could be more than one report so list of dicts

Will these reducers work well with left + right if they are dicts?

### 2. & 3. Fake tools

In [None]:
from langchain_core.tools import InjectedToolCallId, tool
from langchain_core.messages import ToolMessage
from langgraph.types import Command


@tool
def code_exec(
    code: Annotated[str, "The python code to execute to generate your chart."],
    tool_call_id: Annotated[str, InjectedToolCallId]
)->Command: 
    """Use this to execute python code."""

    result = """# code executed succesfully"""  # this would be stdout
    code_dict = {"input": code, "output": result}

    return Command(
        update = {
            "messages" : [ToolMessage(content = result, tool_call_id = tool_call_id)],
            "code": [code_dict], # wrap in list for reducer
        }
    )

@tool
def list_datasets(
    tool_call_id: Annotated[str, InjectedToolCallId]
)->Command:
    """
    List all available datasets.
    """

    fake_ds = ["dataset1", "dataset2", "dataset3"]

    return Command(
        update = {
            "messages" : [ToolMessage(content = f"Here are the available datasets: {fake_ds}", tool_call_id = tool_call_id)],
        }
    )

@tool 
def select_dataset(
    dataset_id: str,
    tool_call_id: Annotated[str, InjectedToolCallId]
)->Command:
    """
    Select a dataset from the list of available datasets. Adds it to the list of sources.
    """
    return Command(
        update = {
            "messages" : [ToolMessage(content = f"Selected and loaded dataset {dataset_id}", tool_call_id = tool_call_id)],
            "sources" : [dataset_id]
        }
    )

### 4. & 5. Write report tool

In [None]:
from langgraph.prebuilt import InjectedState

def write_report(
    report_title: Annotated[str, "The title of the report"],
    report_content: Annotated[str, "The content of the report"],
    state: Annotated[InjectedState, MyState],
    tool_call_id: Annotated[str, InjectedToolCallId]
)->Command:
    """
    Write a report of the analysis performed.
    """

    report_dict = {report_title: report_content}

    # also write to file in dev
    with open("report.md", "a") as f:
        f.write(f"# {report_title}\n{report_content}\n\n")

    return Command(
        update = {
            "messages" : [ToolMessage(content = "Report written", tool_call_id = tool_call_id)],
            "report" : [report_dict]
        }
    )

# probably should make a modify_report() tool that can be used to add to modify an existing report


How should i make it add sources? first make the report than another pass for sources? A `read_sources` tool?

### 6. Create a well rounded prompt

In [None]:
report_prompt = """
You are an AI assitant that works together with a data analyst colleague.

After the data analyst performs an analysis, you will be asked to write a report of the analysis performed.

You will be given a list of sources used in the analysis.
"""