<a href="https://colab.research.google.com/github/sugarforever/LangChain-Advanced/blob/main/claude_3_xml_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Claude 3 XML Agent

## Why Use Agent?

The core idea of agents is to use a language model to choose a sequence of actions to take.

## What is XML Agent?

XML tags are a powerful tool for structuring prompts and guiding Claude's responses. Claude is particularly familiar with prompts that have XML tags as Claude was exposed to such prompts during training. By wrapping key parts of your prompt (such as instructions, examples, or input data) in XML tags, you can help Claude better understand the context and generate more accurate outputs. This technique is especially useful when working with complex prompts or variable inputs.

The agent using Claude 3 models utilizing XML tags in prompts can be called XML agent.

## Why Use XML Tags?

Please refer to the following user guide for more details:

https://docs.anthropic.com/claude/docs/use-xml-tags#why-use-xml-tags

## About This Tutorial

In this tutorial, we will use the following tools to implement XML agent that fully utilizes XML in agent with Claude 3:

1. Claude 3 (opus or sonnet)
2. LangChain
3. Faiss
4. PyPDF

In [1]:
!pip install -q -U langchain langchainhub langchain-anthropic langchain-openai faiss-cpu pypdf

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m809.1/809.1 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m17.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m286.1/286.1 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m47.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m260.9/260.9 kB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m68.0/68.0 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m850.5/850.5 kB[0m [31m33.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m257.5/257.5 kB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━

In [10]:
!wget -O aesopagent.pdf https://arxiv.org/pdf/2403.07952.pdf

--2024-03-18 21:24:20--  https://arxiv.org/pdf/2403.07952.pdf
Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.67.42, 151.101.195.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2024-03-18 21:24:20 ERROR 403: Forbidden.



In [3]:
import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["ANTHROPIC_API_KEY"] = userdata.get('ANTHROPIC_API_KEY')

In [4]:
from langchain_community.document_loaders.pdf import PyPDFLoader
from langchain_community.vectorstores.faiss import FAISS
from langchain_openai import OpenAIEmbeddings

In [11]:
splits = PyPDFLoader("./aesopagent.pdf").load_and_split()

In [12]:
vectorstore = FAISS.from_documents(splits, OpenAIEmbeddings())

In [13]:
from langchain.agents import tool

@tool
def aesop_agent_search(query: str) -> str:
    "Use this tool when answering questions about Aesop Agent"

    similar_docs = vectorstore.similarity_search(query)
    results_str = "\n\n".join(
        [doc.page_content for doc in similar_docs]
    )
    return results_str

tools = [aesop_agent_search]

In [52]:
print(
    aesop_agent_search.run(tool_input={"query": "What is Aesop Agent?"})
)

AesopAgent: Agent-driven Evolutionary System on
Story-to-Video Production
Jiuniu Wang∗Zehua Du∗Yuyuan Zhao∗Bo Yuan Kexiang Wang Jian Liang
Yaxi Zhao Yihen Lu Gengliang Li Junlong Gao Xin Tu Zhenyu Guo†
DAMO Academy, Alibaba Group
Abstract
The Agent and AIGC (Artificial Intelligence Generated Content) technologies have
recently made significant progress. We propose Aesop Agent, an Agent-driven
Evolutionary System on Story-to-Video Production. AesopAgent is a practical
application of agent technology for multimodal content generation. The system
integrates multiple generative capabilities within a unified framework, so that
individual users can leverage these modules easily. This innovative system would
convert user story proposals into scripts, images, and audio, and then integrate
these multimodal contents into videos. Additionally, the animating units (e.g.,
Gen-2 and Sora) could make the videos more infectious. The AesopAgent system
could orchestrate task workflow for video generatio

In [14]:
from langchain import hub

prompt = hub.pull("hwchase17/xml-agent-convo")
prompt

ChatPromptTemplate(input_variables=['agent_scratchpad', 'input', 'tools'], partial_variables={'chat_history': ''}, metadata={'lc_hub_owner': 'hwchase17', 'lc_hub_repo': 'xml-agent-convo', 'lc_hub_commit_hash': '00f6b7470fa25a24eef6e4e3c1e44ba07189f3e91c4d987223ad232490673be8'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['agent_scratchpad', 'chat_history', 'input', 'tools'], template="You are a helpful assistant. Help the user answer any questions.\n\nYou have access to the following tools:\n\n{tools}\n\nIn order to use a tool, you can use <tool></tool> and <tool_input></tool_input> tags. You will then get back a response in the form <observation></observation>\nFor example, if you have a tool called 'search' that could run a google search, in order to search for the weather in SF you would respond:\n\n<tool>search</tool><tool_input>weather in SF</tool_input>\n<observation>64 degrees</observation>\n\nWhen you are done, respond with a final answer between

In [15]:
from langchain_anthropic import ChatAnthropic

'''
Claude 3
- Opus: most powerful model, delivering state-of-the-art performance on highly complex tasks and demonstrating fluency and human-like understanding
- Sonnet: most balanced model between intelligence and speed, a great choice for enterprise workloads and scaled AI deployments
- Haiku: fastest and most compact model, designed for near-instant responsiveness and seamless AI experiences that mimic human interactions

'''
llm = ChatAnthropic(
    model_name="claude-3-sonnet-20240229",
    temperature=0.0
)

In [None]:
llm.invoke("Who developed Aesop Agent")

In [16]:
def convert_intermediate_steps(intermediate_steps):
    log = ""
    for action, observation in intermediate_steps:
        log += (
            f"<tool>{action.tool}</tool><tool_input>{action.tool_input}"
            f"</tool_input><observation>{observation}</observation>"
        )
    return log

In [17]:
def convert_tools(tools):
    return "\n".join([f"{tool.name}: {tool.description}" for tool in tools])

In [18]:
from langchain.agents.output_parsers import XMLAgentOutputParser

agent = (
    {
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "agent_scratchpad": lambda x: convert_intermediate_steps(
            x["intermediate_steps"]
        ),
    }
    | prompt.partial(tools=convert_tools(tools))
    | llm.bind(stop=["</tool_input>", "</final_answer>"])
    | XMLAgentOutputParser()
)

In [26]:
from langchain.chains.conversation.memory import ConversationBufferWindowMemory

# conversational memory
conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=5,
    return_messages=True
)

In [20]:
from langchain_core.messages.human import HumanMessage

def memory2str(memory: ConversationBufferWindowMemory):
    messages = memory.chat_memory.messages
    memory_list = [
        f"Human: {mem.content}" if isinstance(mem, HumanMessage) \
        else f"AI: {mem.content}" for mem in messages
    ]
    memory_str = "\n".join(memory_list)
    return memory_str

In [27]:
from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(
    agent=agent, tools=tools, return_intermediate_steps=True, verbose=True
)


In [22]:
def chat(text: str):
    output = agent_executor.invoke({
        "input": text,
        "chat_history": memory2str(conversational_memory)
    })
    conversational_memory.chat_memory.add_user_message(text)
    conversational_memory.chat_memory.add_ai_message(output["output"])

    return output

In [28]:
user_msg = "Who developed Aesop Agent?"
output = chat(user_msg)

answer = output["output"]



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo find information about who developed Aesop Agent, I will use the provided tool:

<tool>aesop_agent_search</tool>
<tool_input>Who developed Aesop Agent?[0m[36;1m[1;3mAesopAgent: Agent-driven Evolutionary System on
Story-to-Video Production
Jiuniu Wang∗Zehua Du∗Yuyuan Zhao∗Bo Yuan Kexiang Wang Jian Liang
Yaxi Zhao Yihen Lu Gengliang Li Junlong Gao Xin Tu Zhenyu Guo†
DAMO Academy, Alibaba Group
Abstract
The Agent and AIGC (Artificial Intelligence Generated Content) technologies have
recently made significant progress. We propose Aesop Agent, an Agent-driven
Evolutionary System on Story-to-Video Production. AesopAgent is a practical
application of agent technology for multimodal content generation. The system
integrates multiple generative capabilities within a unified framework, so that
individual users can leverage these modules easily. This innovative system would
convert user story proposals into scripts, images, and au

In [29]:
print(answer)


- Aesop Agent is an agent-driven evolutionary system for story-to-video production, proposed by researchers at DAMO Academy, Alibaba Group.

- It integrates multiple generative capabilities like script generation, image generation, and video assembly into a unified framework to convert user story proposals into videos.

- The system utilizes agent-based approaches, retrieval-augmented generation (RAG) techniques, and incorporates expert knowledge and experience to iteratively optimize the video generation workflow.

- Aesop Agent consists of a Horizontal Layer that manages the overall workflow, and a Utility Layer that provides specialized utilities for tasks like image composition, character consistency, style consistency etc.

- It was developed by researchers including Jiuniu Wang, Zehua Du, Yuyuan Zhao, Bo Yuan, Kexiang Wang, Jian Liang, Yaxi Zhao, Yihen Lu, Gengliang Li, Junlong Gao, Xin Tu, and Zhenyu Guo at DAMO Academy, Alibaba Group.



In [30]:
print(output['intermediate_steps'])

[(AgentAction(tool='aesop_agent_search', tool_input='Who developed Aesop Agent?', log='To find information about who developed Aesop Agent, I will use the provided tool:\n\n<tool>aesop_agent_search</tool>\n<tool_input>Who developed Aesop Agent?'), 'AesopAgent: Agent-driven Evolutionary System on\nStory-to-Video Production\nJiuniu Wang∗Zehua Du∗Yuyuan Zhao∗Bo Yuan Kexiang Wang Jian Liang\nYaxi Zhao Yihen Lu Gengliang Li Junlong Gao Xin Tu Zhenyu Guo†\nDAMO Academy, Alibaba Group\nAbstract\nThe Agent and AIGC (Artificial Intelligence Generated Content) technologies have\nrecently made significant progress. We propose Aesop Agent, an Agent-driven\nEvolutionary System on Story-to-Video Production. AesopAgent is a practical\napplication of agent technology for multimodal content generation. The system\nintegrates multiple generative capabilities within a unified framework, so that\nindividual users can leverage these modules easily. This innovative system would\nconvert user story proposals