In [None]:
%%capture
!pip install langchain==0.1.1 openai==1.10.0 langchainhub langchain-openai wikipedia duckduckgo-search

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

Enter your OpenAI API Key:··········


In [None]:
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass('Enter your LangSmith API key: ')

Enter your LangSmith API key: ··········


In [None]:
from uuid import uuid4

unique_id = uuid4().hex[0:8]

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"Managing_Agent_Prompt_Size_{unique_id}"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"

In [6]:
from operator import itemgetter

from langchain.agents import AgentExecutor, load_tools
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain_community.tools import WikipediaQueryRun, DuckDuckGoSearchRun
from langchain_community.utilities import WikipediaAPIWrapper, DuckDuckGoSearchAPIWrapper
from langchain_core.prompt_values import ChatPromptValue
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI

# Managing Prompt Length

- Agents dynamically use tools to gather information, adding the results to their prompts.

- This can lead to large prompts that exceed the model's context window.

- LCEL allows for custom functions to manage prompt size within an agent.

- Example: An agent that searches Wikipedia for information.

- It's important to trim the prompt by keeping only necessary information and removing the rest.

In [7]:
wiki = WikipediaQueryRun(
    api_wrapper=WikipediaAPIWrapper(top_k_results=10, doc_content_chars_max=10_000)
)

ddg_search = search = DuckDuckGoSearchRun(
    api_wrapper=DuckDuckGoSearchAPIWrapper(region="us-en", time="d", max_results=5)
)

tools = [ddg_search, wiki ]

In [8]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", """You are the world's greatest research assistant. You know exactly
        where and what to search for given a query. You've been a research assistant to
        people like Yann LeCun, Geoffry Hinton, and Francois Chollet."""),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

llm = ChatOpenAI(model="gpt-3.5-turbo")

In [9]:
agent = (
    {
        "input": itemgetter("input"),
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | llm.bind_functions(tools)
    | OpenAIFunctionsAgentOutputParser()
)

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [10]:
agent_executor.invoke(
    {
        "input": """Why does deep learning work? Is there some inherent property
        of a model's architecture that makes it capable of learning? Or is it more
        about the data it is trained on?  Are there any theorems or hypothesis that you can find that
        support this? Is there something special about how humans generate data?
        What philosophical implications does this have about why deep learning works.
        How will people in the future talk about what deep learning is?"""
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `duckduckgo_search` with `{'query': 'Why does deep learning work?'}`


[0m

  with DDGS() as ddgs:


[36;1m[1;3mDeep learning is a branch of machine learning that models high level abstractions in data by using a deep graph with many processing layers. According to the Universal approximation theorem , deep-ness isn't necessary for a neural network to be able to approximate arbitrary continuous functions. A phrenological mapping of the brain by Friedrich Eduard Bilz.Phrenology, a pseudoscience, was among the first attempts to correlate mental functions with specific parts of the brain.. The mind (adjective form: mental) is that which thinks, imagines, remembers, wills, and senses, or is the set of faculties responsible for such phenomena. The mind is also associated with experiencing perception ... Frank Vincent Zappa [nb 1] (December 21, 1940 - December 4, 1993) was an American musician, composer, and bandleader. His work is characterized by nonconformity free-form improvisation, sound experimentation, musical virtuosity of American culture. [2] English is either the official langu

  with DDGS() as ddgs:


[36;1m[1;3mQuantum mechanics is a fundamental theory in physics that describes the behavior of nature at and below the scale of atoms.: 1.1 It is the foundation of all quantum physics, which includes quantum chemistry, quantum field theory, quantum technology, and quantum information science. Quantum mechanics can describe many systems that classical physics cannot. Biography Youth and education House of birth in Brunswick (destroyed in World War II) Caricature of Abraham Gotthelf Kästner by Gauss (1795) Johann Carl Friedrich Gauss was born on 30 April 1777 in Brunswick (Braunschweig), in the Duchy of Brunswick-Wolfenbüttel (now part of Lower Saxony, Germany), to a family of lower social status. His father Gebhard Dietrich Gauss (1744-1808) worked in ... In physics, string theory is a theoretical framework in which the point-like particles of particle physics are replaced by one-dimensional objects called strings.String theory describes how these strings propagate through space and i

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens. However, your messages resulted in 4808 tokens (4688 in the messages, 120 in the functions). Please reduce the length of the messages or functions.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

In [11]:
def condense_prompt(prompt: ChatPromptValue, max_tokens: int = 4_000) -> ChatPromptValue:
    """
    Condenses the input prompt to ensure the total number of tokens does not exceed a specified limit.

    This function processes a ChatPromptValue object's messages to reduce the total token count to
    within a specified limit. It progressively removes messages from the beginning of the AI function-related
    messages until the total token count is under the specified limit, while always preserving the first
    two messages for essential context or instructions.

    Parameters:
    - prompt (ChatPromptValue): The input ChatPromptValue object containing a sequence of messages.
    - max_tokens (int, optional): The maximum number of tokens allowed for the condensed prompt. Defaults to 4,000.

    Returns:
    - ChatPromptValue: A new ChatPromptValue object with the condensed sequence of messages, ensuring
      the total token count is within the specified limit.

    Note:
    - This function is useful for scenarios where the prompt for an AI model exceeds the maximum token limit,
      allowing for the inclusion of necessary context while staying within token constraints.
    """
    messages = prompt.to_messages()
    num_tokens = llm.get_num_tokens_from_messages(messages)
    ai_function_messages = messages[2:]

    while num_tokens > max_tokens:
        ai_function_messages = ai_function_messages[2:]
        num_tokens = llm.get_num_tokens_from_messages(
            messages[:2] + ai_function_messages
        )

    messages = messages[:2] + ai_function_messages
    return ChatPromptValue(messages=messages)


In [12]:
agent = (
    {
        "input": itemgetter("input"),
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | condense_prompt
    | llm.bind_functions(tools)
    | OpenAIFunctionsAgentOutputParser()
)

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

agent_executor.invoke(
    {
        "input": """Why does deep learning work? Is there some inherent property
        of a model's architecture that makes it capable of learning? Or is it more
        about the data it is trained on?  Are there any theorems or hypothesis that you can find that
        support this? Is there something special about how humans generate data?
        What philosophical implications does this have about why deep learning works.
        How will people in the future talk about what deep learning is?"""
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `duckduckgo_search` with `{'query': 'Why does deep learning work?'}`


[0m

  with DDGS() as ddgs:


[36;1m[1;3mDeep learning is a branch of machine learning that models high level abstractions in data by using a deep graph with many processing layers. According to the Universal approximation theorem , deep-ness isn't necessary for a neural network to be able to approximate arbitrary continuous functions. A phrenological mapping of the brain by Friedrich Eduard Bilz.Phrenology, a pseudoscience, was among the first attempts to correlate mental functions with specific parts of the brain.. The mind (adjective form: mental) is that which thinks, imagines, remembers, wills, and senses, or is the set of faculties responsible for such phenomena. The mind is also associated with experiencing perception ... English is either the official language or one of the official languages in 59 sovereign states (such as in India, Ireland, and Canada). In some other countries, it is the sole or dominant language for historical reasons without being explicitly defined by law (such as in the United State

{'input': "Why does deep learning work? Is there some inherent property\n        of a model's architecture that makes it capable of learning? Or is it more\n        about the data it is trained on?  Are there any theorems or hypothesis that you can find that\n        support this? Is there something special about how humans generate data?\n        What philosophical implications does this have about why deep learning works.\n        How will people in the future talk about what deep learning is?",
 'output': "Deep learning works due to a combination of factors related to the model's architecture and the data it is trained on. One important aspect is the universal approximation theorem, which states that neural networks with a sufficient number of parameters can approximate any continuous function. This theorem"}