# Example of langchain agents

in this notebook we test the agent functionality of Langchain:  A way to give an LLM different tools, for example:
- access to a vector store
- access to external APIs
- ability to browse websites

In [2]:
!pip install -r requirements.txt

Need to add the api keys using the %env command


We need to install playwright on the system. Playwright is a framework for web testing and automation. In our case it will be used for automation to browse websites using a headless chromium browser.

In [7]:
!playwright install

The `load()` function of AsyncChromiumLoader in langchain needs this nest asyncio to be enabled because we are using a jupyter notebook (nested events loop)

In [8]:
import nest_asyncio
nest_asyncio.apply()

## Using langchain's premade tools 
Langchain has a set of document loaders: 
- html
- csv
- document folder
- pdf
- markdown

We can use one of the html loader which is an instance of a headless chrome browser

We also use a document_transformer from Langchain. These are different tools for different filetypes with useful methods for each. In our case we will be using it to extract data from web pages so we use the BeautifulSoupTransformer

In [12]:
from langchain.document_loaders import AsyncChromiumLoader
from langchain.document_transformers import BeautifulSoupTransformer
import asyncio

## Test loading the contents of a page

In [25]:
loader = AsyncChromiumLoader(["https://www.wsj.com"])

html = loader.load()

## Transform html to text

In [20]:
from langchain.document_transformers import Html2TextTransformer

html2text = Html2TextTransformer()
docs_transformed = html2text.transform_documents(html)
docs_transformed[0].page_content[0:1000]

"Skip to Main ContentSkip to SearchSkip to... Select * Top News * The Future of\nEverything * Featured Stories * Science * Longevity & Retirement * Food &\nCooking * Arts & Entertainment * Video * Economy * Real Estate * Sports * CMO\n* CIO * CFO * Risk & Compliance * Logistics Report * Sustainable Business *\nHeard on the Street * Barron’s * MarketWatch * Mansion Global * Penta *\nOpinion * Journal Reports * Sponsored Offers Explore Our Brands * WSJ * * * *\n* Barron's * * * * * MarketWatch * * * * * IBD # The Wall Street Journal\nSubscribeSign In Intro Offer The Wall Street Journal Save on a WSJ Membership\nGain Trusted Insights on 2023’s Biggest Stories Become a WSJ Member Today\nSubscribe Now English Edition * English * 中文 (Chinese) * 日本語 (Japanese) Print\nEdition Video Audio Latest Headlines More Other Products from WSJ * Buy Side\nfrom WSJ * WSJ Shop * WSJ Wine # SubscribeSign In * World #### Topics * Africa\n* Americas * Asia * China * Europe * Middle East * India * Oceania * Ru

## Extract content from the web page

## Init the language model

using turbo-0613 we make sure the model invokes openai functions: after describing a desired schema output the model will output json in that format. The naming and properties are used to instruct the model

In [38]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")

## Create an extraction chain 

In [45]:
from langchain.chains import create_extraction_chain

schema = {
    "properties": {
        "news_article_title": {"type": "string"},
        "news_article_summary": {"type": "string"},
    },
    "required": ["news_article_title", "news_article_summary"],
}


def extract(content: str, schema: dict):
    return create_extraction_chain(schema=schema, llm=llm).run(content)



## Scrape and extract
This function puts together the scraping aspect done with playwright and the extraction done with the language model.
It chunks the extracted text and feeds it to the language model to extract the desired data from the page.
If the schema contains things that aren't easy it will hallucinate

In [110]:
import pprint
from langchain.text_splitter import RecursiveCharacterTextSplitter


def scrape_with_playwright(urls, schema, tags_to_extract=["span"]):
    """scrapes the urls in a provided list of urls, using a schema with properties each with their 
    type and required properties"""
    loader = AsyncChromiumLoader(urls)
    docs = loader.load()
    bs_transformer = BeautifulSoupTransformer()
    docs_transformed = bs_transformer.transform_documents(
        docs, tags_to_extract=tags_to_extract
    )
    print("Extracting content with LLM")

    # Grab the first 1000 tokens of the site
    splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=1000, chunk_overlap=0
    )
    splits = splitter.split_documents(docs_transformed)

    # Process the first split
    extracted_content = extract(schema=schema, content=splits[0].page_content)
    pprint.pprint(extracted_content)
    return extracted_content

In [121]:
urls = ["https://www.thefork.com/restaurants/strasbourg-c529135/pizzeria-t896"]
schema_res = {
    "properties": {
        "restaurant_name": {"type": "string"},
        "restaurant_address": {"type": "string"},
    },
    "required": ["restaurant_name", "restaurant_address"]
}
extracted_content_res = scrape_with_playwright(urls, schema=schema_res, tags_to_extract=["p", "h2"])

Extracting content with LLM


IndexError: list index out of range

In [118]:
urls = ["https://www.wsj.com"]
extracted_content = scrape_with_playwright(urls, schema=schema)

Extracting content with LLM
[{'news_article_link': 'https://www.wsj.com/articles/will-joe-manchin-run-for-president-democrats-fear-a-disaster-11638912000',
  'news_article_summary': 'The centrist’s decision to not pursue Senate '
                          're-election stirs speculation of a presidential bid '
                          'as groups like No Labels seek alternatives to Biden '
                          'and Trump.',
  'news_article_title': 'Will Joe Manchin Run for President? Democrats Fear a '
                        'Disaster.'},
 {'news_article_link': 'https://www.wsj.com/articles/how-democrats-are-leveraging-abortion-rights-to-win-in-2024-11638912001',
  'news_article_summary': 'The push to get abortion protections on the ballot '
                          'promises to galvanize voters and affect races for '
                          'the White House and control of Congress.',
  'news_article_title': 'How Democrats Are Leveraging Abortion Rights to Win '
               

In [91]:
schema_recipe = {
    "properties": {
        "recipe_name": {"type": "string"},
        "recipe_description": {"type": "string"},
        "recipe_ingredients":{"type":"string"},
    },
    "required": ["recipe_title", "recipe_description", "recipe_ingredients"],
}

urls_recipes = ["https://www.marieclaire.fr/cuisine/baeckeofe-d-adrienne,1191975.asp"]
extracted_content = scrape_with_playwright(urls_recipes, schema=schema_recipe, tags_to_extract=["span", "li", "p", "div", "h1", "h2", "h3"])

Extracting content with LLM
[{'recipe_description': 'Plat alsacien traditionnel à base de viandes et de '
                        'pommes de terre',
  'recipe_ingredients': "650 g d'épaule d'agneau désossée\n"
                        "650 g d'échine de porc désossée\n"
                        '650 g de paleron de bœuf\n'
                        '2 kg de pommes de terre à chair ferme\n'
                        "1 kg d'oignons\n"
                        '1 blanc de poireau\n'
                        '1 carotte\n'
                        "1 gousse d'ail\n"
                        '1 échalote\n'
                        '1/2 branche de céleri\n'
                        '2 brins de thym\n'
                        '1 feuille de laurier\n'
                        '5 baies de genièvre écrasées\n'
                        "1 bouteille de vin blanc d'Alsace\n"
                        '200 g de farine\n'
                        'sel\n'
                        'poivre en grains et moulu',
  'recipe_

This works very well granted we give the right tags to extract and the right website. If we want to create a recipebook by scraping all the recipes we usually prepare we can do so relatively easily.

One thing that can be improved in this is to give this function the ability to search google (remove the manual urls selection)

In [92]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models.openai import ChatOpenAI
from langchain.utilities import GoogleSearchAPIWrapper
from langchain.retrievers.web_research import WebResearchRetriever

In [93]:
# Vectorstore
vectorstore = Chroma(
    embedding_function=OpenAIEmbeddings(), persist_directory="./chroma_db_oai"
)

# LLM
llm = ChatOpenAI(temperature=0)

# Search
search = GoogleSearchAPIWrapper()

In [94]:
# Initialize
web_research_retriever = WebResearchRetriever.from_llm(
    vectorstore=vectorstore, llm=llm, search=search
)

In [73]:
# Run
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.web_research").setLevel(logging.INFO)
from langchain.chains import RetrievalQAWithSourcesChain

user_input = "How do LLM Powered Autonomous Agents work?"
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm, retriever=web_research_retriever
)
result = qa_chain({"question": user_input})
result

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'How do LLM Powered Autonomous Agents work?', 'text': LineList(lines=['1. What is the functioning principle of LLM Powered Autonomous Agents?\n', '2. How do LLM Powered Autonomous Agents operate?\n', '3. Can you explain the working mechanism of LLM Powered Autonomous Agents?'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. What is the functioning principle of LLM Powered Autonomous Agents?\n', '2. How do LLM Powered Autonomous Agents operate?\n', '3. Can you explain the working mechanism of LLM Powered Autonomous Agents?']
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:Search results: [{'title': 'Web scraping | 🦜️ Langchain', 'link': 'https://python.langchain.com

{'question': 'How do LLM Powered Autonomous Agents work?',
 'answer': "LLM-powered autonomous agents work by using LLM as the agent's brain, complemented by several key components such as planning, memory, and tool use. In terms of planning, the agent breaks down large tasks into smaller subgoals and can reflect and refine its actions based on past experiences. Memory is divided into short-term memory, which is used for in-context learning, and long-term memory, which allows the agent to retain and recall information over extended periods. Tool use involves the agent calling external APIs for additional information. These agents have been used in various applications, including scientific discovery and generative agents simulation.\n",
 'sources': 'https://python.langchain.com/docs/use_cases/web_scraping, https://lilianweng.github.io/posts/2023-06-23-agent/'}

In [95]:
user_input = "Ecris moi la liste d'ingredients pour le Baeckeoffe Alsacian"
result = qa_chain({"question": user_input})
result

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': "Ecris moi la liste d'ingredients pour le Baeckeoffe Alsacian", 'text': LineList(lines=['1. Quels sont les ingrédients nécessaires pour préparer le Baeckeoffe alsacien ?\n', '2. Quelle est la liste des ingrédients pour réaliser le Baeckeoffe alsacien ?\n', '3. Quels ingrédients faut-il utiliser pour faire le Baeckeoffe alsacien ?'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. Quels sont les ingrédients nécessaires pour préparer le Baeckeoffe alsacien ?\n', '2. Quelle est la liste des ingrédients pour réaliser le Baeckeoffe alsacien ?\n', '3. Quels ingrédients faut-il utiliser pour faire le Baeckeoffe alsacien ?']
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:S

{'question': "Ecris moi la liste d'ingredients pour le Baeckeoffe Alsacian",
 'answer': 'The ingredients for Baeckeoffe Alsacian are:\n- Echine de porc (pork shoulder) 800 g\n- Boe',
 'sources': ''}

In [None]:
user_input = "Ecris moi la liste d'ingredients pour le Baeckeoffe Alsacian"
result = qa_chain({"question": user_input})
result

In [117]:
tools = [
    Tool.from_function(
        name = "Search",
        func=qa_chain.__call__, 
        coroutine=qa_chain.acall,
        description="useful to find the url containing the answer to a factual question"
    ),
]
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
llm=ChatOpenAI(temperature=0)
agent_chain = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, memory=memory)
agent_chain.run(input="What are the ingredients in the alsacian baeckeoffe?")



[1m> Entering new AgentExecutor chain...[0m


INFO:langchain.retrievers.web_research:Generating questions for Google Search ...


[32;1m[1;3mI should search for the recipe of alsacian baeckeoffe to find the ingredients.
Action: Search
Action Input: Alsacian baeckeoffe recipe[0m

INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'Alsacian baeckeoffe recipe', 'text': LineList(lines=['1. "What is the traditional recipe for Alsacian baeckeoffe?"\n', '2. "How to make Alsacian baeckeoffe at home?"\n', '3. "Ingredients and cooking instructions for Alsacian baeckeoffe recipe?"'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. "What is the traditional recipe for Alsacian baeckeoffe?"\n', '2. "How to make Alsacian baeckeoffe at home?"\n', '3. "Ingredients and cooking instructions for Alsacian baeckeoffe recipe?"']
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:Search results: [{'title': "Baeckeoffe- Alsatian Stew - G'day Soufflé", 'link': 'https://gdaysouffle.com/baeckeoffe-alsatian-stew/', 'snippet': "Feb 10, 2021 ... Baeckeoffe means 'Baker's Oven' in the Alsatian dialect and


Observation: [36;1m[1;3m{'question': 'Alsacian baeckeoffe recipe', 'answer': 'Here is a recipe for', 'sources': ''}[0m
Thought:[32;1m[1;3mI now know the final answer
Final Answer: The ingredients in the Alsacian baeckeoffe are typically potatoes, onions, carrots, leeks, pork, beef, and lamb, all marinated in white wine and baked in a casserole dish.[0m

[1m> Finished chain.[0m


'The ingredients in the Alsacian baeckeoffe are typically potatoes, onions, carrots, leeks, pork, beef, and lamb, all marinated in white wine and baked in a casserole dish.'

In [97]:
agent_chain.run(input="Scrape a webpage that has the baeckeoffe recipe and find me related ones")



[1m> Entering new AgentExecutor chain...[0m


INFO:langchain.retrievers.web_research:Generating questions for Google Search ...


[32;1m[1;3m{
    "action": "Search",
    "action_input": "baeckeoffe recipe"
}[0m

INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'baeckeoffe recipe', 'text': LineList(lines=['1. "What is the traditional recipe for baeckeoffe?"\n', '2. "How to make baeckeoffe at home?"\n', '3. "Are there any variations of the baeckeoffe recipe?"'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. "What is the traditional recipe for baeckeoffe?"\n', '2. "How to make baeckeoffe at home?"\n', '3. "Are there any variations of the baeckeoffe recipe?"']
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:Search results: [{'title': "Baeckeoffe- Alsatian Stew - G'day Soufflé", 'link': 'https://gdaysouffle.com/baeckeoffe-alsatian-stew/', 'snippet': 'Feb 10, 2021 ... Baeckeoffe is a very hearty stew comprising three types of meat (beef, lamb, and pork), topped with potatoes and marinated in a dry white wi


Observation: [36;1m[1;3m{'question': 'baeckeoffe recipe', 'answer': 'Baeckeoffe is a traditional Alsatian dish. It is a mix of marinated mutton, beef, and pork, with mixed vegetables, marjoram, thyme, and parsley. The dish is typically sealed with a lid of bread dough and cooked slowly in the oven. It is believed to have originated from the practice of housewives bringing their prepared casserole to the village baker to cook in his oven while they attended church services. Baeckeoffe is a hearty stew that is often served with potatoes and is marinated in a dry white wine, preferably a Riesling. It is a popular dish in the Alsace-Lorraine region of eastern France. \n', 'sources': ''}[0m
Thought:[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "Baeckeoffe is a traditional Alsatian dish. It is a mix of marinated mutton, beef, and pork, with mixed vegetables, marjoram, thyme, and parsley. The dish is typically sealed with a lid of bread dough and cooked slowly in the ov

'Baeckeoffe is a traditional Alsatian dish. It is a mix of marinated mutton, beef, and pork, with mixed vegetables, marjoram, thyme, and parsley. The dish is typically sealed with a lid of bread dough and cooked slowly in the oven. It is believed to have originated from the practice of housewives bringing their prepared casserole to the village baker to cook in his oven while they attended church services. Baeckeoffe is a hearty stew that is often served with potatoes and is marinated in a dry white wine, preferably a Riesling. It is a popular dish in the Alsace-Lorraine region of eastern France.'