A reproduction from the video: "Building a Research Assistant from Scratch" (https://www.youtube.com/watch?v=DjuXACWYkkU).

#### Load env

In [13]:
import dotenv
dotenv.load_dotenv("../.env")

True

#### Step 1

The simplest example is to retrieve text from a webpage, and then summarize its main content to answer the question.

In [14]:
from langchain.chat_models import AzureChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
import requests
from bs4 import BeautifulSoup

template = """Summarize the following question based on the context:

Question: {question}

Context:

{context}"""

prompt = ChatPromptTemplate.from_template(template)

def scrape_text(url: str):
    # Send a GET request to the webpage
    try:
        response = requests.get(url)

        # Check if the request was successful
        if response.status_code == 200:
            # Parse the content of the request with BeautifulSoup
            soup = BeautifulSoup(response.text, "html.parser")

            # Extract all text from the webpage
            page_text = soup.get_text(separator=" ", strip=True)

            # Print the extracted text
            return page_text
        else:
            return f"Failed to retrieve the webpage: Status code {response.status_code}"
    except Exception as e:
        print(e)
        return f"Failed to retrieve the webpage: {e}"


url = "https://blog.langchain.dev/announcing-langsmith/"

page_content = scrape_text(url)[:10000]
llm = AzureChatOpenAI(azure_deployment="gpt-35-turbo-16k", openai_api_version="2023-08-01-preview", temperature=0)

chain = prompt | llm | StrOutputParser()

chain.invoke(
    {
        "question": "what is langsmith",
        "context": page_content
    }
)

# langsmith output: https://smith.langchain.com/public/7c48f66b-2ec3-4bfd-b043-d8e1baa12bfb/r


'The question is asking for an explanation of what LangSmith is.'

#### Step 2

Using the prompt from the gpt-research repository.

In [15]:
SUMMARY_TEMPLATE = """{text} 
-----------
Using the above text, answer in short the following question: 
> {question}
-----------
if the question cannot be answered using the text, imply summarize the text. Include all factual information, numbers, stats etc if available."""  # noqa: E501
SUMMARY_PROMPT = ChatPromptTemplate.from_template(SUMMARY_TEMPLATE)

chain = SUMMARY_PROMPT | llm | StrOutputParser()
chain.invoke(
    {
        "text": page_content,
        "question": "what is langsmith"
    }
)

# langsmith output: https://smith.langchain.com/public/403bcb28-e149-4d67-95cc-2efa825fe354/r

'LangSmith is a unified platform that helps developers debug, test, evaluate, and monitor their LLM (Large Language Model) applications. It aims to bridge the gap between prototype and production by addressing issues such as application performance. LangSmith provides visibility into model inputs and outputs, allows for experimentation with chains and prompt templates, helps identify and resolve issues like unexpected results or latency problems, and offers tools for dataset creation and evaluation. It also enables monitoring of system-level performance, model/chain performance, and user interaction with the application. LangSmith has been tested and used by various companies and organizations to improve their LLM-powered applications.'

#### Step 3

Refactor using RunnablePassthrough to accept a URL and question, and retrieve text from the URL to summarize and answer the question.

In [16]:
from langchain.schema.runnable import RunnablePassthrough

chain = RunnablePassthrough.assign(
    text = lambda x: scrape_text(x["url"])[:10000],
) | SUMMARY_PROMPT | llm | StrOutputParser()
chain.invoke(
    {
        "url": url,
        "question": "what is langsmith"
    }
)

# langsmith output: https://smith.langchain.com/public/24dc2e21-02ff-4fbb-98b9-3e4ea7da38b4/r

'LangSmith is a unified platform that helps developers debug, test, evaluate, and monitor their LLM (Large Language Model) applications. It aims to bridge the gap between prototype and production by providing tools and features to address common challenges faced by LLM application developers. LangSmith offers visibility into model inputs and outputs, allows for experimentation with chains and prompt templates, helps identify issues such as unexpected results or latency problems, and provides features for dataset creation, testing, and evaluation. It also enables monitoring of system-level performance, model/chain performance, and user interactions. LangSmith has been tested and used by various companies and organizations, including Snowflake, Boston Consulting Group, DeepLearningAI, Klarna, Mendable, Multi-On, and Quivr.'

#### Step 4

On DuckDuckGo, conduct a search based on a specific question and return a set number of links. This process has evolved to: for a given question, first search on DuckDuckGo to obtain the required number of links, then extract text from these links and compile the information to answer the question.


In [17]:
from langchain.utilities import DuckDuckGoSearchAPIWrapper
from langchain.schema.runnable import RunnablePassthrough

RESULTS_PER_QUESTION = 3

ddg_search = DuckDuckGoSearchAPIWrapper()

def web_search(query: str, num_results: int = RESULTS_PER_QUESTION):
    results = ddg_search.results(query, num_results)
    return [r["link"] for r in results]

scrape_and_summarize_chain = RunnablePassthrough.assign(
    text = lambda x: scrape_text(x["url"])[:10000],
) | SUMMARY_PROMPT | AzureChatOpenAI(azure_deployment="gpt-35-turbo-16k", openai_api_version="2023-08-01-preview") | StrOutputParser()

chain = RunnablePassthrough.assign(
    urls = lambda x: web_search(x["question"])
) | (lambda x: [{"question": x["question"], "url": u} for u in x["urls"]]) | scrape_and_summarize_chain.map()

chain.invoke(
    {
        "question": "what is langsmith"
    }
)

# langsmith output: https://smith.langchain.com/public/cf28125b-f920-49fa-a1e7-9d7bd991ac66/r

['LangSmith is a product developed by the creators of LangChain, a popular language model software tool. It aims to address the challenges of deploying and maintaining large language models (LLMs) in production. LangSmith focuses on providing features for reliability, such as debugging, testing, evaluating, monitoring, and usage metrics. It also offers a user-friendly UI to visualize and understand LLM workflows. While there may be potential competitors in the market, LangSmith aims to differentiate itself by integrating with other tools and providing value-add features. The author suggests that LangSmith should focus on extensibility to be more impactful and competitive in the long run.',
 'LangSmith is a platform that works in tandem with LangChain and other LLM frameworks. It is used for tracing and debugging LLM applications and intelligent agents. LangSmith allows you to debug, test, evaluate, and monitor chains and intelligent agents built on any LLM framework. It provides a visu

#### Step 5

Let LLM answer how to use what kind of alternative questions to search engine retrieval for a specific question.

In [18]:
import json

SEARCH_PROMPT = ChatPromptTemplate.from_messages(
    [
        (
            "user",
            "Write 3 google search queries to search online that form an "
            "objective opinion from the following: {question}\n"
            "You must respond with a list of strings in the following format: "
            '["query 1", "query 2", "query 3"].',
        ),
    ]
)

search_question_chain = SEARCH_PROMPT | llm | StrOutputParser() | json.loads

search_question_chain.invoke(
    {
        "question": "what is the difference between langsmith and langchain"
    }
)

# langsmith output: https://smith.langchain.com/public/061fe805-6e9c-4c8b-82c8-a4ca1c18fa14/r

['Langsmith vs Langchain differences',
 'Comparison between Langsmith and Langchain',
 'Langsmith and Langchain: contrasting features']

#### Step 6

In [19]:
web_search_chain = RunnablePassthrough.assign(
    urls = lambda x: web_search(x["question"])
) | (lambda x: [{"question": x["question"], "url": u} for u in x["urls"]]) | scrape_and_summarize_chain.map()

chain = search_question_chain | (lambda x: [{"question": q} for q in x]) | web_search_chain.map()

chain.invoke(
    {
        "question": "what is the difference between langsmith and langchain"
    }
)

# langsmith output: https://smith.langchain.com/public/eb43427c-5667-4ef0-933c-2d5447c08017/r

[['The given text does not mention anything about "Langsmith." Therefore, there is no information available to compare Langsmith and LangChain.',
  "LangSmith is a new product created by the team behind LangChain, a popular software tool for large language models (LLMs). LangChain was initially designed for building prototypes, while LangSmith focuses on helping developers bring LLM applications into production in a reliable and maintainable way. LangSmith addresses challenges related to reliability by providing features in the areas of debugging, testing, evaluating, monitoring, and usage metrics. Its intuitive UI reduces the barrier to entry for developers without a software background and allows for visualizations of the LLM system's processes. While there are no direct competitors, organizations like Vercel and embeddings providers may develop similar tooling in the future. LangSmith aims to connect with various tools and integrate with platforms such as OpenAI evals and fine-tunin

#### Step 7

In [20]:

full_research_chain = search_question_chain | (lambda x: [{"question": q} for q in x]) | web_search_chain.map()

WRITER_SYSTEM_PROMPT = "You are an AI critical thinker research assistant. Your sole purpose is to write well written, critically acclaimed, objective and structured reports on given text."  # noqa: E501


# Report prompts from https://github.com/assafelovic/gpt-researcher/blob/master/gpt_researcher/master/prompts.py
RESEARCH_REPORT_TEMPLATE = """Information:
--------
{research_summary}
--------
Using the above information, answer the following question or topic: "{question}" in a detailed report -- \
The report should focus on the answer to the question, should be well structured, informative, \
in depth, with facts and numbers if available and a minimum of 1,200 words.
You should strive to write the report as long as you can using all relevant and necessary information provided.
You must write the report with markdown syntax.
You MUST determine your own concrete and valid opinion based on the given information. Do NOT deter to general and meaningless conclusions.
Write all used source urls at the end of the report, and make sure to not add duplicated sources, but only one reference for each.
You must write the report in apa format.
Please do your best, this is very important to my career."""  # noqa: E501

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", WRITER_SYSTEM_PROMPT),
        ("user", RESEARCH_REPORT_TEMPLATE),
    ]
)

def collapse_list_of_lists(list_of_lists):
    content = []
    for l in list_of_lists:
        content.append("\n\n".join(l))
    return "\n\n".join(content)

chain = RunnablePassthrough.assign(
    research_summary= full_research_chain | collapse_list_of_lists
) | prompt | llm | StrOutputParser()

chain.invoke(
    {
        "question": "what is the difference between langsmith and langchain"
    }
)

# langsmith output: https://smith.langchain.com/public/6e104657-b6e9-4ce9-8634-2bf350301517/r

"# Report: The Difference Between LangSmith and LangChain\n\n## Introduction\n\nLangSmith and LangChain are two AI frameworks developed by the same team. While both tools are used for testing and evaluating language models and AI applications, they have distinct features and purposes. This report aims to provide a detailed analysis of the differences between LangSmith and LangChain based on the information provided.\n\n## LangSmith: A Framework for Production-Grade Applications\n\nLangSmith is a new product created by the team behind LangChain. It is specifically designed to address the challenges of getting language model (LLM) applications into production in a reliable and maintainable way. Unlike LangChain, which focuses on prototyping, LangSmith aims to provide features that assist developers in building production-grade LLM applications.\n\n### Features of LangSmith\n\nLangSmith offers a range of features that are essential for developing and managing LLM applications. These featu

#### Step 8

The reply above cannot retrieve the URL of the quoted text because it wasn't used during the initial capture.

In [21]:
scrape_and_summarize_chain = RunnablePassthrough.assign(
    summary = RunnablePassthrough.assign(
    text = lambda x: scrape_text(x["url"])[:10000],
) | SUMMARY_PROMPT | llm | StrOutputParser()) | (lambda x: f"URL: {x['url']}\n\nSUMMARY: {x['summary']}")

chain.invoke(
    {
        "question": "what is the difference between langsmith and langchain"
    }
)

# langsmith output: https://smith.langchain.com/public/e37286b3-500f-48e5-b245-369283b9a94c/r

"# Report: The Difference Between LangSmith and LangChain\n\n## Introduction\n\nThe purpose of this report is to analyze and compare the differences between LangSmith and LangChain based on the information provided. LangSmith and LangChain are both products developed by the same team, but they serve different purposes in the context of large language models (LLMs) and AI applications. This report will provide an in-depth analysis of the features, functionalities, and target audiences of both LangSmith and LangChain.\n\n## LangSmith: A Reliable and Maintainable LLM Production Platform\n\nLangSmith is a new product developed by the creators of LangChain. While LangChain focuses on reducing the barrier to entry for building prototypes with LLMs, LangSmith aims to help LLM applications move into production in a reliable and maintainable way. It addresses challenges related to reliability by providing features for debugging, testing, evaluating, monitoring, and usage metrics. LangSmith also

#### Step 9

Run a fastapi by Langserve.

In [None]:
! pip install sse_starlette
from fastapi import FastAPI
from langserve import add_routes


app = FastAPI(
  title="LangChain Server",
  version="1.0",
  description="A simple api server using Langchain's Runnable interfaces",
)

add_routes(
    app,
    chain,
    path="/research-assistant",
)


if __name__ == "__main__":
    import nest_asyncio
    nest_asyncio.apply()

    import uvicorn
    uvicorn.run(app, host="localhost", port=8000)

# open browser to http://localhost:8000/research-assistant/playground/ and play