# How to do retrieval

Retrieval is a common technique chatbots use to augment their responses with data outside a chat model’s training data. This section will cover how to implement retrieval in the context of chatbots, but it’s worth noting that retrieval is a very subtle and deep topic.

## Setup

You’ll need to install a few packages, and set any LLM API keys:

```{=mdx}
import Npm2Yarn from "@theme/Npm2Yarn";

<Npm2Yarn>
  @langchain/core @langchain/openai cheerio
</Npm2Yarn>
```

Let’s also set up a chat model that we’ll use for the below examples.

```{=mdx}
import ChatModelTabs from "@theme/ChatModelTabs";

<ChatModelTabs />
```

## Creating a retriever

We’ll use [the LangSmith documentation](https://docs.smith.langchain.com) as source material and store the content in a vectorstore for later retrieval. Note that this example will gloss over some of the specifics around parsing and storing a data source - you can see more [in-depth documentation on creating retrieval systems here](/docs/use_cases/question_answering/).

Let’s use a document loader to pull text from the docs:

In [2]:
import "cheerio";

[Module: null prototype] {
  contains: [36m[Function: contains][39m,
  default: [Function: initialize] {
    contains: [36m[Function: contains][39m,
    html: [36m[Function: html][39m,
    merge: [36m[Function: merge][39m,
    parseHTML: [36m[Function: parseHTML][39m,
    root: [36m[Function: root][39m,
    text: [36m[Function: text][39m,
    xml: [36m[Function: xml][39m,
    load: [36m[Function: load][39m,
    _root: Document {
      parent: [1mnull[22m,
      prev: [1mnull[22m,
      next: [1mnull[22m,
      startIndex: [1mnull[22m,
      endIndex: [1mnull[22m,
      children: [],
      type: [32m"root"[39m
    },
    _options: { xml: [33mfalse[39m, decodeEntities: [33mtrue[39m },
    fn: Cheerio {}
  },
  html: [36m[Function: html][39m,
  load: [36m[Function: load][39m,
  merge: [36m[Function: merge][39m,
  parseHTML: [36m[Function: parseHTML][39m,
  root: [36m[Function: root][39m,
  text: [36m[Function: text][39m,
  xml: [36m[Function:

In [3]:
import { CheerioWebBaseLoader } from "langchain/document_loaders/web/cheerio";

const loader = new CheerioWebBaseLoader(
  "https://docs.smith.langchain.com/user_guide"
);

const rawDocs = await loader.load();

Next, we split it into smaller chunks that the LLM’s context window can handle and store it in a vector database:

In [4]:
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 500,
  chunkOverlap: 0,
});

const allSplits = await textSplitter.splitDocuments(rawDocs);

Then we embed and store those chunks in a vector database:

In [5]:
import { OpenAIEmbeddings } from "@langchain/openai";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

const vectorstore = await MemoryVectorStore.fromDocuments(
  allSplits,
  new OpenAIEmbeddings()
);

And finally, let’s create a retriever from our initialized vectorstore:

In [6]:
const retriever = vectorstore.asRetriever(4);

const docs = await retriever.invoke("how can langsmith help with testing?");

console.log(docs);

[
  Document {
    pageContent: "These test cases can be uploaded in bulk, created on the fly, or exported from application traces. L"... 294 more characters,
    metadata: {
      source: "https://docs.smith.langchain.com/user_guide",
      loc: { lines: { from: 7, to: 7 } }
    }
  },
  Document {
    pageContent: "We provide native rendering of chat messages, functions, and retrieve documents.Initial Test Set​Whi"... 347 more characters,
    metadata: {
      source: "https://docs.smith.langchain.com/user_guide",
      loc: { lines: { from: 6, to: 6 } }
    }
  },
  Document {
    pageContent: "will help in curation of test cases that can help track regressions/improvements and development of "... 393 more characters,
    metadata: {
      source: "https://docs.smith.langchain.com/user_guide",
      loc: { lines: { from: 11, to: 11 } }
    }
  },
  Document {
    pageContent: "that time period — this is especially handy for debugging production issues.LangSmith also allows fo"... 39

We can see that invoking the retriever above results in some parts of the LangSmith docs that contain information about testing that our chatbot can use as context when answering questions. And now we’ve got a retriever that can return related data from the LangSmith docs!

## Document chains

Now that we have a retriever that can return LangChain docs, let’s create a chain that can use them as context to answer questions. We’ll use a `createStuffDocumentsChain` helper function to "stuff" all of the input documents into the prompt. It will also handle formatting the docs as strings.

In addition to a chat model, the function also expects a prompt that has a `context` variable, as well as a placeholder for chat history messages named `messages`. We’ll create an appropriate prompt and pass it as shown below:

In [7]:
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import {
  ChatPromptTemplate,
  MessagesPlaceholder,
} from "@langchain/core/prompts";

const SYSTEM_TEMPLATE = `Answer the user's questions based on the below context. 
If the context doesn't contain any relevant information to the question, don't make something up and just say "I don't know":

<context>
{context}
</context>
`;

const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
  ["system", SYSTEM_TEMPLATE],
  new MessagesPlaceholder("messages"),
]);

const documentChain = await createStuffDocumentsChain({
  llm,
  prompt: questionAnsweringPrompt,
});

We can invoke this `documentChain` by itself to answer questions. Let’s use the docs we retrieved above and the same question, `how can langsmith help with testing?`:

In [8]:
import { HumanMessage, AIMessage } from "@langchain/core/messages";

await documentChain.invoke({
  messages: [new HumanMessage("Can LangSmith help test my LLM applications?")],
  context: docs,
});

[32m"Yes, LangSmith allows developers to create datasets for their LLM applications and run tests using t"[39m... 128 more characters

Looks good! For comparison, we can try it with no context docs and compare the result:

In [9]:
await documentChain.invoke({
  messages: [new HumanMessage("Can LangSmith help test my LLM applications?")],
  context: [],
});

[32m"I don't know."[39m

We can see that the LLM does not return any results.

## Retrieval chains

Let’s combine this document chain with the retriever. Here’s one way this can look:

In [10]:
import type { BaseMessage } from "@langchain/core/messages";
import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";

const parseRetrieverInput = (params: { messages: BaseMessage[] }) => {
  return params.messages[params.messages.length - 1].content;
};

const retrievalChain = RunnablePassthrough.assign({
  context: RunnableSequence.from([parseRetrieverInput, retriever]),
}).assign({
  answer: documentChain,
});

Given a list of input messages, we extract the content of the last message in the list and pass that to the retriever to fetch some documents. Then, we pass those documents as context to our document chain to generate a final response.

Invoking this chain combines both steps outlined above:

In [11]:
await retrievalChain.invoke({
  messages: [new HumanMessage("Can LangSmith help test my LLM applications?")],
});

{
  messages: [
    HumanMessage {
      lc_serializable: [33mtrue[39m,
      lc_kwargs: {
        content: [32m"Can LangSmith help test my LLM applications?"[39m,
        additional_kwargs: {},
        response_metadata: {}
      },
      lc_namespace: [ [32m"langchain_core"[39m, [32m"messages"[39m ],
      content: [32m"Can LangSmith help test my LLM applications?"[39m,
      name: [90mundefined[39m,
      additional_kwargs: {},
      response_metadata: {}
    }
  ],
  context: [
    Document {
      pageContent: [32m"These test cases can be uploaded in bulk, created on the fly, or exported from application traces. L"[39m... 294 more characters,
      metadata: {
        source: [32m"https://docs.smith.langchain.com/user_guide"[39m,
        loc: { lines: [36m[Object][39m }
      }
    },
    Document {
      pageContent: [32m"this guide, we’ll highlight the breadth of workflows LangSmith supports and how they fit into each s"[39m... 343 more characters,
      meta

Looks good!

## Query transformation

Our retrieval chain is capable of answering questions about LangSmith, but there’s a problem - chatbots interact with users conversationally, and therefore have to deal with followup questions.

The chain in its current form will struggle with this. Consider a followup question to our original question like `Tell me more!`. If we invoke our retriever with that query directly, we get documents irrelevant to LLM application testing:

In [12]:
await retriever.invoke("Tell me more!");

[
  Document {
    pageContent: [32m"Oftentimes, changes in the prompt, retrieval strategy, or model choice can have huge implications in"[39m... 40 more characters,
    metadata: {
      source: [32m"https://docs.smith.langchain.com/user_guide"[39m,
      loc: { lines: { from: [33m8[39m, to: [33m8[39m } }
    }
  },
  Document {
    pageContent: [32m"This allows you to quickly test out different prompts and models. You can open the playground from a"[39m... 37 more characters,
    metadata: {
      source: [32m"https://docs.smith.langchain.com/user_guide"[39m,
      loc: { lines: { from: [33m10[39m, to: [33m10[39m } }
    }
  },
  Document {
    pageContent: [32m"We provide native rendering of chat messages, functions, and retrieve documents.Initial Test Set​Whi"[39m... 347 more characters,
    metadata: {
      source: [32m"https://docs.smith.langchain.com/user_guide"[39m,
      loc: { lines: { from: [33m6[39m, to: [33m6[39m } }
    }
  },
  Document {
    pag

This is because the retriever has no innate concept of state, and will only pull documents most similar to the query given. To solve this, we can transform the query into a standalone query without any external references an LLM.

Here’s an example:

In [14]:
const queryTransformPrompt = ChatPromptTemplate.fromMessages([
  new MessagesPlaceholder("messages"),
  [
    "user",
    "Given the above conversation, generate a search query to look up in order to get information relevant to the conversation. Only respond with the query, nothing else.",
  ],
]);

const queryTransformationChain = queryTransformPrompt.pipe(llm);

await queryTransformationChain.invoke({
  messages: [
    new HumanMessage("Can LangSmith help test my LLM applications?"),
    new AIMessage(
      "Yes, LangSmith can help test and evaluate your LLM applications. It allows you to quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs. Additionally, LangSmith can be used to monitor your application, log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise."
    ),
    new HumanMessage("Tell me more!"),
  ],
});

AIMessage {
  lc_serializable: [33mtrue[39m,
  lc_kwargs: {
    content: [32m"How can LangSmith help test LLM applications?"[39m,
    tool_calls: [],
    invalid_tool_calls: [],
    additional_kwargs: { function_call: [90mundefined[39m, tool_calls: [90mundefined[39m },
    response_metadata: {}
  },
  lc_namespace: [ [32m"langchain_core"[39m, [32m"messages"[39m ],
  content: [32m"How can LangSmith help test LLM applications?"[39m,
  name: [90mundefined[39m,
  additional_kwargs: { function_call: [90mundefined[39m, tool_calls: [90mundefined[39m },
  response_metadata: {
    tokenUsage: { completionTokens: [33m10[39m, promptTokens: [33m145[39m, totalTokens: [33m155[39m },
    finish_reason: [32m"stop"[39m
  },
  tool_calls: [],
  invalid_tool_calls: []
}

Awesome! That transformed query would pull up context documents related to LLM application testing.

Let’s add this to our retrieval chain. We can wrap our retriever as follows:

In [16]:
import { RunnableBranch } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";

const queryTransformingRetrieverChain = RunnableBranch.from([
  [
    (params: { messages: BaseMessage[] }) => params.messages.length === 1,
    RunnableSequence.from([parseRetrieverInput, retriever]),
  ],
  queryTransformPrompt
    .pipe(llm)
    .pipe(new StringOutputParser())
    .pipe(retriever),
]).withConfig({ runName: "chat_retriever_chain" });

Then, we can use this query transformation chain to make our retrieval chain better able to handle such followup questions:


In [17]:
const conversationalRetrievalChain = RunnablePassthrough.assign({
  context: queryTransformingRetrieverChain,
}).assign({
  answer: documentChain,
});

Awesome! Let’s invoke this new chain with the same inputs as earlier:


In [18]:
await conversationalRetrievalChain.invoke({
  messages: [new HumanMessage("Can LangSmith help test my LLM applications?")],
});

{
  messages: [
    HumanMessage {
      lc_serializable: [33mtrue[39m,
      lc_kwargs: {
        content: [32m"Can LangSmith help test my LLM applications?"[39m,
        additional_kwargs: {},
        response_metadata: {}
      },
      lc_namespace: [ [32m"langchain_core"[39m, [32m"messages"[39m ],
      content: [32m"Can LangSmith help test my LLM applications?"[39m,
      name: [90mundefined[39m,
      additional_kwargs: {},
      response_metadata: {}
    }
  ],
  context: [
    Document {
      pageContent: [32m"These test cases can be uploaded in bulk, created on the fly, or exported from application traces. L"[39m... 294 more characters,
      metadata: {
        source: [32m"https://docs.smith.langchain.com/user_guide"[39m,
        loc: { lines: [36m[Object][39m }
      }
    },
    Document {
      pageContent: [32m"this guide, we’ll highlight the breadth of workflows LangSmith supports and how they fit into each s"[39m... 343 more characters,
      meta

In [19]:
await conversationalRetrievalChain.invoke({
  messages: [
    new HumanMessage("Can LangSmith help test my LLM applications?"),
    new AIMessage(
      "Yes, LangSmith can help test and evaluate your LLM applications. It allows you to quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs. Additionally, LangSmith can be used to monitor your application, log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise."
    ),
    new HumanMessage("Tell me more!"),
  ],
});

{
  messages: [
    HumanMessage {
      lc_serializable: [33mtrue[39m,
      lc_kwargs: {
        content: [32m"Can LangSmith help test my LLM applications?"[39m,
        additional_kwargs: {},
        response_metadata: {}
      },
      lc_namespace: [ [32m"langchain_core"[39m, [32m"messages"[39m ],
      content: [32m"Can LangSmith help test my LLM applications?"[39m,
      name: [90mundefined[39m,
      additional_kwargs: {},
      response_metadata: {}
    },
    AIMessage {
      lc_serializable: [33mtrue[39m,
      lc_kwargs: {
        content: [32m"Yes, LangSmith can help test and evaluate your LLM applications. It allows you to quickly edit examp"[39m... 317 more characters,
        tool_calls: [],
        invalid_tool_calls: [],
        additional_kwargs: {},
        response_metadata: {}
      },
      lc_namespace: [ [32m"langchain_core"[39m, [32m"messages"[39m ],
      content: [32m"Yes, LangSmith can help test and evaluate your LLM applications. It a

You can check out [this LangSmith trace](https://smith.langchain.com/public/dc4d6bd4-fea5-45df-be94-06ad18882ae9/r) to see the internal query transformation step for yourself.

## Streaming

Because this chain is constructed with LCEL, you can use familiar methods like `.stream()` with it:

In [20]:
const stream = await conversationalRetrievalChain.stream({
  messages: [
    new HumanMessage("Can LangSmith help test my LLM applications?"),
    new AIMessage(
      "Yes, LangSmith can help test and evaluate your LLM applications. It allows you to quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs. Additionally, LangSmith can be used to monitor your application, log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise."
    ),
    new HumanMessage("Tell me more!"),
  ],
});

for await (const chunk of stream) {
  console.log(chunk);
}

{
  messages: [
    HumanMessage {
      lc_serializable: true,
      lc_kwargs: {
        content: "Can LangSmith help test my LLM applications?",
        additional_kwargs: {},
        response_metadata: {}
      },
      lc_namespace: [ "langchain_core", "messages" ],
      content: "Can LangSmith help test my LLM applications?",
      name: undefined,
      additional_kwargs: {},
      response_metadata: {}
    },
    AIMessage {
      lc_serializable: true,
      lc_kwargs: {
        content: "Yes, LangSmith can help test and evaluate your LLM applications. It allows you to quickly edit examp"... 317 more characters,
        tool_calls: [],
        invalid_tool_calls: [],
        additional_kwargs: {},
        response_metadata: {}
      },
      lc_namespace: [ "langchain_core", "messages" ],
      content: "Yes, LangSmith can help test and evaluate your LLM applications. It allows you to quickly edit examp"... 317 more characters,
      name: undefined,
      additional_kwargs: 

## Further reading

This guide only scratches the surface of retrieval techniques. For more on different ways of ingesting, preparing, and retrieving the most relevant data, check out [this section](/docs/modules/data_connection/) of the docs.
