[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/generation/gpt4-retrieval-augmentation/gpt-4-langchain-docs.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/generation/gpt4-retrieval-augmentation/gpt-4-langchain-docs.ipynb)

# GPT4 with Retrieval Augmentation over LangChain Docs

[![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/fast-link.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/generation/gpt4-retrieval-augmentation/gpt-4-langchain-docs-fast.ipynb)

In this notebook we'll work through an example of using GPT-4 with retrieval augmentation to answer questions about the LangChain Python library.

In [1]:
!pip install -qU \
  tiktoken==0.4.0 \
  openai==0.27.7 \
  langchain==0.0.179 \
  "pinecone-client[grpc]"==2.2.1

---

🚨 _Note: the above `pip install` is formatted for Jupyter notebooks. If running elsewhere you may need to drop the `!`._

---

In this example, we will download the LangChain docs from [langchain.readthedocs.io/](https://langchain.readthedocs.io/latest/en/). We get all `.html` files located on the site like so:

In [2]:
!wget -r -A.html -P rtdocs https://api.python.langchain.com/en/latest/

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Length: unspecified [text/html]
Saving to: ‘rtdocs/api.python.langchain.com/en/stable/modules/llms.html’

api.python.langchai     [ <=>                ]   2.36M  --.-KB/s    in 0.008s  

2023-06-27 12:14:56 (312 MB/s) - ‘rtdocs/api.python.langchain.com/en/stable/modules/llms.html’ saved [2477496]

--2023-06-27 12:14:56--  https://api.python.langchain.com/en/stable/modules/retrievers.html
Reusing existing connection to api.python.langchain.com:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘rtdocs/api.python.langchain.com/en/stable/modules/retrievers.html’

api.python.langchai     [ <=>                ] 272.60K  --.-KB/s    in 0.001s  

2023-06-27 12:14:56 (343 MB/s) - ‘rtdocs/api.python.langchain.com/en/stable/modules/retrievers.html’ saved [279143]

--2023-06-27 12:14:56--  https://api.python.langchain.com/en/stable/modules/vectorstores.html
Reusing existing connection to a

This downloads all HTML into the `rtdocs` directory. Now we can use LangChain itself to process these docs. We do this using the `ReadTheDocsLoader` like so:

In [47]:
from langchain.document_loaders import ReadTheDocsLoader

loader = ReadTheDocsLoader('rtdocs')
docs = loader.load()
len(docs)



  _ = BeautifulSoup(


  soup = BeautifulSoup(data, **self.bs_kwargs)
  soup = BeautifulSoup(data, **self.bs_kwargs)


1021

This leaves us with `891` processed doc pages. Let's take a look at the format each one contains:

In [48]:
docs[20]



We access the plaintext page content like so:

In [49]:
print(docs[20].page_content)

Chains
Chains are easily reusable components which can be linked together.
class langchain.chains.APIChain(*, memory=None, callbacks=None, callback_manager=None, verbose=None, tags=None, api_request_chain, api_answer_chain, requests_wrapper, api_docs, question_key='question', output_key='output')[source]
Bases: langchain.chains.base.Chain
Chain that makes API calls and summarizes the responses to answer a question.
Parameters
memory (Optional[langchain.schema.BaseMemory]) – 
callbacks (Optional[Union[List[langchain.callbacks.base.BaseCallbackHandler], langchain.callbacks.base.BaseCallbackManager]]) – 
callback_manager (Optional[langchain.callbacks.base.BaseCallbackManager]) – 
verbose (bool) – 
tags (Optional[List[str]]) – 
api_request_chain (langchain.chains.llm.LLMChain) – 
api_answer_chain (langchain.chains.llm.LLMChain) – 
requests_wrapper (langchain.requests.TextRequestsWrapper) – 
api_docs (str) – 
question_key (str) – 
output_key (str) – 
Return type
None
attribute api_answer_

In [50]:
print(docs[35].page_content)

Source code for langchain.vectorstores.rocksetdb
"""Wrapper around Rockset vector database."""
from __future__ import annotations
import logging
from enum import Enum
from typing import Any, Iterable, List, Optional, Tuple
from langchain.docstore.document import Document
from langchain.embeddings.base import Embeddings
from langchain.vectorstores.base import VectorStore
logger = logging.getLogger(__name__)
[docs]class Rockset(VectorStore):
    """Wrapper arpund Rockset vector database.
    To use, you should have the `rockset` python package installed. Note that to use
    this, the collection being used must already exist in your Rockset instance.
    You must also ensure you use a Rockset ingest transformation to apply
    `VECTOR_ENFORCE` on the column being used to store `embedding_key` in the
    collection.
    See: https://rockset.com/blog/introducing-vector-search-on-rockset/ for more details
    Everything below assumes `commons` Rockset workspace.
    TODO: Add support for wo

We can also find the source of each document:

In [51]:
docs[35].metadata['source'].replace('rtdocs/', 'https://')

'https://api.python.langchain.com/en/latest/_modules/langchain/vectorstores/rocksetdb.html'

Now let's see how we can process all of these. We will chunk everything into ~500 token chunks, we can do this easily with `langchain` and `tiktoken`:

In [52]:
import tiktoken

tokenizer_name = tiktoken.encoding_for_model('gpt-4')
tokenizer_name.name

'cl100k_base'

In [53]:
tokenizer = tiktoken.get_encoding(tokenizer_name.name)

# create the length function
def tiktoken_len(text):
    tokens = tokenizer.encode(
        text,
        disallowed_special=()
    )
    return len(tokens)

In [54]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=20,
    length_function=tiktoken_len,
    separators=["\n\n", "\n", " ", ""]
)

Process the `docs` into more chunks using this approach.

In [55]:
from typing_extensions import Concatenate
from uuid import uuid4
from tqdm.auto import tqdm

chunks = []

for idx, page in enumerate(tqdm(docs)):
    content = page.page_content
    if len(content) > 100:
        url = page.metadata['source'].replace('rtdocs/', 'https://')
        if '/stable/' in url or ('/genindex.html' in url or '/index.html' in url):
            # this is not /latest/ docs or is index page, we don't want to include this
            continue
        texts = text_splitter.split_text(content)
        chunks.extend([{
            'id': str(uuid4()),
            'text': texts[i],
            'chunk': i,
            'url': url
        } for i in range(len(texts))])

  0%|          | 0/1021 [00:00<?, ?it/s]

Our chunks are ready so now we move onto embedding and indexing everything.

## Initialize Embedding Model

We use `text-embedding-ada-002` as the embedding model. We can embed text like so:

In [56]:
import os
import openai

# get API key from top-right dropdown on OpenAI website
openai.api_key = os.getenv("OPENAI_API_KEY") or "OPENAI_API_KEY"

openai.Engine.list()  # check we have authenticated

<OpenAIObject list at 0x7f15a27b61b0> JSON: {
  "data": [
    {
      "created": null,
      "id": "whisper-1",
      "object": "engine",
      "owner": "openai-internal",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "babbage",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "davinci",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "text-davinci-edit-001",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "babbage-code-search-code",
      "object": "engine",
      "owner": "openai-dev",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "text-similarity-babbage-001",
      "object": "engine",
      "owner"

In [57]:
embed_model = "text-embedding-ada-002"

res = openai.Embedding.create(
    input=[
        "Sample document text goes here",
        "there will be several phrases in each batch"
    ], engine=embed_model
)

In the response `res` we will find a JSON-like object containing our new embeddings within the `'data'` field.

In [58]:
res.keys()

dict_keys(['object', 'data', 'model', 'usage'])

Inside `'data'` we will find two records, one for each of the two sentences we just embedded. Each vector embedding contains `1536` dimensions (the output dimensionality of the `text-embedding-ada-002` model.

In [59]:
len(res['data'])

2

In [60]:
len(res['data'][0]['embedding']), len(res['data'][1]['embedding'])

(1536, 1536)

We will apply this same embedding logic to the langchain docs dataset we've just scraped. But before doing so we must create a place to store the embeddings.

## Initializing the Index

Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a [free API key](https://app.pinecone.io/) and enter it below where we will initialize our connection to Pinecone and create a new index.

In [61]:
import pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = os.getenv("PINECONE_API_KEY") or "PINECONE_API_KEY"
# find your environment next to the api key in pinecone console
env = os.getenv("PINECONE_ENVIRONMENT") or "PINECONE_ENVIRONMENT"

pinecone.init(api_key=api_key, environment=env)
pinecone.whoami()

WhoAmIResponse(username='c78f2bd', user_label='default', projectname='3947fb1')

In [62]:
index_name = 'gpt-4-langchain-docs'

In [64]:
import time

# check if index already exists (it shouldn't if this is first time)
if index_name not in pinecone.list_indexes():
    # if does not exist, create index
    pinecone.create_index(
        index_name,
        dimension=len(res['data'][0]['embedding']),
        metric='cosine'
    )
    # wait for index to be initialized
    time.sleep(1)

# connect to index
index = pinecone.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

We can see the index is currently empty with a `total_vector_count` of `0`. We can begin populating it with OpenAI `text-embedding-ada-002` built embeddings like so:

In [65]:
from tqdm.auto import tqdm

batch_size = 100  # how many embeddings we create and insert at once

for i in tqdm(range(0, len(chunks), batch_size)):
    # find end of batch
    i_end = min(len(chunks), i+batch_size)
    meta_batch = chunks[i:i_end]
    # get ids
    ids_batch = [x['id'] for x in meta_batch]
    # get texts to encode
    texts = [x['text'] for x in meta_batch]
    # create embeddings (try-except added to avoid RateLimitError)
    try:
        res = openai.Embedding.create(input=texts, engine=embed_model)
    except:
        done = False
        while not done:
            time.sleep(5)
            try:
                res = openai.Embedding.create(input=texts, engine=embed_model)
                done = True
            except:
                pass
    embeds = [record['embedding'] for record in res['data']]
    # cleanup metadata
    meta_batch = [{
        'text': x['text'],
        'chunk': x['chunk'],
        'url': x['url']
    } for x in meta_batch]
    to_upsert = list(zip(ids_batch, embeds, meta_batch))
    # upsert to Pinecone
    index.upsert(vectors=to_upsert)

  0%|          | 0/25 [00:00<?, ?it/s]

Now we've added all of our langchain docs to the index. With that we can move on to retrieval and then answer generation using GPT-4.

## Retrieval

To search through our documents we first need to create a query vector `xq`. Using `xq` we will retrieve the most relevant chunks from the LangChain docs, like so:

In [66]:
query = "how do I use the LLMChain in LangChain?"

res = openai.Embedding.create(
    input=[query],
    engine=embed_model
)

# retrieve from Pinecone
xq = res['data'][0]['embedding']

# get relevant contexts (including the questions)
res = index.query(xq, top_k=5, include_metadata=True)

In [67]:
res

{'matches': [{'id': '1c576e13-7f97-4109-96cf-b0c0b3626512',
              'metadata': {'chunk': 0.0,
                           'text': 'Models\uf0c1\n'
                                   'LangChain provides interfaces and '
                                   'integrations for a number of different '
                                   'types of models.\n'
                                   'LLMs\n'
                                   'Chat Models',
                           'url': 'https://api.python.langchain.com/en/latest/models.html'},
              'score': 0.827638745,
              'values': []},
             {'id': 'a352fb94-ce42-4707-ab27-351c484d5021',
              'metadata': {'chunk': 0.0,
                           'text': 'Data connection\uf0c1\n'
                                   'LangChain has a number of modules that '
                                   'help you load, structure, store, and '
                                   'retrieve documents.\n'
                 

With retrieval complete, we move on to feeding these into GPT-4 to produce answers.

## Retrieval Augmented Generation

GPT-4 is currently accessed via the `ChatCompletions` endpoint of OpenAI. To add the information we retrieved into the model, we need to pass it into our user prompts *alongside* our original query. We can do that like so:

In [68]:
# get list of retrieved text
contexts = [item['metadata']['text'] for item in res['matches']]

augmented_query = "\n\n---\n\n".join(contexts)+"\n\n-----\n\n"+query

In [69]:
print(augmented_query)

Models
LangChain provides interfaces and integrations for a number of different types of models.
LLMs
Chat Models

---

Data connection
LangChain has a number of modules that help you load, structure, store, and retrieve documents.
Document Loaders
Document Transformers
Embeddings
Vector Stores
Retrievers

---

Source code for langchain.chains.llm
"""Chain that just formats a prompt and calls an LLM."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, Sequence, Tuple, Union
from pydantic import Extra, Field
from langchain.base_language import BaseLanguageModel
from langchain.callbacks.manager import (
    AsyncCallbackManager,
    AsyncCallbackManagerForChainRun,
    CallbackManager,
    CallbackManagerForChainRun,
    Callbacks,
)
from langchain.chains.base import Chain
from langchain.input import get_colored_text
from langchain.load.dump import dumpd
from langchain.prompts.base import BasePromptTemplate
from langchain.prompts.prompt import PromptTemp

Now we ask the question:

In [70]:
# system message to 'prime' the model
primer = f"""You are Q&A bot. A highly intelligent system that answers
user questions based on the information provided by the user above
each question. If the information can not be found in the information
provided by the user you truthfully say "I don't know".
"""

res = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": primer},
        {"role": "user", "content": augmented_query}
    ]
)

To display this response nicely, we will display it in markdown.

In [71]:
from IPython.display import Markdown

display(Markdown(res['choices'][0]['message']['content']))

To use the LLMChain in LangChain, follow these steps:

1. Import the necessary modules:

```python
from langchain import LLMChain, OpenAI, PromptTemplate
```

2. Create a prompt template by specifying the input variables and template string:

```python
prompt_template = "Tell me a {adjective} joke"
prompt = PromptTemplate(
    input_variables=["adjective"], template=prompt_template
)
```

3. Create an instance of LLMChain by providing the language model (e.g. OpenAI) and the prompt:

```python
llm = LLMChain(llm=OpenAI(), prompt=prompt)
```

4. Run the LLMChain with the desired input:

```python
result = llm.run(adjective="funny")
```

5. Print the result:

```python
print(result)
```

Here's the complete example:

```python
from langchain import LLMChain, OpenAI, PromptTemplate

prompt_template = "Tell me a {adjective} joke"
prompt = PromptTemplate(
    input_variables=["adjective"], template=prompt_template
)
llm = LLMChain(llm=OpenAI(), prompt=prompt)

result = llm.run(adjective="funny")
print(result)
```

Replace `OpenAI()` with the appropriate language model instance for your use case.

Let's compare this to a non-augmented query...

In [72]:
res = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": primer},
        {"role": "user", "content": query}
    ]
)
display(Markdown(res['choices'][0]['message']['content']))

I don't know.

If we drop the `"I don't know"` part of the `primer`?

In [73]:
res = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are Q&A bot. A highly intelligent system that answers user questions"},
        {"role": "user", "content": query}
    ]
)
display(Markdown(res['choices'][0]['message']['content']))

LangChain is a language model developed by OpenAI, and currently, there isn't any documentation available mentioning "LLMChain" or any component named as such. It seems that the term "LLMChain" does not exist or may be a user-specific term.

In order to give you more helpful information or guide you through the process you're looking for, I might need more context or details on what you're trying to accomplish. Please provide more information or clarify your request, and I'll do my best to help you.

Then we see something even worse than `"I don't know"` — hallucinations. Clearly augmenting our queries with additional context can make a huge difference to the performance of our system.

Great, we've seen how to augment GPT-4 with semantic search to allow us to answer LangChain specific queries.

Once you're finished, we delete the index to save resources.

In [74]:
pinecone.delete_index(index_name)

---