[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/gpt-4-langchain-docs.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/gpt-4-langchain-docs.ipynb)

# GPT4 with Retrieval Augmentation over LangChain Docs

In this notebook we'll work through an example of using GPT-4 with retrieval augmentation to answer questions about the LangChain Python library.

[![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/full-link.svg)](https://github.com/pinecone-io/examples/blob/master/learn/generation/openai/gpt-4-langchain-docs.ipynb)

To begin we must install the prerequisite libraries:

In [3]:
!pip install -qU \
    openai==1.66.3 \
    pinecone==5.4.2 \
    pinecone-datasets==1.0.2 \
    pinecone-notebooks==0.1.1 \
    tqdm

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m567.4/567.4 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m427.3/427.3 kB[0m [31m14.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.5/87.5 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.3/78.3 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.4/13.4 MB[0m [31m62.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [29]:
from google.colab import userdata
from pinecone_datasets import load_dataset
import openai
from openai import OpenAI
import os
from pinecone import Pinecone

In [31]:
openai_api_key = userdata.get('OPENAI_API_KEY')
# Instantiate the OpenAI client

index_name = 'gpt-4-langchain-docs-fast'
dimensions = 3072
# Configure client
pc = Pinecone(api_key=api_key)

---

🚨 _Note: the above `pip install` is formatted for Jupyter notebooks. If running elsewhere you may need to drop the `!`._

---

In this example, we will use a pre-embedding dataset of the LangChain docs from [python.langchain.readthedocs.com/](https://python.langchain.com/en/latest/). If you'd like to see how we perform the data preparation refer to [this notebook]().

The embeddings were produced with OpenAI's `text-embedding-ada-002` model which outputs embeddings with dimension `1536`.

Let's go ahead and download the dataset.

In [5]:
dataset = load_dataset('langchain-python-docs-text-embedding-ada-002')

In [32]:
df = dataset.documents.copy()[0:200]
model="text-embedding-ada-002"
model="text-embedding-3-large"


In [33]:
# Function to generate embeddings
def generate_embeddings(text):

# Assuming you have your OpenAI API key set as an environment variable
  client = OpenAI(api_key=openai_api_key)
  #openai.api_key = userdata.get("OPENAI_API_KEY")
  # response = openai.Embedding.create(
  #   input=text,
  #     model=model
  # )
  response = client.embeddings.create(
      input=text,
      model=model
  )

  #return response['data'][0]['embedding']
  return response


In [None]:
x=generate_embeddings("how do i make chimichangas?")
#print(x['data'][0]['Embedding'])
x.data[0].embedding

In [35]:
print(df3.values[:2])

[['417ede5d-39be-498f-b518-f47ed4e53b90'
  array([ 0.00594974,  0.01983248, -0.00384342, ..., -0.00060922,
         -0.01963556, -0.04669775])                               None
  None
  {'chunk': 0, 'text': '.rst\n.pdf\nWelcome to LangChain\n Contents \nGetting Started\nModules\nUse Cases\nReference Docs\nEcosystem\nAdditional Resources\nWelcome to LangChain#\nLangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model, but will also be:\nData-aware: connect a language model to other sources of data\nAgentic: allow a language model to interact with its environment\nThe LangChain framework is designed around these principles.\nThis is the Python specific portion of the documentation. For a purely conceptual guide to LangChain, see here. For the JavaScript documentation, see here.\nGetting Started#\nHow to get started using LangChain to create an Language Mod

In [None]:

# Iterate through the dataset and embed the metadata 'text'
for i in range(len(df.values)):
    text_value = df.metadata[i]['text']
    embedding = generate_embeddings(text_value)
    df.values[i] = embedding

# You can then use dataset2 as your embedded dataset.
# For example, you could save it to a new file or upload it to a vector database.

# Example: print the first 5 embeddings
print(df.values[:5])

In [11]:
# We drop the sparse_values column since it is not needed in this demo
df.drop(['sparse_values'], axis=1, inplace=True)
# We rename the blob column to metadata
df.drop(['metadata'], axis=1, inplace=True)
df.rename(columns={'blob': 'metadata'}, inplace=True)

dataset.head()

Unnamed: 0,id,values,metadata
0,417ede5d-39be-498f-b518-f47ed4e53b90,"[0.005949743557721376, 0.01983247883617878, -0...","{'chunk': 0, 'text': '.rst .pdf Welcome to Lan..."
1,110f550d-110b-4378-b95e-141397fa21bc,"[0.009401749819517136, 0.02443608082830906, 0....","{'chunk': 1, 'text': 'Use Cases# Best practice..."
2,d5f00f02-3295-4567-b297-5e3262dc2728,"[-0.005517194513231516, 0.0208403542637825, 0....","{'chunk': 2, 'text': 'Gallery: A collection of..."
3,0b6fe3c6-1f0e-4608-a950-43231e46b08a,"[-0.006499645300209522, 0.0011573900701478124,...","{'chunk': 0, 'text': 'Search Error Please acti..."
4,39d5f15f-b973-42c0-8c9b-a2df49b627dc,"[-0.005658374633640051, 0.00817849114537239, 0...","{'chunk': 0, 'text': '.md .pdf Dependents Depe..."


Let's take a look at what sort of metadata we're working with in this dataset.

In [12]:
from pprint import pprint

print("Here are some example entries in our Knowledge Base:\n")
for r in dataset.documents.iloc[0:1].to_dict(orient="records"):
    pprint(r['metadata'])

Here are some example entries in our Knowledge Base:

{'chunk': 0,
 'text': '.rst\n'
         '.pdf\n'
         'Welcome to LangChain\n'
         ' Contents \n'
         'Getting Started\n'
         'Modules\n'
         'Use Cases\n'
         'Reference Docs\n'
         'Ecosystem\n'
         'Additional Resources\n'
         'Welcome to LangChain#\n'
         'LangChain is a framework for developing applications powered by '
         'language models. We believe that the most powerful and '
         'differentiated applications will not only call out to a language '
         'model, but will also be:\n'
         'Data-aware: connect a language model to other sources of data\n'
         'Agentic: allow a language model to interact with its environment\n'
         'The LangChain framework is designed around these principles.\n'
         'This is the Python specific portion of the documentation. For a '
         'purely conceptual guide to LangChain, see here. For the JavaScript '
      

Our chunks are ready so now we move onto embedding and indexing everything.

## Initializing the Pinecone client

Now the data is ready, we can set up our index to store it.

We begin by instantiating the Pinecone client. To do this we need a [free API key](https://app.pinecone.io).

In [28]:
import os

if not os.environ.get("PINECONE_API_KEY"):
    from pinecone_notebooks.colab import Authenticate
    Authenticate()

In [21]:
pc.delete_index(name = index_name)

### Creating a Pinecone Index

When creating the index we need to define several configuration properties.

- `name` can be anything we like. The name is used as an identifier for the index when performing other operations such as `describe_index`, `delete_index`, and so on.
- `metric` specifies the similarity metric that will be used later when you make queries to the index.
- `dimension` should correspond to the dimension of the dense vectors produced by your embedding model. In this quick start, we are using made-up data so a small value is simplest.
- `spec` holds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects).

There are more configurations available, but this minimal set will get us started.

In [24]:
from pinecone import ServerlessSpec

# check if index already exists (it shouldn't if this is first time)
if not pc.has_index(name=index_name):
    # if does not exist, create index
    pc.create_index(
        name=index_name,
        dimension=dimensions,  # dimensionality of text-embedding-ada-002
        metric='cosine',
        spec=ServerlessSpec(cloud='aws', region='us-east-1')

    )
    print("Created index ", index_name)
pc.describe_index(name=index_name)

{
    "name": "gpt-4-langchain-docs-fast",
    "dimension": 1536,
    "metric": "cosine",
    "host": "gpt-4-langchain-docs-fast-b78vvil.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "deletion_protection": "disabled"
}

## Storing data in the Index

First we need to instantiate an Index client that can interact with the index we just created.

In [25]:
# Instantiate an Index client
index = pc.Index(name=index_name)

# View index stats for the new index
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

We can see the index is currently empty with a `total_vector_count` of `0`. We can begin populating it with OpenAI `text-embedding-ada-002` built embeddings like so:

In [21]:
index.upsert_from_dataframe(
    df=df,
    batch_size=100
)

sending upsert requests:   0%|          | 0/6952 [00:00<?, ?it/s]

{'upserted_count': 6952}

Now we've added all of our langchain docs to the index. With that we can move on to retrieval and then answer generation using GPT-4.

## Retrieval

To search through our documents we first need to create a query vector `xq`. Using `xq` we will retrieve the most relevant chunks from the LangChain docs. To create that query vector we must initialize a `text-embedding-ada-002` embedding model with OpenAI. For this, you need an [OpenAI API key](https://platform.openai.com/).

In [17]:
def create_embedding(query):
    from openai import OpenAI
    from google.colab import userdata

    # Get OpenAI api key from platform.openai.com
    #openai_api_key = os.getenv('OPENAI_API_KEY') or 'sk-...'
    openai_api_key = userdata.get('OPENAI_API_KEY')

    # Instantiate the OpenAI client
    client = OpenAI(api_key=openai_api_key)
    model="text-embedding-ada-002"
    model="text-embedding-3-large"

    # Create an embedding
    res = client.embeddings.create(
      model=model,
      input=[query],
    )
    return res.data[0].embedding

In [18]:
query = "how do I use the LLMChain in LangChain?"

query ="are chimichangas better than enchiladas?"
# retrieve from Pinecone
xq = create_embedding(query)

# get relevant contexts (including the questions)
res = index.query(vector=xq, top_k=5, include_metadata=True)
res

PineconeApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Date': 'Tue, 25 Mar 2025 20:59:33 GMT', 'Content-Type': 'application/json', 'Content-Length': '104', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '55', 'x-pinecone-request-id': '5828896700232144065', 'x-envoy-upstream-service-time': '2', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"Vector dimension 3072 does not match the dimension of the index 1536","details":[]}


With retrieval complete, we move on to feeding these into GPT-4 to produce answers.

## Retrieval Augmented Generation

GPT-4 is currently accessed via the `ChatCompletions` endpoint of OpenAI.

To get a richer response from the LLM that includes context from our knowledge base, we need to retrieve context relevant to the query and then include it into the chat completion prompt.

In [31]:
def retrieval_augmented_prompt(query):
    context_limit = 3750
    xq = create_embedding(query)

    # Get relevant contexts
    query_results = index.query(vector=xq, top_k=3, include_metadata=True)
    contexts = [
        x.metadata['text'] for x in query_results.matches
    ]

    # Build our prompt with the retrieved contexts included
    prompt_start = (
        "Answer the question based on the context below.\n\n"+
        "Context:\n"
    )
    prompt_end = (
        f"\n\nQuestion: {query}\nAnswer:"
    )
    context_separator = "\n\n---\n\n"

    # Join contexts and trim to fit within limit
    combined_contexts = []
    total_length = 0

    for context in contexts:
        new_length = total_length + len(context) + len(context_separator)
        if new_length >= context_limit:
            break
        combined_contexts.append(context)
        total_length = new_length

    return prompt_start + context_separator.join(combined_contexts) + prompt_end

In [32]:
prompt = retrieval_augmented_prompt(query)
print(prompt)

Answer the question based on the context below.

Context:
for full documentation on:\n\nGetting started (installation, setting up the environment, simple examples)\n\nHow-To examples (demos, integrations, helper functions)\n\nReference (full API docs)\n\nResources (high-level explanation of core concepts)\n\nð\x9f\x9a\x80 What can this help with?\n\nThere are six main areas that LangChain is designed to help with.\nThese are, in increasing order of complexity:\n\nð\x9f“\x83 LLMs and Prompts:\n\nThis includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.\n\nð\x9f”\x97 Chains:\n\nChains go beyond a single LLM call and involve sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.\n\nð\x9f“\x9a Data Augmented Generation:\n\nData Augmented Generation involves specific types o

Now we ask the question of the LLM using chat completion:

In [36]:
def chat_completion(prompt):
    from openai import OpenAI
    from google.colab import userdata
    # Get OpenAI api key from platform.openai.com
    openai_api_key = userdata.get('OPENAI_API_KEY')


    # Instantiate the OpenAI client
    client = OpenAI(api_key=openai_api_key)

    # Instructions
    sys_prompt = f"""You are Q&A bot. A highly intelligent system that answers
    user questions based on the information provided by the user above
    each question. If the information can not be found in the information
    provided by the user you truthfully say "I don't know".
    """

    res = client.chat.completions.create(
        model='gpt-4o-mini-2024-07-18',
        messages=[
            {"role": "system", "content": sys_prompt},
            {"role": "user", "content": prompt}
        ],
        temperature=0
    )
    return res.choices[0].message.content.strip()

def rag(query):
    prompt = retrieval_augmented_prompt(query)
    return chat_completion(prompt)

In [47]:
question="how do i make chimichangas?"
answer = rag(question)
answer

"I don't know."

To display this response nicely, we will display it in markdown.

In [43]:
from IPython.display import Markdown

display(Markdown(answer))

- LangChain is an intuitive framework for developing applications driven by language models, such as OpenAI or Hugging Face.
- It is a Python library that provides out-of-the-box support for building NLP applications using LLMs.
- LangChain offers a standard interface for creating chains, enabling sequences of calls that go beyond a single LLM call.
- The framework simplifies the process of building advanced language model applications.
- It includes various modules to assist with different aspects of language model application development.
- LangChain is designed to help with six main areas: LLMs and Prompts, Chains, Data Augmented Generation, and Agents.
- It supports prompt management, prompt optimization, and common utilities for working with LLMs.
- LangChain facilitates integrations with other tools and provides end-to-end chains for common applications.

Let's compare this to a non-augmented query...

In [44]:
def non_augmented_prompt(query):
    return f"""
Question: {query}
Answer:
"""

answer2 = chat_completion(non_augmented_prompt(question))

display(Markdown(answer2))

I don't know.

If we drop the `"I don't know"` part of the `sys_prompt`, the LLM will try to pull an answer out of things it already knows. These may or may not be correct.

In [46]:
def hallucinating_chat_completion(prompt):
    from openai import OpenAI

    from google.colab import userdata
    # Get OpenAI api key from platform.openai.com
    openai_api_key = userdata.get('OPENAI_API_KEY')

    # Instantiate the OpenAI client
    client = OpenAI(api_key=openai_api_key)

    # Instructions
    sys_prompt = f"""You are helpful Q&A bot."""

    res = client.chat.completions.create(
        model='gpt-4o-mini-2024-07-18',
        messages=[
            {"role": "system", "content": sys_prompt},
            {"role": "user", "content": prompt}
        ],
        temperature=0
    )
    return res.choices[0].message.content.strip()

answer3 = hallucinating_chat_completion(non_augmented_prompt(question))
display(Markdown(answer3))

- **Definition**: LangChain is a framework designed for developing applications powered by language models.

- **Modular Components**: It consists of various modules that can be combined to create complex applications, including:
  - **Prompt Templates**: Tools for creating and managing prompts for language models.
  - **Chains**: Sequences of calls to language models or other tools, allowing for multi-step workflows.
  - **Agents**: Components that can make decisions based on user input and dynamically choose actions.

- **Integration**: LangChain supports integration with various language models, including OpenAI's GPT, Hugging Face models, and others.

- **Data Handling**: It provides utilities for managing and processing data, including document loaders and vector stores for efficient retrieval.

- **Use Cases**: Common applications include chatbots, question-answering systems, content generation, and more.

- **Extensibility**: Developers can extend LangChain with custom components to fit specific needs.

- **Community and Resources**: It has an active community and offers extensive documentation, tutorials, and examples to help users get started.

- **Deployment**: LangChain can be used in various environments, from local development to cloud-based applications.

Then we see something even worse than `"I don't know"` — hallucinations. Clearly augmenting our queries with additional context can make a huge difference to the performance of our system and ensure that trusted information is given priority when composing a response.

Great, we've seen how to augment GPT-4 with semantic search to allow us to answer LangChain specific queries.

## Demo cleanup

Once you're finished, we delete the index to save resources.

In [None]:
pc.delete_index(name=index_name)

---