#### [LangChain Handbook](https://pinecone.io/learn/langchain)

# Retrieval Augmentation

Large Language Models (LLMs) have a data freshness problem. The most powerful LLMs in the world, like GPT-4, have no idea about recent world events.

The world of LLMs is frozen in time. Their world exists as a static snapshot of the world as it was within their training data.

A solution to this problem is *retrieval augmentation*. The idea behind this is that we retrieve relevant information from an external knowledge base and give that information to our LLM. In this notebook we will learn how to do that.

visit the notebook](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/full-link.svg)](https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/05-langchain-retrieval-augmentation.ipynb)


In [None]:
To begin, we must install the prerequisite libraries that we will be using in this notebook.
# !pip install -qU \
#   langchain==0.1.1 \
#   langchain-community==0.0.13 \
#   openai==0.27.7 \
#   tiktoken==0.4.0 \
#   pinecone-client==3.0.0 \
#   pinecone-datasets==0.7.0

#### Building the Knowledge Base

We will download a pre-embedding dataset from `pinecone-datasets`. Allowing us to skip the embedding and preprocessing steps, 
if you'd rather work through those steps you can find the [full notebook here](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-augmentation.ipynb).

In [None]:
import pinecone_datasets

dataset = pinecone_datasets.load_dataset('wikipedia-simple-text-embedding-ada-002-100K')
dataset.head()

In [None]:
len(dataset)

In [None]:
#format the dataset for upsert and reduce what we use to a subset of dataset 
# we drop sparse_values as they are not needed for this example
dataset.documents.drop(['metadata'], axis=1, inplace=True)
dataset.documents.rename(columns={'blob': 'metadata'}, inplace=True)
# we will use rows of the dataset up to index 30_000
dataset.documents.drop(dataset.documents.index[30_000:], inplace=True)
len(dataset)

In [None]:
#initialize pinecone with vector database

#### vector database

In [None]:
#to create vector database a free API key from pinecone
#choose whether to serveless or pod-based index
import os

use_serverless = os.environ.get("USE_SERVERLESS", "False").lower() == "true"

In [None]:
#initialize connection with pinecone with API key
from pinecone import Pinecone

# initialize connection to pinecone (get API key at app.pc.io)
api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'
environment = os.environ.get('PINECONE_ENVIRONMENT') or 'PINECONE_ENVIRONMENT'

# configure client
pc = Pinecone(api_key=api_key)

##### setup the index specification,which will allow us to define the cloud provider and region where we want to deploy the index [available providers and regions here](https://docs.pinecone.io/docs/projects).

In [None]:
from pinecone import ServerlessSpec, PodSpec

if use_serverless:
    spec = ServerlessSpec(cloud='aws', region='us-west-2')
else:
    spec = PodSpec(environment=environment)

In [None]:
index_name = 'langchain-retrieval-augmentation-fast'

In [None]:
import time

if index_name in pc.list_indexes().names():
    pc.delete_index(index_name)

# we create a new index
pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of text-embedding-ada-002
        metric='dotproduct',
        spec=spec
    )

# wait for index to be initialized
while not pc.describe_index(index_name).status['ready']:
    time.sleep(1)

In [None]:
# connect to the index 
index = pc.Index(index_name)
# wait a moment for connection
time.sleep(1)

index.describe_index_stats()

In [None]:
#We should see that the new Pinecone index has a `total_vector_count` of `0`, as we haven't added any vectors yet.
#Now we upsert the data to Pinecone:


In [None]:
for batch in dataset.iter_documents(batch_size=100):
    index.upsert(batch)

In [None]:
#We've now indexed everything. We can check the number of vectors in our index like so:
index.describe_index_stats()

### Creating vector store to query

In [None]:
#after creating index then switch to langchain to initialize langchain vectorstore
#also we will also need a LangChain embedding object, which we initialize like so:
from langchain.embeddings.openai import OpenAIEmbeddings

# get openai api key from platform.openai.com
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or 'OPENAI_API_KEY'

model_name = 'text-embedding-ada-002'

embed = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)

In [None]:
#initialize vector store 
from langchain.vectorstores import Pinecone

text_field = "text"

# switch back to normal index for langchain
index = pc.Index(index_name)

vectorstore = Pinecone(
    index, embed.embed_query, text_field
)

In [None]:
#query vector store using vectot.similarity_search:
query = "who was Benito Mussolini?"

vectorstore.similarity_search(
    query,  # our search query
    k=3  # return 3 most relevant docs
)

##### generative question answering

In [None]:
#initialize RetrievalQA object
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

# completion llm
llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model_name='gpt-3.5-turbo',
    temperature=0.0
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)

In [None]:
qa.run(query)

In [None]:
#include the sources of information that the LLM is using to answer question,using a slightly different version of `RetrievalQA` called `RetrievalQAWithSourcesChain`:
from langchain.chains import RetrievalQAWithSourcesChain

qa_with_sources = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)

In [None]:
qa_with_sources(query)

In [None]:
#Now we answer the question being asked, *and* return the source of this information being used by the LLM.
#Once done, we can delete the index to save resources.
pc.delete_index(index_name)