# RAG Augmentation 
### Reference https://towardsdatascience.com/retrieval-augmented-generation-rag-from-theory-to-langchain-implementation-4e9bd5f6a4f2

* Prerequisites

1. langchain for orchestration
2. openai for the embedding model and LLM
3. weaviate-client for the vector database


In [29]:
import os

## Step1: Collect and load your data

In [19]:
import requests
import tempfile
from langchain.document_loaders import TextLoader

# URL to fetch the text content from
url = "https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs/modules/state_of_the_union.txt"
res = requests.get(url)

text_content = res.text

# Use the tempfile module to create a temporary file
with tempfile.NamedTemporaryFile(mode="w", delete=False) as tmp_file:
    tmp_file.write(text_content)
    tmp_file_path = tmp_file.name

# Use this path with TextLoader
loader = TextLoader(tmp_file_path)
documents = loader.load()

/var/folders/97/r7d175650dg4fx0f_n5km01w0000gn/T/tmpto6xjula


## Step2: Chunk your documents

In [22]:
from langchain.text_splitter import CharacterTextSplitter

# use the CharacterTextSplitter with a chunk_size of about 500 and a chunk_overlap of 50 to preserve text continuity between the chunks
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)

## Step3: Embed and Store the chunks

In [34]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Weaviate
import weaviate
from weaviate.embedded import EmbeddedOptions

os.environ["OPENAI_API_KEY"] = "API_KEY"
# To generate the vector embeddings, you can use the OpenAI embedding model, and to store them, you can use the Weaviate vector database.
client = weaviate.Client(embedded_options=EmbeddedOptions())

# By calling .from_documents() the vector database is automatically populated with the chunks.
vectorstore = Weaviate.from_documents(
    client=client, documents=chunks, embedding=OpenAIEmbeddings(), by_text=False
)

            Consider upgrading to the new and improved v4 client instead!
            See here for usage: https://weaviate.io/developers/weaviate/client-libraries/python
            


embedded weaviate is already listening on port 8079


{"level":"info","msg":"Created shard langchain_fedd420c736a4924973ff76ec4593068_qVrQM2bgZjAU in 2.469998ms","time":"2024-04-06T15:48:37-05:00"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-04-06T15:48:37-05:00","took":89818}
/Users/qilinzhou/anaconda3/lib/python3.11/site-packages/pydantic/main.py:1024: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/


## Step4: RAG

In [35]:
# Retrieve: Once the vector database is populated, you can define it as the retriever component,
# which fetches the additional context based on the semantic similarity between the user query and the embedded chunks.
retriever = vectorstore.as_retriever()

# Augment: Next, to augment the prompt with the additional context, you need to prepare a prompt template.
# The prompt can be easily customized from a prompt template, as shown below.

from langchain.prompts import ChatPromptTemplate

template = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

print(prompt)

input_variables=['context', 'question'] messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:\n"))]


In [40]:
# Generate: Build a chain for the RAG pipeline, chaining together the retriever, the prompt template and the LLM.
# Once the RAG chain is defined, you can invoke it.
from langchain_openai import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

llm = ChatOpenAI(
    model_name="gpt-3.5-turbo", temperature=0
)  # The model will always choose the most likely next word or token.

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

query = "What did the president say about Justice Breyer"
rag_chain.invoke(query)

/Users/qilinzhou/anaconda3/lib/python3.11/site-packages/pydantic/main.py:1024: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/


"The president honored Justice Stephen Breyer for his service and dedication to the country. He mentioned nominating Judge Ketanji Brown Jackson to continue Justice Breyer's legacy of excellence. The president highlighted Judge Jackson's background and broad support since her nomination."