# Simple Retrieval-Augmented Generation (RAG) Example with LangChain, OpenAI and Weaviate

RAG tends to need an LLM, a vector database, an embedding model, and a platform for orchestration.

**Resources:**
- https://towardsdatascience.com/retrieval-augmented-generation-rag-from-theory-to-langchain-implementation-4e9bd5f6a4f2


In [3]:
import creds

OPENAI_API_KEY = creds.OPENAI_TOKEN

False

In [4]:
## Collect and Load Data

import requests
from langchain.document_loaders import TextLoader

url = "https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs/modules/state_of_the_union.txt"
res = requests.get(url)
with open("state_of_the_union.txt", "w") as f:
    f.write(res.text)

loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()

In [16]:
## chunk document

from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)

In [18]:
len(chunks)

90

In [22]:
## embed and store the chunks

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Weaviate
import weaviate
from weaviate.embedded import EmbeddedOptions

client = weaviate.Client(
  embedded_options = EmbeddedOptions()
)

vectorstore = Weaviate.from_documents(
    client = client,    
    documents = chunks,
    embedding = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY),
    by_text = False
)

            Consider upgrading to the new and improved v4 client instead!
            See here for usage: https://weaviate.io/developers/weaviate/client-libraries/python
            


embedded weaviate is already listening on port 8079


{"level":"info","msg":"Created shard langchain_96066d03a1c9480cbd18f28f85f2187b_sbmwoKJ95ntm in 6.532587ms","time":"2024-02-15T15:14:47+08:00"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-02-15T15:14:47+08:00","took":104003}
/Users/bobbycxy/Desktop/it_pays_to_learn/ML Topics/Natural Language Processing/RAG-examples/venv/lib/python3.8/site-packages/pydantic/main.py:1024: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/


In [23]:
vectorstore

<langchain_community.vectorstores.weaviate.Weaviate at 0x7f871e49b1f0>

In [24]:
## step 1: Retrieve
retriever = vectorstore.as_retriever()

In [25]:
retriever

VectorStoreRetriever(tags=['Weaviate', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.weaviate.Weaviate object at 0x7f871e49b1f0>)

In [41]:
## Step 2: Augment
from langchain.prompts import ChatPromptTemplate

template = """Try to respond in 1 one sentence.
Question: {question} 
Context: {context} 
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

print(prompt)

input_variables=['context', 'question'] messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template='Try to respond in 1 one sentence.\nQuestion: {question} \nContext: {context} \nAnswer:\n'))]


In [43]:
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, openai_api_key=OPENAI_API_KEY)

rag_chain = (
    {"context": retriever,  "question": RunnablePassthrough()} 
    | prompt 
    | llm
    | StrOutputParser() 
)

query = "What did the president say fighting inflation?"
rag_chain.invoke(query)

/Users/bobbycxy/Desktop/it_pays_to_learn/ML Topics/Natural Language Processing/RAG-examples/venv/lib/python3.8/site-packages/pydantic/main.py:1024: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/
/Users/bobbycxy/Desktop/it_pays_to_learn/ML Topics/Natural Language Processing/RAG-examples/venv/lib/python3.8/site-packages/pydantic/main.py:1024: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/


'The president said that his plan to fight inflation will lower costs and lower the deficit.'