# Basics of RAG from Scratch

Install and import all requirements

In [1]:
! pip install openai langchain faiss-cpu




[notice] A new release of pip is available: 24.0 -> 25.1.1
[notice] To update, run: C:\Users\micke\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [7]:
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

Get the data you want your LLM to be aware of when answer inquiries

In [8]:
def load_str(path: str) -> str:
    """
    Simple file reader that loads data into string format
    """
    with open(path, "r", encoding="utf-8") as f:
        docs = f.read()
    return docs

# loads the file in as a str
data = load_str("what_is_rag.txt") # (optional) print to see if you get the right data

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, # length of each chunk by len() -> so linke a length 1000 str
    chunk_overlap=200, # allowed overlap, so chunk 1 could be 1-1000, and chunk 2 be 800-1800 -> overflow of context
    separators=[":"] # what we should try to chunk by
)

# splits text up into chunks
docs = text_splitter.split_text(data) # (optional) print and see

In [None]:
# Get your OpenAI key
openai_key = os.getenv("OPENAI_API_KEY")

# Instantiate a client that answers the questions
client = ChatOpenAI(openai_api_key=openai_key, model="gpt-4o-mini", temperature=0.7)

# Instantiate a client that will turn text into vector embeddings
embeddings = OpenAIEmbeddings(api_key=openai_key)

# vector stores: stores information through an indexing system -> vectorized representations by using embeddings
vector_store = FAISS.from_texts(docs, embedding=embeddings)

# the retrieval machine -> takes in the chat bot and the vector store
retrieval = RetrievalQA.from_chain_type(
    llm=client,
    chain_type="stuff",
    retriever=vector_store.as_retriever(search_kwawrgs={"k": 3}),
    return_source_documents=True
)

In [20]:
question = "How would I use RAG to implement my own chat bot?"

# Ask the question
response = retrieval.invoke(question)

print(response["result"] + "\n")
for i, doc in enumerate(response["source_documents"], 1):
    snippet = doc.page_content.strip().replace("\n", " ")
    print(f"Chunk {i}: {snippet}…")

To implement your own chatbot using Retrieval-Augmented Generation (RAG), you can follow these typical workflow steps:

1. **Indexing Phase:**
   - **Source Documents:** Collect domain-specific documents or FAQs that you want your chatbot to reference.
   - **Chunking:** Split these documents into manageable chunks, such as sentences or paragraphs.
   - **Embedding Generation:** Use an embedding model to transform each chunk into high-dimensional vectors.
   - **Storage:** Store these embeddings in a vector store for efficient retrieval.

2. **Query Phase:**
   - **User Query:** When a user sends a message to the chatbot, generate an embedding for this query.
   - **Retrieval:** Use techniques like k-NN search to find the top-k closest document chunks from your vector store based on similarity to the query embedding.
   - **Prompt Construction:** Create a prompt that includes the user query and the retrieved snippets. Ensure the most relevant snippets are placed first and separate cont