# Basic RAG Implementation - Step by Step

## Set environment

**Run requirements**



In [None]:
! pip install --upgrade --quiet langchain langchain_community langchain-openai langchainhub faiss-cpu

**Dev requirements**

In [None]:
! pip install --upgrade --quiet tiktoken

**Environment Variables**

The example below requires an OpenAI API key, so please create an account and a key if you don't have one yet.

- [OpenAI docs](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)

- [Video](https://www.youtube.com/watch?v=aVog4J6nIAU&pp=ygUOb3BlbmFpIGFwaSBrZXk%3D)


The Langchain API key is not mandatory, but it allows monitoring the process with LangSmith.

- [Langchain docs](https://docs.smith.langchain.com/setup#create-an-api-key)

In [None]:
from getpass import getpass
import os

OPENAI_API_KEY = getpass('Please enter the secret value for OpenAI Key: ')
os.environ['OPENAI_API_KEY']= OPENAI_API_KEY

# Set them if you have a Langchain API key

LANGCHAIN_API_KEY = getpass('Please enter the secret value for LangChain Key: ')
os.environ['LANGCHAIN_API_KEY'] = LANGCHAIN_API_KEY
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'


## Star ExpertBot - Simple RAG Example

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import LocalFileStore
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


# ====================
#      Indexing
# ====================

## Load knowledgebase
loader = TextLoader("./stardust_serenade.txt")
knowledgebase = loader.load()

## Split knowledgebase in chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100)
chunks = text_splitter.split_documents(knowledgebase)

## Embed knowledgebase
embedder = OpenAIEmbeddings() # Default: model=text-embedding-ada-002
store = LocalFileStore("/home/cache/")

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    embedder, store, namespace=embedder.model
)
db_vector = FAISS.from_documents(chunks, cached_embedder)


# ====================
#      Retrieval
# ====================

retriever = db_vector.as_retriever()


# =============================
#     Retrieval and Generation
# =============================

retrieval_chain = (
    (lambda x: x["input"])
    | retriever
    | (lambda docs: "\n\n".join(doc.page_content for doc in docs))
    )


template = """Answer the question based only on the following context:
{context}

Question: {input}
"""
prompt = ChatPromptTemplate.from_template(template)
chat_model = ChatOpenAI() # Default: model=gpt-3.5-turbo, temperature=0.7


simple_chain = prompt | chat_model | StrOutputParser()


rag_chain = (
    {"context": retrieval_chain , "input": RunnablePassthrough()}
    | simple_chain
)

In [None]:
# ====================
#        Invoke
# ====================
question = (
    "What were the names of the three stars in the story, "\
    "and what were their unique characteristics?"
)

result = rag_chain.invoke({"input": question})
print(result)

**Monitoring**

Open Langsmith to see the workflow steps -> [LangSmith](https://smith.langchain.com/) 🦜

### Step by Step

#### **1. Load knowledgebase**


Langchain allows us to load documents from many sources like markdown CSV and plane files, URLs, Git Hub, Google Drive, images, Data Dog, WhatsApp, etc. In this case, we are going to use a markdown file.

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_core.documents import Document
from typing import (List)


def load_documents_from_plane_text(knowledgebase_file_path: str) -> List[Document]:
        """Load the knowledgebase from a plane text file"""

        loader = TextLoader(knowledgebase_file_path)
        documents = loader.load()

        return documents

knowledgebase = load_documents_from_plane_text("./stardust_serenade.txt")
print(f"{knowledgebase = }\n{len(knowledgebase) = }")


**Source:** [Document Loaders](https://python.langchain.com/docs/integrations/document_loaders/)

#### **2. Format knowledgebase**

Lanchain has different text splitters like `CharacterTextSplitter`, `MarkdownHeaderTextSplitter`, `RecursiveCharacterTextSplitter`, etc.

The most important parameters when we set a splitter are:

- **chunk_size:** It could represent either characters or tokens depending on the splitter type. The value depends on the context window supported by the embedding model.

- **chunk_overlap:** It represents the number of characters the chunks will share if the splitter has to cut a paragraph.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def format_knowledgebase(knowledgebase: List[Document]) -> List[Document]:
        """Split the knowledgebase in chunks"""

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
        formatted_documents = text_splitter.split_documents(knowledgebase)

        return formatted_documents

chunks = format_knowledgebase(knowledgebase)
print(f"Number of chunks: {len(chunks)}")
print(f"First chunk: {chunks[0]}\n")

for n, chunk in enumerate(chunks):
  print(f"Chunk {1 + n}:")
  content = chunk.page_content
  print(f"Number of characters: {len(content)}")

  overlap = chunks[n - 1].page_content[-30:]

  if overlap in content:
    print(f"There is an overlap")
  print("\n")


**Source:** [Text Splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/)

#### **3. Embedding**

**Count tokens**

Requests to the OpenAI embeddings models API are billed based on the number of tokens in the input.

The following example uses the cl100k_base encoding to count tokens because it is compatible with third-generation models like text-embedding-ada-002.

In [None]:
import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

mini_knowledgebase = ["Once upon a time, in the vast cosmic expanse, there existed three remarkable stars: Azure, Sunny, and Rosette.",
                      "On the other side, they discovered a realm of Rainbow Stars—each one a fusion of their colors."]
num_tokens = num_tokens_from_string(mini_knowledgebase[0], "cl100k_base")
print(f"{num_tokens = }")


**Embed Text**

LangChain supports different embedding models. Here we use the `text-embedding-ada-002` model from OpenAI.

In [None]:
from langchain_openai import OpenAIEmbeddings

embedder = OpenAIEmbeddings() # Default: text-embedding-ada-002

In [None]:
mini_knowledgebase_vector = embedder.embed_documents(mini_knowledgebase)

print(f"{mini_knowledgebase_vector = }")
print(f"{len(mini_knowledgebase_vector) = }")
print(f"Vector's dimention: {len(mini_knowledgebase_vector[0])}")

**Cache Backed Embeddings**

Having embeddings in the cache will optimize the design, development and deployment, and if the model isn't free we'll save money.

In [None]:
from langchain.storage import LocalFileStore
from langchain.embeddings import CacheBackedEmbeddings

store = LocalFileStore("./cache/")

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    embedder, store, namespace=embedder.model
)

**Store Embeddings**

Instead calling directly the embed_query method to create the embeddings, we generate them by defining the vector store.

Let's see how to store the embddings using a cache.

In [None]:
from langchain_community.vectorstores import FAISS

%time
vector_store = FAISS.from_documents(chunks, cached_embedder) # If you are not using a chache use just the embedder

Now you could see the embedding files in /home/cache directory.

If we create the vector_store again, it will take the embeddings form the cache directory and won't call the embedding model, make it the store process faster.

In [None]:
%time
vector_store_2 = FAISS.from_documents(chunks, cached_embedder)

**Sources:**
- [What are tokens and how to count them?](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them#h_cd01d4bb9a)
- [Tokenizer tool](https://platform.openai.com/tokenizer)
- [Embedding Models](https://python.langchain.com/docs/integrations/text_embedding/)
- [Cache Backed Embeddings](https://python.langchain.com/docs/modules/data_connection/text_embedding/caching_embeddings)

#### **4. Retrive the context**

To get the relevant context, we need to create a retriver, but let's first see how  this process works manually and then how it works using the retriever.

**Calculate similarity manually**

The Cosine Similarity is a measure recommended to find similar vectors. For OpenAI embeddings, "1" indicates identical vectors.

Let's find the similarity between a question and two phrases manually.

- **Question:** What were the names of the three stars in the story?
- **Phrase 1:** Once upon a time, in the vast cosmic expanse, there existed three remarkable stars: Azure, Sunny, and Rosette.
- **Phrase 2:** On the other side, they discovered a realm of Rainbow Stars—each one a fusion of their colors.

We need first create the embdding the question.

In [None]:
question = "What were the names of the three stars in the story?"
question_vector = embedder.embed_query(question)

Now we could run the cosine similarity and find the most similar phrase.



In [None]:
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)

most_similar = ""
max_similarity = 0

for i, phrase_vector in enumerate(zip(mini_knowledgebase, mini_knowledgebase_vector)):

  phrase, vector = phrase_vector

  similarity = cosine_similarity(question_vector, vector)

  tag = f"phrase_{i + 1}"
  print(f"{tag} similarity: {similarity}")

  if similarity > max_similarity:
    most_similar = phrase
    max_similarity = similarity

print(f"\n{most_similar = }")

**Using a Retriever**

The code bellow use the retriever to find the context to answer the question (the most similar two chunks).

Langchaing supports many retriever types. In this case, we'll use the same vector store as the retriever (Vector store-backed retriever).

Vector store-backed retriever has different types of search. In the example below, we'll use similarity search and specify the maximum chunks it must retrieve.

In [None]:
retriever = vector_store.as_retriever(search_kwargs={"k": 2})

In [None]:
context = retriever.get_relevant_documents(question)

print(f"{context = }")
print(f"{len(context) = }")

**Sources:**
- [Cosine similarity](https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions)
- [Retrievers Types](https://python.langchain.com/docs/modules/data_connection/retrievers/)
- [Vector store-backed retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore)

#### **5. Generate the Answer**

As we have the context now, we need to use a LLM to respond the question according to it. But let's first see what the LLM responses withouth the context.

**Simple Chain**

We'll LCEL (Lang Chain Expression Language) to build the chain. A simple chain is composed by:

- A prompt template: The instruction we'll give to the LLM.
- A LLM: The model we'll use to answer the question. In this case, it's ChatGPT.
- An output parser: The parser we'll use to get the answer from the LLM response and format it.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI


prompt = ChatPromptTemplate.from_template(
    "You are an expert in children’s stories, please answer the following question."
    "\nQuestion: {input}"
    )
llm = ChatOpenAI()  # Default: Model=gpt-3.5-turbo Temperature=0.7
output_parser = StrOutputParser()

simple_chain = prompt | llm | output_parser

In [None]:
# Test
question = "What were the names of the three stars in the story, and what were their unique characteristics?"
answer = simple_chain.invoke({"input": question})
print(f"{answer =}")

Chains works in the way each element takes as input the output of the previous element. So we need be aware the inputs and outputs are compatible.

The example below shows how to see the input and output schemas.

In [None]:
prompt.output_schema.schema()["anyOf"]

In [None]:
llm.input_schema.schema()["anyOf"]

In [None]:
llm.output_schema.schema()["anyOf"]

In [None]:
output_parser.input_schema.schema()["anyOf"]

In [None]:
output_parser.output_schema.schema()["type"]

**RAG Chain**

A RAG chain is composed by a simple chain and a retrieval chain.

In this example the retrieval chain is compose by:
- Input parser function
- Retriever
- context formatter

In [None]:
retrieval_chain = (
    (lambda x: x["input"])
    | retriever
    | (lambda docs: "\n\n".join(doc.page_content for doc in docs))
    )

In [None]:
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:
{context}

Question: {input}
"""

prompt = ChatPromptTemplate.from_template(template)
chat_model = ChatOpenAI()


simple_chain = prompt | chat_model | StrOutputParser()


rag_chain = (
    {"context": retrieval_chain , "input": RunnablePassthrough()}
    | simple_chain
)

In [None]:
answer = rag_chain.invoke({"input": question})
print(f"{answer =}")

**Prebuilted Chains**

Langchain has some prebuilted promps and chains.

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_openai import ChatOpenAI
from langchain import hub


prompt = hub.pull("langchain-ai/retrieval-qa-chat")
chat_model = ChatOpenAI()

document_chain = create_stuff_documents_chain(chat_model, prompt)

rag_chain = create_retrieval_chain(retriever, document_chain)

In [None]:
answer = rag_chain.invoke({"input": question})["answer"]
print(f"{answer =}")

**Sources**:

- [LangChain Expression Language](https://python.langchain.com/docs/expression_language/get_started)
- [Prompt](https://python.langchain.com/docs/modules/model_io/prompts/quick_start)
- [Chat Model](https://python.langchain.com/docs/modules/model_io/chat/quick_start)
- [Output Parser](https://python.langchain.com/docs/modules/model_io/output_parsers/quick_start)
- [Chains](https://python.langchain.com/docs/modules/chains)