# Basic RAG Pipeline

In this notebook we will look into building an basic RAG pipeline with LlamaIndex. It has following 2 sections.

1. Understanding Retrieval Augmented Generation (RAG).
2. Building basic RAG with LlamaIndex.

**Retrieval Augmented Generation (RAG)**

LLMs are trained on vast datasets, but these will not include your specific data. Retrieval-Augmented Generation (RAG) addresses this by dynamically incorporating your data during the generation process. This is done not by altering the training data of LLMs, but by allowing the model to access and utilize your data in real-time to provide more tailored and contextually relevant responses.

In RAG, your data is loaded and and prepared for queries or “indexed”. User queries act on the index, which filters your data down to the most relevant context. This context and your query then go to the LLM along with a prompt, and the LLM provides a response.

Even if what you’re building is a chatbot or an agent, you’ll want to know RAG techniques for getting data into your application.

![RAG Overview](../data/llamaindex_rag_overview.png)

**Stages within RAG**

There are five key stages within RAG, which in turn will be a part of any larger application you build. These are:

**Loading:** this refers to getting your data from where it lives – whether it’s text files, PDFs, another website, a database, or an API – into your pipeline. LlamaHub provides hundreds of connectors to choose from.

**Indexing:** this means creating a data structure that allows for querying the data. For LLMs this nearly always means creating vector embeddings, numerical representations of the meaning of your data, as well as numerous other metadata strategies to make it easy to accurately find contextually relevant data.

**Storing:** Once your data is indexed, you will want to store your index, along with any other metadata, to avoid the need to re-index it.

**Querying:** for any given indexing strategy there are many ways you can utilize LLMs and LlamaIndex data structures to query, including sub-queries, multi-step queries and hybrid strategies.

**Evaluation:** a critical step in any pipeline is checking how effective it is relative to other strategies, or when you make changes. Evaluation provides objective measures of how accurate, faithful and fast your responses to queries are. However, this part is not covered in this notebook.

## Build RAG system.

Now that we have understood the significance of RAG system, let's build a simple basci RAG pipeline.

Set Your OpenAI API Key

In [1]:
import sys
sys.path.insert(0, '..')

import common.utils
import os
import openai
openai.api_key = common.utils.get_openai_api_key()

  from .autonotebook import tqdm as notebook_tqdm


✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input response will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .


#### Load Data and Build Index.

In [2]:
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader(
    input_files=["../data/Henry.txt"]
).load_data()

In [3]:
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])

<class 'list'> 

1 

<class 'llama_index.schema.Document'>
Doc ID: 9e4ed240-1cd5-4c00-bd39-57392ec1ffcb
Text: History   Henry, with his striking features and undeniable
charm, has captivated the hearts of many in Hong Kong, earning him the
title of the most handsome boy in the city. His chiseled jawline,
expressive eyes, and perfectly styled hair make heads turn wherever he
goes. Beyond his physical appearance, Henry possesses an innate grace
and confid...


concatenate all documents

In [4]:
from llama_index import Document

document = Document(text="\n\n".join([doc.text for doc in documents]))

build service context and establish index by using specifc embedding model

In [5]:
from llama_index.node_parser import SimpleNodeParser
from llama_index import VectorStoreIndex
from llama_index import ServiceContext
from llama_index.llms import OpenAI

# Define an LLM
llm = OpenAI(model="gpt-3.5-turbo")

# Build index with a chunk_size of 64
node_parser = SimpleNodeParser.from_defaults(chunk_size=64,chunk_overlap=2)
nodes = node_parser.get_nodes_from_documents(documents)
index = VectorStoreIndex(nodes)

Build a QueryEngine and start querying.

In [6]:
query_engine = index.as_query_engine()

In [7]:
response = query_engine.query(
    "Who is the pretty boy in Hong Kong?"
)
print(str(response))

Henry is the pretty boy in Hong Kong.


In [8]:
response = query_engine.query(
    "香港谁最帅?"
)
print(str(response))

Henry是香港最帅的男孩。


In [9]:
response = query_engine.query(
    "Who is the beautiful person in Hong Kong?"
)
print(str(response))

Henri is the beautiful person in Hong Kong.


By default it retrieves `two` similar nodes/ chunks. You can modify that in `vector_index.as_query_engine(similarity_top_k=k)`.

**Let's check the text in each of these retrieved nodes.**

In [10]:
# First retrieved node
response.source_nodes[0].get_text()

"It's no wonder that he has become an icon of attractiveness in Hong Kong, leaving a lasting impression on everyone fortunate enough to encounter him.\n\nHenri, with her radiant presence and captivating allure, is hailed as the most beautiful girl in Hong Kong."

In [11]:
# Second retrieved node
response.source_nodes[1].get_text()

"Whether she is engaged in a conversation or simply walking down the streets of Hong Kong, Henri's beauty is captivating, leaving an indelible impression on those fortunate enough to cross her path."