# Basic RAG Pipeline

In this notebook we will look into building an basic RAG pipeline with LlamaIndex. It has following 2 sections.

1. Understanding Retrieval Augmented Generation (RAG).
2. Building basic RAG with LlamaIndex.

**Retrieval Augmented Generation (RAG)**

LLMs are trained on vast datasets, but these will not include your specific data. Retrieval-Augmented Generation (RAG) addresses this by dynamically incorporating your data during the generation process. This is done not by altering the training data of LLMs, but by allowing the model to access and utilize your data in real-time to provide more tailored and contextually relevant responses.

In RAG, your data is loaded and prepared for queries or “indexed”. User queries act on the index, which filters your data down to the most relevant context. This context and your query then go to the LLM along with a prompt, and the LLM provides a response.

Even if what you’re building is a chatbot or an agent, you’ll want to know RAG techniques for getting data into your application.

![RAG Overview](../data/llamaindex_rag_overview.png)

**Stages within RAG**

There are five key stages within RAG, which in turn will be a part of any larger application you build. These are:

**Loading:** this refers to getting your data from where it lives – whether it’s text files, PDFs, another website, a database, or an API – into your pipeline. LlamaHub provides hundreds of connectors to choose from.

**Indexing:** this means creating a data structure that allows for querying the data. For LLMs this nearly always means creating vector embeddings, numerical representations of the meaning of your data, as well as numerous other metadata strategies to make it easy to accurately find contextually relevant data.

**Storing:** Once your data is indexed, you will want to store your index, along with any other metadata, to avoid the need to re-index it.

**Querying:** for any given indexing strategy there are many ways you can utilize LLMs and LlamaIndex data structures to query, including sub-queries, multi-step queries and hybrid strategies.

**Evaluation:** a critical step in any pipeline is checking how effective it is relative to other strategies, or when you make changes. Evaluation provides objective measures of how accurate, faithful and fast your responses to queries are. However, this part is not covered in this notebook.

## Build RAG system.

Now that we have understood the significance of RAG system, let's build a simple basci RAG pipeline.

#### Load Data and Build Index.

In [None]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

documents = SimpleDirectoryReader(
    input_files=["../data/Henry.txt"]
).load_data()

local_model_path = "/data-extend/zhengwenhao/workspace/RAG/tex2vec/bge-base-zh-v1.5"
Settings.embed_model = HuggingFaceEmbedding(model_name=local_model_path)

# 创建 VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)

Build a QueryEngine and start querying.

In [None]:
from load_llama2 import OurLLM
Settings.llm = OurLLM()

query_engine = index.as_query_engine()

In [None]:
response = query_engine.query(
    "Who is the pretty boy in Hong Kong?"
)
print(str(response))

By default it retrieves `two` similar nodes/ chunks. You can modify that in `vector_index.as_query_engine(similarity_top_k=k)`.

**Let's check the text in each of these retrieved nodes.**

In [None]:
# First retrieved node
response.source_nodes[0].get_text()

In [None]:
# Second retrieved node
response.source_nodes[1].get_text()