## RAG Pipeline using LlamaIndex
This notebook will create a RAG pipeline using LlamaIndex. There are 5 key stages in RAGs listed as: Loading the data, Indexing the data, Storing the data, Querying the data, and then finally Evaluation

#### Loading and Embedding Documents
There are three main ways to load data into LlamaIndex:

1. SimpleDirectoryReader: A built-in loader for various file types from a local directory.
2. LlamaParse: LlamaParse, LlamaIndex’s official tool for PDF parsing, available as a managed API.
3. LlamaHub: A registry of hundreds of data-loading libraries to ingest data from any source.


In [15]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir = "Documents")     # Add proper path to where documents are stored
documents = reader.load_data()

After loading the documents, we will break them into smaller pieces called Nodes. We will use the IngestionPipeline to help us create these nodes. We will use SentenceSplitter to break down documents into manageable chunks and HuggingFaceEmbedding to convert each chunk into numerical embeddings.

In [16]:
from llama_index.core import Document
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

pipeline = IngestionPipeline(
    transformations = [
        SentenceSplitter(chunk_overlap = 0),
        HuggingFaceEmbedding(model_name = "BAAI/bge-small-en-v1.5"), #Make sure you are login to HuggingFace and have a token, you can see it from the terminal by using huggingface-cli login
    ]
)

# The pipeline defined here is SentenceSplitter -> HuggingFaceEmbeddings

nodes = await pipeline.arun(documents = documents)
# This will save the transformed chunks of documents into the node variable

#### Storing and Indexing Documents
We will use a vector database like ChromaDB in order to index the node objects and to make them searchable. To install ChromaDB:

pip install llama-index-vector-stores-chroma

In [23]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path = "./mental_files_db")                # Creates a database specified at the given path
chroma_collection = db.get_or_create_collection("mental")                 # Creates a collection, (A collection is like a namespace/table to store embeddings and documents.)
vector_store = ChromaVectorStore(chroma_collection = chroma_collection)

pipeline = IngestionPipeline(
    transformations = [
        SentenceSplitter(chunk_size = 512, chunk_overlap = 50),
        HuggingFaceEmbedding(model_name = "BAAI/bge-small-en-v1.5"),
    ],
    vector_store = vector_store
)

nodes = await pipeline.arun(documents = documents)

Now we will have to perform Vector embeddings, this can be done using VectorStoreIndex from LlamaIndex

In [24]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name = "BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(vector_store, embed_model = embed_model)
# This wraps the vector store with the embedding model, creating an index from which you can query

#### Querying a VectorStoreIndex with Prompts and LLMs
Before querying the index, we need to convert it in to a query interface. The most common options are:

as_retriever -  For basic document retrieval.

as_query_engine - For single question answer interactions.

as_chat_engine - For conversational interactions that maintain memory across multiple messages. 

In [None]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

llm = HuggingFaceInferenceAPI(model_name = "Qwen/Qwen2.5-7B-Instruct")
query_engine = index.as_query_engine(
    llm = llm,
    response_mode = "tree_summarize"    # Create a detailed answer by going through each of the retrieved text chunk and creating a tree structure of the answer.
)

query_engine.query("What is the best method of therapy?")

HfHubHTTPError: 404 Client Error: Not Found for url: https://router.huggingface.co/hf-inference/models/Qwen/Qwen2.5-7B-Instruct/v1/chat/completions (Request ID: Root=1-68d39fa7-2c1742e75c03e7d00a70c385;596b4f8e-b83f-4bc9-ba30-2694ad31eac8)