# Customized Standard RAG Pipeline in LlamaIndex - with Qdrant Vector DB
* Notebook by Adam Lang
* Date: 3/22/2024
* In this notebook we will go through a customized 'standard' RAG pipeline using LlamaIndex and the Qdrant Vector DB.
* About the 'dataset' we will use for this:
  - I am going to query the privacy policy from a digital health application that has an app on the iPhone app store. The app is called the iCare HOME2 which is used for remote monitoring of glaucoma. In particular I am curious about the contents of the privacy policy and how the users data is utilized and shared. I will build a RAG application to query this.

# Steps for Customized RAG Pipeline
1. Configure different LLM
2. Use different embedding model
3. Configure different Vector Store - **Using Qdrant Vector Database**
4. Customize with different indices
5. Synthesize response for a user query

In [20]:
!pip install llama-index qdrant_client llama-index-vector-stores-qdrant

Collecting llama-index-vector-stores-qdrant
  Downloading llama_index_vector_stores_qdrant-0.1.4-py3-none-any.whl (8.6 kB)
Installing collected packages: llama-index-vector-stores-qdrant
Successfully installed llama-index-vector-stores-qdrant-0.1.4


### Setup openai acess

In [5]:
import os
os.environ['OPENAI_API_KEY'] = 'your_key'

### Load dataset

In [6]:
from pathlib import Path
from llama_index.core import download_loader

# pdf reader
PDFReader = download_loader("PDFReader")

# create loader
loader = PDFReader()
documents = loader.load_data(file=Path('/content/drive/MyDrive/Colab Notebooks/Building Production Ready RAG Systems using LlamaIndex/icare_PATIENT_glaucoma_app_privacy_policy.pdf'))

  PDFReader = download_loader("PDFReader")


In [7]:
# len document
len(documents)

13

In [10]:
# print a document
print(documents[2].text)




### Configure OpenAI LLM

In [12]:
!pip install llama-index-llms-openai



In [14]:
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-3.5-turbo", temparture=0.1)


### Load BGE embeddings from HuggingFace

In [15]:
!pip install llama-index-embeddings-huggingface

Collecting llama-index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.1.4-py3-none-any.whl (7.7 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch<3.0.0,>=2.1.2->llama-index-embeddings-huggingface)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m36.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch<3.0.0,>=2.1.2->llama-index-embeddings-huggingface)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m63.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-cupti-cu12==12.1.105 (from torch<3.0.0,>=2.1.2->llama-index-embeddings-huggingface)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━

In [16]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

In [17]:
# instatiate embedding model - use BGE embeddings small model
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

### Create Service Context by providing LLM and Embedding model

In [19]:
from llama_index.core import ServiceContext
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model
)

  service_context = ServiceContext.from_defaults(


### Configure Qdrant VectorDB

In [21]:
import qdrant_client
from llama_index.vector_stores.qdrant import QdrantVectorStore

# initialize client, setting path to save data
client = qdrant_client.QdrantClient(path="./qdrant_db")

#create collection
vector_store = QdrantVectorStore(client=client, collection_name="rag_customization")

### Create Store Context by assigning vector store created

In [23]:
from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 1. VectorStore Index
* Define the vector store index by passing storage context and service context

In [54]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents[:200],
                                        storage_context=storage_context,
                                        service_context=service_context,
                                        show_progress=True)

Parsing nodes:   0%|          | 0/13 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/13 [00:00<?, ?it/s]

### Build the query engine for the index

In [55]:
query_engine = index.as_query_engine(similarity_top_k=5)

In [56]:
response=query_engine.query(
    "Is the users data shared with a 3rd party?"
)

In [57]:
# print response
print(response)

The user's data is not shared with a 3rd party.


# 2. Keyword Table

In [97]:
from llama_index.core import KeywordTableIndex

keyword_table_index = KeywordTableIndex.from_documents(
    documents[:600],
    service_context=service_context,
    show_progress=True

)

Parsing nodes:   0%|          | 0/13 [00:00<?, ?it/s]

Extracting keywords from nodes:   0%|          | 0/13 [00:00<?, ?it/s]

In [98]:
# build retriever
keyword_table_retriever = keyword_table_index.as_retriever()

In [99]:
# build query engine
query_engine = keyword_table_index.as_query_engine()

In [111]:
# response query
response = query_engine.query(
    """What is the name of the device?"""
)

In [112]:
print(response)

The name of the device is "icare_PATIENT_glaucoma_app_privacy_policy.pdf".
