## Document Search and Question-Answering System with Qdrant and LlamaIndex

This Jupyter Notebook demonstrates the following steps:
1. Documents are read and converted into numerical embeddings.
2. The embeddings are stored in a Qdrant vector database for efficient similarity searches.
3. Queries are processed using a Language Model (LLM) and the embeddings to retrieve relevant answers.
4. Results are refined through reranking, and a structured prompt template ensures clarity in the final response.

### Set up Asyncio

In [30]:
import nest_asyncio

nest_asyncio.apply()

### Set up the Qdrant vector database

In [None]:
import qdrant_client

collection_name="chat_with_docs"

client = qdrant_client.QdrantClient(
    host="localhost",
    port=6333
)

  client = qdrant_client.QdrantClient(


* Qdrant: A vector database for storing and searching embeddings (numerical representations of text)
* Connects to a Qdrant instance running locally on port 6333. The collection (chat_with_docs) in Qdrant are like tables in databases will store the document embeddings.
* client = qdrant_client.QdrantClient(...) initializes a QdrantClient instance, connecting it to a Qdrant server running locally.

### Read the documents

In [32]:
from llama_index.core import SimpleDirectoryReader

input_dir_path = './docs'

loader = SimpleDirectoryReader(
            input_dir = input_dir_path,
            required_exts=[".pdf"],
            recursive=True
        )
docs = loader.load_data()

* The loader is a SimpleDirectoryReader instance, read and extract text from files (e.g., PDFs) in a specified directory, recursively.
* It loads document content into a format we can work with.
* Reads all PDF files in the ./docs directory and loads their content into docs.

### Function to index data

In [33]:
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, ServiceContext, StorageContext

def create_index(documents):
    vector_store = QdrantVectorStore(client=client,
                                     collection_name=collection_name)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)

    index = VectorStoreIndex.from_documents(documents,
                                            storage_context=storage_context)
    
    return index

* Initialize a QdrantVectorStore object by passing the previously created Qdrant client and a name for the collection.
* QdrantVectorStore: Stores the embeddings in Qdrant.
* VectorStoreIndex: Manages how documents and their embeddings are organized and queried.
* Creates an index for the documents (docs) by converting them into embeddings and saving these embeddings in Qdrant.

### Load the embedding model and index data

In [40]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5",
                                   trust_remote_code=True)

Settings.embed_model = embed_model


In [42]:
index = create_index(docs)

* The HuggingFaceEmbedding class use Hugging Face models to generate embeddings for text data. In this case, we use pretrained model "BAAI/bge-large-en-v1.5" by the Beijing Academy of Artificial Intelligence (BAAI).
* Next, configure embed_model as the default embedding model in Settings. It ensures that the same model is used throughout our RAG pipeline to maintain consistency in embedding generation.
* Finally, invoke the create_index function we defined earlier, passing in docs (the list of loaded documents). As discussed above, this function converts each document into an embedding using embed_model and stores the embeddings in Qdrant.

### Load the LLM

In [43]:
from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3.2:1b", request_timeout=120)
Settings.llm = llm

* Ollama: A language model that processes and generates responses to user queries.
* Request_timeout of 120 seconds for requests to the LLM to ensure that the system doesn’t get stuck if the model takes too long to respond.
* Set the above LLM instance as the default language model in Settings, making it the primary model used in this RAG pipeline.

### Define the prompt template

In [46]:
from llama_index.core import PromptTemplate

template = """
            Context information is below:
              ---------------------
              {context_str}
              ---------------------
              Given the context information above I want you to think
              step by step to answer the query in a crisp manner,
              incase you don't know the answer say 'I don't know!'
            
              Query: {query_str}
        
              Answer:
"""
qa_prompt_tmpl = PromptTemplate(template)

* PromptTemplate: A predefined structure for how the LLM will process queries.

### Reranking

In [47]:
from llama_index.core.postprocessor import SentenceTransformerRerank

rerank = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-2-v2", 
    top_n=3
)

* SentenceTransformerRerank: Refines query results by re-ranking them using a more accurate model cross-encoder.
* Ensures only the top 3 most relevant results are shown.


### Query the document

In [48]:
query_engine = index.as_query_engine(similarity_top_k=10,
                                     node_postprocessors=[rerank])

query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)

response = query_engine.query("What exactly is DSPy?")

* The query engine integrates the retrieval, re-ranking, and prompt-based response generation steps
* Uses the index to retrieve the top 10 similar documents based on the query.
* Applies reranking to refine the results.
* Generates a final answer using the LLM based on the query and document context.

In [50]:
response

Response(response='DSPy stands for Demonstrating Self-Improving Pipelines, and it refers to a programming model that translates prompting techniques into parameterized declarative modules. This means that DSPy allows developers to define prompts (or instructions) in a specific way using natural language signatures, which can be compiled into efficient and effective LMs (Language Models).', source_nodes=[NodeWithScore(node=TextNode(id_='a24aa8d1-0a0c-477c-b5e2-e8089870e161', embedding=None, metadata={'page_label': '2', 'file_name': 'dspy.pdf', 'file_path': 'e:\\ML Practice\\DailyDose of DS\\basic_RAG_application\\rag_project\\docs\\dspy.pdf', 'file_type': 'application/pdf', 'file_size': 460814, 'creation_date': '2024-11-02', 'last_modified_date': '2024-11-02'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified

In [49]:
from IPython.display import Markdown, display

display(Markdown(str(response)))

DSPy stands for Demonstrating Self-Improving Pipelines, and it refers to a programming model that translates prompting techniques into parameterized declarative modules. This means that DSPy allows developers to define prompts (or instructions) in a specific way using natural language signatures, which can be compiled into efficient and effective LMs (Language Models).