# SciQuery: Advanced RAG System using LlamaIndex

In [1]:
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.query_pipeline import QueryPipeline

from app.llama_index_utils import *
from llama_index.core.settings import Settings

from app.llama_index_utils import create_vector_collection, load_vector_collection, prepare_document
from config import QDRANT_COLLECTION_NAME



  from .autonotebook import tqdm as notebook_tqdm


### Prepare Document using pdf files

We have two options, one, using `llama_index's` pdf parser and second is manual parsing using `pymupdf`

In [2]:

method = "simple"#"manual_parsing"
pdf_paths = "data/pdf_documents"

documents = prepare_document(pdf_paths=pdf_paths,method=method)

### LLM and Embedding

We load LLAMA 3.1 8b using Ollama and embedding is loaded from sentence transformer.

In [3]:

Settings.llm = get_llms()
Settings.embed_model = get_embedding_model()



### This step convert documents to nodes/chunks

Using Sementic Splitter we group by sentence which are sementically similar and create one

In [5]:
semantic_node_parser = get_node_parser(Settings.embed_model, parsing_method="semantic")

transforms = [semantic_node_parser,Settings.embed_model]

pipeline = IngestionPipeline(transformations=transforms)

# nodes = pipeline.run(nodes=documents)
# len(documents), len(nodes)

(840, 2614)

### Persist nodes as a Document store on disk
and later load it 

In [6]:
doc_store_name = "semantic-document-nodes"
# save_simple_doc_store(nodes,doc_store_name)

nodes_ = get_doc_store(doc_store_name)
nodes = list(nodes_.docs.values())
len(nodes)

### Create Embedding for each nodes

and index and persist it using Qdrant Vector Store.
later load it.

In [8]:

# create_vector_collection(nodes, "semantic-document-collection")
index = load_vector_collection("semantic-document-collection", embed_model=Settings.embed_model)

Both client and aclient are provided. If using `:memory:` mode, the data between clients is not synced.




<llama_index.core.indices.vector_store.base.VectorStoreIndex at 0x17d4b1ba0>

In [38]:
from llama_index.core import PromptTemplate

# generate question regarding topic
prompt_str1 = "Represent this sentence for searching relevant passages: {topic}"
prompt_tmpl1 = PromptTemplate(prompt_str1)

prompt_str2 = """
    <|begin_of_text|>
    <|start_header_id|>
        system
    <|end_header_id|>
        You are an AI assistant tasked with answering questions based on provided context.
        Your goal is to provide accurate, relevant, and concise responses using only the information given in the context.
        If the context doesn't contain enough information to answer the question fully, state that clearly. 
        Do not make up or infer information beyond what's explicitly stated in the context.

        [INSTRUCTIONS]
        1. Carefully read the provided context and query.
        2. Analyze the information in the context that is relevant to the query.
        3. Formulate a clear and concise answer based solely on the given context.
        4. If the context doesn't provide sufficient information to answer the query, state this explicitly.
        5. Do not include any information that is not present in the given context.
    <|eot_id|>
    <|start_header_id|>
        user
    <|end_header_id|>
    Answer the user question based on the context provided below

    [CONTEXT]:
    {context_str}
    
    [QUERY]
    {input}
    
    <|eot_id|>
    <|start_header_id|>
        assistant
    <|end_header_id|>
    """
prompt_tmpl2 = PromptTemplate(prompt_str2)


### Simple RAG using Semantic Chunking

In [7]:
from llama_index.core.query_pipeline import InputComponent, FunctionComponent

# Function to concatenate retrieved nodes
def concatenate_nodes(retrieved_nodes):
    return "\n\n".join([node.node.get_content() for node in retrieved_nodes])

# Create a FunctionComponent for concatenation
concat_component = FunctionComponent(concatenate_nodes)

retriever = index.as_retriever(similarity_top_k=5)

query_pipeline = QueryPipeline(
        verbose=True
    )

query_pipeline.add_modules(
    {
        "input":InputComponent(),
        "prompt_tmpl1":prompt_tmpl1,
        "retriever":retriever,
        "concat_component": concat_component,
        "prompt_tmpl2":prompt_tmpl2,
        "llm":Settings.llm
    }
)

query_pipeline.add_link("input","prompt_tmpl1",dest_key="topic")
query_pipeline.add_link("prompt_tmpl1","retriever")
query_pipeline.add_link("retriever","concat_component")
query_pipeline.add_link("input","prompt_tmpl2",dest_key="input")
query_pipeline.add_link("concat_component","prompt_tmpl2",dest_key="context_str")
query_pipeline.add_link("prompt_tmpl2","llm")



In [8]:
query = "Explain what is ULMFIT?"

output, intermediates = query_pipeline.run_with_intermediates(topic=query)

[1;3;38;2;155;135;227m> Running module input with input: 
topic: What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?

[0m[1;3;38;2;155;135;227m> Running module prompt_tmpl1 with input: 
topic: What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: Represent this sentence for searching relevant passages: What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are ...

[0m[1;3;38;2;155;135;227m> Running module concat_component with input: 
retrieved_nodes: [NodeWithScore(node=TextNode(id_='6f27def0-4877-4535-aac5-001f41acc5d1', embedding=None, metadata={'page_label': '1', 'file_name': 'libov2020_how_neutral_is_mbert.pdf', 'file_

ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='According to the context, one limitation of using mBERT for zero-shot cross-lingual transfer is that it can be hindered by the "curse of multilinguality", where adding more pretraining languages up to a point hurts downstream transfer. This effect can be mitigated by increasing model capacity or additional training for particular language pairs.\n\nAdditionally, when new scripts or languages are introduced, mBERT may not have been specifically trained on these languages, which can lead to decreased performance. The text mentions that mBERT is pre-trained on the 104 languages with the largest Wikipedias, implying that languages without a large Wikipedia presence may not be well-represented in the model.\n\nThere is also a mention of the "overt-fitting" issue when training models for zero-shot transfer, where the model overfits to the original language rather than generalizing to new languages. This can 

In [9]:
intermediates

{'input': ComponentIntermediates(inputs={'topic': 'What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?'}, outputs={'topic': 'What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?'}),
 'prompt_tmpl1': ComponentIntermediates(inputs={'topic': 'What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?'}, outputs={'prompt': 'Represent this sentence for searching relevant passages: What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?'}),
 'retriever': ComponentIntermediates(inputs={'input': 'Represent this sentence for searching relevant passages: What are the limitations of using mBERT for zero-

In [10]:
for i in intermediates["retriever"].outputs["output"]:
    print(i.metadata)

{'page_label': '1', 'file_name': 'libov2020_how_neutral_is_mbert.pdf', 'file_path': 'data/pdf_documents/libov2020_how_neutral_is_mbert.pdf', 'file_type': 'application/pdf', 'file_size': 305799, 'creation_date': '2024-08-24', 'last_modified_date': '2024-08-01'}
{'page_label': '1', 'file_name': 'wudredze2019_betobentzbecas.pdf', 'file_path': 'data/pdf_documents/wudredze2019_betobentzbecas.pdf', 'file_type': 'application/pdf', 'file_size': 400689, 'creation_date': '2024-08-24', 'last_modified_date': '2024-08-01'}
{'page_label': '8', 'file_name': 'wudredze2019_betobentzbecas.pdf', 'file_path': 'data/pdf_documents/wudredze2019_betobentzbecas.pdf', 'file_type': 'application/pdf', 'file_size': 400689, 'creation_date': '2024-08-24', 'last_modified_date': '2024-08-01'}
{'page_label': '3', 'file_name': 'lauscher2020_fromzero_to_hero.pdf', 'file_path': 'data/pdf_documents/lauscher2020_fromzero_to_hero.pdf', 'file_type': 'application/pdf', 'file_size': 4938400, 'creation_date': '2024-08-24', 'last

In [11]:
print(output.message.content)

According to the context, one limitation of using mBERT for zero-shot cross-lingual transfer is that it can be hindered by the "curse of multilinguality", where adding more pretraining languages up to a point hurts downstream transfer. This effect can be mitigated by increasing model capacity or additional training for particular language pairs.

Additionally, when new scripts or languages are introduced, mBERT may not have been specifically trained on these languages, which can lead to decreased performance. The text mentions that mBERT is pre-trained on the 104 languages with the largest Wikipedias, implying that languages without a large Wikipedia presence may not be well-represented in the model.

There is also a mention of the "overt-fitting" issue when training models for zero-shot transfer, where the model overfits to the original language rather than generalizing to new languages. This can make it difficult to ensure that the probing model did not overfit to the original langua

### Fusing Retrieval with Semantic Chunking and Query Transformation

This code implements a Fusion Retrieval system that combines vector-based similarity search with keyword-based BM25 retrieval. The approach aims to leverage the strengths of both methods to improve the overall quality and relevance of document retrieval.

In [12]:
from llama_index.retrievers.bm25 import BM25Retriever

doc_store_name = "semantic-document-nodes"
nodes_ = get_doc_store(doc_store_name)


vector_retriever = index.as_retriever(similarity_top_k=2)

bm25_retriever = BM25Retriever.from_defaults(
    docstore=nodes_, similarity_top_k=2
)

In [30]:
from llama_index.core.retrievers import QueryFusionRetriever

from typing import List
from llama_index.core.schema import QueryBundle
from types import MethodType 

def get_queries(self, original_query: str) -> List[QueryBundle]:
    prompt_str1 = "Represent this sentence for searching relevant passages: {input}"
    prompt_str = self.query_gen_prompt.format(
        num_queries=self.num_queries - 1,
        query=original_query,
    )
    response = self._llm.complete(prompt_str)

    # assume LLM proper put each query on a newline
    queries = response.text.split("\n")
    queries = [prompt_str1.format(input=q.strip()) for q in queries if q.strip()]
    if self._verbose:
        queries_str = "\n".join(queries)
        print(f"Generated queries:\n{queries_str}")

    # The LLM often returns more queries than we asked for, so trim the list.
    return [QueryBundle(q) for q in queries[: self.num_queries - 1]]


retriever = QueryFusionRetriever(
    [vector_retriever, bm25_retriever],
    similarity_top_k=2,
    num_queries=4,  # set this to 1 to disable query generation
    mode="reciprocal_rerank",
    use_async=True,
    verbose=True,
    # query_gen_prompt="...",  # we could override the query generation prompt here
)

retriever._get_queries = MethodType(get_queries, retriever)

In [31]:
# apply nested async to run in a notebook
import nest_asyncio

nest_asyncio.apply()

In [32]:

# generate question regarding topic
prompt_str1 = "{topic}"
prompt_tmpl1 = PromptTemplate(prompt_str1)


query_pipeline = QueryPipeline(
        verbose=True
    )

query_pipeline.add_modules(
    {
        "input":InputComponent(),
        "prompt_tmpl1":prompt_tmpl1,
        "retriever":retriever,
        "concat_component": concat_component,
        "prompt_tmpl2":prompt_tmpl2,
        "llm":Settings.llm
    }
)

query_pipeline.add_link("input","prompt_tmpl1")
query_pipeline.add_link("prompt_tmpl1","retriever")
query_pipeline.add_link("retriever","concat_component")
query_pipeline.add_link("input","prompt_tmpl2",dest_key="input")
query_pipeline.add_link("concat_component","prompt_tmpl2",dest_key="context_str")
query_pipeline.add_link("prompt_tmpl2","llm")

In [33]:


# response_1 = query_pipeline.run(topic=query)
output, intermediates = query_pipeline.run_with_intermediates(input=query)

[1;3;38;2;155;135;227m> Running module input with input: 
input: What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?

[0m[1;3;38;2;155;135;227m> Running module prompt_tmpl1 with input: 
topic: What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?

[0mGenerated queries:
Represent this sentence for searching relevant passages: Here are three search queries related to the input query:
Represent this sentence for searching relevant passages: 1. "mBERT limitations in zero-shot cross-lingual NLP models"
Represent this sentence for searching relevant passages: 2.

ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='Based on the provided context, there is no specific information that lists the limitations of using mBERT for zero-shot cross-lingual tasks. However, the paper mentions that mBERT does not outperform other models (specifically, DistilBERT) on certain tasks (XNLI and NER), which could be considered a limitation.\n\nRegarding new scripts or languages being introduced, there is no explicit information in the provided context to discuss how these might affect the limitations of using mBERT for zero-shot cross-lingual tasks.', additional_kwargs={'tool_calls': []}), raw={'model': 'llama3.1:latest', 'created_at': '2024-08-25T19:13:05.243769Z', 'message': {'role': 'assistant', 'content': 'Based on the provided context, there is no specific information that lists the limitations of using mBERT for zero-shot cross-lingual tasks. However, the paper mentions that mBERT does not outperform other models (specificall

In [36]:
print(output.message.content)


Based on the provided context, there is no specific information that lists the limitations of using mBERT for zero-shot cross-lingual tasks. However, the paper mentions that mBERT does not outperform other models (specifically, DistilBERT) on certain tasks (XNLI and NER), which could be considered a limitation.

Regarding new scripts or languages being introduced, there is no explicit information in the provided context to discuss how these might affect the limitations of using mBERT for zero-shot cross-lingual tasks.


In [37]:
intermediates

{'input': ComponentIntermediates(inputs={'input': 'What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?'}, outputs={'input': 'What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?'}),
 'prompt_tmpl1': ComponentIntermediates(inputs={'topic': 'What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?'}, outputs={'prompt': 'What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?'}),
 'retriever': ComponentIntermediates(inputs={'input': 'What are the limitations of using mBERT for zero-shot cross-lingual tasks, and how do these limitations vary when new scripts or languages are introduced?'}, outpu

### TODO: RAG with Hierarchical indexing
### Contextual Compression
### Adaptive Retrieval
### Sohisticated Controllable Agent
### Recursive Retrieval,
### Context Enrichement using QA, Summary
### DocumentSummaryIndex Retrieval