At its core, LlamaIndex contains a toolkit designed to easily connect LLM’s with your external data.

In [None]:
!pip install llama-index pypdf sentence_transformers llama-index-question-gen-openai -q

In [None]:
import os
import openai
from IPython.display import display
import textwrap
from llama_index.core import Settings, get_response_synthesizer
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex,SimpleDirectoryReader,ServiceContext,PromptTemplate
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from google.colab import userdata


# openai.api_key = "sk-oK2lHkqcDrd1MJFCrvd4VeKIc9-UTySVUtTBZAGb22T3BlbkFJNAijMZvmUKlBUW5XWMkudjAGZBNHpWy3vXS0g612UA"
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# for using openai api
Settings.llm = OpenAI(model = 'gpt-4o-mini')
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")

# # for using locally deployed models using ollama
# Settings.llm = Ollama(model="deepseek-r1", request_timeout=120.0)



# Naive RAG

https://gpt-index.readthedocs.io/en/latest/guides/primer/index_guide.html

In [None]:


# Loading the data (you can add any folder name here)
documents = SimpleDirectoryReader('handbook').load_data()

# Indexing and craeting vector store
index = VectorStoreIndex.from_documents(documents, show_progress= True)

# Retreived the relevant nodes from index
retriever = VectorIndexRetriever(index=index, similarity_top_k=5,)

response_synthesizer = get_response_synthesizer(
    response_mode="simple_summarize",
)

# Retreival and Response
simple_query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)


Parsing nodes:   0%|          | 0/336 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/336 [00:00<?, ?it/s]

In [None]:
response = simple_query_engine.query("What are the requirements for CS BS degree.")
print(textwrap.fill(response.response, width=80))

To obtain a Bachelor of Science in Computer Science (BS CS) degree, students
must complete a total of 132 credits, which includes the following components:
1. **Core Courses**: 58 credits (18 courses) 2. **Supporting Courses**: 22
credits (7 courses) 3. **Computing Electives**: 12 credits (4 courses) 4.
**General Education**: 47 credits 5. **University Electives**: 6 credits (2
courses)  Additionally, students must fulfill specific course requirements,
including a lab course in the Science and Mathematics category as part of
General Education.


In [None]:
response = simple_query_engine.query("List down all the requirements for Double Major, in bullets")
print(response)

- Students are allowed to pursue more than one major.
- The Higher Education Commission (HEC) recognizes only the first major listed on the transcript.
- If a student intends to graduate with multiple majors or a combination of a major and a minor, all requirements must be completed according to the same catalog.


# Cases where naive RAG can fail

In [None]:
response = simple_query_engine.query("How many courses are offered by FCCU")
print(textwrap.fill(response.response, width=80))

FCCU offers foreign language courses at two levels in four languages: French,
German, Chinese, and Korean. Additionally, there are various courses across
different departments as indicated in the catalog. However, the exact total
number of courses offered is not specified in the provided information.


In [None]:
response = simple_query_engine.query("What is this text about?")
print(textwrap.fill(response.response, width=80))

The text provides an overview of various courses offered in a Bachelor’s Degree
Program, focusing on literature, drama, poetry, linguistics, and the humanities.
It outlines course descriptions, prerequisites, and the educational philosophy
of the institution, emphasizing the importance of critical thinking,
communication skills, and a commitment to lifelong learning. Additionally, it
discusses the expectations of conduct for students and the institution's
dedication to providing a co-educational environment and an American-style
education.


# Response Object

In [None]:
response.metadata

{'121be295-f238-47e3-aa05-c3ea2a576f56': {'page_label': '144',
  'file_name': 'Bachelors-Catalog-2025-26-New.pdf',
  'file_path': '/content/handbook/Bachelors-Catalog-2025-26-New.pdf',
  'file_type': 'application/pdf',
  'file_size': 4205922,
  'creation_date': '2025-11-28',
  'last_modified_date': '2025-11-28'},
 'cc36068a-f145-4052-8cde-38a7df29e493': {'page_label': '145',
  'file_name': 'Bachelors-Catalog-2025-26-New.pdf',
  'file_path': '/content/handbook/Bachelors-Catalog-2025-26-New.pdf',
  'file_type': 'application/pdf',
  'file_size': 4205922,
  'creation_date': '2025-11-28',
  'last_modified_date': '2025-11-28'},
 '8342bb90-2fad-481b-beb3-be569d99800e': {'page_label': '154',
  'file_name': 'Bachelors-Catalog-2025-26-New.pdf',
  'file_path': '/content/handbook/Bachelors-Catalog-2025-26-New.pdf',
  'file_type': 'application/pdf',
  'file_size': 4205922,
  'creation_date': '2025-11-28',
  'last_modified_date': '2025-11-28'},
 '668ab9bf-b017-41a2-9109-a94af774a216': {'page_label':

In [None]:
response.source_nodes

[NodeWithScore(node=TextNode(id_='121be295-f238-47e3-aa05-c3ea2a576f56', embedding=None, metadata={'page_label': '144', 'file_name': 'Bachelors-Catalog-2025-26-New.pdf', 'file_path': '/content/handbook/Bachelors-Catalog-2025-26-New.pdf', 'file_type': 'application/pdf', 'file_size': 4205922, 'creation_date': '2025-11-28', 'last_modified_date': '2025-11-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='bec1f71a-b64f-48e8-ad38-dd89aa20ff53', node_type='4', metadata={'page_label': '144', 'file_name': 'Bachelors-Catalog-2025-26-New.pdf', 'file_path': '/content/handbook/Bachelors-Catalog-2025-26-New.pdf', 'file_type': 'application/pdf', 'file_size': 4205922, 'creation_date': '2025-11-28', 'last_modified_date'

# Advanced RAG

## SubQuestion Query Engine

In [None]:
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
import nest_asyncio

nest_asyncio.apply()

In [None]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=simple_query_engine,
        metadata=ToolMetadata(
            name="handbook",
            description="FCCU Student Handbook",
        ),
    ),
]

In [None]:
sub_question_query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools, use_async=True,)


In [None]:
response = sub_question_query_engine.query("What is this text about")
print(textwrap.fill(response.response, width=80))

Generated 5 sub questions.
[1;3;38;2;237;90;200m[handbook] Q: What topics are covered in the FCCU Student Handbook?
[0m[1;3;38;2;90;149;237m[handbook] Q: What are the key policies outlined in the FCCU Student Handbook?
[0m[1;3;38;2;11;159;203m[handbook] Q: What resources are available to students according to the FCCU Student Handbook?
[0m[1;3;38;2;155;135;227m[handbook] Q: What are the academic requirements mentioned in the FCCU Student Handbook?
[0m[1;3;38;2;237;90;200m[handbook] Q: What support services are detailed in the FCCU Student Handbook?
[0m[1;3;38;2;237;90;200m[handbook] A: The FCCU Student Handbook covers a variety of topics including the dress code, sexual harassment policy, commitment to core values, faculty commitment to students, general education, equality of opportunity, and financial integrity. Additionally, it includes sections on student success, student life, fee structure, financial aid and merit scholarships, academic policies and procedures, academi

# Routing

In [None]:

from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector, LLMMultiSelector
from llama_index.core.selectors import (
    PydanticMultiSelector,
    PydanticSingleSelector,
)

In [None]:
# Loading and creating index for who book
who_documents = SimpleDirectoryReader('who').load_data()

# Indexing and creating vector store
who_index = VectorStoreIndex.from_documents(who_documents, show_progress= True)

who_query_engine = who_index.as_query_engine()

Parsing nodes:   0%|          | 0/114 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/114 [00:00<?, ?it/s]

In [None]:
from llama_index.core.tools import QueryEngineTool

lums_tool = QueryEngineTool.from_defaults(
    query_engine=simple_query_engine,
    description=(
        "Provide information on rules and policy of lums"
    ),
)

who_tool = QueryEngineTool.from_defaults(
    query_engine=who_query_engine,
    description=(
        "Provides infromation on self care instructions provided by WHO"
    ),
)

In [None]:
router_query_engine = RouterQueryEngine(
    selector=PydanticSingleSelector.from_defaults(),
    query_engine_tools=[
        lums_tool,
        who_tool,
    ],
)


In [None]:
response = router_query_engine.query("who is the VC of lums")
print(response.metadata["selector_result"])
print(textwrap.fill(response.response, width=80))

selections=[SingleSelection(index=0, reason='The question pertains to the rules and policy of LUMS, which may include information about the Vice Chancellor.')]
The Vice Chancellor of LUMS is Ali Cheema.


In [None]:
response = router_query_engine.query("importance of mantainging good health")
print(response.metadata["selector_result"])
print(textwrap.fill(response.response, width=80))

selections=[SingleSelection(index=1, reason='The question pertains to maintaining good health, and the self-care instructions provided by WHO are directly related to health maintenance.')]
Maintaining good health is crucial as it significantly impacts overall well-
being and quality of life. Good health enables individuals to perform daily
activities effectively, reduces the risk of chronic diseases, and enhances
mental and emotional stability. It also contributes to increased productivity
and longevity, allowing individuals to engage more fully in their communities
and relationships. Prioritizing health through proper nutrition, regular
exercise, and preventive care can lead to a healthier population and reduce
healthcare costs in the long run.


# Query Rewriting

In [None]:
from llama_index.core import PromptTemplate

query_gen_str = """\
You are a helpful assistant that generates multiple search queries based on a \
single input query. Generate {num_queries} search queries, one on each line, \
related to the following input query:
Query: {query}
Queries:
"""
query_gen_prompt = PromptTemplate(query_gen_str)

def generate_queries(query: str, llm, num_queries: int = 4):
    response = llm.predict(
        query_gen_prompt, num_queries=num_queries, query=query
    )
    # assume LLM proper put each query on a newline
    queries = response.split("\n")
    return queries


queries = generate_queries("What happened at Interleaf and Viaweb?", Settings.llm)
queries

['1. History and events surrounding Interleaf and Viaweb companies  ',
 "2. Key milestones and outcomes of Interleaf and Viaweb's operations  ",
 '3. The rise and fall of Interleaf and Viaweb in the tech industry  ',
 '4. Impact of Interleaf and Viaweb on software development and e-commerce']

# HyDE Query Transform

In [None]:
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine import TransformQueryEngine

hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(simple_query_engine, hyde)

query_str = "What are profits of ABC TECH"

# response = hyde_query_engine.query(query_str)
# print(textwrap.fill(response.response, width=80))

In [None]:
query_bundle = hyde(query_str)
print(textwrap.fill(query_bundle.custom_embedding_strs[0], width=80))

ABC TECH has experienced significant profits over the past fiscal year, driven
by several key factors that highlight its robust business model and strategic
initiatives. The company reported a net profit margin of 25%, which is notably
higher than the industry average of 15%. This impressive margin can be
attributed to a combination of innovative product offerings, efficient cost
management, and a strong market presence.  One of the primary contributors to
ABC TECH's profitability is its diverse portfolio of technology solutions,
including software development, cloud services, and cybersecurity products. The
demand for these services has surged, particularly in the wake of increased
digital transformation efforts across various sectors. ABC TECH's flagship
product, a cloud-based platform, has seen a 40% increase in subscriptions,
significantly boosting recurring revenue streams.  Additionally, ABC TECH has
implemented rigorous cost-control measures, optimizing operational efficiencies
