In [None]:
!pip install -q -U llama-index llama-index-llms-groq llama-index-vector-stores-chroma llama-index-embeddings-huggingface

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m149.4/149.4 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m628.3/628.3 kB[0m [31m40.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m72.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m62.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m54.5 MB/s[0m eta [36m0:00:0

In [None]:
from google.colab import userdata
GROQ_API_KEY = userdata.get('GROQ_API_KEY')

### Groq API + Llama-index

In [None]:
from llama_index.core.llms import ChatMessage
from llama_index.llms.groq import Groq

llm = Groq(model="llama3-70b-8192", api_key=GROQ_API_KEY)

list of available LLM models can be found here

https://console.groq.com/docs/models

In [None]:
messages = [
    ChatMessage(
        role="system", content="You will be provided with text, and your task is \
        to translate it into emojis. Do not use any regular text. \
        Do your best with emojis only."
    ),
    ChatMessage(role="user", content="Learn AI at manipal"),
]
resp = llm.chat(messages)

print(resp.message.content)

TypeError: Client.__init__() got an unexpected keyword argument 'proxies'

In [None]:
print(llm.complete("What are the courses at Manipal School of Information Science"))

TypeError: Client.__init__() got an unexpected keyword argument 'proxies'

### RAG with Llama-index

In [None]:
!pip install -q -U llama-index-readers-smart-pdf-loader

#### **Introduction**
LlamaIndex is a library that provides all the components needed to perform **Retrieval-Augmented Generation (RAG)**, a technique that augments language model responses with relevant context from external data sources. Here’s a breakdown of key components and how they work together in LlamaIndex:

1. **Readers**  
Readers are modules that connect LlamaIndex to various data sources. They are responsible for reading documents from these sources, whether they're databases, websites, local files, or APIs. Readers format the data into structures that LlamaIndex can process.

2. **Documents**
Documents are the raw data chunks or text that the Reader ingests from sources. A document can be a single entity, like a page of text, an article, or any block of information that you want to be available to the model. Documents are divided into **nodes** for easier processing.

3. **Nodes**
Nodes are smaller units of information within documents. By breaking down documents into nodes, LlamaIndex can handle large datasets more effectively, processing information at a manageable granularity. Each node might represent a paragraph, a sentence, or a specific section. Nodes enable finer control over which parts of the document are retrieved based on the query.

4. **Vector Index**
The vector index is where LlamaIndex stores nodes in a way that enables fast, relevant retrieval. Each node is transformed into a **vector**—a numerical representation of the content, using embeddings from a language model. The vector index then organizes these embeddings for similarity search, so the system can efficiently retrieve nodes most relevant to a given query.

5. **Retrieval**
Retrieval is the process of searching the vector index for nodes that are most relevant to a user’s query. The retrieval process identifies nodes with vectors that closely match the query's vector representation. This enables LlamaIndex to pull out specific, relevant information from potentially large datasets, narrowing down the data needed to answer the query.

6. **Response Synthesis**
Once relevant nodes are retrieved, response synthesis involves combining this information with the language model to generate a coherent, contextually informed response. LlamaIndex processes the retrieved nodes and feeds them to the language model, which uses the information to construct an answer that directly addresses the query.
____________
#### **Putting It All Together**

In the context of **RAG (Retrieval-Augmented Generation)**, LlamaIndex works as follows:
- A **Reader** collects information from various data sources and structures it into **Documents**.
- The Documents are broken into **Nodes** for easy processing, which are then embedded and stored in the **Vector Index**.
- When a query is made, LlamaIndex performs **Retrieval** to find nodes with content relevant to the query.
- Finally, **Response Synthesis** uses these retrieved nodes to generate a response, blending the model’s generative power with the specific information from the dataset.

ref: [Building RAG from Scratch](https://docs.llamaindex.ai/en/stable/examples/low_level/oss_ingestion_retrieval/)

#### Load data from pdf
We use file PDFReader to read the text from a pdf file.

You can check out other readers [here](https://docs.llamaindex.ai/en/stable/api_reference/readers/).

Some useful readers are web, github, youtube, google docs.

In [None]:
from llama_index.readers.file import PDFReader

# pdf_url = "/content/2024 Manipal Prospectus.pdf"
pdf_url = "/content/note1.pdf"
pdf_reader_obj = PDFReader(return_full_document=True)
documents = pdf_reader_obj.load_data(pdf_url)

In [None]:
print(f"{len(documents) = }\n")
for doc in documents[:3]:
  print(doc.metadata)

len(documents) = 1

{'file_name': 'note1.pdf'}


In [None]:
# concatiate the text from pages (documents) into a single string
full_text = ""
for doc in documents:
  full_text += doc.text + "\n"

print(full_text[:500])

Lecture 1: Stationary Time Series∗
1 Introduction
If a random variable Xis indexed to time, usually denoted by t, the observations {Xt, t∈T}is
called a time series, where Tis a time index set (for example, T=Z, the integer set).
Time series data are very common in empirical economic studies. Figure 1 plots some frequently
used variables. The upper left ﬁgure plots the quarterly GDP from 1947 to 2001; the upper right
ﬁgure plots the the residuals after linear-detrending the logarithm of GDP; the 


In [None]:
# index2 = VectorStoreIndex.from_documents(documents, embed_model=embed_model,
# chunk_size=128,
#     chunk_overlap=12)

# query_engine = index2.as_query_engine(llm=llm)
# query_engine.query("time series")

Response(response='A time series is a collection of observations {Xt, t∈T} where Xt is a random variable indexed to time, usually denoted by t, and T is a time index set.', source_nodes=[NodeWithScore(node=TextNode(id_='430ba336-2030-4be2-9381-8d4c41a17416', embedding=None, metadata={'file_name': 'note1.pdf'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='e2eca413-c717-4743-890b-b7020fc14f92', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_name': 'note1.pdf'}, hash='887d298ca117e2d7a0afa7b7242cafc1b4dbfa15d8e9c02eaf38ac80c65d79f6'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='9c6b1bd4-a1dc-4ea1-aa97-f576fc654899', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='5e23d34c350a0d4cd160eaa442ee7f78f1b7bdf46b353b2bf87585a01aeaab26')}, text='Lecture 1: Stationary Time Series∗\n1 Introduction\nIf a random variable Xis indexed to time, usually denoted by t, the observations {Xt, t∈T}

#### Split text into Chunks

In [None]:
from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core.schema import TextNode

text_parser = TokenTextSplitter(
    chunk_size=1024,
    chunk_overlap=128
)

chunks = text_parser.split_text(text=full_text)

len(chunks)

ModuleNotFoundError: No module named 'llama_index'

In [None]:
# convert chunks into llama nodes
nodes = []
for chunk_text in chunks:
  node = TextNode(text=chunk_text)
  nodes.append(node)

# nodes[0]

#### Store in Vector db

We use [chromadb](https://docs.llamaindex.ai/en/stable/examples/vector_stores/ChromaIndexDemo/) here, but there are many other vector databases that you can explore.


In [None]:
# load the embedding model from hugging face
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
from tqdm import tqdm

# Create embeddings for the chunks
for node in tqdm(nodes):
    node_embedding = embed_model.get_text_embedding(
        node.get_content(metadata_mode="all")
    )
    node.embedding = node_embedding

100%|██████████| 371/371 [11:42<00:00,  1.89s/it]


In [None]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Create a collection called "manipal_docs" in chromadb where our chunks
# can be stored
db = chromadb.EphemeralClient()
chroma_collection = db.get_or_create_collection("manipal_docs")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex(
    nodes=nodes, storage_context=storage_context, embed_model=embed_model
)

ModuleNotFoundError: No module named 'chromadb'

#### Querying

Now you've loaded your data, built an index, you're ready to get to the most significant part of an LLM application: querying.

https://docs.llamaindex.ai/en/stable/understanding/querying/querying/

Querying involves two stages:
- Retrieval
- Response synthesis

#### **Retriever**
Retrieval is the process of extracting similar chunks from the from the vector index.

#### **Response Synthesizer**
A Response Synthesizer is what generates a response from an LLM, using a user query and a given set of text chunks (retrieved chunks). The output of a response synthesizer is a Response object.

https://docs.llamaindex.ai/en/stable/module_guides/querying/response_synthesizers/

##### Retriever

In [None]:
from llama_index.core.retrievers import VectorIndexRetriever

# Create a retriever object

retriever = index.as_retriever(similarity_top_k=3)

# OR

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=3,
)

top_chunks = retriever.retrieve("What is the courses offered at msis?")

print(len(top_chunks))

print(top_chunks[0].score)
print(top_chunks[1].score)
print(top_chunks[2].score)

3
0.5175268917617181
0.4784979544547596
0.4765330732143537


##### Response synthasizer

In [None]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
from llama_index.core import get_response_synthesizer

from llama_index.core import PromptTemplate

# Create prompt
template = (
        "Context information is below.\n"
        "---------------------\n"
        "{context_str}\n"
        "---------------------\n"
        "Given the context information and not prior knowledge, "
        "answer the query.\n"
        "Query: {query_str}\n"
        "Answer: "
)
qa_template = PromptTemplate(template)

# configure response synthesizer
response_synthesizer = get_response_synthesizer(llm, text_qa_template = qa_template)

##### query engine

Retrieval + response synthases

In [None]:
# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.3)]
)

# query
response = query_engine.query("What are the courses offerd at msis")
print(response)

According to the prospectus, the courses offered at MSIS (Manipal School of Information Sciences) are:

1. Master of Engineering (M.E) in:
	* Artificial Intelligence and Machine Learning
	* Big Data Analytics
	* Cloud Computing
	* Cyber Security
	* Embedded Systems
	* Microelectronics and VLSI Technology
	* VLSI Design
2. Research Program:
	* Doctor of Philosophy (PhD)

Additionally, MSIS also offers industry-driven elective courses that facilitate industry certifications from Microsoft, Infineon, and Intel.


In [None]:
# check the similarity scores of source nodes that were passed as context
for node in response.source_nodes:
  print(node.score)

0.5460719425383277
0.5071821032476492
0.4638452332629387


In [None]:
# print(response.source_nodes[2].get_content())

Further reading: https://docs.llamaindex.ai/en/stable/examples/query_engine/custom_query_engine/