# Imports

In [1]:
!pip install llama-index
!pip install llama-index-llms-huggingface
!pip install llama-index-embeddings-huggingface
!pip install --upgrade huggingface_hub

!pip install accelerate
!pip install bitsandbytes
!pip install transformers
!pip install peft
!pip install einops
!pip install safetensors
!pip install torch



In [2]:
import transformers
import torch
import warnings
warnings.filterwarnings('ignore')
import nest_asyncio
import numpy as np
import pandas as pd

from transformers import BitsAndBytesConfig, AutoTokenizer

from llama_index.core import SimpleDirectoryReader, Document, Settings, VectorStoreIndex, ServiceContext
from llama_index.core.prompts import PromptTemplate
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLM

# Load Dataset (Knowledge Base)

In [3]:
documents = SimpleDirectoryReader(
    # input_files=["eBook-How-to-Build-a-Career-in-AI.pdf"]
    input_files=["hg-interview.pdf"]
).load_data()

In [4]:
print("Type: ", type(documents), "\n")
print("Number of documents: ", len(documents), "\n")
print("Type of document: ", type(documents[0]))
print("Document example: \n", documents[10].text)

Type:  <class 'list'> 

Number of documents:  25 

Type of document:  <class 'llama_index.core.schema.Document'>
Document example: 
 “Twirl for me,” he says. I hold out my arms and spin in a 
circle. The prep team screams in admiration. 
Cinna dismisses the team and has me move around in 
the dress and shoes, which are infinitely more manageable than Effie’s. The dress hangs in such a way that I don’t have to lift the skirt when I walk, leaving me with one less thing to worry about. 
“So, all ready for the interview then?” asks Cinna. I can 
see by his expression that he’s been talking to Haymitch. That he knows how dreadful I am. 
“I’m awful. Haymitch called me a dead slug. No matter 
what we tried, I couldn’t do it. I just can’t be one of those people he wants me to be,” I say. 
Cinna thinks about this a moment. “Why don’t you just 
be yourself?” 
“Myself? That’s no good, either. Haymitch says I’m sullen 
and hostile,” I say. 
“Well, you are . . . around Haymitch,” says Cinna with a 

In [5]:
document = Document(text="\n\n".join([doc.text for doc in documents]))

# Basic RAG pipeline

## Load the LLM

In [6]:
# LLM Configs
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)
llm_name = "mistralai/Mistral-7B-Instruct-v0.1"
prompt_template = PromptTemplate("<s>[INST] {query_str} [/INST] </s>\n")
llm_configs = {
    "do_sample": True,
    "temperature": 0.1,
    "top_k": 5,
    "top_p": 0.95
}

# Set the LLM
llm = HuggingFaceLLM(
    model_name=llm_name,
    tokenizer_name=llm_name,
    query_wrapper_prompt=prompt_template,
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"quantization_config": quantization_config},
    generate_kwargs=llm_configs,
    device_map="auto",
)
Settings.llm = llm

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

## Load the RAG Embedding Model

In [7]:
# set the embed model
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Build the index

In [8]:
index = VectorStoreIndex.from_documents([document])

# Use index as a query engine
query_engine = index.as_query_engine()

# Query the index
response = query_engine.query(
    "What are steps to take when finding projects to build your experience?"
)
print(str(response))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



To answer the query, it is important to first understand the context information provided. The story is about a young girl named Katniss Everdeen who is participating in the Hunger Games, a televised event where 24 children from the twelve districts of Panem are selected to fight to the death in an arena. The story follows Katniss as she navigates the challenges of the Hunger Games and tries to make a good impression on the audience in order to gain sponsors and increase her chances of survival.

In terms of finding projects to build experience, it is important to consider your interests, skills, and goals. Some steps to take when finding projects to build your experience could include:

1. Identify your interests and skills: Consider what you are passionate about and what skills you have to offer. This will help you identify potential projects that align with your interests and skills.
2. Research opportunities: Look for opportunities that align with your interests and skills. This c

In [9]:
response = query_engine.query(
    "What were Peeta and Caesar joking about during the interview?"
)
print(str(response))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



During Peeta's interview, Caesar and Peeta were joking about Peeta's love for bread and how he would compare the tributes to the breads from their districts. They also joked about Peeta's unrequited love for a girl back home and how winning the Hunger Games would help him in his pursuit of her.


# Sentence-Window RAG pipeline

## Build the Node Parser

In [10]:
# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

In [11]:
text = "Hello. How are you? I am fine!"

nodes = node_parser.get_nodes_from_documents([Document(text=text)])
print([x.text for x in nodes])

['Hello. ', 'How are you? ', 'I am fine!']


In [12]:
print(nodes[1].metadata["window"])

Hello.  How are you?  I am fine!


In [13]:
text = "hello. foo bar. cat dog. mouse"

nodes = node_parser.get_nodes_from_documents([Document(text=text)])
print([x.text for x in nodes])

['hello. ', 'foo bar. ', 'cat dog. ', 'mouse']


In [14]:
print(nodes[0].metadata["window"])

hello.  foo bar.  cat dog.  mouse


## Build the Index

In [15]:
from llama_index.core import ServiceContext

sentence_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    node_parser=node_parser,
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [16]:
import os
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage

if not os.path.exists("./sentence_index_3"):
    sentence_index = VectorStoreIndex.from_documents(
        [document], service_context=sentence_context
    )

    sentence_index.storage_context.persist(persist_dir="./sentence_index_3")
else:
    sentence_index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir="./sentence_index_3"),
        service_context=sentence_context
    )

## Build the Postprocessor

In [17]:
from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor

postproc = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)

In [18]:
from llama_index.core.schema import NodeWithScore
from copy import deepcopy

scored_nodes = [NodeWithScore(node=x, score=1.0) for x in nodes]
nodes_old = [deepcopy(n) for n in nodes]
nodes_old[1].text

'foo bar. '

In [19]:
replaced_nodes = postproc.postprocess_nodes(scored_nodes)
print(replaced_nodes[1].text)

hello.  foo bar.  cat dog.  mouse


## Add a ReRanker

In [20]:
from llama_index.core.indices.postprocessor import SentenceTransformerRerank

# link: https://huggingface.co/BAAI/bge-reranker-base
rerank = SentenceTransformerRerank(
    top_n=2, model="BAAI/bge-reranker-base"
)

config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

In [21]:
from llama_index.core import QueryBundle
from llama_index.core.schema import TextNode, NodeWithScore

query = QueryBundle("I want a dog.")

scored_nodes = [
    NodeWithScore(node=TextNode(text="This is a cat"), score=0.6),
    NodeWithScore(node=TextNode(text="This is a dog"), score=0.4),
]

In [22]:
reranked_nodes = rerank.postprocess_nodes(
    scored_nodes, query_bundle=query
)

print([(x.text, x.score) for x in reranked_nodes])

[('This is a dog', 0.91827404), ('This is a cat', 0.0014040867)]


## Run the Query Engine

In [23]:
sentence_window_engine_3 = sentence_index.as_query_engine(
    similarity_top_k=6, node_postprocessors=[postproc, rerank]
)

In [33]:
from llama_index.core.response.notebook_utils import display_response

window_response = sentence_window_engine_3.query(
    "What all did Katniss talk about in her interview with Caesar?"
)

display_response(window_response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** Katniss talked about her crush on the girl she was being interviewed by, Caesar, and how she was pretty sure the girl didn't know she was alive until the reaping. She also mentioned that a lot of boys liked the girl. Additionally, she discussed the fact that if she won the Hunger Games, she would go home.

# Auto-Merging Retrieval

## Build the node parser

In [25]:
from llama_index.core.node_parser import HierarchicalNodeParser

# create the hierarchical node parser w/ default settings
node_parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128]
)

nodes = node_parser.get_nodes_from_documents([document])

In [26]:
from llama_index.core.node_parser import get_leaf_nodes

leaf_nodes = get_leaf_nodes(nodes)
print(leaf_nodes[30].text)

Cinna takes my icy hands in his warm ones. “Suppose, 
when you answer the questions, you think you’re addressing a friend back home. Who would your best friend be?” asks Cinna. 
“Gale,” I say instantly. “Only it doesn’t make sense, 
Cinna. I would never be telling Gale those things about me. He already knows them.” 
“What about me? Could you think of me as a friend?” 
asks Cinna.


In [27]:
nodes_by_id = {node.node_id: node for node in nodes}

parent_node = nodes_by_id[leaf_nodes[30].parent_node.node_id]
print(parent_node.text)

Cinna takes my icy hands in his warm ones. “Suppose, 
when you answer the questions, you think you’re addressing a friend back home. Who would your best friend be?” asks Cinna. 
“Gale,” I say instantly. “Only it doesn’t make sense, 
Cinna. I would never be telling Gale those things about me. He already knows them.” 
“What about me? Could you think of me as a friend?” 
asks Cinna. 
Of all the people I’ve met since I left home, Cinna is by 
far my favorite. I liked him right off and he hasn’t 

disappointed me yet. “I think so, but —” 
“I’ll be sitting on the main platform with the other 
stylists. You’ll be able to look right at me. When you’re asked a question, find me, and answer it as honestly as possible,” says Cinna. 
“Even if what I think is horrible?” I ask. Because it might 
be, really. 
“Especially if what you think is horrible,” says Cinna. 
“You’ll try it?” 
I nod. It’s a plan. Or at least a straw to grasp at. Too soon it’s time to go. The interviews take place on a 
stage co

## Build the index

In [28]:
# from llama_index import ServiceContext

auto_merging_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    node_parser=node_parser,
)

In [29]:
# if an index file exist, then it will load it
# if not, it will rebuild it

# import os
# from llama_index import VectorStoreIndex, StorageContext, load_index_from_storage
# from llama_index import load_index_from_storage

if not os.path.exists("./merging_index"):
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)

    automerging_index = VectorStoreIndex(
            leaf_nodes,
            storage_context=storage_context,
            service_context=auto_merging_context
        )

    automerging_index.storage_context.persist(persist_dir="./merging_index")
else:
    automerging_index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir="./merging_index"),
        service_context=auto_merging_context
    )

## Add a Reranker

In [30]:
from llama_index.core.indices.postprocessor import SentenceTransformerRerank
from llama_index.core.retrievers import AutoMergingRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

automerging_retriever = automerging_index.as_retriever(
    similarity_top_k=12
)

retriever = AutoMergingRetriever(
    automerging_retriever,
    automerging_index.storage_context,
    verbose=True
)

rerank = SentenceTransformerRerank(top_n=6, model="BAAI/bge-reranker-base")

auto_merging_engine = RetrieverQueryEngine.from_args(
    automerging_retriever, node_postprocessors=[rerank]
)

## Run the Query Engine

In [31]:
auto_merging_response = auto_merging_engine.query(
    "What is the strategy of each tribute from different districts?"
)

display_response(auto_merging_response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** The strategy of each tribute from different districts is not explicitly stated in the given context information. However, we can infer some strategies based on the characteristics and actions of the tributes.

For example, the monstrous boy from District 2 seems to be playing up his ruthless killing machine image, as he is described as a "killing machine" and is not shown to be interested in training or forming alliances.

The fox-faced girl from District 5 is described as being sly and elusive, and her strategy may involve using her wit and charm to gain allies and outsmart her opponents.

Cinna, who is from District 12, is described as being quiet and reserved, and his strategy may involve using his intelligence and resourcefulness to survive and gain allies.

Thresh, who is from District 11, is described as being solitary and uninterested in training, but his strategy may involve using his strength and intimidation to gain allies and outsmart his opponents.

Overall, each tribute's strategy may involve using their unique strengths and characteristics to gain allies, outsmart their opponents, and survive in the Hunger Games.