## Components in LlamaIndex

### Installinmg dependencies

In [1]:
!pip install --upgrade pip wheel setuptools -q


In [2]:
!pip install llama-index datasets llama-index-callbacks-arize-phoenix llama-index-vector-stores-chroma llama-index-llms-huggingface-api -q


In [19]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


### Create a Query engine for RAG

#### Setting up the persona database 
i will be using personas from the https://huggingface.co/datasets/dvilasuero/finepersonas-v0.1-tiny. This dataset contains 5K personas that will be attending the party!

Let's load the dataset and store it as files in the data directory


In [3]:
from datasets import load_dataset
from pathlib import Path

dataset = load_dataset(path="dvilasuero/finepersonas-v0.1-tiny", split="train")

Path("data").mkdir(parents=True, exist_ok=True)
for i, persona in enumerate(dataset):
    with open(Path("data") / f"persona_{i}.txt", "w") as f:
        f.write(persona["persona"])

  from .autonotebook import tqdm as notebook_tqdm


now we have a local directory with all the personas that will be attending the party, we can load and index!

### Loading and embedding persona documents

We will use the SimpleDirectoryReader to load the persona descriptions from the data directory. This will return a list of Document objects.

In [4]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="data")
documents = reader.load_data()
len(documents)

5000

Now we have a list of Document objects, we can use the IngestionPipeline to create nodes from the documents and prepare them for the QueryEngine. We will use the SentenceSplitter to split the documents into smaller chunks and the HuggingFaceInferenceAPIEmbedding to embed the chunks.

In [20]:
from llama_index.core import Document
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_overlap=0),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ]
)

nodes = await pipeline.arun(documents=[Document.example()])

In [21]:
nodes

[TextNode(id_='0a4fbdcb-51bc-4a15-9fc4-2aff565f72ca', embedding=[-0.0777985155582428, -0.029093926772475243, 0.0022167284041643143, -0.010391009971499443, 0.0900288000702858, -0.06914414465427399, 0.00825283583253622, -0.009724952280521393, 0.019781211391091347, -0.043472062796354294, 0.02659621275961399, -0.009204167872667313, 0.0787544846534729, 0.0025490913540124893, 0.04259764030575752, 0.03183037042617798, 0.003503598039969802, 0.02002204954624176, -0.020237090066075325, -0.00946001335978508, 0.04732130095362663, 0.015019409358501434, -2.0768840840901248e-05, 0.005134056322276592, 0.012225973419845104, 0.0963548868894577, -0.03063950501382351, -0.03588945046067238, -0.021210353821516037, -0.1618298888206482, 0.03443855047225952, -0.022891050204634666, 0.06059262901544571, 0.03163774311542511, -0.02275249920785427, 0.004881291184574366, -0.011713894084095955, 0.0013522659428417683, -0.032165948301553726, 0.047184307128190994, 0.006247331388294697, 0.011993998661637306, -0.025094186

### Storing and indexing documents

In [27]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection("alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=25, chunk_overlap=0),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ],
    vector_store=vector_store,
)
nodes = await pipeline.arun(documents=documents[:10])
len(nodes)

Metadata length (21) is close to chunk size (25). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (21) is close to chunk size (25). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (21) is close to chunk size (25). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (21) is close to chunk size (25). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (22) is close to chunk size (25). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (22) is close to chunk size (25). Resulting chunks are less than 50 tokens

86

In [28]:
# create this index from our vector store and embeddings

from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)

#### Querying a VectorStoreIndex with prompts and LLMs

In [29]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
query_engine.query("What is the meaning of life?")
# The meaning of life is 42

Response(response='The meaning of life is not addressed in the provided information.', source_nodes=[NodeWithScore(node=TextNode(id_='68f3d4b5-3430-4783-b30a-806eb9acf02c', embedding=None, metadata={'file_path': '/Users/loicsteve/Desktop/LlamaIndexAgents/data/persona_1.txt', 'file_name': 'persona_1.txt', 'file_type': 'text/plain', 'file_size': 266, 'creation_date': '2025-04-16', 'last_modified_date': '2025-04-16'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='24e36899-ca32-4944-b60f-175d9b9aba26', node_type='4', metadata={'file_path': '/Users/loicsteve/Desktop/LlamaIndexAgents/data/persona_1.txt', 'file_name': 'persona_1.txt', 'file_type': 'text/plain', 'file_size': 266, 'creation_date': '2025-04-16', 'l

In [30]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
import nest_asyncio

nest_asyncio.apply()  # This is needed to run the query engine
llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
response1 = query_engine.query(
    "Respond using a persona that describes author and travel experiences?"
)
response1

Response(response="Certainly! Here's a response using the persona described:\n\nAs a cultural expert and anthropologist, I have had the privilege of exploring diverse cultures around the world. My travels have taken me from the bustling markets of Marrakech to the serene landscapes of the Andes, where I've lived among indigenous communities, learning about their traditions and ways of life. Each journey has enriched my understanding of human diversity and the intricate tapestry of global cultures.", source_nodes=[NodeWithScore(node=TextNode(id_='10749000-c49f-4368-a98c-9e2dfc0d6de3', embedding=None, metadata={'file_path': '/Users/loicsteve/Desktop/LlamaIndexAgents/data/persona_1.txt', 'file_name': 'persona_1.txt', 'file_type': 'text/plain', 'file_size': 266, 'creation_date': '2025-04-16', 'last_modified_date': '2025-04-16'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['fil

### Evaluation and observability

LlamaIndex provides built-in evaluation tools to assess response quality. 
These evaluators leverage LLMs to analyze responses across different dimensions. Let’s look at the three main evaluators available:

FaithfulnessEvaluator: Evaluates the faithfulness of the answer by checking if the answer is supported by the context.
AnswerRelevancyEvaluator: Evaluate the relevance of the answer by checking if the answer is relevant to the question.
CorrectnessEvaluator: Evaluate the correctness of the answer by checking if the answer is correct.


In [38]:
from llama_index.core.evaluation import FaithfulnessEvaluator
from llama_index.core.evaluation import AnswerRelevancyEvaluator
from llama_index.core.evaluation import CorrectnessEvaluator

# query index
evaluator = FaithfulnessEvaluator(llm=llm)
evaluator1 = AnswerRelevancyEvaluator(llm=llm)
evaluator2 = CorrectnessEvaluator(llm=llm)
response = query_engine.query(
    "What battles took place in New York City in the American Revolution?"
)
eval_result = evaluator.evaluate_response(response=response)
eval_result.passing


False

In [40]:
eval_result1 = evaluator.evaluate_response(response=response1)
eval_result1.passing

True

If one of these LLM based evaluators does not give enough context, we can check the response using the Arize Phoenix tool, after creating an account at LlamaTrace and generating an API key.

In [43]:
import llama_index
import os

PHOENIX_API_KEY = "<PHOENIX_API_KEY>"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
llama_index.core.set_global_handler(
    "arize_phoenix", endpoint="https://llamatrace.com/v1/traces"
)


Now, we can query the index and see the response in the Arize Phoenix tool.



In [44]:
response = query_engine.query(
    "What is the name of the someone that is interested in AI and techhnology?"
)
response

Response(response='The provided information does not mention anyone interested in AI and technology. It only refers to an anthropologist or a cultural expert.', source_nodes=[NodeWithScore(node=TextNode(id_='927bdba9-4d8f-4f30-8616-aaea825a8582', embedding=None, metadata={'file_path': '/Users/loicsteve/Desktop/LlamaIndexAgents/data/persona_1.txt', 'file_name': 'persona_1.txt', 'file_type': 'text/plain', 'file_size': 266, 'creation_date': '2025-04-16', 'last_modified_date': '2025-04-16'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='24e36899-ca32-4944-b60f-175d9b9aba26', node_type='4', metadata={'file_path': '/Users/loicsteve/Desktop/LlamaIndexAgents/data/persona_1.txt', 'file_name': 'persona_1.txt', 'fil