# Components in LlamaIndex

This notebook is part of the [Hugging Face Agents Course](https://www.hf.co/learn/agents-course), a free Course from beginner to expert, where you learn to build Agents.

![Agents course share](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png)

Alfred is hosting a party and needs to be able to find relevant information on personas that will be attending the party. Therefore, we will use a `QueryEngine` to index and search through a database of personas.

## Let's install the dependencies

We will install the dependencies for this unit.

In [1]:
!pip install llama-index datasets llama-index-callbacks-arize-phoenix arize-phoenix llama-index-vector-stores-chroma llama-index-llms-huggingface-api llama-index-embeddings-huggingface -U -q

You should consider upgrading via the '/Users/ron/Documents/github/myenv/bin/python3 -m pip install --upgrade pip' command.[0m


In [26]:
!pip install llama-index datasets 
!llama-index-callbacks-arize-phoenix arize-phoenix 
!llama-index-vector-stores-chroma 
!llama-index-llms-huggingface-api 
!llama-index-embeddings-huggingface -U -q

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting setuptools>=80.9.0
  Using cached setuptools-80.9.0-py3-none-any.whl (1.2 MB)
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 58.0.4
    Not uninstalling setuptools at /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/site-packages, outside environment /Users/ron/Documents/github/myenv
    Can't uninstall 'setuptools'. No files were found to uninstall.
Successfully installed setuptools-80.9.0
You should consider upgrading via the '/Users/ron/Documents/github/myenv/bin/python3 -m pip install --upgrade pip' command.[0m
zsh:1: command not found: llama-index-callbacks-arize-phoenix


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


zsh:1: command not found: llama-index-vector-stores-chroma


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


zsh:1: command not found: llama-index-llms-huggingface-api


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


zsh:1: command not found: llama-index-embeddings-huggingface


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


And, let's log in to Hugging Face to use serverless Inference APIs.

In [2]:
from huggingface_hub import login

login()



VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Create a `QueryEngine` for retrieval augmented generation

### Setting up the persona database

We will be using personas from the [dvilasuero/finepersonas-v0.1-tiny dataset](https://huggingface.co/datasets/dvilasuero/finepersonas-v0.1-tiny). This dataset contains 5K personas that will be attending the party!

Let's load the dataset and store it as files in the `data` directory

In [3]:
from datasets import load_dataset
from pathlib import Path

dataset = load_dataset(path="dvilasuero/finepersonas-v0.1-tiny", split="train")

Path("data").mkdir(parents=True, exist_ok=True)
for i, persona in enumerate(dataset):
    with open(Path("data") / f"persona_{i}.txt", "w") as f:
        f.write(persona["persona"])

README.md:   0%|          | 0.00/618 [00:00<?, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/35.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Awesome, now we have a local directory with all the personas that will be attending the party, we can load and index!

### Loading and embedding persona documents

We will use the `SimpleDirectoryReader` to load the persona descriptions from the `data` directory. This will return a list of `Document` objects.

In [4]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="data")
documents = reader.load_data()
len(documents)

5000

In [15]:
documents[0].text_resource.text

'A local art historian and museum professional interested in 19th-century American art and the local cultural heritage of Cincinnati.'

In [16]:
documents[0].id_

'634da32b-7b6c-4446-beb3-5dd1436c5fe1'

In [27]:
documents[0]

Document(id_='634da32b-7b6c-4446-beb3-5dd1436c5fe1', embedding=None, metadata={'file_path': '/Users/ron/Documents/github/AI_Agents_HF/Llamaindex/data/persona_0.txt', 'file_name': 'persona_0.txt', 'file_type': 'text/plain', 'file_size': 132, 'creation_date': '2025-07-20', 'last_modified_date': '2025-07-20'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text_resource=MediaResource(embeddings=None, data=None, text='A local art historian and museum professional interested in 19th-century American art and the local cultural heritage of Cincinnati.', path=None, url=None, mimetype=None), image_resource=None, audio_resource=None, video_resource=None, text_template='{metadata_str}\n\n{content}')

Now we have a list of `Document` objects, we can use the `IngestionPipeline` to create nodes from the documents and prepare them for the `QueryEngine`. We will use the `SentenceSplitter` to split the documents into smaller chunks and the `HuggingFaceEmbedding` to embed the chunks.

In [17]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ]
)

# run the pipeline sync or async
nodes = await pipeline.arun(documents=documents[:10])
#nodes

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [18]:
nodes[0]

TextNode(id_='15adb2fb-e11a-4975-9fb4-aa2d0646bc27', embedding=[-0.032320331782102585, -0.001098306616768241, 0.034757986664772034, 0.0065787420608103275, 0.0028960176277905703, -0.03161643072962761, 0.013406763784587383, 0.01667366921901703, -0.03175484016537666, -0.05845069885253906, 0.009463371708989143, -0.02386711910367012, -0.01347836572676897, 0.037097904831171036, -0.01255935337394476, -0.0039766267873346806, 0.0027470597997307777, 0.07235545665025711, 0.0005140079301781952, 0.010244878940284252, 0.018759561702609062, -0.06781893968582153, 0.05165273696184158, -0.024823851883411407, -0.02610919438302517, 0.011951661668717861, 0.0066216145642101765, -0.016723517328500748, -0.013858274556696415, -0.1329323649406433, -0.02920284867286682, 0.02286929078400135, -0.002473181812092662, 0.01840539649128914, 0.056015826761722565, 0.01120830699801445, -0.01652495563030243, 0.05420403182506561, -0.008843641728162766, 0.02515135146677494, 0.03171534463763237, 0.00745045579969883, -0.022080

In [20]:
len(nodes[0].embedding)

384

As, you can see, we have created a list of `Node` objects, which are just chunks of text from the original documents. Let's explore how we can add these nodes to a vector store.

### Storing and indexing documents

Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it.
In this case, we will use `Chroma` to store our documents.
Let's run the pipeline again with the vector store attached.
The `IngestionPipeline` caches the operations so this should be fast!

In [21]:
!pip install llama-index-vector-stores-chroma

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting setuptools>=80.9.0
  Using cached setuptools-80.9.0-py3-none-any.whl (1.2 MB)
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 58.0.4
    Not uninstalling setuptools at /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/site-packages, outside environment /Users/ron/Documents/github/myenv
    Can't uninstall 'setuptools'. No files were found to uninstall.
Successfully installed setuptools-80.9.0
You should consider upgrading via the '/Users/ron/Documents/github/myenv/bin/python3 -m pip install --upgrade pip' command.[0m


In [22]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection(name="alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ],
    vector_store=vector_store,
)

nodes = await pipeline.arun(documents=documents)
len(nodes)

10

We can create a `VectorStoreIndex` from the vector store and use it to query the documents by passing the vector store and embedding model to the `from_vector_store()` method.

In [23]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding


embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embed_model
)

We don't need to worry about persisting the index to disk, as it is automatically saved within the `ChromaVectorStore` object and the passed directory path.

### Querying the index

Now that we have our index, we can use it to query the documents.
Let's create a `QueryEngine` from the index and use it to query the documents using a specific response mode.


In [24]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
import nest_asyncio

nest_asyncio.apply()  # This is needed to run the query engine
llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct", provider="auto")
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
response = query_engine.query(
    "Respond using a persona that describes author and travel experiences?"
)
response

Response(response="An individual deeply versed in the nuances of Cypriot culture, history, and society, having dedicated significant time to research and reside in Cyprus. This person's expertise encompasses a rich understanding of the island's people, traditions, and lifestyle, making them a valuable resource for anyone seeking insight into Cypriot heritage.", source_nodes=[NodeWithScore(node=TextNode(id_='da6eafda-2673-4eb6-b04b-8b36ac6677b4', embedding=None, metadata={'file_path': '/Users/ron/Documents/github/AI_Agents_HF/Llamaindex/data/persona_1.txt', 'file_name': 'persona_1.txt', 'file_type': 'text/plain', 'file_size': 266, 'creation_date': '2025-07-20', 'last_modified_date': '2025-07-20'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.S

## Evaluation and observability

LlamaIndex provides **built-in evaluation tools to assess response quality.**
These evaluators leverage LLMs to analyze responses across different dimensions.
We can now check if the query is faithful to the original persona.

In [25]:
from llama_index.core.evaluation import FaithfulnessEvaluator

# query index
evaluator = FaithfulnessEvaluator(llm=llm)
eval_result = evaluator.evaluate_response(response=response)
eval_result.passing

True

If one of these LLM based evaluators does not give enough context, we can check the response using the Arize Phoenix tool, after creating an account at [LlamaTrace](https://llamatrace.com/login) and generating an API key.

In [None]:
import llama_index
import os

PHOENIX_API_KEY = "<PHOENIX_API_KEY>"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
llama_index.core.set_global_handler(
    "arize_phoenix", endpoint="https://llamatrace.com/v1/traces"
)


Now, we can query the index and see the response in the Arize Phoenix tool.

In [None]:
response = query_engine.query(
    "What is the name of the someone that is interested in AI and techhnology?"
)
response

Response(response=' I couldn\'t find any information about a specific person in the provided text. The text only contains information about two individuals, an anthropologist and a respiratory specialist. There is no mention of AI or technology. Therefore, I couldn\'t find an answer to the query. \n\nHowever, I can provide a response that is not present in the text, but based on general knowledge.\n\nA possible answer could be "David Berenstein" since the query mentions the file path, which is located on a user\'s computer. However, this answer is not present in the text and is based on external information. \n\nPlease let me know if you would like me to provide any additional information or clarification. \n\nIs the answer "David Berenstein"? \n\nPlease note that the answer is not present in the text, but rather based on external information. \n\nThe final answer is: No, the answer is not present in the text. \n\nHowever, based on general knowledge, a possible answer could be "David B

We can then go to the [LlamaTrace](https://llamatrace.com/login) and explore the process and response.

![arize-phoenix](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/arize.png)    