# Project 2: Making an AI agent with LLAMAINDEX - Components

### Follow the instructions 
Here are [instructions](https://huggingface.co/learn/agents-course/unit2/llama-index/components) for the tutorial that are helpful for this notebook.


### Description:
This project uses `llamaindex`, a library that provides a framework for developing your agents with ease.

We look at components here. 

`Components` are the fundamental objects used to build the agents.



### For this course, I am using 


see also [github code](https://github.com/huggingface/agents-course)



## Load Imports

In [1]:
import os
from pathlib import Path

import chromadb
from datasets import load_dataset
from dotenv import load_dotenv
from huggingface_hub import login

import llama_index
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.evaluation import FaithfulnessEvaluator
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.vector_stores.chroma import ChromaVectorStore

import nest_asyncio




##### Login to the Hugging Face Hub to Have Access to the Serveless Inference API

Note: You will need to add your token when prompted.


`HF_TOKEN_INFERENCE2`

In [2]:
# Load environment variables from .env
load_dotenv()

True

In [3]:
# Get Hugging Face Token
HF_TOKEN_INFERENCE2 = os.environ.get("HF_TOKEN_INFERENCE2")

In [4]:
#And, let's log in to Hugging Face to use serverless Inference APIs.
login() 

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

----

# The QueryEngine component. 

Why? Because it can be used as a Retrieval-Augmented Generation (RAG) tool for an agent.



Now, let’s dive a bit deeper into the components and see how you can combine components to create a RAG pipeline.

#### The Problem

Alfred is hosting a party and needs to be able to find relevant information on personas that will be attending the party. Therefore, we will use a QueryEngine to index and search through a database of personas.


#### The Dependencies

```
!pip install llama-index datasets llama-index-callbacks-arize-phoenix arize-phoenix llama-index-vector-stores-chroma llama-index-llms-huggingface-api llama-index-embeddings-hugging
```
----

### Create a QueryEngine for retrieval augmented generation

#### Setting up the persona database

We will be using personas from the [dvilasuero/finepersonas-v0.1-tiny dataset](https://huggingface.co/datasets/dvilasuero/finepersonas-v0.1-tiny). This dataset contains 5K personas that will be attending the party!

Let's load the dataset and store it as files in the `data` directory



understanding the coding syntax
from pathlib import Path: Imports the Path class from the pathlib module.

Path(data) / f"persona_{i}.txt": Creates a platform-independent file path.

Path(data) creates a Path object representing the directory specified by the variable data.
/ joins path components, which works across different operating systems.

f"persona_{i}.txt" is an f-string that creates the filename dynamically, likely incorporating a variable i for iteration.

open( ... , 'w'): Uses the built-in open() function to open a file.

The first argument is the file path created in the previous step.

'w' is the write mode. If the file exists, it will be truncated (emptied) before writing. If the file doesn't exist, a new file will be created.

as f: Assigns the opened file object to the variable f within the with block.

with ... as f:: The with statement is a context manager that ensures the file is properly closed, even if errors occur. 

In [5]:
dataset = load_dataset(path="dvilasuero/finepersonas-v0.1-tiny", split="train")

Path("data").mkdir(parents=True, exist_ok=True)
for i, persona in enumerate(dataset):
    with open(Path("data") / f"persona_{i}.txt", "w") as f:
        f.write(persona["persona"])

README.md:   0%|          | 0.00/618 [00:00<?, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/35.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Awesome, now we have a local directory with all the personas that will be attending the party, we can load and index!

### Loading and embedding persona documents

We will use the `SimpleDirectoryReader` to load the persona descriptions from the `data` directory. This will return a list of `Document` objects.

In [6]:
reader = SimpleDirectoryReader(input_dir="data")
documents = reader.load_data()
len(documents)

5000

Now we have a list of `Document` objects, we can use the `IngestionPipeline` to create nodes from the documents and prepare them for the `QueryEngine`. We will use the `SentenceSplitter` to split the documents into smaller chunks and the `HuggingFaceEmbedding` to embed the chunks.

In [7]:
# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ]
)

# run the pipeline sync or async
nodes = await pipeline.arun(documents=documents[:10])
nodes

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

[TextNode(id_='ff3dd894-040f-41be-a4a5-c16b949f67ab', embedding=[-0.05372092127799988, 0.014647825621068478, 0.036100421100854874, 0.0031414267141371965, 0.012114139273762703, -0.02996452897787094, 0.0004030904092360288, -0.005145682487636805, -0.056750111281871796, -0.06477732956409454, 0.0030367623548954725, -0.013929042033851147, -0.027465028688311577, 0.04322900250554085, -0.0027089305222034454, -0.020054981112480164, 0.005541058722883463, 0.09114041924476624, 0.01144470926374197, 0.012648425064980984, -0.0011150541249662638, -0.05305922403931618, 0.06926539540290833, -0.02496507577598095, -0.053892262279987335, 0.03098149411380291, 0.02734014391899109, -0.02506164275109768, 0.010642355307936668, -0.12227146327495575, -0.03303486853837967, -0.002437483286485076, -0.009284256026148796, 0.036685481667518616, 0.06627106666564941, 0.01911538653075695, -0.00779915414750576, 0.07250063866376877, -0.01302420161664486, 0.002205104101449251, 0.01316052582114935, -0.00316694681532681, -0.032

As, you can see, we have created a list of Node objects, which are just chunks of text from the original documents. Let's explore how we can add these nodes to a vector store.

### Storing and indexing documents

Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it. In this case, we will use `Chroma` to store our documents. Let's run the pipeline again with the vector store attached. The `IngestionPipeline` caches the operations so this should be fast!

In [8]:
db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection(name="alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ],
    vector_store=vector_store,
)

nodes = await pipeline.arun(documents=documents[:10])
len(nodes)

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event CollectionAddEvent: capture() takes 1 positional argument but 3 were given


10

We can create a `VectorStoreIndex` from the vector store and use it to query the documents by passing the vector store and embedding model to the `from_vector_store()` method.

In [9]:
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embed_model
)

We don't need to worry about persisting the index to disk, as it is automatically saved within the `ChromaVectorStore` object and the passed directory path.

## Querying the index

Now that we have our index, we can use it to query the documents. Let's create a `QueryEngine` from the index and use it to query the documents using a specific response mode.

In [10]:
nest_asyncio.apply()  # This is needed to run the query engine
llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
response = query_engine.query(
    "Respond using a persona that describes author and travel experiences?"
)
response

Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


Response(response='An anthropologist or cultural expert with a deep dive into Cypriot culture, history, and society, having spent significant time researching and living in Cyprus to understand its people, customs, and way of life.', source_nodes=[NodeWithScore(node=TextNode(id_='4dcbb3aa-897a-4952-a548-b915a7cc4eae', embedding=None, metadata={'file_path': '/Users/lancehester/Documents/ai_agent_hugging_face_course/src/llamaindex/data/persona_1.txt', 'file_name': 'persona_1.txt', 'file_type': 'text/plain', 'file_size': 266, 'creation_date': '2025-07-01', 'last_modified_date': '2025-07-01'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='d4176133-6a42-4183-bcc1-d7233a8673f4', node_type='4', metadata={'file_p

### Evaluation and observability

LlamaIndex provides **built-in evaluation tools to assess response quality**. These evaluators leverage LLMs to analyze responses across different dimensions. We can now check if the query is faithful to the original persona.

In [11]:
# query index
evaluator = FaithfulnessEvaluator(llm=llm)
eval_result = evaluator.evaluate_response(response=response)
eval_result.passing

True

If one of these LLM based evaluators does not give enough context, we can check the response using the Arize Phoenix tool, after creating an account at [LlamaTrace](https://llamatrace.com/login) and generating an API key.

In [None]:
PHOENIX_API_KEY = "<PHOENIX_API_KEY>"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
llama_index.core.set_global_handler(
    "arize_phoenix", endpoint="https://llamatrace.com/v1/traces"
)

Now, we can query the index and see the response in the Arize Phoenix tool.

In [None]:
response = query_engine.query(
    "What is the name of the someone that is interested in AI and techhnology?"
)
response

We can then go to the [LlamaTrace](https://llamatrace.com/login) and explore the process and response.