# Components in LlamaIndex


Alfred is hosting a party and needs to be able to find relevant information on personas that will be attending the party. Therefore, we will use a `QueryEngine` to index and search through a database of personas.

## Let's install the dependencies

We will install the dependencies for this unit.

In [1]:
!pip install llama-index datasets llama-index-callbacks-arize-phoenix arize-phoenix llama-index-vector-stores-chroma llama-index-llms-huggingface-api llama-index-embeddings-huggingface -U -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.0/4.0 MB[0m [31m61.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.2/301.2 kB[0m [31m27.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m247.0/247.0 kB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.1/88.1 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.5/19.5 MB[0m [31m77.8 MB/s[0m eta [36m0:00:

And, let's log in to Hugging Face to use serverless Inference APIs.

In [2]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Create a `QueryEngine` for retrieval augmented generation

### Setting up the persona database

We will be using personas from the [dvilasuero/finepersonas-v0.1-tiny dataset](https://huggingface.co/datasets/dvilasuero/finepersonas-v0.1-tiny). This dataset contains 5K personas that will be attending the party!

Let's load the dataset and store it as files in the `data` directory

In [3]:
from datasets import load_dataset
from pathlib import Path

dataset = load_dataset(path="dvilasuero/finepersonas-v0.1-tiny", split="train")
dataset

README.md:   0%|          | 0.00/618 [00:00<?, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/35.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Dataset({
    features: ['id', 'persona', 'model_name_embeddings', 'embedding', 'nn_indices', 'nn_scores', 'projection', 'cluster_label', 'summary_label'],
    num_rows: 5000
})

In [4]:
dataset[0]['persona']

'A local art historian and museum professional interested in 19th-century American art and the local cultural heritage of Cincinnati.'

In [5]:
Path("data").mkdir(parents=True, exist_ok=True)
for i, persona in enumerate(dataset):
    with open(Path("data") / f"persona_{i}.txt", "w") as f:
        f.write(persona["persona"])

In [11]:
# import os

# os.listdir('data')

Awesome, now we have a local directory with all the personas that will be attending the party, we can load and index!

### Loading and embedding persona documents

We will use the `SimpleDirectoryReader` to load the persona descriptions from the `data` directory. This will return a list of `Document` objects.

In [12]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="data")
documents = reader.load_data()
len(documents)

5000

Now we have a list of `Document` objects, we can use the `IngestionPipeline` to create nodes from the documents and prepare them for the `QueryEngine`. We will use the `SentenceSplitter` to split the documents into smaller chunks and the `HuggingFaceEmbedding` to embed the chunks.

In [13]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ]
)

# run the pipeline sync or async
nodes = await pipeline.arun(documents=documents[:10])
nodes

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

  return forward_call(*args, **kwargs)


[TextNode(id_='d1186234-342d-49e2-8257-62d9a87ea0b2', embedding=[-0.0012166757369413972, 0.01964840106666088, 0.04393020272254944, -0.0022481302730739117, -0.018407437950372696, -0.017943954095244408, 0.009850725531578064, 0.03162040188908577, -0.06904757767915726, -0.048689112067222595, 0.0006960766040720046, -0.03956495225429535, -0.04958916828036308, 0.03078891895711422, -0.02168801613152027, -0.002734903944656253, 0.009126011282205582, 0.08719325065612793, 0.0089624784886837, 0.012264142744243145, 0.014265628531575203, -0.06638183444738388, 0.03564688563346863, -0.019090453162789345, -0.029022356495261192, -0.000252155470661819, 0.016371164470911026, -0.006577865220606327, -0.016838740557432175, -0.11301659792661667, -0.0613473579287529, 0.00975888967514038, -0.02667418122291565, 0.014211353845894337, 0.06158551201224327, 0.01061816606670618, 0.013383885845541954, 0.066158227622509, 0.007938407361507416, 0.017128722742199898, 0.029465196654200554, 0.01628757454454899, -0.0257763154

As, you can see, we have created a list of `Node` objects, which are just chunks of text from the original documents. Let's explore how we can add these nodes to a vector store.

### Storing and indexing documents

Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it.
In this case, we will use `Chroma` to store our documents.
Let's run the pipeline again with the vector store attached.
The `IngestionPipeline` caches the operations so this should be fast!

In [14]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection(name="alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ],
    vector_store=vector_store,
)

nodes = await pipeline.arun(documents=documents[:10])
len(nodes)

  return forward_call(*args, **kwargs)


10

We can create a `VectorStoreIndex` from the vector store and use it to query the documents by passing the vector store and embedding model to the `from_vector_store()` method.

In [15]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding


embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embed_model
)

In [16]:
index

<llama_index.core.indices.vector_store.base.VectorStoreIndex at 0x783034864b50>

We don't need to worry about persisting the index to disk, as it is automatically saved within the `ChromaVectorStore` object and the passed directory path.

### Querying the index

Now that we have our index, we can use it to query the documents.
Let's create a `QueryEngine` from the index and use it to query the documents using a specific response mode.


In [18]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
import nest_asyncio

nest_asyncio.apply()  # This is needed to run the query engine
# llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
llm = HuggingFaceInferenceAPI(model_name="HuggingFaceTB/SmolLM3-3B")
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
response = query_engine.query(
    "Respond using a persona that describes author and travel experiences?"
)
response

Response(response='<think>\nOkay, let\'s see. The user wants me to respond using a persona that describes an author and their travel experiences. The context provided includes two personas: one is an anthropologist or cultural expert with experience in Cyprus, and another is a pulmonologist interested in respiratory education.\n\nHmm, the user is asking for a persona that combines both aspects? Or maybe they want a persona that\'s an author who has both these backgrounds? Wait, the query says "respond using a persona that describes author and travel experiences." The context mentions personas, but the user is asking for a persona that\'s an author. Maybe the personas in the context are examples of authors with specific expertise. \n\nSo, the first persona is an anthropologist in Cyprus, and the second is a pulmonologist. The user wants a persona that\'s an author, so perhaps combining elements from both? Or maybe the user is asking for a persona that\'s an author who has both these bac

## Evaluation and observability

LlamaIndex provides **built-in evaluation tools to assess response quality.**
These evaluators leverage LLMs to analyze responses across different dimensions.
We can now check if the query is faithful to the original persona.

In [19]:
from llama_index.core.evaluation import FaithfulnessEvaluator

# query index
evaluator = FaithfulnessEvaluator(llm=llm)
eval_result = evaluator.evaluate_response(response=response)
eval_result.passing

True

If one of these LLM based evaluators does not give enough context, we can check the response using the Arize Phoenix tool, after creating an account at [LlamaTrace](https://llamatrace.com/login) and generating an API key.

In [34]:
# import llama_index
# import os

# PHOENIX_API_KEY = "<PHOENIX_API_KEY>"
# os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
# llama_index.core.set_global_handler(
#     "arize_phoenix", endpoint="https://llamatrace.com/v1/traces"
# )




In [36]:
import os
from llama_index.core import set_global_handler

# Get API key from environment variable or config
PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY")  # Recommended approach

# Configure Phoenix
set_global_handler(
    "arize_phoenix",
    endpoint="https://llamatrace.com/v1/traces",
    api_key=PHOENIX_API_KEY
)



Now, we can query the index and see the response in the Arize Phoenix tool.

In [37]:
response = query_engine.query(
    "What is the name of the someone that is interested in AI and techhnology?"
)
response

ERROR:opentelemetry.exporter.otlp.proto.http.trace_exporter:Failed to export span batch code: 401, reason: 
ERROR:opentelemetry.exporter.otlp.proto.http.trace_exporter:Failed to export span batch code: 401, reason: 
ERROR:opentelemetry.exporter.otlp.proto.http.trace_exporter:Failed to export span batch code: 401, reason: 
ERROR:opentelemetry.exporter.otlp.proto.http.trace_exporter:Failed to export span batch code: 401, reason: 
ERROR:opentelemetry.exporter.otlp.proto.http.trace_exporter:Failed to export span batch code: 401, reason: 
ERROR:opentelemetry.exporter.otlp.proto.http.trace_exporter:Failed to export span batch code: 401, reason: 
ERROR:opentelemetry.exporter.otlp.proto.http.trace_exporter:Failed to export span batch code: 401, reason: 
ERROR:opentelemetry.exporter.otlp.proto.http.trace_exporter:Failed to export span batch code: 401, reason: 
ERROR:opentelemetry.exporter.otlp.proto.http.trace_exporter:Failed to export span batch code: 401, reason: 
ERROR:opentelemetry.exporter

Response(response='<think>\nOkay, let\'s see. The user is asking for the name of someone who is interested in AI and technology. The context provided includes two personas: one is an anthropologist or cultural expert focused on Cyprus, and the other is an art historian and museum professional interested in 19th-century American art and Cincinnati\'s cultural heritage.\n\nHmm, neither of these personas mention AI or technology. The first persona is about Cypriot culture, history, and society. The second is about art history and local heritage in Cincinnati. There\'s no indication that either of them has an interest in AI or technology. The query is asking for a name, but the context doesn\'t provide any names. The personas are described as "someone" but not given specific names. \n\nWait, maybe the user is expecting me to infer a name from the context, but there\'s no name provided. The personas are just descriptions. The answer should be that there\'s no information in the context abou

We can then go to the [LlamaTrace](https://llamatrace.com/login) and explore the process and response.

![arize-phoenix](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/arize.png)    