# Introduction to LLamaIndex
LlamaIndex is a complete toolkit for creating LLM-powered agents over your data using indexes and workflows. For this course we’ll focus on three main parts that help build agents in LlamaIndex: Components, Agents and Tools and Workflows.

In [None]:
# !pip install llama-index-llms-huggingface-api llama-index-embeddings-huggingface

In [1]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
import os
from dotenv import load_dotenv

# Load the .env file
load_dotenv()

# Retrieve HF_TOKEN from the environment variables
hf_token = os.getenv("HF_TOKEN")

llm = HuggingFaceInferenceAPI(
    model_name="Qwen/Qwen2.5-Coder-32B-Instruct",
    temperature=0.7,
    max_tokens=100,
    token=hf_token,
)

response = llm.complete("Hello, how are you?")
print(response)
# I am good, how can I help you today?

Hello! I'm just a computer program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?


## What are components in LLamaIndex?
While LlamaIndex has many components, we’ll focus specifically on the QueryEngine component. Why? Because it can be used as a Retrieval-Augmented Generation (RAG) tool for an agent.

So, what is RAG? LLMs are trained on enormous bodies of data to learn general knowledge. However, they may not be trained on relevant and up-to-date data. RAG solves this problem by finding and retrieving relevant information from your data and giving that to the LLM.

Now, think about how Alfred works:

1. You ask Alfred to help plan a dinner party
2. Alfred needs to check your calendar, dietary preferences, and past successful menus
3. The `QueryEngine` helps Alfred find this information and use it to plan the dinner party

This makes the `QueryEngine` a **key component for building agentic RAG workflows** in LlamaIndex. Just as Alfred needs to search through your household information to be helpful, any agent needs a way to find and understand relevant data. The QueryEngine provides exactly this capability.

Now, let’s dive a bit deeper into the components and see how you can combine components to create a RAG pipeline.

## Creating a RAG pipeline using components

There are five key stages within RAG, which in turn will be a part of most larger applications you build. These are:

1. **Loading**: this refers to getting your data from where it lives — whether it’s text files, PDFs, another website, a database, or an API — into your workflow. LlamaHub provides hundreds of integrations to choose from.
2. **Indexing**: this means creating a data structure that allows for querying the data. For LLMs, this nearly always means creating vector embeddings. Which are numerical representations of the meaning of the data. Indexing can also refer to numerous other metadata strategies to make it easy to accurately find contextually relevant data based on properties.
3. **Storing**: once your data is indexed you will want to store your index, as well as other metadata, to avoid having to re-index it.
4. **Querying**: for any given indexing strategy there are many ways you can utilize LLMs and LlamaIndex data structures to query, including sub-queries, multi-step queries and hybrid strategies.
5. **Evaluation**: a critical step in any flow is checking how effective it is relative to other strategies, or when you make changes. Evaluation provides objective measures of how accurate, faithful and fast your responses to queries are.


In [None]:
!pip install datasets

We will be using personas from the `dvilasuero/finepersonas-v0.1-tiny dataset`. This dataset contains 5K personas that will be attending the party!

Let's load the dataset and store it as files in the `data` directory

In [1]:
from datasets import load_dataset
from pathlib import Path

dataset = load_dataset(path="dvilasuero/finepersonas-v0.1-tiny", split="train")

Path("data").mkdir(parents=True, exist_ok=True)
for i, persona in enumerate(dataset):
    with open(Path("data") / f"persona_{i}.txt", "w") as f:
        f.write(persona["persona"])

## Loading and ebbedding documents

Prima di accedere ai dati dobbiamo caricarli, abbiamo 3 modi per farlo:
1. `SimpleDirectoryReader`: A built-in loader for various file types from a local directory.
2. `LlamaParse`: LlamaParse, LlamaIndex’s official tool for PDF parsing, available as a managed API.
3. `LlamaHub`: A registry of hundreds of data-loading libraries to ingest data from any source.


The simplest way to load data is with `SimpleDirectoryReader`. This versatile component can load various file types from a folder and convert them into Document objects that LlamaIndex can work with. Let’s see how we can use SimpleDirectoryReader to load data from a folder.

In [2]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="data")
documents = reader.load_data()
len(documents)

5000

After loading our documents, we need to break them into smaller pieces called Node objects. A Node is just a chunk of text from the original document that’s easier for the AI to work with, while it still has references to the original Document object.

The `IngestionPipeline` helps us create these nodes through two key transformations.

1. `SentenceSplitter` breaks down documents into manageable chunks by splitting them at natural sentence boundaries.
2. `HuggingFaceEmbedding` converts each chunk into numerical embeddings - vector representations that capture the semantic meaning in a way AI can process efficiently.
   
This process helps us organise our documents in a way that’s more useful for searching and analysis.

In [10]:
from llama_index.core import Document
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_overlap=0),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ]
)

# run the pipeline sync or async
nodes = await pipeline.arun(documents=documents[:10])
nodes

[TextNode(id_='ca5711b6-9768-4232-a595-86553c979b6c', embedding=[-0.05761207640171051, 0.019227761775255203, 0.02992495335638523, 0.007753055077046156, -0.0121120261028409, -0.03004390560090542, 0.013232488185167313, -0.008162317797541618, -0.08974718302488327, -0.03916358947753906, -0.003696898929774761, -0.04986422508955002, -0.008050402626395226, 0.04177147522568703, -0.0032207658514380455, -0.010836731642484665, -0.005486790090799332, 0.08179862052202225, 0.027424100786447525, 0.003029109677299857, -0.018713850528001785, -0.06928347796201706, 0.06025458127260208, -0.034376759082078934, -0.023837808519601822, 0.026188673451542854, 0.036787547171115875, -0.01453328225761652, 0.032120831310749054, -0.12900204956531525, -0.04033482447266579, 0.0188222024589777, -0.017645327374339104, 0.01944986917078495, 0.06629305332899094, 0.013320490717887878, 0.0027979989536106586, 0.060686659067869186, 0.012141973711550236, 0.013918831013143063, 0.014468561857938766, 0.021381059661507607, -0.02834

## Storing and indexing documents

After creating our `Node` objects we need to index them to make them searchable, but before we can do that, we need a place to store our data.

Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it. In this case, we will use Chroma to store our documents.

In [None]:
!pip install llama-index-vector-stores-chroma

In [12]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection("alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

pipeline = IngestionPipeline(
    transformations=[
        # SentenceSplitter(chunk_size=25, chunk_overlap=0),
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ],
    vector_store=vector_store,
)
nodes = await pipeline.arun(documents=documents[:10])
len(nodes)

10

This is where vector embeddings come in - by embedding both the query and nodes in the same vector space, we can find relevant matches. The VectorStoreIndex handles this for us, using the same embedding model we used during ingestion to ensure consistency.

Let’s see how to create this index from our vector store and embeddings:

In [13]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(
    vector_store, 
    embed_model=embed_model
)

All information is automatically persisted within the ChromaVectorStore object and the passed directory path.

Great! Now that we can save and load our index easily, let’s explore how to query it in different ways.

## Querying a VectorStoreIndex with prompts and LLMs
Before we can query our index, we need to convert it to a query interface. The most common conversion options are:

- `as_retriever`: For basic document retrieval, returning a list of NodeWithScore objects with similarity scores
- `as_query_engine`: For single question-answer interactions, returning a written response
- `as_chat_engine`: For conversational interactions that maintain memory across multiple messages, returning a written response using chat history and indexed context

We’ll focus on the query engine since it is more common for agent-like interactions. We also pass in an LLM to the query engine to use for the response.



In [18]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
response = query_engine.query(
    "Respond using a persona that describes author and travel experiences?"
)

response

Response(response='An anthropologist or cultural expert with a deep dive into Cypriot culture, history, and society, having spent significant time researching and living in Cyprus to understand its people, customs, and way of life.', source_nodes=[NodeWithScore(node=TextNode(id_='0aff1520-474a-47c3-8f25-5c8cc9bacff0', embedding=None, metadata={'file_path': '/Users/pandagan/workspace/projects/hugging_face_agent_course/unit_2/2.2_LlamaIndex/data/persona_1.txt', 'file_name': 'persona_1.txt', 'file_type': 'text/plain', 'file_size': 266, 'creation_date': '2025-06-10', 'last_modified_date': '2025-06-10'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='3340a5df-67a7-4920-b42e-1dc79e610e94', node_type='4', metadat

> The language model won’t always perform in predictable ways, so we can’t be sure that the answer we get is always correct. We can deal with this by evaluating the quality of the answer.

## Evaluation and Observability

LlamaIndex provides **built-in evaluation tools to assess response quality**. These evaluators leverage LLMs to analyze responses across different dimensions. Let’s look at the three main evaluators available:

- `FaithfulnessEvaluator`: Evaluates the faithfulness of the answer by checking if the answer is supported by the context.
- `AnswerRelevancyEvaluator`: Evaluate the relevance of the answer by checking if the answer is relevant to the question.
- `CorrectnessEvaluator`: Evaluate the correctness of the answer by checking if the answer is correct.

In [19]:
from llama_index.core.evaluation import FaithfulnessEvaluator

# query index
evaluator = FaithfulnessEvaluator(llm=llm)
eval_result = evaluator.evaluate_response(response=response)
eval_result.passing

True

If one of these LLM based evaluators does not give enough context, we can check the response using the Arize Phoenix tool, after creating an account at [LlamaTrace](https://llamatrace.com/login) and generating an API key.

In [None]:
import llama_index
import os

PHOENIX_API_KEY = "<PHOENIX_API_KEY>"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
llama_index.core.set_global_handler(
    "arize_phoenix", 
    endpoint="https://llamatrace.com/v1/traces"
)

In [None]:
Now, we can query the index and see the response in the Arize Phoenix tool.



In [None]:
response = query_engine.query(
    "What is the name of the someone that is interested in AI and techhnology?"
)
response

We can then go to the [LlamaTrace](https://llamatrace.com/login) and explore the process and response.

![](../../image/arize.png)

We have seen how to use components to create a `QueryEngine`. Now, let’s see how we can use the **QueryEngine as a tool for an agent**!