# Getting Started with LlamaCloud Indexing

This notebook is a comprehensive tutorial on using the Index feature in LlamaCloud to build powerful RAG and agentic applications over a knowledge base of documents.

In this example we use Apple and Tesla 10-K filings from 2019-2023. We show you how to use the `LlamaCloudIndex` class from `llama_cloud_services` for retrieval.

## Structure

**🚀 Getting Started with the Basics** 
Learn the fundamentals of LlamaCloudIndex:
1. **Index Creation**: Setting up a LlamaCloudIndex with 10 documents (5 Apple + 5 Tesla 10-Ks)
2. **Chunk Retrieval**: Standard retrieval for specific information
3. **Simple Query Engine**: Building basic RAG without complexity
4. **Basic Agent**: Simple agent with conversational capability

**🎯 Advanced Features**
Explore more sophisticated patterns:
1. **File-Level Retrieval**: Retrieving entire documents for comprehensive analysis
2. **Smart Agent**: Agent that can choose between chunk and file retrieval
3. **Query Engine with Citations**: Enhanced RAG with source citations

**Status:**
| Last Updated | Version | State      |
|--------------|---------|------------|
| Sep-02-2025  | 0.6.62  | Active     |


## Setup and Installation

First, let's install the required packages and set up our environment.


In [None]:
# Install required packages
%pip install "llama-index>=0.13.0<0.14.0" llama-cloud-services
%pip install llama-index-llms-openai llama-index-embeddings-openai

## API Keys Setup

Set your API keys. You'll need:
- **LlamaCloud API Key**: Get from [cloud.llamaindex.ai](https://cloud.llamaindex.ai)
- **OpenAI API Key**: For LLM and embedding models


In [None]:
import os

# Set your API keys
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."
os.environ["OPENAI_API_KEY"] = "sk-..."

## Data Download

Let's download the Apple and Tesla 10-K filings for 2019-2023. These are publicly available SEC filings that provide comprehensive annual business information.


In [None]:
# Download Apple 10-K filings (2019-2023)
print("Downloading Apple 10-K filings...")
!wget "https://s2.q4cdn.com/470004039/files/doc_earnings/2023/q4/filing/_10-K-Q4-2023-As-Filed.pdf" -O data/apple_2023.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2022/q4/_10-K-2022-(As-Filed).pdf" -O data/apple_2022.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2021/q4/_10-K-2021-(As-Filed).pdf" -O data/apple_2021.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2020/ar/_10-K-2020-(As-Filed).pdf" -O data/apple_2020.pdf
!wget "https://www.dropbox.com/scl/fi/i6vk884ggtq382mu3whfz/apple_2019_10k.pdf?rlkey=eudxh3muxh7kop43ov4bgaj5i&dl=1" -O data/apple_2019.pdf

print("Downloading Tesla 10-K filings...")
# Download Tesla 10-K filings (2019-2023)
!wget "https://ir.tesla.com/_flysystem/s3/sec/000162828024002390/tsla-20231231-gen.pdf" -O data/tesla_2023.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000095017023001409/tsla-20221231-gen.pdf" -O data/tesla_2022.pdf
!wget "https://www.dropbox.com/scl/fi/ptk83fmye7lqr7pz9r6dm/tesla_2021_10k.pdf?rlkey=24kxixeajbw9nru1sd6tg3bye&dl=1" -O data/tesla_2021.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000156459021004599/tsla-10k_20201231-gen.pdf" -O data/tesla_2020.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000156459020004475/tsla-10k_20191231-gen_0.pdf" -O data/tesla_2019.pdf

print("\nDownload complete! Files available:")
!ls -la data/

## 1. Creating a LlamaCloudIndex

Now we'll create a LlamaCloudIndex by uploading our documents. LlamaCloudIndex provides a managed indexing solution that handles:
- Document parsing and chunking
- Embedding generation
- Hybrid search (dense + sparse)
- Advanced retrieval with reranking

In [None]:
from llama_cloud_services import LlamaCloudIndex

# create an empty index
index = await LlamaCloudIndex.acreate_index(
    name="apple_tesla_10k",
    project_name="Default",
    organization_id="43b88c8f-e488-46f6-9013-698e3d2e374a",
)

In [None]:
print(f"Index ID: {index.id}")

Index ID: 250a1689-712e-4698-8b5d-9c973cb05ef1


Now with our empty index, we can upload our documents.

In [None]:
paths = os.listdir("data")

file_ids = []
for path in paths:
    file_ids.append(
        await index.aupload_file(os.path.join("./data", path), wait_for_ingestion=False)
    )

print(
    f"Uploaded {len(file_ids)} files. Waiting for completion... (this may take a few minutes)"
)
await index.await_for_completion(file_ids=file_ids)

Uploaded 10 files. Waiting for completion... (this may take a few minutes)


ManagedIngestionStatusResponse(job_id=None, deployment_date=None, status=<ManagedIngestionStatus.SUCCESS: 'SUCCESS'>, error=None, effective_at=datetime.datetime(2025, 9, 2, 20, 3, 31, 983235))

## 🚀 Getting Started with the Basics

Now that we have our index, let's explore the fundamental patterns for building RAG applications.

### Chunk-Level Retrieval

Let's start with the most common retrieval pattern - finding relevant chunks of text from our documents.
The chunk-level retriever performs hybrid search (dense + sparse) with reranking to find the most relevant pieces of information.


In [None]:
# Create a chunk-level retriever
chunk_retriever = index.as_retriever(
    dense_similarity_top_k=5,
    sparse_similarity_top_k=5,
    enable_reranking=True,
    rerank_top_n=3,
)

# Example query
query = "What are the main revenue sources for Apple in 2023?"
chunk_nodes = await chunk_retriever.aretrieve(query)

print(f"Retrieved {len(chunk_nodes)} chunks for query: '{query}'\n")

for i, node in enumerate(chunk_nodes):
    print(f"**Chunk {i+1}** (Score: {node.score:.3f})")
    print(f"Source: {node.metadata.get('file_name', 'Unknown')}")
    print(f"Content: {node.text[:300]}...\n")

Retrieved 3 chunks for query: 'What are the main revenue sources for Apple in 2023?'

**Chunk 1** (Score: 0.875)
Source: apple_2023.pdf
Content: Apple Inc. | 2023 Form 10-K | 35

Net sales disaggregated by significant products and services for 2023, 2022 and 2021 were as follows (in millions):

|                                     | 2023      | 2022      | 2021      |
| ----------------------------------- | --------- | --------- | ---------...

**Chunk 2** (Score: 0.777)
Source: apple_2023.pdf
Content: Services

# Advertising

The Company’s advertising services include third-party licensing arrangements and the Company’s own advertising platforms.

# AppleCare

The Company offers a portfolio of fee-based service and support products under the AppleCare® brand. The offerings provide priority access...

**Chunk 3** (Score: 0.769)
Source: apple_2023.pdf
Content: Apple Inc.
# Notes to Consolidated Financial Statements

# Note 1 – Summary of Significant Accounting Policies

# Basis of Pres

### Simple Query Engine

Now let's build a basic RAG query engine that combines retrieval with response generation.


In [None]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.llms.openai import OpenAI

# Set up LLM
llm = OpenAI(model="gpt-5-mini")

# Create a simple query engine
query_engine = RetrieverQueryEngine.from_args(
    chunk_retriever, llm=llm, response_mode="compact"
)

print("🤖 **Simple RAG Query Engine Created**\n")

🤖 **Simple RAG Query Engine Created**



Let's test our query engine with some basic questions about Apple and Tesla:


In [None]:
# Test with simple questions
basic_queries = [
    "What was Apple's total revenue in 2023?",
    "What are Tesla's main products?",
]

for query in basic_queries:
    print(f"**Q: {query}**")
    response = await query_engine.aquery(query)
    print(f"**A:** {response}\n")
    for source in response.source_nodes:
        print(
            f"""**Top source:**
{source.node.metadata}
"""
        )
    print("-" * 60 + "\n")

**Q: What was Apple's total revenue in 2023?**
**A:** Apple's total revenue (net sales) in 2023 was $383.3 billion (approximately $383,285 million).

**Top source:**
{'id': 'apple_2023.pdf', 'file_size': 714094, 'last_modified_at': '2025-09-02T19:59:21', 'file_path': 'apple_2023.pdf', 'file_name': 'apple_2023.pdf', 'external_file_id': 'apple_2023.pdf', 'file_id': '960f2aa6-4113-41b3-9ff6-1a66ee83b84a', 'pipeline_file_id': '04eee09b-2cca-482d-8540-34ad5151e24d', 'pipeline_id': '250a1689-712e-4698-8b5d-9c973cb05ef1', 'page_label': 23, 'start_page_index': 22, 'start_page_label': 23, 'end_page_index': 22, 'end_page_label': 23, 'document_id': 'efc64ba579b1f5b4a28c92a110591bdbb6b33843fab096f5d8', 'start_char_idx': 206312, 'end_char_idx': 209383}

**Top source:**
{'id': 'apple_2023.pdf', 'file_size': 714094, 'last_modified_at': '2025-09-02T19:59:21', 'file_path': 'apple_2023.pdf', 'file_name': 'apple_2023.pdf', 'external_file_id': 'apple_2023.pdf', 'file_id': '960f2aa6-4113-41b3-9ff6-1a66ee83

### Basic Agent

Let's create a simple agent that can have conversations and maintain memory while answering questions about our documents.


In [None]:
from llama_index.core.tools import QueryEngineTool
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.core.memory import Memory

# Create a tool for the agent - single tool for both companies
document_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="document_search",
    description="Search and analyze Apple and Tesla 10-K filings (2019-2023). Use for questions about either company's business, financials, products, or strategy.",
)

# Create the agent using FunctionAgent with memory
memory = Memory.from_defaults(token_limit=40000, token_flush_size=2000)
agent = FunctionAgent(
    tools=[document_tool],
    llm=llm,
    system_prompt="You are a helpful assistant that can search and analyze financial documents from Apple and Tesla. Use the available tools to answer questions accurately.",
)

print("🤖 **Basic Agent Ready!**")
print("The agent can answer questions about both Apple and Tesla using FunctionAgent.")

🤖 **Basic Agent Ready!**
The agent can answer questions about both Apple and Tesla using FunctionAgent.


Let's test the agent with a conversational flow where it remembers previous questions:


In [None]:
# Example conversation showing the agent capabilities
conversation = [
    "What was Apple's revenue in 2023?",
    "How does that compare to Tesla's revenue?",
    "What do you think is the main reason for the difference?",
]

print("💬 **Agent Conversation**\n")


async def run_conversation():
    for i, query in enumerate(conversation, 1):
        print(f"**Turn {i}: {query}**")
        response = await agent.run(query, memory=memory)
        print(f"**Agent:** {response}\n")
        print("-" * 60 + "\n")


# Run the conversation
await run_conversation()

💬 **Agent Conversation**

**Turn 1: What was Apple's revenue in 2023?**
**Agent:** Apple’s revenue (total net sales) for fiscal 2023 was $383,285 million — about $383.3 billion (per Apple’s 2023 Form 10-K).  

Do you want a breakdown by product category or region?

------------------------------------------------------------

**Turn 2: How does that compare to Tesla's revenue?**
**Agent:** Tesla’s total revenue for 2023 was $96,773 million (about $96.8 billion).

Comparison:
- Apple (fiscal 2023): $383,285 million (~$383.3B)  
- Tesla (calendar 2023): $96,773 million (~$96.8B)

- Apple’s revenue was about 3.96× Tesla’s (Apple ≈ $286.5B more).  
- Put another way, Tesla’s 2023 revenue was ≈25.3% of Apple’s, and Apple’s revenue was ≈296% higher than Tesla’s.

Note: Apple reports a fiscal year ending Sept 30, 2023; Tesla’s is calendar year ending Dec 31, 2023. Want a year‑over‑year growth comparison or breakdown by product/segment?

--------------------------------------------------------

## 🎯 Advanced Features

Now let's explore more sophisticated patterns that showcase the full power of LlamaCloudIndex.

### File-Level Retrieval

For questions requiring comprehensive document context, file-level retrieval returns entire documents rather than just chunks.


In [None]:
# Create a file-level retriever
file_retriever = index.as_retriever(
    retrieval_mode="files_via_content", files_top_k=1  # or "files_via_metadata"
)

# Example query that benefits from full document context
query = "Give me a comprehensive analysis of Tesla's financials in 2023."
file_nodes = await file_retriever.aretrieve(query)

print(f"Retrieved {len(file_nodes)} documents for query: '{query}'\n")

for i, node in enumerate(file_nodes):
    print(f"**Document {i+1}**")
    print(f"Source: {node.metadata.get('file_name', 'Unknown')}")
    print(f"Content length: {len(node.text):,} characters")
    print(f"First 300 chars: {node.text[:300]}...\n")

Retrieved 1 documents for query: 'Give me a comprehensive analysis of Tesla's financials in 2023.'

**Document 1**
Source: tesla_2023.pdf
Content length: 536,235 characters
First 300 chars: 
UNITED STATES SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549 FORM 10-K

# FORM 10-K

(Mark One)

x ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934

For the fiscal year ended December 31, 2023

OR

o TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF...



### Smart Agent with Multiple Retrieval Strategies

Let's create an advanced agent that can choose between chunk-level and file-level retrieval based on the question type.

This agent will have two tools - one for specific factual queries (chunk-level) and one for comprehensive analysis (file-level).


In [None]:
fast_llm = OpenAI(model="gpt-4o-mini", temperature=0.1)

# Create query engines for both retrieval strategies
chunk_query_engine = RetrieverQueryEngine.from_args(
    chunk_retriever, llm=fast_llm, response_mode="compact"
)

file_query_engine = RetrieverQueryEngine.from_args(
    file_retriever,
    llm=fast_llm,
    response_mode="tree_summarize",  # Better for large documents
)

# Create tools for the smart agent
smart_tools = [
    QueryEngineTool.from_defaults(
        query_engine=chunk_query_engine,
        name="specific_search",
        description="Search for specific facts, numbers, or targeted information from Apple and Tesla 10-K filings. Best for precise factual queries.",
    ),
    QueryEngineTool.from_defaults(
        query_engine=file_query_engine,
        name="comprehensive_analysis",
        description="Perform comprehensive document-level analysis requiring full context. Use for complex questions, comparisons, or broad business analysis.",
    ),
]

# Create the smart agent using FunctionAgent
smart_agent = FunctionAgent(
    tools=smart_tools,
    llm=llm,
    system_prompt="You are an intelligent assistant with access to two search strategies. Use 'specific_search' for targeted factual queries and 'comprehensive_analysis' for complex analysis requiring full document context. Choose the appropriate tool based on the question type.",
)

print("🧠 **Smart Agent Ready!**")
print("The agent can choose between:")
print("  - Specific search (chunk-level) for targeted queries")
print("  - Comprehensive analysis (file-level) for complex questions")

🧠 **Smart Agent Ready!**
The agent can choose between:
  - Specific search (chunk-level) for targeted queries
  - Comprehensive analysis (file-level) for complex questions


Let's test queries that should trigger different retrieval strategies:


In [None]:
from llama_index.core.agent import ToolCall, ToolCallResult, AgentStream

# Test different types of queries to see tool selection
test_queries = [
    "What was Apple's exact revenue in 2023?",  # Should use specific_search
    "Analyze Tesla's overall business transformation from 2022 to 2023",  # Should use comprehensive_analysis
]

print("🧠 **Smart Agent Tool Selection Test**\n")


async def run_smart_tests():
    smart_memory = Memory.from_defaults(token_limit=40000, token_flush_size=2000)
    for i, query in enumerate(test_queries, 1):
        print(f"**Query {i}: {query}**")
        handler = smart_agent.run(query, memory=smart_memory)
        async for ev in handler.stream_events():
            if isinstance(ev, ToolCall):
                print(f"Call {ev.tool_name} with args {ev.tool_kwargs}")
            if isinstance(ev, ToolCallResult):
                print(f"Returned: {ev.tool_output}")
            elif isinstance(ev, AgentStream):
                print(ev.delta, end="", flush=True)

        response = await handler
        # Response already printed in the stream
        # print(f"**Agent:** {response}\n")
        print("\n" + "=" * 80 + "\n")


# Run the smart agent tests
await run_smart_tests()

🧠 **Smart Agent Tool Selection Test**

**Query 1: What was Apple's exact revenue in 2023?**
Call specific_search with args {'input': 'Apple 2023 total revenue exact amount (from Apple 10-K or Form 10-K for fiscal year 2023)'}
Returned: The total revenue for Apple in fiscal year 2023 was $383.3 billion.
Call specific_search with args {'input': 'Exact total net sales reported by Apple in fiscal year 2023 in its 2023 Form 10-K (full figure in dollars)'}
Returned: $383,285,000,000
Apple's total net sales (fiscal year 2023, year ended Sept. 30, 2023) were $383,285,000,000 (about $383.3 billion), per Apple’s 2023 Form 10-K.

**Query 2: Analyze Tesla's overall business transformation from 2022 to 2023**
Call comprehensive_analysis with args {'input': "Analyze Tesla's overall business transformation from 2022 to 2023 using Tesla's 2023 Form 10-K and relevant figures: changes in revenue, gross margin, operating income, net income, vehicle production and deliveries, energy and storage, services 

### Query Engine with Citations

For transparency and verification, let's create a query engine that provides inline citations linking back to source documents.


In [None]:
# Define citation components
from typing import List, Optional
from llama_index.core import QueryBundle
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.schema import NodeWithScore


class NodeCitationProcessor(BaseNodePostprocessor):
    """Add node_id to metadata for citation linking."""

    def _postprocess_nodes(
        self,
        nodes: List[NodeWithScore],
        query_bundle: Optional[QueryBundle] = None,
    ) -> List[NodeWithScore]:
        for node_with_score in nodes:
            node_with_score.node.metadata["node_id"] = node_with_score.node.node_id
        return nodes


# Citation system prompt
SYSTEM_CITATION_PROMPT = """You are answering questions using information from a knowledge base.

IMPORTANT: You must cite your sources by adding [cite:NODE_ID] after each fact or claim.

The NODE_ID is the unique identifier found in each document chunk's metadata.

Example:
- If you reference information from node "abc123", write: [cite:abc123]
- Multiple facts from the same source: "Revenue was $100B [cite:abc123] and grew 15% [cite:abc123]"
- Different sources: "Apple grew 10% [cite:node1] while Tesla grew 20% [cite:node2]"

Always cite every factual claim - this ensures transparency and allows readers to verify information."""

In [None]:
# Create query engine with citations
# Note: System prompt attached to the llm is only applied in query engines
citation_llm = OpenAI(model="gpt-5-mini", system_prompt=SYSTEM_CITATION_PROMPT)

citation_query_engine = RetrieverQueryEngine.from_args(
    chunk_retriever,
    llm=citation_llm,
    response_mode="tree_summarize",
    node_postprocessors=[NodeCitationProcessor()],
)

# Function to process citations and create readable links
import re


def process_citations_with_sources(response) -> str:
    content = str(response)
    source_nodes = response.source_nodes

    # Create a lookup: citation_id -> file info
    id_to_metadata = {str(node.id_): node.metadata for node in source_nodes}

    # Track citation order and assign human-friendly numbers
    citation_order = {}
    citation_counter = 1

    def replace(match):
        nonlocal citation_counter
        citation_id = match.group(1).strip()

        if citation_id not in citation_order:
            citation_order[citation_id] = citation_counter
            citation_counter += 1
        number = citation_order[citation_id]

        return f"({number})"

    # Replace citations with numbered references
    # Support both old [citation:id]() and new [cite:id] formats
    citation_regex_old = re.compile(r"\[citation:([^\]]+)\]\(\)")
    citation_regex_new = re.compile(r"\[cite:([^\]]+)\]")
    content = citation_regex_old.sub(replace, content)
    content = citation_regex_new.sub(replace, content)

    # Remove incomplete citation tags
    incomplete_regex = re.compile(r"\[citation:[^\]]*$")
    content = incomplete_regex.sub("", content)

    # Add references section if there are citations
    if citation_order:
        content += "\n\n**References:**\n"
        for citation_id, number in sorted(citation_order.items(), key=lambda x: x[1]):
            metadata = id_to_metadata.get(citation_id, {})
            file_name = metadata.get("file_name", "unknown")
            page_number = metadata.get("page_label", "unknown")
            content += f"{number}. {file_name} (Page {page_number})\n"

    return content


print("📚 **Citations Query Engine Ready!**")

📚 **Citations Query Engine Ready!**


In [None]:
# Test queries with citations
citation_queries = ["How did Tesla's automotive sales perform in 2023?"]

print("📚 **Query Engine with Citations**\n")

for query in citation_queries:
    print(f"**Q: {query}**")
    response = await citation_query_engine.aquery(query)

    # Process and display the response with citations
    content_with_citations = process_citations_with_sources(response)
    print(f"**A:** {content_with_citations}\n")
    print("-" * 60 + "\n")

📚 **Query Engine with Citations**

**Q: How did Tesla's automotive sales perform in 2023?**
**A:** Automotive sales revenue rose $11.30 billion (17%) in 2023 versus 2022 (1).  
The increase was driven primarily by an additional 473,382 combined Model 3 and Model Y cash deliveries from the global ramp of Model Y (1).  
This revenue growth was partially offset by a lower average selling price, sales mix effects, and a negative impact from a stronger U.S. dollar (1).  
Cost of automotive sales rose $15.52 billion (31%) in 2023, largely in line with higher deliveries, though average combined cost per unit declined due to sales mix, lower inbound freight and material costs, better fixed cost absorption, and IRA manufacturing credits (2).  
As a result, total automotive gross margin fell from 28.5% to 19.4% year-over-year, primarily because of the lower average selling price partially offset by the favorable change in cost per unit and IRA credits (2).  
Separately, automotive leasing revenu

## Summary

In this notebook, we've explored LlamaCloudIndex capabilities in a structured way:

### 🚀 Basics Covered:
1. **✅ Index Creation**: Built a managed index with 10 financial documents
2. **✅ Chunk Retrieval**: Standard retrieval for specific information
3. **✅ Simple Query Engine**: Basic RAG without complexity
4. **✅ Basic Agent**: FunctionAgent with conversational capability

### 🎯 Advanced Features:
1. **✅ File-Level Retrieval**: Comprehensive document-level analysis
2. **✅ Smart Agent**: Intelligent tool selection between retrieval strategies
3. **✅ Citations**: Enhanced transparency with source linking

### Key Takeaways:

- **Progressive Learning**: Start simple, then add complexity
- **LlamaCloudIndex** provides enterprise-grade document search
- **Multiple Retrieval Modes** optimize for different query types
- **FunctionAgent Integration** uses LLM's native function calling for efficiency
- **Citations** add transparency and trust

### Next Steps:

- Experiment with different retrieval parameters
- Try other document types or domains
- Build custom workflows and integrations
- Explore advanced agent patterns

For more examples and documentation:
- [LlamaCloudIndex Documentation](https://docs.cloud.llamaindex.ai)
- [LlamaIndex Core Documentation](https://docs.llamaindex.ai)
