# LlamaIndex framework
>LlamaIndex is the leading data framework for building LLM applications

## 1. Installation

### Default installation for OpenAI integration

```bash
pip install llama-index
```
Which includes the following packages:
- llama-index-core
- llama-index-llms-openai
- llama-index-embeddings-openai
- llama-index-program-openai
- llama-index-question-gen-openai
- llama-index-agent-openai
- llama-index-readers-file
- llama-index-multi-modal-llms-openai

**LLAMA_INDEX_CACHE_DIR** - Environment variable to set the cache directory `export LLAMA_INDEX_CACHE_DIR=/path/to/cache`

### Custom installation for local LLMs

```bash
pip install llama-index-core llama-index-readers-file llama-index-llms-ollama llama-index-embeddings-huggingface
```
Which can use local LLMs from Ollama, and indexing with Huggingface

## 2. Load Data and build an index in Memory

In [None]:
# !pip install llama-index
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load data from the directory
documents = SimpleDirectoryReader("data").load_data()
print(documents)
# Load specific file type, i.e. ".pdf"
documents1 = SimpleDirectoryReader("data", required_exts=[".pdf"]).load_data()
print(documents1)
# Load specific files, which `input_files` will override `input_dir`.
documents2 = SimpleDirectoryReader(input_dir="data", input_files=["/path/to/data/requirements.txt"]).load_data()
print(documents2)

# Build index
index = VectorStoreIndex.from_documents(documents)
print(index)

## 3. Query your data



In [None]:
query_engine = index.as_query_engine()

# Query
response = query_engine.query("`688396.SH`是哪家公司?")
print(response)

## 4. Logging - Queries and Events

In [None]:
import logging, sys

# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.basicConfig(stream=sys.stdout, level=logging.WARN)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## 5. Storing your index locally

By default, the data you just loaded is stored in memory as a series of vector embeddings.
You can save time (and requests to OpenAI) by saving the embeddings to disk. That can be done with this line:

In [None]:
index.storage_context.persist()

Of course, you don't get the benefits of persisting unless you load the data. So let's modify the code to generate and store the index if it doesn't exist, but load it if it does:

In [None]:
import os.path
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage

PERSIST_DIR = "storage"

print(os.path.getsize(PERSIST_DIR))

if os.path.exists(PERSIST_DIR) and os.path.isdir(PERSIST_DIR) and any(os.scandir(PERSIST_DIR)):
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)
else:
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    index.storage_context.persist(persist_dir=PERSIST_DIR)

query_engine = index.as_query_engine()
response = query_engine.query("`688396.SH`是哪家公司?")
print(response)

## 6. Using local LLM instead of OpenAI API

### Start Ollama serve with a chosen LLM model

```bash
# Run Ollama serve
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Exec a given LLM model
docker exec -it ollama ollama run deepseek-r1:1.5b
#docker exec -it ollama ollama run deepseek-r1:7b
```

### HuggingFace Embeddings + Ollama Inference

In [None]:
# !pip install llama-index-core llama-index-readers-file llama-index-llms-ollama llama-index-embeddings-huggingface
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama

# Load local embedding and chat LLM
Settings.embed_model = HuggingFaceEmbedding("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
Settings.llm = Ollama("deepseek-r1:1.5b", request_timeout=360.0)

# Settings.embed_model = HuggingFaceEmbedding("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
# Settings.llm = Ollama("deepseek-r1:7b", request_timeout=360.0)

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("`688396.SH`是哪家公司?")
print(response)

### 7. Using Chroma VectorStores

In [2]:
# pip install llama-index-llms-huggingface
# pip install llama-index-vector-stores-chroma
# pip install chromadb

#
# 1. Setting Embedding model
#
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
)
Settings.embed_model = embed_model

#
# 2. Setting LLM model
#
from llama_index.llms.huggingface import HuggingFaceLLM
llm = HuggingFaceLLM(
    model_name="Qwen/Qwen1.5-1.8B-Chat",
    tokenizer_name="Qwen/Qwen1.5-1.8B-Chat",
    model_kwargs={"trust_remote_code": True},
    tokenizer_kwargs={"trust_remote_code": True},
)
Settings.llm = llm

#
# 3. Setup Chroma VectorStore collection
#
import chromadb
chroma_client = chromadb.PersistentClient(path="chroma_db")
chroma_collection = chroma_client.get_or_create_collection("quickstart")
print(chroma_collection)

#
# 4. Loading Chroma VectorStore
#
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

#
# 5. Loading documents, building index and persisting to VectorStore
#
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, show_progress=True)
# print(index)

#
# 6. Query the index
#
query_engine = index.as_query_engine()
response = query_engine.query("Xtuner是什么?")
print(response)

Some parameters are on the meta device because they were offloaded to the disk.


Collection(name=quickstart)


Parsing nodes:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/12 [00:00<?, ?it/s]

KeyboardInterrupt: 