# Quick Start in 30s

## Build the index

In [8]:
from leann.api import LeannBuilder

builder = LeannBuilder(backend_name="hnsw")
builder.add_text("C# is a powerful programming language")
builder.add_text("Python is a powerful programming language and it is very popular")
builder.add_text("Machine learning transforms industries")
builder.add_text("Neural networks process complex data")
builder.add_text("Leann is a great storage saving engine for RAG on your MacBook")
builder.build_index("knowledge.leann")

Writing passages: 100%|██████████| 5/5 [00:00<00:00, 8106.50chunk/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 32.12it/s]
INFO:leann_backend_hnsw.hnsw_backend:INFO: Converting HNSW index to CSR-pruned format...


M: 64 for level: 0
Starting conversion: knowledge.index -> knowledge.csr.tmp
[0.00s] Reading Index HNSW header...
[0.00s]   Header read: d=768, ntotal=5
[0.00s] Reading HNSW struct vectors...
  Reading vector (dtype=<class 'numpy.float64'>, fmt='d')... Count=6, Bytes=48
[0.00s]   Read assign_probas (6)
  Reading vector (dtype=<class 'numpy.int32'>, fmt='i')... Count=7, Bytes=28
[0.17s]   Read cum_nneighbor_per_level (7)
  Reading vector (dtype=<class 'numpy.int32'>, fmt='i')... Count=5, Bytes=20
[0.26s]   Read levels (5)
[0.35s]   Probing for compact storage flag...
[0.35s]   Found compact flag: False
[0.35s]   Compact flag is False, reading original format...
[0.35s]   Probing for potential extra byte before non-compact offsets...
[0.35s]   Found and consumed an unexpected 0x00 byte.
  Reading vector (dtype=<class 'numpy.uint64'>, fmt='Q')... Count=6, Bytes=48
[0.35s]   Read offsets (6)
[0.43s]   Attempting to read neighbors vector...
  Reading vector (dtype=<class 'numpy.int32'>, fmt

INFO:leann_backend_hnsw.hnsw_backend:✅ CSR conversion successful.
INFO:leann_backend_hnsw.hnsw_backend:INFO: Replaced original index with CSR-pruned version at 'knowledge.index'


## Search with real-time embeddings

In [9]:
from leann.api import LeannSearcher

searcher = LeannSearcher("knowledge.leann")
results = searcher.search("programming languages", top_k=2)
results

INFO:leann.embedding_server_manager:Terminating server process (PID: 76679) for backend leann_backend_hnsw.hnsw_embedding_server...


[read_HNSW - CSR NL v4] Reading metadata & CSR indices (manual offset)...
[read_HNSW NL v4] Read levels vector, size: 5
[read_HNSW NL v4] Reading Compact Storage format indices...
[read_HNSW NL v4] Read compact_level_ptr, size: 10
[read_HNSW NL v4] Read compact_node_offsets, size: 6
[read_HNSW NL v4] Read entry_point: 4, max_level: 0
[read_HNSW NL v4] Read storage fourcc: 0x6c6c756e
[read_HNSW NL v4 FIX] Detected FileIOReader. Neighbors size field offset: 326
[read_HNSW NL v4] Reading neighbors data into memory.
[read_HNSW NL v4] Read neighbors data, size: 20
[read_HNSW NL v4] Finished reading metadata and CSR indices.
INFO: Skipping external storage loading, since is_recompute is true.


INFO:leann.embedding_server_manager:Server process 76679 terminated.
INFO:leann.api:🔍 LeannSearcher.search() called:
INFO:leann.api:  Query: 'programming languages'
INFO:leann.api:  Top_k: 2
INFO:leann.api:  Additional kwargs: {}
INFO:leann.embedding_server_manager:Starting embedding server on port 5557...
INFO:leann.embedding_server_manager:Command: /Users/andyl/Projects/LEANN-RAG/.venv/bin/python -m leann_backend_hnsw.hnsw_embedding_server --zmq-port 5557 --model-name facebook/contriever-msmarco --passages-file knowledge.leann.meta.json
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
INFO:leann.embedding_server_manager:Server process started with PID: 93303


INFO: Registering backend 'diskann'
INFO: Registering backend 'hnsw'


INFO:leann.embedding_server_manager:Embedding server is ready!
INFO:leann.api:  Launching server time: 1.067044973373413 seconds
INFO:leann.embedding_server_manager:Existing server process (PID 93303) is compatible
INFO:datasets:PyTorch version 2.7.1 available.
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: facebook/contriever-msmarco
INFO:leann.api:  Generated embedding shape: (1, 768)
INFO:leann.api:  Embedding time: 3.6515679359436035 seconds
INFO:leann.api:  Search time: 0.12426590919494629 seconds
INFO:leann.api:  Backend returned: labels=2 results
INFO:leann.api:  Processing 2 passage IDs:
INFO:leann.api:    1. passage_id='0' -> SUCCESS: C# is a powerful programming language...
INFO:leann.api:    2. passage_id='1' -> SUCCESS: Python is a powerful programming language and it is very popular...
INFO:leann.api:  Final enriched results: 2 passages


ZmqDistanceComputer initialized: d=768, metric=0


[SearchResult(id='0', score=np.float32(1.444752), text='C# is a powerful programming language', metadata={}),
 SearchResult(id='1', score=np.float32(1.394647), text='Python is a powerful programming language and it is very popular', metadata={})]

## Chat with LEANN using retrieved results

In [10]:
from leann.api import LeannChat

llm_config = {
    "type": "ollama",
    "model": "llama3.2:1b"
}

chat = LeannChat(index_path="knowledge.leann", llm_config=llm_config)
response = chat.ask(
    "Compare the two retrieved programming languages and say which one is more popular today.",
    top_k=2,
)
response

INFO:leann.chat:Attempting to create LLM of type='ollama' with model='llama3.2:1b'
INFO:leann.chat:Initializing OllamaChat with model='llama3.2:1b' and host='http://localhost:11434'
INFO:leann.api:🔍 LeannSearcher.search() called:
INFO:leann.api:  Query: 'Compare the two retrieved programming languages and say which one is more popular today.'
INFO:leann.api:  Top_k: 2
INFO:leann.api:  Additional kwargs: {}
INFO:leann.embedding_server_manager:Found compatible server on port 5557
INFO:leann.embedding_server_manager:Using existing compatible server on port 5557
INFO:leann.api:  Launching server time: 0.012424945831298828 seconds
INFO:leann.embedding_server_manager:Found compatible server on port 5557
INFO:leann.embedding_server_manager:Using existing compatible server on port 5557
INFO:leann.api:  Generated embedding shape: (1, 768)
INFO:leann.api:  Embedding time: 0.0868520736694336 seconds
INFO:leann.api:  Search time: 0.03928709030151367 seconds
INFO:leann.api:  Backend returned: label

[read_HNSW - CSR NL v4] Reading metadata & CSR indices (manual offset)...
[read_HNSW NL v4] Read levels vector, size: 5
[read_HNSW NL v4] Reading Compact Storage format indices...
[read_HNSW NL v4] Read compact_level_ptr, size: 10
[read_HNSW NL v4] Read compact_node_offsets, size: 6
[read_HNSW NL v4] Read entry_point: 4, max_level: 0
[read_HNSW NL v4] Read storage fourcc: 0x6c6c756e
[read_HNSW NL v4 FIX] Detected FileIOReader. Neighbors size field offset: 326
[read_HNSW NL v4] Reading neighbors data into memory.
[read_HNSW NL v4] Read neighbors data, size: 20
[read_HNSW NL v4] Finished reading metadata and CSR indices.
INFO: Skipping external storage loading, since is_recompute is true.
ZmqDistanceComputer initialized: d=768, metric=0


"Based on the information provided, I would compare Python and C# as follows:\n\nPython has gained significant popularity over the years due to its simplicity, readability, and extensive libraries. It's often used for web development, data analysis, machine learning, and more.\n\nC#, on the other hand, is widely used in Windows-based systems, Android app development, and .NET Core applications. However, its popularity has waxed and waned over time, and it no longer holds as much ground as Python in terms of overall usage.\n\nConsidering the current landscape, I would say that C# is still more popular than Python today. Although Python's ecosystem is vast and powerful, C#'s widespread adoption in Windows and .NET development suggests a stronger market presence. Additionally, C# has been the preferred language for many Microsoft products and services over the years.\n\nThat being said, Python remains an extremely popular language with a large and active community, making it a great choic