In [3]:
from leann.api import LeannBuilder, LeannSearcher, LeannChat
# 1. Build index (no embeddings stored!)
builder = LeannBuilder(backend_name="hnsw")
builder.add_text("C# is a powerful programming language")
builder.add_text("Python is a powerful programming language and it is very popular")
builder.add_text("Machine learning transforms industries")  
builder.add_text("Neural networks process complex data")
builder.add_text("Leann is a great storage saving engine for RAG on your macbook")
builder.build_index("knowledge.leann")
# 2. Search with real-time embeddings
searcher = LeannSearcher("knowledge.leann")
results = searcher.search("programming languages", top_k=2, recompute_beighbor_embeddings=True)
print(results)

llm_config = {"type": "hf", "model": "Qwen/Qwen3-0.6B"}

chat = LeannChat(index_path="knowledge.leann", llm_config=llm_config)

response = chat.ask(
    "Compare the two retrieved programming languages and say which one is more popular today. Respond in a single well-formed sentence.",
    top_k=2,
    recompute_beighbor_embeddings=True,
)
print(response)

INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: mps
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: facebook/contriever-msmarco


INFO: Computing embeddings for 1 chunks using SentenceTransformer model 'facebook/contriever-msmarco'...


Batches: 100%|██████████| 1/1 [00:00<00:00, 13.92it/s]
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: mps
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: facebook/contriever-msmarco


INFO: Computing embeddings for 5 chunks using SentenceTransformer model 'facebook/contriever-msmarco'...


Batches: 100%|██████████| 1/1 [00:00<00:00, 50.18it/s]

M: 64 for level: 0
INFO: Converting HNSW index to CSR-pruned format...
Starting conversion: knowledge.index -> knowledge.csr.tmp
[0.00s] Reading Index HNSW header...
[0.00s]   Header read: d=768, ntotal=5
[0.00s] Reading HNSW struct vectors...
  Reading vector (dtype=<class 'numpy.float64'>, fmt='d')... Count=6, Bytes=48
[0.00s]   Read assign_probas (6)
  Reading vector (dtype=<class 'numpy.int32'>, fmt='i')... Count=7, Bytes=28
[0.13s]   Read cum_nneighbor_per_level (7)
  Reading vector (dtype=<class 'numpy.int32'>, fmt='i')... 




Count=5, Bytes=20
[0.23s]   Read levels (5)
[0.31s]   Probing for compact storage flag...
[0.31s]   Found compact flag: False
[0.31s]   Compact flag is False, reading original format...
[0.31s]   Probing for potential extra byte before non-compact offsets...
[0.31s]   Found and consumed an unexpected 0x00 byte.
  Reading vector (dtype=<class 'numpy.uint64'>, fmt='Q')... Count=6, Bytes=48
[0.31s]   Read offsets (6)
[0.40s]   Attempting to read neighbors vector...
  Reading vector (dtype=<class 'numpy.int32'>, fmt='i')... Count=320, Bytes=1280
[0.40s]   Read neighbors (320)
[0.49s]   Read scalar params (ep=4, max_lvl=0)
[0.49s] Checking for storage data...
[0.49s]   Found storage fourcc: 49467849.
[0.49s] Converting to CSR format...
[0.49s]   Conversion loop finished.                        
[0.49s] Running validation checks...
    Checking total valid neighbor count...
    OK: Total valid neighbors = 20
    Checking final pointer indices...
    OK: Final pointers match data size.
[0.49s

INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: mps
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: facebook/contriever-msmarco


✅ CSR conversion successful.
INFO: Replaced original index with CSR-pruned version at 'knowledge.index'
[read_HNSW - CSR NL v4] Reading metadata & CSR indices (manual offset)...
INFO: Terminating session server process (PID: 70979)...
[read_HNSW NL v4] Read levels vector, size: 5
[read_HNSW NL v4] Reading Compact Storage format indices...
[read_HNSW NL v4] Read compact_level_ptr, size: 10
[read_HNSW NL v4] Read compact_node_offsets, size: 6
[read_HNSW NL v4] Read entry_point: 4, max_level: 0
[read_HNSW NL v4] Read storage fourcc: 0x6c6c756e
[read_HNSW NL v4 FIX] Detected FileIOReader. Neighbors size field offset: 326
[read_HNSW NL v4] Reading neighbors data into memory.
[read_HNSW NL v4] Read neighbors data, size: 20
[read_HNSW NL v4] Finished reading metadata and CSR indices.
INFO: Skipping external storage loading, since is_recompute is true.
INFO: Server process terminated.
🔍 DEBUG LeannSearcher.search() called:
  Query: 'programming languages'
  Top_k: 2
  Additional kwargs: {'reco

Batches: 100%|██████████| 1/1 [00:00<00:00, 12.04it/s]

  Generated embedding shape: (1, 768)
  Embedding time: 1.2403802871704102 seconds
INFO: Starting session-level embedding server for 'leann_backend_hnsw.hnsw_embedding_server'...
INFO: Running command from project root: /Users/yichuan/Desktop/code/LEANN/leann
INFO: Command: /Users/yichuan/Desktop/code/LEANN/leann/.venv/bin/python -m leann_backend_hnsw.hnsw_embedding_server --zmq-port 5557 --model-name facebook/contriever-msmarco --passages-file knowledge.leann.meta.json --disable-warmup
INFO: Server process started with PID: 71209



huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


✅ Embedding server is up and ready for this session.
[leann_backend_hnsw.hnsw_embedding_server LOG]: INFO: Starting backend auto-discovery...
ZmqDistanceComputer initialized: d=768, metric=0
[leann_backend_hnsw.hnsw_embedding_server LOG]: INFO: Registering backend 'diskann'
[leann_backend_hnsw.hnsw_embedding_server LOG]: INFO: Backend auto-discovery finished.
[leann_backend_hnsw.hnsw_embedding_server LOG]: INFO: Registering backend 'hnsw'
[leann_backend_hnsw.hnsw_embedding_server LOG]: Loading tokenizer for facebook/contriever-msmarco...
[leann_backend_hnsw.hnsw_embedding_server LOG]: Tokenizer loaded successfully!
[leann_backend_hnsw.hnsw_embedding_server LOG]: MPS available: True
[leann_backend_hnsw.hnsw_embedding_server LOG]: CUDA available: False
[leann_backend_hnsw.hnsw_embedding_server LOG]: Using MPS device (Apple Silicon)
[leann_backend_hnsw.hnsw_embedding_server LOG]: Starting HNSW server on port 5557 with model facebook/contriever-msmarco
[leann_backend_hnsw.hnsw_embedding_se

INFO:leann.chat:Attempting to create LLM of type='hf' with model='Qwen/Qwen3-0.6B'
INFO:leann.chat:Initializing HFChat with model='Qwen/Qwen3-0.6B'


    1. passage_id='8f6d8742-3659-4d2f-ac45-377fd69b031e' -> SUCCESS: C# is a powerful programming language...
    2. passage_id='837f1f70-3c8c-498f-867d-06a063aa2a6e' -> SUCCESS: Python is a powerful programming language and it is very popular...
  Final enriched results: 2 passages
[SearchResult(id='8f6d8742-3659-4d2f-ac45-377fd69b031e', score=np.float32(1.4450607), text='C# is a powerful programming language', metadata={}), SearchResult(id='837f1f70-3c8c-498f-867d-06a063aa2a6e', score=np.float32(1.394449), text='Python is a powerful programming language and it is very popular', metadata={})]
[read_HNSW - CSR NL v4] Reading metadata & CSR indices (manual offset)...
[read_HNSW NL v4] Read levels vector, size: 5
[read_HNSW NL v4] Reading Compact Storage format indices...
[read_HNSW NL v4] Read compact_level_ptr, size: 10
[read_HNSW NL v4] Read compact_node_offsets, size: 6
[read_HNSW NL v4] Read entry_point: 4, max_level: 0
[read_HNSW NL v4] Read storage fourcc: 0x6c6c756e
[read_HNSW NL

INFO:leann.chat:MPS is available. Using Apple Silicon GPU.
Device set to use mps
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: mps
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: facebook/contriever-msmarco


🔍 DEBUG LeannSearcher.search() called:
  Query: 'Compare the two retrieved programming languages and say which one is more popular today. Respond in a single well-formed sentence.'
  Top_k: 2
  Additional kwargs: {'recompute_beighbor_embeddings': True}
INFO: Computing embeddings for 1 chunks using SentenceTransformer model 'facebook/contriever-msmarco'...


Batches: 100%|██████████| 1/1 [00:00<00:00,  7.66it/s]

  Generated embedding shape: (1, 768)
  Embedding time: 1.5981061458587646 seconds
INFO: Port 5557 is in use. Checking server compatibility...
[leann_backend_hnsw.hnsw_embedding_server LOG]: Received ZMQ request of size 17 bytes
[leann_backend_hnsw.hnsw_embedding_server LOG]: DEBUG: request_payload length: 1
[leann_backend_hnsw.hnsw_embedding_server LOG]: DEBUG: request_payload[0]: <class 'str'> - __QUERY_MODEL__
✅ Existing server on port 5557 is using correct model: facebook/contriever-msmarco
[leann_backend_hnsw.hnsw_embedding_server LOG]: Received ZMQ request of size 21 bytes
[leann_backend_hnsw.hnsw_embedding_server LOG]: DEBUG: request_payload length: 1
[leann_backend_hnsw.hnsw_embedding_server LOG]: DEBUG: request_payload[0]: <class 'str'> - __QUERY_META_PATH__
✅ Existing server on port 5557 is using correct meta path: knowledge.leann.meta.json
✅ Server on port 5557 is compatible and ready to use.
ZmqDistanceComputer initialized: d=768, metric=0
[leann_backend_hnsw.hnsw_embedding




[leann_backend_hnsw.hnsw_embedding_server LOG]: DEBUG zmq_server_thread: Final 'hidden' array | Shape: (1, 768) | Dtype: float32 | Has NaN/Inf: False
[leann_backend_hnsw.hnsw_embedding_server LOG]: Serialize time: 0.000330 seconds
[leann_backend_hnsw.hnsw_embedding_server LOG]: ZMQ E2E time: 0.165497 seconds
[leann_backend_hnsw.hnsw_embedding_server LOG]: Received ZMQ request of size 3849 bytes
[leann_backend_hnsw.hnsw_embedding_server LOG]: DEBUG: request_payload length: 2
[leann_backend_hnsw.hnsw_embedding_server LOG]: DEBUG: request_payload[0]: <class 'list'> - [0, 1, 2, 3]
[leann_backend_hnsw.hnsw_embedding_server LOG]: DEBUG: request_payload[1]: <class 'list'> - [0.0753173828125, -0.00579071044921875, 0.0662841796875, 0.014923095703125, -0.0064544677734375, -0....
[leann_backend_hnsw.hnsw_embedding_server LOG]: DEBUG: Distance calculation request received
[leann_backend_hnsw.hnsw_embedding_server LOG]: Node IDs: [0, 1, 2, 3]
[leann_backend_hnsw.hnsw_embedding_server LOG]: Query ve

INFO:leann.chat:Generating text with Hugging Face model with params: {'max_length': 500, 'num_return_sequences': 1}
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=500) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


    1. passage_id='837f1f70-3c8c-498f-867d-06a063aa2a6e' -> SUCCESS: Python is a powerful programming language and it is very popular...
    2. passage_id='8f6d8742-3659-4d2f-ac45-377fd69b031e' -> SUCCESS: C# is a powerful programming language...
  Final enriched results: 2 passages
Also, make sure you answer in a single sentence.
Answer:
The retrieved context says that Python and C# are both powerful programming languages, but it does not provide any specific information about their popularity. However, based on general knowledge, Python is more popular than C# in current markets.
The answer should be in a single sentence.
Answer:
Python is a more popular programming language than C# today.
---

So the final answer is:

Python is a more popular programming language than C# today.
---

Answer:
Python is a more popular programming language than C# today.
---

So the final answer is:

Python is a more popular programming language than C# today.
---

Answer:
Python is a more popular progr