Local, typo-tolerant autocomplete for Python apps.
Turn your own text, PDFs, and DOCX files into fast local suggestions with a compact prefix index, fuzzy prefix recovery, and a local Kneser-Ney scorer.
Use it when you want autocomplete without running Elasticsearch, Meilisearch, Algolia, Typesense, or another search service.
Full documentation: https://query-autocomplete.readthedocs.io/en/latest/
query-autocomplete runs in single-process Python and still returns
steady-state suggestions in under 10 ms on this benchmark, without
Elasticsearch, Meilisearch, Redis, a vector database, or an LLM in the serving
path.
What is being measured here:
- local in-memory index build
- first suggestion after a fresh build
- steady-state suggestion latency from the local prefix/scoring runtime
- typo-tolerant autocomplete with no external service
The benchmark uses Salesforce/wikitext with the wikitext-2-raw-v1 config
from Hugging Face. It merges the train, validation, and test splits, then builds
deterministic corpus slices by word count.
Results below use 5 fresh runs per tier, 128 prompts generated from the same WikiText slice, and 5 steady-state passes per run.
Hardware
Apple M4 MacBook, 16 GB RAM
macOS
Python 3.11
Single-process, CPU-only benchmark with no GPU acceleration.
| Corpus | Words | Lines | Build mean (s) | First suggestion mean (ms) | Suggest mean (ms) | Suggest p50 (ms) | Suggest p95 (ms) | Prompts | Runs |
|---|---|---|---|---|---|---|---|---|---|
| S | 10,019 | 299 | 0.210 | 1.247 | 1.533 | 0.589 | 5.574 | 128 | 5 |
| M | 50,079 | 1,132 | 1.116 | 2.388 | 3.147 | 1.149 | 11.751 | 128 | 5 |
| L | 250,008 | 5,664 | 5.606 | 4.296 | 6.304 | 2.306 | 17.750 | 128 | 5 |
| XL | 1,000,073 | 21,765 | 27.118 | 5.370 | 9.049 | 6.744 | 22.713 | 128 | 5 |
Corpus tiers target these approximate sizes:
S: 10,000 wordsM: 50,000 wordsL: 250,000 wordsXL: 1,000,000 words
The useful takeaway: for local app autocomplete, you get fast suggestions from your own text without standing up search infrastructure.
The easiest way to understand it is:
- start with one text string in memory
- move to the document model when you want stable document IDs
- move to a persisted document store when your data needs to live in a database
Most autocomplete setups eventually turn into infrastructure work: a search server, a hosted index, background sync, operational tuning, and another moving part in your stack.
query-autocomplete is for the cases where you already have the text and want useful suggestions directly inside Python. It builds a local prefix index, handles partial words and common typos, and can keep working from an in-memory index, saved artifact, or SQLite-backed document store.
Use it when you want:
- fast local suggestions from your own text
- typo-tolerant prefix autocomplete without a search service
- a small Python-native autocomplete layer for apps, docs, internal tools, or prototypes
- an upgrade path from simple in-memory usage to persisted SQLite storage
It is probably not the right tool when you need:
- distributed search across many machines
- complex boolean filtering, faceting, or full-text ranking
- hosted multi-tenant search infrastructure
- semantic/vector search as the primary retrieval model
pip install query-autocompletePDF and DOCX readers are included in the base install. Optional chunking support is available for pysbd sentence segmentation:
pip install "query-autocomplete[chunking]"Start with one text object and get suggestions back.
The text can be short or very long. A Document can be a phrase, a page, a transcript, or something closer to book length. The tiny examples here are just for readability.
from query_autocomplete import Autocomplete, Document
index = Autocomplete.create([
Document(text="how to build a deck"),
])
print(index.suggest("how to bui", topk=5))
print(index.suggest("how to biuld", topk=5))That is the core experience: give it text, create an in-memory autocomplete, ask for suggestions.
You can also pass file paths directly. .txt, .pdf, and .docx inputs are supported in the base package:
from pathlib import Path
from query_autocomplete import Autocomplete
index = Autocomplete.create([
Path("docs/handbook.pdf"),
Path("docs/release-notes.docx"),
Path("docs/faq.txt"),
])
print(index.suggest("install", topk=5))Once you want better results, just add more text or bigger documents.
from query_autocomplete import Autocomplete, Document
index = Autocomplete.create([
Document(text="how to build a deck"),
Document(text="how to build a desk"),
Document(text="how to build with python"),
])
print(index.suggest("how to bui", topk=5))
print(index.suggest("how to build ", topk=5))This is still the simplest mode and the best place to begin.
Build or load the autocomplete once when your app starts. Do not rebuild it inside every request handler.
from fastapi import FastAPI
from query_autocomplete import Autocomplete
app = FastAPI()
index = Autocomplete.load("my-index")
# Warm before serving traffic so the first user query is not the loader.
index.warm()
@app.get("/autocomplete")
def autocomplete(q: str):
return {"suggestions": index.suggest(q, topk=5)}Cold starts are normal for local indexes: a new process has to load the compiled index into memory once. After that, suggestions are served from the in-process engine.
from query_autocomplete import Autocomplete, Document
index = Autocomplete.create([
Document(text="wireless mechanical keyboard"),
Document(text="wireless mouse for laptop"),
Document(text="usb c docking station"),
Document(text="noise cancelling headphones"),
])
def suggest_products(user_input: str) -> list[str]:
return index.suggest(user_input, topk=5)from query_autocomplete import Autocomplete, Document
index = Autocomplete.create([
Document(text="install query-autocomplete with pip"),
Document(text="create an in-memory autocomplete index"),
Document(text="save and load compiled autocomplete artifacts"),
Document(text="use AdaptiveStore with SQLite persistence"),
])
print(index.suggest("use adap", topk=5))from query_autocomplete import Autocomplete, Document
commands = [
Document(text="open settings"),
Document(text="open keyboard shortcuts"),
Document(text="create new project"),
Document(text="clear recent files"),
Document(text="toggle dark mode"),
]
palette = Autocomplete.create(commands, quality_profile="code_or_logs")
print(palette.suggest("open key", topk=3))The real unit in the library is a Document.
from query_autocomplete import Document
doc = Document(
text="how to build with python",
doc_id="doc-123",
metadata={"source": "docs"},
)Fields:
-
textThe raw text used for learning suggestions. -
doc_idOptional stable identifier for the document. -
metadataOptional JSON-like metadata kept on in-memory document objects.
For the basic in-memory flow, you usually do not need doc_id.
For persisted mutable stores, doc_id becomes important because it is the public document identity used for document management.
Document.text does not need to be short. It can be a single query-like phrase, a paragraph, a full article, a long transcript, or very large source text. The system is designed to adapt to mixed short and long documents in the same store.
One document can contain multiple lines. Internally, the library may split those lines for training.
The default profile is balanced. It turns on conservative production-quality behavior such as context-aware scoring, typo-tolerant prefix lookup, and prefix-ladder collapse.
from query_autocomplete import Autocomplete, Document
index = Autocomplete.create(
[
Document(text="how to build a deck"),
Document(text="how to build a desk"),
Document(text="how to build with python"),
],
quality_profile="precision",
max_generated_words=4,
phrase_min_count=3,
)Available profiles:
-
balancedThe default. A conservative mix of fuzzy recall, quality filtering, and clean top results. -
precisionStricter phrase mining and stronger runtime penalties for cleaner top results. -
recallKeeps more candidates and disables prefix-ladder collapse by default. Fuzzy prefix lookup remains enabled. -
code_or_logsKeeps structured tokens and code/log-like continuations more readily. -
natural_languageUses stricter phrase and diversity behavior for prose-like document collections.
Explicit BuildConfig and SuggestConfig objects override profile defaults.
Use inspect() when debugging ranking. For partial-token queries, diagnostics include prefix_match, which reports the typed fragment, the matched indexed prefix, edit distance, and whether fuzzy recovery was used.
Use inspect(...) when you want to understand why suggestions ranked the way they did.
diagnostics = index.inspect("how to bui", topk=3)
for item in diagnostics:
print(item.text, item.score)
print(item.breakdown)
print(item.expansion_trace)Each diagnostic includes:
- final score
- prior score from prefix/context evidence
- local scorer score
- structural noise penalty
- context support ratio and penalty
- length adjustment
- diversity group key
- token or phrase expansion trace
suggest(...) still returns plain list[str]; diagnostics are only returned by inspect(...).
If you want to keep a compiled autocomplete index around and load it later, you can save it as an artifact. This is a persistence helper, not the main Autocomplete mental model.
from query_autocomplete import Autocomplete, Document
index = Autocomplete.create([
Document(text="how to build a deck"),
Document(text="how to build a desk"),
])
index.save("my-index")
loaded = Autocomplete.load("my-index")
loaded.warm()
print(loaded.suggest("how to bui", topk=5))You can also create and save in one step:
from query_autocomplete import Autocomplete, Document
Autocomplete.create(
[
Document(text="how to build a deck"),
Document(text="how to build a desk"),
],
).save("my-index")Path rules:
-
index.save()Auto-creates a managed folder under.query_autocomplete_artifacts/ -
index.save("docs-v1")Saves to.query_autocomplete_artifacts/docs-v1/ -
index.save("artifacts/docs-v1")Saves to that explicit relative path -
Autocomplete.load("docs-v1")Loads from the managed artifact folder
This is persistence for a compiled serving artifact, not a mutable document database.
When your document collection needs to change over time, move to AdaptiveStore.
This is the database-backed model:
- one SQLite database is one document collection
- documents can be added and deleted over time
- the serving index is rebuilt from stored source documents
doc_idis the public identity for document management
For a proper persisted mutable document collection, use the SQL-compatible store.
from query_autocomplete import AdaptiveStore, Document
store = AdaptiveStore.open("sqlite:///autocomplete.sqlite3")
store.add_documents([
Document(text="how to build a deck", doc_id="deck"),
Document(text="how to build with python", doc_id="python"),
])
# Warm before serving traffic so the first user query is not the builder.
store.warm()
print(store.suggest("how to bui", topk=5))AdaptiveStore rebuilds the serving index when documents change. For production-style apps, call store.warm() during startup or after ingestion so the first real user request does not pay that cost.
Supported store URLs today:
sqlite:///autocomplete.sqlite3sqlite:////absolute/path/autocomplete.sqlite3- a plain path like
"./autocomplete.sqlite3"
Serving a SQLite-backed autocomplete from FastAPI:
from fastapi import FastAPI
from query_autocomplete import AdaptiveStore
app = FastAPI()
store = AdaptiveStore.open("sqlite:///autocomplete.sqlite3")
@app.on_event("startup")
def startup():
store.warm()
@app.get("/autocomplete")
def autocomplete(q: str):
return {"suggestions": store.suggest(q, topk=5)}Each adaptive SQLite database owns one document collection. Name the database file however you want; the documents and current serving index live inside that file.
Adaptive SQL persistence is SQL-first:
- source documents are stored in SQLite
- the compiled serving index cache is also stored in SQLite
- normal adaptive usage does not write
.query_autocomplete_artifacts
store.add_documents([
Document(text="how to build a deck", doc_id="deck"),
Document(text="how to build a desk", doc_id="desk"),
])Rules for adaptive mutable stores:
doc_idis optional on input and auto-generated when missingdoc_idmust be unique within the database- document content must also be unique within the database
So these are both rejected inside one database:
- same
doc_idwith different content - same content with a different
doc_id
Ingesting documents automatically invalidates the serving cache, which is rebuilt on demand the next time you query.
In adaptive stores, doc_id is the public document identity.
store.remove_document("deck")print(store.list_documents())store = AdaptiveStore.open("sqlite:///adaptive.sqlite3")store.clear()store.delete() is kept as a backwards-compatible alias for clear(). It clears the adaptive database tables but does not remove the SQLite file.
store = AdaptiveStore.open("sqlite:///adaptive.sqlite3")
copied = store.migrate("sqlite:///adaptive-copy.sqlite3")from query_autocomplete.config import SuggestConfig
autocomplete = store.with_suggest_config(SuggestConfig(default_top_k=3))
print(autocomplete.suggest("how to bui"))AdaptiveAutocomplete also supports inspect(...) with the same diagnostics as the in-memory engine:
for item in autocomplete.inspect("how to bui", topk=3):
print(item.text, item.breakdown.final_score)You can export a live in-memory autocomplete into an adaptive store:
from query_autocomplete import AdaptiveStore, Autocomplete, Document
engine = Autocomplete.create([
Document(text="how to build a deck"),
])
store = AdaptiveStore.import_autocomplete(
"sqlite:///adaptive.sqlite3",
engine=engine,
)You can also export the source documents directly:
store = AdaptiveStore.open("sqlite:///adaptive.sqlite3")
store.add_documents(engine.export_documents())An autocomplete loaded from Autocomplete.load(...) cannot be imported into an adaptive store, because artifact files are for serving and do not retain the full source-document provenance needed for mutable retraining.
You usually do not need to touch config first, but when you do:
-
BuildConfigControls index construction and compilation behavior forAdaptiveStore -
SuggestConfigControls serving behavior forstore.with_suggest_config(...)
Example:
from query_autocomplete import Autocomplete, Document
from query_autocomplete.config import BuildConfig, NormalizationConfig, SuggestConfig
build_config = BuildConfig(
max_generated_words=4,
max_indexed_prefix_chars=24,
max_context_tokens=3,
top_tokens_per_prefix=64,
top_next_tokens=32,
top_next_phrases=16,
phrase_min_count=2,
phrase_min_doc_freq=1,
phrase_min_pmi=0.0,
phrase_max_dominant_extension_ratio=0.95,
phrase_boundary_generic_min_count=8,
phrase_max_len=4,
normalization=NormalizationConfig(
lowercase=True,
unicode_nfkc=True,
strip_accents=False,
strip_punctuation=True,
),
)
suggest_config = SuggestConfig(
default_top_k=10,
default_length_bias=0.5,
max_suggestion_words=4,
beam_width=24,
token_branch_limit=8,
phrase_branch_limit=8,
prior_weight=0.35,
noise_penalty_weight=0.35,
suppress_redundant_continuations=True,
min_context_support_ratio=0.0,
context_support_penalty_weight=0.25,
collapse_prefix_ladders=True,
collapse_prefix_ladder_strategy="best",
unknown_context_strategy="skip",
normalize_phrase_scores_by_length=False,
fuzzy_prefix="auto",
max_edit_distance=2,
)
index = Autocomplete.create(
[Document(text="how to build a deck")],
build_config=build_config,
suggest_config=suggest_config,
)Most useful knobs:
BuildConfig.max_generated_wordsBuildConfig.max_context_tokensDefaults to3; values up to6are supported. Higher values are rejected because the binary context graph stores at most six-token history keys.BuildConfig.phrase_min_countBuildConfig.phrase_min_doc_freqBuildConfig.phrase_min_pmiSuggestConfig.default_top_kSuggestConfig.max_suggestion_wordsSuggestConfig.default_length_biasSuggestConfig.context_support_penalty_weightSuggestConfig.collapse_prefix_laddersSuggestConfig.collapse_prefix_ladder_strategySuggestConfig.unknown_context_strategySuggestConfig.normalize_phrase_scores_by_lengthSuggestConfig.fuzzy_prefixDefaults to"auto": exact prefix lookup is tried first, then bounded fuzzy lookup recovers common one-edit typos on non-trivial fragments.SuggestConfig.max_edit_distanceDefaults to2; serving may use a lower effective distance for short fragments to avoid noisy autocomplete matches.
Phrase quality options are build-time settings. Changing them requires rebuilding the index or adaptive serving artifact.
Runtime quality options are serving-time settings. You can override them per call:
results = index.suggest(
"how to build ",
collapse_prefix_ladders=False,
)collapse_prefix_ladders removes near-duplicate suggestions where one result is just a longer continuation of another. For example, instead of returning all of how to build, how to build a, and how to build a deck, the default keeps one representative according to collapse_prefix_ladder_strategy.
Candidate fluency is scored locally with an interpolated Kneser-Ney bigram model built from the indexed corpus. This keeps serving lightweight while giving better contextual preferences than simple add-k smoothing.
Rerankers are request-time behavior:
results = index.suggest("how to build ", reranker=my_reranker)
diagnostics = index.inspect("how to build ", reranker=my_reranker)If a request asks for longer continuations than the index was built for, the library emits a warning. For example, an index built with max_generated_words=4 warns when called with suggest(..., max_words=5).
The same warning behavior applies when serving asks for artifact detail that was not stored at build time: a partial query fragment longer than BuildConfig.max_indexed_prefix_chars, or SuggestConfig.token_branch_limit / phrase_branch_limit values larger than BuildConfig.top_next_tokens / top_next_phrases.
- The published package is built from
python-package/ - The importable library source lives in
core/src/query_autocomplete/
- This package is MIT-licensed.
- It depends on
marisa-trie, whose current published licensing isMIT AND (BSD-2-Clause OR LGPL-2.1-or-later). - See THIRD_PARTY_LICENSES.md for a short note and links to upstream metadata.
The published package is built from python-package/, while the canonical library source lives in core/src/query_autocomplete/.
Install in editable mode from the repo root:
python -m pip install -e ./python-package[dev]Run the test suite:
python -m pytest -qBuild distributable artifacts:
cd python-package
python -m buildReleases are triggered by pushing a semver tag. The GitHub Actions workflow runs tests, builds the wheel, and publishes to PyPI with trusted publishing.