Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/reference/index-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ The default match index options are:
- `token_filters`: a list of filters to apply to normalize tokens before indexing.
- `tokenizer`: determines how input text is split into tokens.
- `bf`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
- `k`: The maximum number of bits set in the bloom filter per term. Defaults to `6`.
- `k`: The number of hash functions to use per term (each sets one bit in the bloom filter). Defaults to `6`.

**Token filters**

Expand All @@ -93,6 +93,8 @@ There are two `tokenizer`s provided: `standard` and `ngram`.
This determines the maximum number of bits that will be set in the bloom filter per term.
`k` must be an integer from `3` to `16` and defaults to `6`.

To calculate optimal values for your use case, see this [Bloom filter calculator](https://di-mgt.com.au/bloom-calculator.html).

**Caveats around n-gram tokenization**

While using n-grams as a tokenization method allows greater flexibility when doing arbitrary substring matches, it is important to bear in mind the limitations of this approach.
Expand Down