Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use stats #31

Closed
wants to merge 618 commits into from
Closed

use stats #31

wants to merge 618 commits into from

Conversation

PSeitz
Copy link
Owner

@PSeitz PSeitz commented Jan 13, 2023

waywardmonkeys and others added 30 commits September 19, 2022 18:10
Improvements to doc linking, grammar, etc.
Update the docs to reflect the lack of LockParams, correct an error,
and improve cross-linking.
unused and at the wrong placed
remove fast_field_cardinality from FastValue
add benchmarks for multivalued fastfield merge
* add support for deleting all documents matching query

#1494
This started showing up with clippy in rust 1.64.
Use u8::from(bool), u64::from(bool).
fixes multivalue ff regression by avoiding using `get_val`. Line::train calls repeatedly get_val, but get_val implementation on Column for multivalues is very slow. The fix is to use the iterator instead. Longterm fix should be to remove get_val access in serialization.

Old Code

test fastfield::bench::bench_multi_value_ff_merge_few_segments                                                           ... bench:  46,103,960 ns/iter (+/- 2,066,083)
test fastfield::bench::bench_multi_value_ff_merge_many_segments                                                          ... bench:  83,073,036 ns/iter (+/- 4,373,615)
est fastfield::bench::bench_multi_value_ff_merge_many_segments_log_merge                                                ... bench:  64,178,576 ns/iter (+/- 1,466,700)

Current

running 3 tests
test fastfield::multivalued::bench::bench_multi_value_ff_merge_few_segments                                              ... bench:  57,379,523 ns/iter (+/- 3,220,787)
test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments                                             ... bench:  90,831,688 ns/iter (+/- 1,445,486)
test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments_log_merge                                   ... bench: 158,313,264 ns/iter (+/- 28,823,250)

With Fix

running 3 tests
test fastfield::multivalued::bench::bench_multi_value_ff_merge_few_segments                                              ... bench:  57,635,671 ns/iter (+/- 2,707,361)
test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments                                             ... bench:  91,468,712 ns/iter (+/- 11,393,581)
test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments_log_merge                                   ... bench:  73,909,138 ns/iter (+/- 15,846,097)
clippy: Remove borrows that the compiler will do.
* Checking cfg_attr is no longer necessary.
* Don't need multiple `clippy::` prefixes on a name.
* use binary search instead of linear for get_val in merge

* use partition_point
This was removed in 2018, so these should be fine to remove now.
This is only used within the file watcher and is const, so it
can't be configured.
When building without default features (so without mmap, etc),
there are some warnings about unused things. This fixes the
ones related to `ArcBytes` and `WeakArcBytes`, which are only
used with the `mmap_directory` code.
Updates the requirements on [tantivy-fst](https://github.com/tantivy-search/fst) to permit the latest version.
- [Release notes](https://github.com/tantivy-search/fst/releases)
- [Commits](https://github.com/tantivy-search/fst/commits)

---
updated-dependencies:
- dependency-name: tantivy-fst
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
…st-0.4.0

Update tantivy-fst requirement from 0.3.0 to 0.4.0
fulmicoton and others added 27 commits December 22, 2022 14:29
* Supporting PartialCmp in VectorColumn.
* Apply suggestions from code review

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
* doc: update comments in the faceted search example

* chore: format
…1750)

* Make nightly Clippy mostly happy.

* Document how to produce TermSetQuery queries using QueryParser.

* Enable construction of queries using FuzzyTermQuery via the QueryParser

* Use FxHashMap instead of HashMap in the QueryParser as these hash tables are not exposed to DoS attacks.

* Use a struct instead of a tuple to improve readability.
Introduce MakeZero trait, remove make_zero from FastValue
Merge two multivalue fastfield implementations into one
prepare range query on fastfield for different types
* Added support for dynamic fast field.

See README for more information.

* Apply suggestions from code review

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
closes #1766

Finding tantivy tokenizers is a frustrating experience currently, since
they need be updated for each tantivy version. That's unnecessary since
the API is rather stable anyway.
* enable range query on fast field for u64 compatible types

* rename, update benches
Updates the requirements on [base64](https://github.com/marshallpierce/rust-base64) to permit the latest version.
- [Release notes](https://github.com/marshallpierce/rust-base64/releases)
- [Changelog](https://github.com/marshallpierce/rust-base64/blob/master/RELEASE-NOTES.md)
- [Commits](marshallpierce/rust-base64@v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: base64
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This adds a regex tokenizer which tokenizes the text by using a
regex pattern to split.

Co-authored-by: Michael Kleen <mkleen@gmailw.com>
…-not-indexed-fields

Allow range queries via fast fields on non-indexed fields
* handle user input on get_docid_for_value_range

fixes #1757

* pass range as parameter
@codecov-commenter
Copy link

Codecov Report

Merging #31 (78273bf) into main (2e255c4) will increase coverage by 0.06%.
The diff coverage is 94.55%.

@@            Coverage Diff             @@
##             main      #31      +/-   ##
==========================================
+ Coverage   94.05%   94.12%   +0.06%     
==========================================
  Files         230      282      +52     
  Lines       39350    53079   +13729     
==========================================
+ Hits        37012    49960   +12948     
- Misses       2338     3119     +781     
Impacted Files Coverage Δ
common/src/writer.rs 94.11% <ø> (ø)
fastfield_codecs/src/main.rs 0.51% <0.00%> (-0.44%) ⬇️
query-grammar/src/occur.rs 100.00% <ø> (ø)
src/collector/custom_score_top_collector.rs 100.00% <ø> (ø)
src/collector/filter_collector_wrapper.rs 84.84% <ø> (ø)
src/collector/histogram_collector.rs 98.98% <ø> (-0.51%) ⬇️
src/collector/mod.rs 47.75% <ø> (-0.47%) ⬇️
src/collector/multi_collector.rs 99.30% <ø> (ø)
src/collector/tests.rs 98.18% <ø> (+0.02%) ⬆️
src/collector/top_collector.rs 97.60% <ø> (ø)
... and 395 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@PSeitz PSeitz closed this Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet