feat: implement Eq and Hash for Property#848
Open
RobertJacobsonCDC wants to merge 5 commits intomainfrom
Open
Conversation
Benchmark ResultsHyperfine
CriterionRegressions (slower)
Improvements (faster)
Unchanged / inconclusive (CI crosses 0%)
|
Collaborator
Author
Local Benchmark ResultsThis is what I get running the benchmarks on my local machine. The "regressions" all seem to be the usual high variance benchmarks. They don't appear to be real regressions. I actually don't fully understand why I'm seeing a significant performance bump in a lot of these benchmarks. Some of them I can explain. I found a better way of dealing with the hashing of multi-properties and queries. Also, when Property implements
|
| Benchmark | Base | Current | Change |
|---|---|---|---|
| init/population/10k | 3.4157 ms | 3.1341 ms | -8.2% faster ✓ |
| init/population/100k | 44.433 ms | 41.496 ms | -6.6% faster ✓ |
| execute/population/10k | 81.324 ms | 78.589 ms | -3.4% faster ✓ |
| execute/population/100k | 1.2189 s | 1.1451 s | -6.1% faster ✓ |
Criterion
Regressions
| Group | Bench | Change | CI Lower | CI Upper |
|---|---|---|---|---|
| large_dataset | bench_query_population_property_entities | 15.768% | 13.029% | 18.341% |
| large_dataset | bench_query_population_multi_unindexed_entities | 4.053% | 2.790% | 5.417% |
| large_dataset | bench_query_population_derived_property_entities | 1.719% | 1.222% | 2.196% |
| set_property | set_property_no_dependents | 6.073% | 5.853% | 6.287% |
| set_property | set_property_three_dependents | 4.675% | 3.794% | 5.582% |
| algorithm_benches | algorithm_sampling_multiple_l_reservoir | 1.733% | 1.506% | 1.934% |
| algorithm_benches | algorithm_sampling_multiple_known_length | 1.415% | 1.293% | 1.542% |
| sampling | sampling_single_unindexed_concrete_plus_derived_entities | 2.414% | 2.288% | 2.536% |
| sampling | count_and_sampling_single_unindexed_concrete_plus_derived_entiti | 2.516% | 2.409% | 2.640% |
| sample_entity_single_property_unindexed | 10000 | 6.227% | 5.731% | 6.709% |
| sample_entity_single_property_unindexed | 100000 | 28.688% | 28.287% | 29.111% |
| counts | single_property_unindexed_entities | 16.558% | 14.548% | 18.681% |
| counts | concrete_plus_derived_unindexed_entities | 2.895% | 1.689% | 4.694% |
Improvements
| Group | Bench | Change | CI Lower | CI Upper |
|---|---|---|---|---|
| large_dataset | bench_query_population_indexed_property_entities | -88.866% | -88.943% | -88.790% |
| large_dataset | bench_filter_indexed_entity | -38.729% | -39.516% | -37.909% |
| large_dataset | bench_match_entity | -4.233% | -5.034% | -3.428% |
| large_dataset | bench_query_population_multi_indexed_entities | -51.022% | -51.142% | -50.902% |
| set_property | set_property_three_dependents_mixed | -2.006% | -2.549% | -1.439% |
| sample_entity_single_property_indexed | 10000 | -63.064% | -63.214% | -62.908% |
| sample_entity_single_property_indexed | 100000 | -62.047% | -62.487% | -61.601% |
| sample_entity_single_property_indexed | 1000 | -63.172% | -63.549% | -62.741% |
| property_semantics_float_queries | float_query_with_query_results_indexed | -20.865% | -21.922% | -19.835% |
| property_semantics_float_queries | float_query_entity_count_indexed | -21.155% | -22.165% | -20.178% |
| sample_entity_whole_population | 10000 | -2.098% | -2.280% | -1.905% |
| algorithm_benches | algorithm_sampling_single_rand_reservoir | -1.216% | -1.320% | -1.095% |
| sampling | sampling_single_l_reservoir_entities | -20.510% | -20.576% | -20.429% |
| sampling | sampling_multiple_known_length_entities | -8.332% | -8.470% | -8.185% |
| sampling | sampling_multiple_l_reservoir_entities | -9.317% | -9.373% | -9.259% |
| sampling | sampling_single_known_length_entities | -62.470% | -62.706% | -62.244% |
| sampling | count_and_sampling_single_known_length_entities | -61.926% | -62.203% | -61.649% |
| property_semantics_value_change_counts | float_value_change_counter_execute | -6.790% | -7.906% | -5.705% |
| examples | example-births-deaths | -5.968% | -6.162% | -5.787% |
| sample_entity_single_property_unindexed | 1000 | -6.364% | -6.884% | -5.898% |
| property_semantics_hashing | raw_one_shot_hash_scalar | -59.481% | -60.060% | -58.905% |
| property_semantics_hashing | struct_property_hash | -46.316% | -46.846% | -45.789% |
| property_semantics_hashing | scalar_property_hash | -54.853% | -55.481% | -54.250% |
| property_semantics_hashing | multi_property_hash | -48.030% | -48.732% | -47.374% |
| property_semantics_hashing | float_property_hash | -8.910% | -10.501% | -7.306% |
| sample_entity_multi_property_indexed | 10000 | -57.980% | -58.309% | -57.649% |
| sample_entity_multi_property_indexed | 100000 | -58.471% | -58.770% | -58.175% |
| sample_entity_multi_property_indexed | 1000 | -56.881% | -57.276% | -56.457% |
| indexing | query_people_count_multiple_individually_indexed_properties_enti | -1.228% | -1.290% | -1.171% |
| indexing | query_people_count_single_indexed_property_entities | -89.242% | -89.261% | -89.224% |
| indexing | query_people_count_indexed_multi-property_entities | -56.706% | -57.692% | -55.912% |
| indexing | with_query_results_multiple_individually_indexed_properties_enti | -58.218% | -58.300% | -58.143% |
| indexing | with_query_results_indexed_multi-property_entities | -55.982% | -56.194% | -55.795% |
| indexing | query_people_multiple_individually_indexed_properties_entities | -7.542% | -7.722% | -7.293% |
| indexing | with_query_results_single_indexed_property_entities | -88.365% | -88.414% | -88.333% |
| indexing | query_people_indexed_multi-property_entities | -5.076% | -6.663% | -3.539% |
| counts | multi_property_indexed_entities | -49.622% | -49.744% | -49.492% |
| counts | index_after_adding_entities | -57.814% | -57.893% | -57.733% |
| counts | single_property_indexed_entities | -89.710% | -89.731% | -89.689% |
| counts | reindex_after_adding_more_entities | -25.211% | -25.630% | -24.694% |
| counts | multi_property_unindexed_entities | -21.721% | -22.141% | -21.227% |
Unchanged
| Group | Bench | Change | CI Lower | CI Upper |
|---|---|---|---|---|
| large_dataset | bench_filter_unindexed_entity | 0.515% | -0.750% | 1.518% |
| sample_entity_whole_population | 100000 | -1.610% | -2.682% | -0.487% |
| sample_entity_whole_population | 1000 | 1.058% | -0.092% | 2.508% |
| algorithm_benches | algorithm_sampling_single_known_length | 0.006% | -0.082% | 0.106% |
| algorithm_benches | algorithm_sampling_single_l_reservoir | -0.053% | -0.163% | 0.048% |
| sampling | sampling_single_unindexed_entities | -0.531% | -0.583% | -0.478% |
| sampling | sampling_multiple_unindexed_entities | 0.115% | 0.077% | 0.157% |
| examples | example-basic-infection | -1.336% | -1.740% | -0.912% |
| property_semantics_hashing | raw_bincode_hash_serialized_scalar | 1.178% | 0.529% | 1.833% |
| indexing | query_people_single_indexed_property_entities | -0.516% | -0.573% | -0.459% |
02f305a to
2f8b226
Compare
Collaborator
Author
|
Added to To-Do list:
|
2f8b226 to
5644270
Compare
5644270 to
2f0c638
Compare
This comment was marked as duplicate.
This comment was marked as duplicate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR completes the transition of property indexing and query lookup from serialized surrogate hashes to standard Rust
Eq/Hashsemantics. Property values and canonical values are now first-class map keys, the property macros support both normal derived implementations and manual float-style equality/hash behavior, and shared multi-property indexes use allocation-free canonical-value reconstruction instead of byte serialization.The PR is split into commits corresponding to the following phases:
Phase 0: Added baseline benchmarks for property hashing, equality, and value count indexes
ixa-bench/criterion/property_semantics.rs.ixaproper.mise run bench:create '' mainon this commit to establish a baseline for comparison.Phase 1: Make Properties
Eq + HashProperty::CanonicalValueimplementEqandHash.define_property!withimpl_eq_hash = Eq | Hash | both | neither.EqandHashautomatically.Eq/Hashsupport for types that cannot use ordinary derives.Hashviarkyvstreaming into a hasher withHasherWriter. Does not allocate.rkyvusing fixed-size archived-byte comparison withEqualityBufferWriter. Does not allocate.rkyvfor macro-generated code and updated property/macro docs so the required trait list now includesEqandHash.Phase 2: Switch Hashing To Use
HashHashpath instead ofbincode/serde.u128hashing inhashing::one_shot_128.bincodere-export/dependency from the property/indexing path.Vec<u8>just to compute a lookup hash.ixa-derivesupport for reconstructing canonical multi-property values from sorted query parts withcanonical_from_sorted_query_parts_closure.Phase 3: Remove Hash-Only Plumbing
FullIndextoHashMap<P::CanonicalValue, IndexSet<EntityId<E>>>.ValueCountIndextoHashMap<P::CanonicalValue, usize>.Property::hash_property_value,Query::multi_property_value_hash,Query::get_query, and the remaining*_with_hashquery/index plumbing.PropertyIndex,PropertyValueStore, andPropertyStoreso indexed lookup operates on canonical values rather than precomputedu128property hashes.Phase 4: Integration Tests and Macro Bug Fixes
integration-tests/ixa-runner-tests/tests/macros.rs.Notable Implementation Details
Multi-Property Query Lookup
We need a mechanism to look up queries in an index. The challenge is, the query doesn't have access to its equivalent multi-property type and so cannot directly construct the canonical value of that type. However, it only ever needs to do so in cases when that type does in fact exist.
The previous implementation used serialization to erase the type and hashing for the lookup, but it was slow and copied/allocated. The new implementation uses
&dyn Anyto erase the type but references the existing (stack local) values rather than making a heap allocation. It then forwards the erased value to the innerPropertyValueStoreCore<E, P>, which knows how to downcast and reconstitute aP:CanonicalValuefrom the type erased representation.Property::QueryParts<'a>Query::QueryParts<'a>[&dyn Any; 1]view.&dyn Anyvalues.Self::CanonicalValuefrom those already-sorted query parts usingcanonical_from_sorted_query_parts.(Age, Weight, Height)and(Weight, Height, Age)without reintroducing hash-only lookup machinery.Generic Generated
EqandHashImplementationsThe
impl_eq_hash = ...option fordefine_property/define_derived_propertygeneratesEqandHashimplementations that operate byte-wise on the property type. Previously, while we didn't use these traits, we nonetheless generated a general byte-wise hashing mechanism forProperty::hash_property_valueandQuery::multi_property_value_hash.Generating an implementation that works for any type is really tricky. You can't simply reinterpret values of a type as bytes and then operate on the bytes, because, for example, you might have a struct with padding, and the padding might contain junk data that causes values that are equal as typed values to become unequal as raw bytes. You can either develop a trait with recursive blanket implementations (as
serdeand similar crates do), or you piggyback on some crate that does this already, like an serialization crate.Our old strategy serialized the value to a
Vec<u8>(usingbincodeandserde) and then hashed those vectors in canonical order. Our new strategy usesrkyv, which uses a mechanism that avoids copying and allocation, unlikeserde. TheHashimplementation can avoid copying. TheEqimplementation still makes a copy of the values and compares the copies byte-wise, but they are copied into fixed-sized arrays allocated on the stack. While this isn't as performant as a native value comparison, it's probably the best we can expect for something applicable to completely generalCopy/ inline types. If client code needs something faster, they can either implement their ownPartialEq/EqandHash(which is generally pretty easy), or they can useordered-floatordecorumtypes and just deriveEqandHash.Out of Scope
serdeusage in reporting/global-property code untouched.PropertySourceId.value_hashforEntitySetstructural simplification—we use a hash for type-erased equality checks.HashTabledata structure forHashMapand the falloutEq/Hashserdederive/import outside the ixa internal property layer.Open Questions and Issues
impl_eq_hash = bothforf64-containing types just as obnoxious as requiring them to usedecorumorordered-float? If so, let's not even bother with synthesizedEqandHashimpls.ordered-floatis much more obnoxious. See this.HashandEqmight be reasonably performant in the common cases but are unlikely to be optimal. If this is a problem, the user can usedecorumorordered-floator implementEqandHashthemselves.deriveserde::Serializefor all properties? I droppedserde::Serializefrom the property-layer contract and macro-generated property derives.Serializeout of list of trait constraints onProperty.define_property/define_derived_propertygenerated types for compatibility with reports.derives different from the list we automatically give them, they have to declare the type themselves and then useimpl_property. (This isn't new.)To Do Before Merge
Add user documentation forUser documentation deferred to a follow-up PR so it's easier to review.f64in properties,impl_eq_hash = ...fordefine_*_propertymacrosimpl_eq_hash = ...parameter fordefine_derived_propertyquery_parts_for_valueand remove the associated typeProperty:QueryPartsintegration-tests/ixa-runner-tests/tests/macros.rsdefine_propertywith all variations ofimpl_eq_hash = ...define_derived_propertywith all variations ofimpl_eq_hash = ...