Skip to content

feat: implement Eq and Hash for Property#848

Open
RobertJacobsonCDC wants to merge 5 commits intomainfrom
RobertJacobsonCDC_814_hash_for_property
Open

feat: implement Eq and Hash for Property#848
RobertJacobsonCDC wants to merge 5 commits intomainfrom
RobertJacobsonCDC_814_hash_for_property

Conversation

@RobertJacobsonCDC
Copy link
Copy Markdown
Collaborator

@RobertJacobsonCDC RobertJacobsonCDC commented Apr 17, 2026

This PR completes the transition of property indexing and query lookup from serialized surrogate hashes to standard Rust Eq/Hash semantics. Property values and canonical values are now first-class map keys, the property macros support both normal derived implementations and manual float-style equality/hash behavior, and shared multi-property indexes use allocation-free canonical-value reconstruction instead of byte serialization.

The PR is split into commits corresponding to the following phases:

Phase 0: Added baseline benchmarks for property hashing, equality, and value count indexes

  • Added ixa-bench/criterion/property_semantics.rs.
  • No code changes in ixa proper.
  • mise run bench:create '' main on this commit to establish a baseline for comparison.

Phase 1: Make Properties Eq + Hash

  • Tightened the property contract so both property values and Property::CanonicalValue implement Eq and Hash.
  • Extended define_property! with impl_eq_hash = Eq | Hash | both | neither.
  • Kept the default case simple: ordinary properties still derive Eq and Hash automatically.
  • Added macro-generated Eq/Hash support for types that cannot use ordinary derives.
  • Implemented generated Hash via rkyv streaming into a hasher with HasherWriter. Does not allocate.
  • Implemented generated equality via rkyv using fixed-size archived-byte comparison with EqualityBufferWriter. Does not allocate.
  • Re-exported rkyv for macro-generated code and updated property/macro docs so the required trait list now includes Eq and Hash.

Phase 2: Switch Hashing To Use Hash

  • Rebased property hashing on the crate’s deterministic Hash path instead of bincode/serde.
  • Centralized deterministic u128 hashing in hashing::one_shot_128.
  • Removed the old serialized-hash implementation and the bincode re-export/dependency from the property/indexing path.
  • Updated multi-property lookup so it no longer serializes components into Vec<u8> just to compute a lookup hash.
  • Added ixa-derive support for reconstructing canonical multi-property values from sorted query parts with canonical_from_sorted_query_parts_closure.

Phase 3: Remove Hash-Only Plumbing

  • Converted FullIndex to HashMap<P::CanonicalValue, IndexSet<EntityId<E>>>.
  • Converted ValueCountIndex to HashMap<P::CanonicalValue, usize>.
  • Removed the old pattern of storing canonical values inside index payloads just to compensate for hash-surrogate lookup.
  • Removed Property::hash_property_value, Query::multi_property_value_hash, Query::get_query, and the remaining *_with_hash query/index plumbing.
  • Simplified PropertyIndex, PropertyValueStore, and PropertyStore so indexed lookup operates on canonical values rather than precomputed u128 property hashes.

Phase 4: Integration Tests and Macro Bug Fixes

  • Implementation work uncovered pre-existing bugs in property-related macro implementations. We fix these.
  • Added crate-external integration tests of property-related macros to integration-tests/ixa-runner-tests/tests/macros.rs.

Notable Implementation Details

Multi-Property Query Lookup

We need a mechanism to look up queries in an index. The challenge is, the query doesn't have access to its equivalent multi-property type and so cannot directly construct the canonical value of that type. However, it only ever needs to do so in cases when that type does in fact exist.

The previous implementation used serialization to erase the type and hashing for the lookup, but it was slow and copied/allocated. The new implementation uses &dyn Any to erase the type but references the existing (stack local) values rather than making a heap allocation. It then forwards the erased value to the inner PropertyValueStoreCore<E, P>, which knows how to downcast and reconstitute a P:CanonicalValue from the type erased representation.

  • Introduced an allocation-free query-parts bridge:
    • Property::QueryParts<'a>
    • Query::QueryParts<'a>
  • Ordinary properties expose a single-part [&dyn Any; 1] view.
  • Tuple queries expose sorted arrays of component &dyn Any values.
  • Multi-properties reconstruct Self::CanonicalValue from those already-sorted query parts using canonical_from_sorted_query_parts.
  • This preserves shared-index behavior for equivalent multi-properties like (Age, Weight, Height) and (Weight, Height, Age) without reintroducing hash-only lookup machinery.

Generic Generated Eq and Hash Implementations

The impl_eq_hash = ... option for define_property / define_derived_property generates Eq and Hash implementations that operate byte-wise on the property type. Previously, while we didn't use these traits, we nonetheless generated a general byte-wise hashing mechanism for Property::hash_property_value and Query::multi_property_value_hash.

Generating an implementation that works for any type is really tricky. You can't simply reinterpret values of a type as bytes and then operate on the bytes, because, for example, you might have a struct with padding, and the padding might contain junk data that causes values that are equal as typed values to become unequal as raw bytes. You can either develop a trait with recursive blanket implementations (as serde and similar crates do), or you piggyback on some crate that does this already, like an serialization crate.

Our old strategy serialized the value to a Vec<u8> (using bincode and serde) and then hashed those vectors in canonical order. Our new strategy uses rkyv, which uses a mechanism that avoids copying and allocation, unlike serde. The Hash implementation can avoid copying. The Eq implementation still makes a copy of the values and compares the copies byte-wise, but they are copied into fixed-sized arrays allocated on the stack. While this isn't as performant as a native value comparison, it's probably the best we can expect for something applicable to completely general Copy / inline types. If client code needs something faster, they can either implement their own PartialEq / Eq and Hash (which is generally pretty easy), or they can use ordered-float or decorum types and just derive Eq and Hash.

Out of Scope

  • Left unrelated serde usage in reporting/global-property code untouched.
  • Kept narrow internal hash-based identifiers where they are still useful:
    • PropertySourceId.value_hash for EntitySet structural simplification—we use a hash for type-erased equality checks.
  • Multi-property type-ID registry
  • Changes to indexing architecture beyond
    • swapping internal HashTable data structure for HashMap and the fallout
    • removing obsolete hash-oriented API
  • Changes to query system beyond indexed property lookup.
  • Changes to canonicalization mechanism for multi-properties
  • Float-specific policy, API, helpers, etc. The changes were kept generic to types that cannot derive Eq / Hash
  • Detailed audit of serde derive/import outside the ixa internal property layer.

Open Questions and Issues

  • Is expecting users to use impl_eq_hash = both for f64-containing types just as obnoxious as requiring them to use decorum or ordered-float? If so, let's not even bother with synthesized Eq and Hash impls.
    • Answer: Using ordered-float is much more obnoxious. See this.
  • Generic impls of Hash and Eq might be reasonably performant in the common cases but are unlikely to be optimal. If this is a problem, the user can use decorum or ordered-float or implement Eq and Hash themselves.
  • Should we derive serde::Serialize for all properties? I dropped serde::Serialize from the property-layer contract and macro-generated property derives.
    • Suggest leaving Serialize out of list of trait constraints on Property.
    • We might reintroduce them in define_property / define_derived_property generated types for compatibility with reports.
    • Related: If client code needs some other derives different from the list we automatically give them, they have to declare the type themselves and then use impl_property. (This isn't new.)

To Do Before Merge

  • Add user documentation for f64 in properties, impl_eq_hash = ... for define_*_property macros User documentation deferred to a follow-up PR so it's easier to review.
  • Derive serde::Serialize for all properties
  • Implement impl_eq_hash = ... parameter for define_derived_property
  • Inline query_parts_for_value and remove the associated type Property:QueryParts
  • Add integration tests for property macros in integration-tests/ixa-runner-tests/tests/macros.rs
    • define_property with all variations of impl_eq_hash = ...
    • define_derived_property with all variations of impl_eq_hash = ...
  • Satisfy lints

@github-actions
Copy link
Copy Markdown

Benchmark Results

Hyperfine

Command Mean [ms] Min [ms] Max [ms] Relative
large_sir::baseline 2.8 ± 0.1 2.8 3.0 1.00
large_sir::entities 6.5 ± 0.1 6.4 7.1 2.30 ± 0.06

Criterion

Regressions (slower)
Group Bench Param Change CI Lower CI Upper
sample_entity sample_entity_whole_population 100000 48.080% 46.189% 50.092%
indexing query_people_single_indexed_property_entities 15.216% 15.117% 15.293%
examples example-basic-infection 4.315% 4.044% 4.584%
algorithm_benches algorithm_sampling_multiple_known_length 3.701% 2.950% 4.492%
sample_entity sample_entity_single_property_unindexed 10000 2.485% 2.139% 2.747%
Improvements (faster)
Group Bench Param Change CI Lower CI Upper
counts single_property_indexed_entities -87.160% -87.231% -87.096%
indexing query_people_count_single_indexed_property_entities -85.171% -85.238% -85.060%
large_dataset bench_query_population_indexed_property_entities -82.799% -82.890% -82.721%
indexing with_query_results_single_indexed_property_entities -82.561% -82.625% -82.515%
sampling sampling_single_known_length_entities -57.144% -57.424% -56.884%
sampling count_and_sampling_single_known_length_entities -56.508% -56.683% -56.337%
sample_entity sample_entity_single_property_indexed 1000 -56.072% -56.334% -55.797%
sample_entity sample_entity_single_property_indexed 100000 -55.865% -56.134% -55.636%
indexing with_query_results_indexed_multi-property_entities -54.821% -55.331% -54.465%
sample_entity sample_entity_single_property_indexed 10000 -54.734% -55.202% -54.385%
indexing query_people_count_indexed_multi-property_entities -52.456% -52.676% -52.215%
counts index_after_adding_entities -51.943% -52.086% -51.848%
indexing with_query_results_multiple_individually_indexed_properties_enti -50.664% -50.874% -50.473%
sample_entity sample_entity_multi_property_indexed 1000 -41.014% -41.200% -40.840%
large_dataset bench_query_population_multi_indexed_entities -40.765% -41.064% -40.523%
sample_entity sample_entity_multi_property_indexed 100000 -40.497% -40.809% -40.193%
counts multi_property_indexed_entities -40.490% -40.796% -40.152%
sample_entity sample_entity_multi_property_indexed 10000 -40.050% -40.318% -39.789%
sampling sampling_single_unindexed_concrete_plus_derived_entities -34.358% -34.826% -33.800%
sampling count_and_sampling_single_unindexed_concrete_plus_derived_entiti -32.713% -32.786% -32.622%
counts reindex_after_adding_more_entities -20.564% -20.855% -20.336%
large_dataset bench_match_entity -16.974% -17.115% -16.822%
sample_entity sample_entity_single_property_unindexed 100000 -16.018% -17.313% -14.638%
large_dataset bench_filter_indexed_entity -10.448% -19.193% -1.494%
sampling sampling_single_l_reservoir_entities -8.999% -9.615% -8.303%
indexing query_people_indexed_multi-property_entities -6.908% -7.481% -6.432%
counts multi_property_unindexed_entities -6.844% -7.180% -6.612%
sampling sampling_multiple_known_length_entities -6.239% -6.864% -5.601%
large_dataset bench_query_population_derived_property_entities -4.884% -5.620% -4.324%
sample_entity sample_entity_single_property_unindexed 1000 -4.563% -5.210% -3.950%
sampling sampling_multiple_l_reservoir_entities -3.738% -3.836% -3.638%
indexing query_people_count_multiple_individually_indexed_properties_enti -2.683% -2.792% -2.576%
algorithm_benches algorithm_sampling_single_known_length -2.396% -3.490% -1.564%
large_dataset bench_query_population_multi_unindexed_entities -2.171% -2.978% -1.517%
large_dataset bench_query_population_property_entities -2.164% -2.632% -1.736%
counts single_property_unindexed_entities -1.891% -2.543% -1.226%
Unchanged / inconclusive (CI crosses 0%)
Group Bench Param Change CI Lower CI Upper
large_dataset bench_filter_unindexed_entity 2.317% -1.806% 6.496%
algorithm_benches algorithm_sampling_multiple_l_reservoir 1.264% 0.661% 1.704%
indexing query_people_multiple_individually_indexed_properties_entities 0.864% 0.701% 1.068%
counts concrete_plus_derived_unindexed_entities -0.643% -1.126% -0.108%
examples example-births-deaths 0.589% 0.294% 0.942%
algorithm_benches algorithm_sampling_single_l_reservoir 0.547% 0.205% 0.798%
sample_entity sample_entity_whole_population 10000 0.364% 0.124% 0.595%
sample_entity sample_entity_whole_population 1000 -0.360% -0.801% 0.021%
sampling sampling_multiple_unindexed_entities -0.353% -0.495% -0.210%
algorithm_benches algorithm_sampling_single_rand_reservoir -0.028% -0.271% 0.201%
sampling sampling_single_unindexed_entities -0.006% -0.131% 0.160%

@RobertJacobsonCDC
Copy link
Copy Markdown
Collaborator Author

Local Benchmark Results

This is what I get running the benchmarks on my local machine. The "regressions" all seem to be the usual high variance benchmarks. They don't appear to be real regressions.

I actually don't fully understand why I'm seeing a significant performance bump in a lot of these benchmarks. Some of them I can explain. I found a better way of dealing with the hashing of multi-properties and queries. Also, when Property implements Eq and Hash, we can simplify indexing so it only uses 64-bit hashes instead of 128-bit hashes, which is a little faster to compute with. The rest of it? Maybe the compiler is just able to do a better job of optimizing the code? Not entirely sure.

ixa-epi-covid

main (33ab8db) vs RobertJacobsonCDC_eq_hash

Benchmark Base Current Change
init/population/10k 3.4157 ms 3.1341 ms -8.2% faster ✓
init/population/100k 44.433 ms 41.496 ms -6.6% faster ✓
execute/population/10k 81.324 ms 78.589 ms -3.4% faster ✓
execute/population/100k 1.2189 s 1.1451 s -6.1% faster ✓

Criterion

Regressions

Group Bench Change CI Lower CI Upper
large_dataset bench_query_population_property_entities 15.768% 13.029% 18.341%
large_dataset bench_query_population_multi_unindexed_entities 4.053% 2.790% 5.417%
large_dataset bench_query_population_derived_property_entities 1.719% 1.222% 2.196%
set_property set_property_no_dependents 6.073% 5.853% 6.287%
set_property set_property_three_dependents 4.675% 3.794% 5.582%
algorithm_benches algorithm_sampling_multiple_l_reservoir 1.733% 1.506% 1.934%
algorithm_benches algorithm_sampling_multiple_known_length 1.415% 1.293% 1.542%
sampling sampling_single_unindexed_concrete_plus_derived_entities 2.414% 2.288% 2.536%
sampling count_and_sampling_single_unindexed_concrete_plus_derived_entiti 2.516% 2.409% 2.640%
sample_entity_single_property_unindexed 10000 6.227% 5.731% 6.709%
sample_entity_single_property_unindexed 100000 28.688% 28.287% 29.111%
counts single_property_unindexed_entities 16.558% 14.548% 18.681%
counts concrete_plus_derived_unindexed_entities 2.895% 1.689% 4.694%

Improvements

Group Bench Change CI Lower CI Upper
large_dataset bench_query_population_indexed_property_entities -88.866% -88.943% -88.790%
large_dataset bench_filter_indexed_entity -38.729% -39.516% -37.909%
large_dataset bench_match_entity -4.233% -5.034% -3.428%
large_dataset bench_query_population_multi_indexed_entities -51.022% -51.142% -50.902%
set_property set_property_three_dependents_mixed -2.006% -2.549% -1.439%
sample_entity_single_property_indexed 10000 -63.064% -63.214% -62.908%
sample_entity_single_property_indexed 100000 -62.047% -62.487% -61.601%
sample_entity_single_property_indexed 1000 -63.172% -63.549% -62.741%
property_semantics_float_queries float_query_with_query_results_indexed -20.865% -21.922% -19.835%
property_semantics_float_queries float_query_entity_count_indexed -21.155% -22.165% -20.178%
sample_entity_whole_population 10000 -2.098% -2.280% -1.905%
algorithm_benches algorithm_sampling_single_rand_reservoir -1.216% -1.320% -1.095%
sampling sampling_single_l_reservoir_entities -20.510% -20.576% -20.429%
sampling sampling_multiple_known_length_entities -8.332% -8.470% -8.185%
sampling sampling_multiple_l_reservoir_entities -9.317% -9.373% -9.259%
sampling sampling_single_known_length_entities -62.470% -62.706% -62.244%
sampling count_and_sampling_single_known_length_entities -61.926% -62.203% -61.649%
property_semantics_value_change_counts float_value_change_counter_execute -6.790% -7.906% -5.705%
examples example-births-deaths -5.968% -6.162% -5.787%
sample_entity_single_property_unindexed 1000 -6.364% -6.884% -5.898%
property_semantics_hashing raw_one_shot_hash_scalar -59.481% -60.060% -58.905%
property_semantics_hashing struct_property_hash -46.316% -46.846% -45.789%
property_semantics_hashing scalar_property_hash -54.853% -55.481% -54.250%
property_semantics_hashing multi_property_hash -48.030% -48.732% -47.374%
property_semantics_hashing float_property_hash -8.910% -10.501% -7.306%
sample_entity_multi_property_indexed 10000 -57.980% -58.309% -57.649%
sample_entity_multi_property_indexed 100000 -58.471% -58.770% -58.175%
sample_entity_multi_property_indexed 1000 -56.881% -57.276% -56.457%
indexing query_people_count_multiple_individually_indexed_properties_enti -1.228% -1.290% -1.171%
indexing query_people_count_single_indexed_property_entities -89.242% -89.261% -89.224%
indexing query_people_count_indexed_multi-property_entities -56.706% -57.692% -55.912%
indexing with_query_results_multiple_individually_indexed_properties_enti -58.218% -58.300% -58.143%
indexing with_query_results_indexed_multi-property_entities -55.982% -56.194% -55.795%
indexing query_people_multiple_individually_indexed_properties_entities -7.542% -7.722% -7.293%
indexing with_query_results_single_indexed_property_entities -88.365% -88.414% -88.333%
indexing query_people_indexed_multi-property_entities -5.076% -6.663% -3.539%
counts multi_property_indexed_entities -49.622% -49.744% -49.492%
counts index_after_adding_entities -57.814% -57.893% -57.733%
counts single_property_indexed_entities -89.710% -89.731% -89.689%
counts reindex_after_adding_more_entities -25.211% -25.630% -24.694%
counts multi_property_unindexed_entities -21.721% -22.141% -21.227%

Unchanged

Group Bench Change CI Lower CI Upper
large_dataset bench_filter_unindexed_entity 0.515% -0.750% 1.518%
sample_entity_whole_population 100000 -1.610% -2.682% -0.487%
sample_entity_whole_population 1000 1.058% -0.092% 2.508%
algorithm_benches algorithm_sampling_single_known_length 0.006% -0.082% 0.106%
algorithm_benches algorithm_sampling_single_l_reservoir -0.053% -0.163% 0.048%
sampling sampling_single_unindexed_entities -0.531% -0.583% -0.478%
sampling sampling_multiple_unindexed_entities 0.115% 0.077% 0.157%
examples example-basic-infection -1.336% -1.740% -0.912%
property_semantics_hashing raw_bincode_hash_serialized_scalar 1.178% 0.529% 1.833%
indexing query_people_single_indexed_property_entities -0.516% -0.573% -0.459%

@RobertJacobsonCDC RobertJacobsonCDC linked an issue Apr 17, 2026 that may be closed by this pull request
Comment thread examples/basic-infection/src/people.rs Outdated
@RobertJacobsonCDC RobertJacobsonCDC force-pushed the RobertJacobsonCDC_814_hash_for_property branch from 02f305a to 2f8b226 Compare April 17, 2026 22:22
@RobertJacobsonCDC RobertJacobsonCDC marked this pull request as ready for review April 20, 2026 17:00
@RobertJacobsonCDC RobertJacobsonCDC linked an issue Apr 20, 2026 that may be closed by this pull request
@RobertJacobsonCDC
Copy link
Copy Markdown
Collaborator Author

RobertJacobsonCDC commented Apr 20, 2026

Added to To-Do list:

  • Derive serde::Serialize for all properties
  • Update docs (The Ixa Book) to cover f64 in properties, impl_eq_hash = ... parameter in define_*_property. Let's defer this to a follow-up PR for ease of review. I'd like to add a whole chapter on properties.

@RobertJacobsonCDC RobertJacobsonCDC force-pushed the RobertJacobsonCDC_814_hash_for_property branch from 2f8b226 to 5644270 Compare April 22, 2026 16:59
github-actions Bot added a commit that referenced this pull request Apr 22, 2026
@RobertJacobsonCDC RobertJacobsonCDC force-pushed the RobertJacobsonCDC_814_hash_for_property branch from 5644270 to 2f0c638 Compare April 22, 2026 17:18
@CDCgov CDCgov deleted a comment from github-actions Bot Apr 22, 2026
@github-actions

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove Query::get_query Research Hash implementation for Property types

2 participants