Skip to content

feat: SourceSet::PopulationRange subsumes Empty, Entity, and WholePopulation variants#854

Merged
RobertJacobsonCDC merged 1 commit intomainfrom
RobertJacobsonCDC_range_entity_set
Apr 28, 2026
Merged

feat: SourceSet::PopulationRange subsumes Empty, Entity, and WholePopulation variants#854
RobertJacobsonCDC merged 1 commit intomainfrom
RobertJacobsonCDC_range_entity_set

Conversation

@RobertJacobsonCDC
Copy link
Copy Markdown
Collaborator

@RobertJacobsonCDC RobertJacobsonCDC commented Apr 23, 2026

Added SourceSet::PopulationRange Variant

This PR consolidates contiguous entity-ID sources around PopulationRange and carries that representation through the entity_set stack. SourceSet no longer has separate empty, whole-population, or singleton variants; those cases are now expressed canonically as ranges, with helper constructors for empty, singleton, and general population ranges. The iterator side was simplified in the same direction: SourceIterator now relies on the range-backed path for empty and whole-population iteration, and EntitySetIteratorInner no longer has its own dedicated empty variant.

On top of that representation cleanup, EntitySet now has basic range-aware set algebra. Unions of overlapping or adjacent ranges collapse to a single range, intersections of ranges collapse to their overlap, and range differences simplify when the result is still empty or one contiguous range; split differences continue to use the generic expression path. The work also removed the older helper-driven empty/universal/singleton reductions that depended on the deleted variants and replaced them with a single as_range() helper used only for the new range optimizations.

The branch also updates downstream query construction and tests to match the new canonical form. Empty queries and full-population query results now construct range-backed sources, docs/comments in entity_set were refreshed to describe the new model, and the entity_set/iterator test coverage was rewritten and expanded to cover canonical empty/singleton/full-population ranges, optimized range operations, and the remaining non-optimized split-difference behavior.

This PR replaces #826.

@RobertJacobsonCDC RobertJacobsonCDC linked an issue Apr 23, 2026 that may be closed by this pull request
github-actions Bot added a commit that referenced this pull request Apr 23, 2026
@RobertJacobsonCDC RobertJacobsonCDC force-pushed the RobertJacobsonCDC_range_entity_set branch from b3bb846 to 4e1b971 Compare April 23, 2026 22:00
@github-actions
Copy link
Copy Markdown

Benchmark Results

Hyperfine

Command Mean [ms] Min [ms] Max [ms] Relative
large_sir::baseline 3.0 ± 0.0 3.0 3.1 1.00
large_sir::entities 12.2 ± 0.3 11.8 13.9 4.00 ± 0.11

Criterion

Regressions (slower)
Group Bench Param Change CI Lower CI Upper
sample_entity sample_entity_single_property_unindexed 1000 18.752% 17.767% 19.748%
sample_entity sample_entity_single_property_unindexed 10000 8.144% 7.424% 8.940%
algorithm_benches algorithm_sampling_multiple_known_length 4.507% 4.102% 4.995%
algorithm_benches algorithm_sampling_multiple_l_reservoir 4.038% 3.313% 4.913%
examples example-births-deaths 2.647% 2.311% 3.162%
large_dataset bench_query_population_multi_indexed_entities 2.371% 1.811% 2.921%
large_dataset bench_query_population_derived_property_entities 1.961% 1.627% 2.390%
sampling sampling_single_unindexed_entities 1.910% 1.659% 2.202%
examples example-basic-infection 1.733% 1.061% 2.282%
sampling sampling_multiple_known_length_entities 1.552% 1.148% 1.894%
counts single_property_indexed_entities 1.463% 1.050% 1.864%
Improvements (faster)
Group Bench Param Change CI Lower CI Upper
sample_entity sample_entity_whole_population 100000 -42.560% -43.097% -41.984%
sample_entity sample_entity_whole_population 10000 -40.817% -41.313% -40.320%
sample_entity sample_entity_whole_population 1000 -28.117% -28.551% -27.697%
indexing query_people_single_indexed_property_entities -14.159% -15.780% -12.423%
sample_entity sample_entity_multi_property_indexed 1000 -10.238% -10.514% -9.878%
sample_entity sample_entity_multi_property_indexed 10000 -9.778% -10.197% -9.491%
sample_entity sample_entity_multi_property_indexed 100000 -9.541% -9.816% -9.127%
indexing query_people_indexed_multi-property_entities -9.540% -9.833% -9.259%
large_dataset bench_filter_unindexed_entity -8.657% -11.444% -5.947%
sample_entity sample_entity_single_property_indexed 1000 -6.536% -6.982% -6.203%
sampling sampling_multiple_l_reservoir_entities -6.476% -6.859% -6.207%
sample_entity sample_entity_single_property_indexed 100000 -5.635% -6.136% -5.146%
sampling sampling_single_l_reservoir_entities -5.614% -6.043% -5.179%
sample_entity sample_entity_single_property_indexed 10000 -5.595% -5.894% -5.176%
indexing query_people_count_multiple_individually_indexed_properties_enti -3.243% -3.684% -2.661%
sampling count_and_sampling_single_unindexed_concrete_plus_derived_entiti -2.332% -2.594% -2.085%
sampling sampling_single_unindexed_concrete_plus_derived_entities -2.177% -2.503% -1.857%
counts reindex_after_adding_more_entities -1.752% -2.012% -1.484%
sample_entity sample_entity_single_property_unindexed 100000 -1.632% -1.987% -1.256%
indexing query_people_count_single_indexed_property_entities -1.547% -1.776% -1.293%
Unchanged / inconclusive (CI crosses 0%)
Group Bench Param Change CI Lower CI Upper
indexing query_people_multiple_individually_indexed_properties_entities -1.223% -1.732% -0.793%
sampling sampling_single_known_length_entities -1.167% -1.737% -0.568%
counts multi_property_indexed_entities -0.993% -1.234% -0.805%
indexing with_query_results_single_indexed_property_entities -0.987% -1.357% -0.642%
sampling count_and_sampling_single_known_length_entities -0.898% -1.726% -0.015%
indexing query_people_count_indexed_multi-property_entities -0.768% -1.052% -0.488%
counts index_after_adding_entities 0.761% 0.413% 1.023%
algorithm_benches algorithm_sampling_single_known_length 0.731% 0.435% 1.151%
counts multi_property_unindexed_entities -0.676% -0.879% -0.483%
indexing with_query_results_multiple_individually_indexed_properties_enti 0.511% 0.243% 0.867%
large_dataset bench_query_population_property_entities -0.483% -1.242% 0.202%
large_dataset bench_query_population_multi_unindexed_entities 0.357% -0.188% 0.728%
large_dataset bench_query_population_indexed_property_entities -0.344% -0.682% 0.115%
sampling sampling_multiple_unindexed_entities -0.273% -0.684% 0.122%
large_dataset bench_filter_indexed_entity 0.249% -7.399% 8.230%
counts concrete_plus_derived_unindexed_entities 0.220% -0.377% 0.855%
algorithm_benches algorithm_sampling_single_rand_reservoir -0.176% -0.490% 0.071%
large_dataset bench_match_entity -0.158% -0.726% 0.330%
indexing with_query_results_indexed_multi-property_entities -0.150% -0.483% 0.187%
counts single_property_unindexed_entities -0.044% -0.566% 0.409%
algorithm_benches algorithm_sampling_single_l_reservoir 0.037% -0.119% 0.206%

github-actions Bot added a commit that referenced this pull request Apr 23, 2026
@RobertJacobsonCDC RobertJacobsonCDC force-pushed the RobertJacobsonCDC_range_entity_set branch from 4e1b971 to 6183d8e Compare April 23, 2026 22:26
@CDCgov CDCgov deleted a comment from github-actions Bot Apr 23, 2026
@github-actions
Copy link
Copy Markdown

Benchmark Results

Hyperfine

Command Mean [ms] Min [ms] Max [ms] Relative
large_sir::baseline 2.9 ± 0.0 2.8 3.0 1.00
large_sir::entities 12.6 ± 0.1 12.4 13.0 4.37 ± 0.08

Criterion

Regressions (slower)
Group Bench Param Change CI Lower CI Upper
examples example-basic-infection 16.239% 15.603% 16.852%
sample_entity sample_entity_whole_population 1000 14.457% 12.752% 16.142%
large_dataset bench_filter_indexed_entity 14.299% 4.598% 25.165%
indexing with_query_results_indexed_multi-property_entities 6.992% 6.015% 8.012%
large_dataset bench_match_entity 6.731% 6.302% 7.220%
indexing query_people_count_indexed_multi-property_entities 6.399% 6.135% 6.632%
algorithm_benches algorithm_sampling_multiple_known_length 6.397% 5.930% 6.925%
sampling count_and_sampling_single_known_length_entities 5.973% 4.142% 8.092%
large_dataset bench_query_population_derived_property_entities 3.487% 2.955% 4.163%
algorithm_benches algorithm_sampling_multiple_l_reservoir 2.361% 1.809% 2.882%
sampling sampling_multiple_known_length_entities 2.120% 1.708% 2.574%
sampling sampling_multiple_unindexed_entities 2.041% 1.669% 2.415%
large_dataset bench_query_population_indexed_property_entities 1.809% 1.283% 2.356%
sampling sampling_single_unindexed_entities 1.177% 1.109% 1.246%
Improvements (faster)
Group Bench Param Change CI Lower CI Upper
sample_entity sample_entity_whole_population 100000 -39.753% -40.323% -39.208%
sample_entity sample_entity_whole_population 10000 -38.858% -39.520% -38.102%
indexing with_query_results_single_indexed_property_entities -32.381% -32.516% -32.243%
sample_entity sample_entity_single_property_unindexed 100000 -22.316% -22.516% -22.130%
indexing query_people_single_indexed_property_entities -19.642% -19.776% -19.484%
indexing query_people_indexed_multi-property_entities -19.082% -19.399% -18.725%
sampling sampling_single_l_reservoir_entities -14.698% -18.061% -11.607%
indexing query_people_count_multiple_individually_indexed_properties_enti -5.808% -5.913% -5.709%
indexing query_people_multiple_individually_indexed_properties_entities -5.120% -5.410% -4.812%
indexing query_people_count_single_indexed_property_entities -4.566% -4.881% -4.109%
sample_entity sample_entity_single_property_unindexed 10000 -3.840% -4.241% -3.578%
sample_entity sample_entity_multi_property_indexed 1000 -3.806% -3.994% -3.639%
indexing with_query_results_multiple_individually_indexed_properties_enti -3.649% -3.883% -3.444%
counts single_property_indexed_entities -2.902% -3.320% -2.476%
sample_entity sample_entity_single_property_indexed 100000 -2.892% -3.314% -2.584%
sample_entity sample_entity_multi_property_indexed 10000 -2.624% -3.047% -2.265%
large_dataset bench_query_population_multi_unindexed_entities -2.413% -3.964% -1.060%
sample_entity sample_entity_multi_property_indexed 100000 -2.161% -2.730% -1.262%
sampling sampling_single_unindexed_concrete_plus_derived_entities -1.997% -2.264% -1.713%
sample_entity sample_entity_single_property_indexed 1000 -1.793% -2.395% -1.389%
sampling count_and_sampling_single_unindexed_concrete_plus_derived_entiti -1.538% -1.927% -1.193%
Unchanged / inconclusive (CI crosses 0%)
Group Bench Param Change CI Lower CI Upper
large_dataset bench_filter_unindexed_entity -2.169% -5.569% 1.080%
examples example-births-deaths 1.955% 0.472% 2.825%
sampling sampling_multiple_l_reservoir_entities -1.468% -2.294% -0.453%
large_dataset bench_query_population_multi_indexed_entities 1.422% 0.910% 1.851%
algorithm_benches algorithm_sampling_single_known_length -1.241% -2.133% -0.499%
counts index_after_adding_entities 0.837% 0.680% 0.988%
sampling sampling_single_known_length_entities 0.770% -0.143% 1.693%
counts multi_property_unindexed_entities 0.717% 0.399% 1.087%
algorithm_benches algorithm_sampling_single_l_reservoir -0.706% -1.023% -0.493%
counts concrete_plus_derived_unindexed_entities -0.440% -0.851% -0.088%
sample_entity sample_entity_single_property_unindexed 1000 -0.412% -0.825% -0.040%
algorithm_benches algorithm_sampling_single_rand_reservoir -0.283% -0.686% 0.146%
sample_entity sample_entity_single_property_indexed 10000 -0.281% -1.272% 0.804%
counts single_property_unindexed_entities -0.163% -0.640% 0.180%
large_dataset bench_query_population_property_entities 0.151% -0.539% 1.060%
counts reindex_after_adding_more_entities -0.109% -0.394% 0.167%
counts multi_property_indexed_entities 0.005% -0.214% 0.217%

github-actions Bot added a commit that referenced this pull request Apr 23, 2026
@RobertJacobsonCDC
Copy link
Copy Markdown
Collaborator Author

Trivial optimizations are implemented when both sets are SourceSet::PopulationRange, but it makes sense to include trivial optimizations for, say, an arbitrary set A and the empty set, for example.

Copy link
Copy Markdown
Collaborator

@cdc-as81 cdc-as81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RobertJacobsonCDC RobertJacobsonCDC force-pushed the RobertJacobsonCDC_range_entity_set branch from 6183d8e to 2bd5f82 Compare April 28, 2026 17:38
@github-actions
Copy link
Copy Markdown

Benchmark Results

Hyperfine

Command Mean [ms] Min [ms] Max [ms] Relative
large_sir::baseline 3.1 ± 0.1 3.0 3.2 1.00
large_sir::entities 12.4 ± 0.2 12.1 13.1 4.03 ± 0.10

Criterion

Regressions (slower)
Group Bench Param Change CI Lower CI Upper
large_dataset bench_query_population_multi_indexed_entities 75.055% 73.766% 76.424%
counts multi_property_indexed_entities 56.993% 56.110% 57.788%
indexing query_people_count_indexed_multi-property_entities 40.645% 40.121% 41.192%
indexing with_query_results_indexed_multi-property_entities 37.206% 36.314% 38.098%
sample_entity sample_entity_single_property_unindexed 10000 27.306% 26.861% 27.819%
sample_entity sample_entity_multi_property_indexed 1000 24.893% 24.270% 25.397%
sample_entity sample_entity_multi_property_indexed 10000 23.896% 22.044% 25.370%
sample_entity sample_entity_multi_property_indexed 100000 20.020% 16.481% 23.234%
sample_entity sample_entity_single_property_unindexed 1000 18.068% 15.813% 20.379%
sampling count_and_sampling_single_known_length_entities 7.216% 6.261% 8.267%
indexing query_people_single_indexed_property_entities 4.965% 4.234% 5.744%
algorithm_benches algorithm_sampling_multiple_l_reservoir 3.770% 3.166% 4.511%
sampling sampling_single_known_length_entities 3.616% 2.645% 4.609%
large_dataset bench_query_population_derived_property_entities 3.058% 2.642% 3.427%
examples example-basic-infection 2.982% 2.113% 3.631%
sampling sampling_multiple_known_length_entities 2.223% 1.868% 2.614%
Improvements (faster)
Group Bench Param Change CI Lower CI Upper
large_dataset bench_filter_unindexed_entity -58.032% -58.946% -57.086%
large_dataset bench_query_population_property_entities -23.614% -24.177% -23.028%
indexing query_people_indexed_multi-property_entities -19.987% -20.606% -19.539%
large_dataset bench_query_population_multi_unindexed_entities -17.379% -17.628% -16.968%
sample_entity sample_entity_whole_population 10000 -11.394% -11.910% -10.953%
sample_entity sample_entity_whole_population 100000 -8.589% -8.853% -8.198%
sample_entity sample_entity_whole_population 1000 -7.884% -8.430% -7.395%
indexing query_people_count_multiple_individually_indexed_properties_enti -6.272% -6.541% -5.947%
sampling sampling_multiple_l_reservoir_entities -5.981% -6.280% -5.701%
examples example-births-deaths -5.428% -5.779% -4.975%
large_dataset bench_query_population_indexed_property_entities -4.354% -4.768% -3.768%
indexing query_people_count_single_indexed_property_entities -4.284% -4.711% -3.936%
sample_entity sample_entity_single_property_indexed 1000 -4.199% -4.615% -3.912%
counts multi_property_unindexed_entities -4.185% -5.932% -2.551%
sample_entity sample_entity_single_property_indexed 10000 -4.000% -4.536% -3.572%
sampling sampling_single_l_reservoir_entities -3.811% -4.738% -2.914%
indexing with_query_results_single_indexed_property_entities -3.644% -4.039% -3.358%
sample_entity sample_entity_single_property_indexed 100000 -3.497% -4.113% -2.610%
sampling count_and_sampling_single_unindexed_concrete_plus_derived_entiti -3.175% -3.243% -3.096%
large_dataset bench_match_entity -3.066% -4.478% -2.216%
indexing query_people_multiple_individually_indexed_properties_entities -3.035% -3.126% -2.932%
sampling sampling_single_unindexed_concrete_plus_derived_entities -2.460% -2.596% -2.283%
counts reindex_after_adding_more_entities -1.569% -1.691% -1.445%
counts single_property_indexed_entities -1.456% -1.735% -1.073%
Unchanged / inconclusive (CI crosses 0%)
Group Bench Param Change CI Lower CI Upper
large_dataset bench_filter_indexed_entity 6.252% -1.010% 13.721%
sample_entity sample_entity_single_property_unindexed 100000 -0.975% -1.232% -0.638%
sampling sampling_multiple_unindexed_entities -0.955% -1.056% -0.811%
counts single_property_unindexed_entities -0.645% -1.580% 0.171%
algorithm_benches algorithm_sampling_multiple_known_length 0.524% 0.063% 0.992%
counts concrete_plus_derived_unindexed_entities -0.237% -0.765% 0.251%
algorithm_benches algorithm_sampling_single_l_reservoir 0.180% -0.367% 0.655%
counts index_after_adding_entities -0.108% -0.337% 0.074%
indexing with_query_results_multiple_individually_indexed_properties_enti -0.042% -0.227% 0.137%
algorithm_benches algorithm_sampling_single_known_length -0.042% -0.342% 0.190%
sampling sampling_single_unindexed_entities 0.033% -0.051% 0.118%
algorithm_benches algorithm_sampling_single_rand_reservoir -0.032% -0.284% 0.242%

github-actions Bot added a commit that referenced this pull request Apr 28, 2026
@RobertJacobsonCDC RobertJacobsonCDC force-pushed the RobertJacobsonCDC_range_entity_set branch from 2bd5f82 to 3b16021 Compare April 28, 2026 18:34
github-actions Bot added a commit that referenced this pull request Apr 28, 2026
@github-actions
Copy link
Copy Markdown

Benchmark Results

Hyperfine

Command Mean [ms] Min [ms] Max [ms] Relative
large_sir::baseline 2.8 ± 0.1 2.7 3.2 1.00
large_sir::entities 6.7 ± 0.1 6.5 6.9 2.35 ± 0.08

Criterion

Regressions (slower)
Group Bench Param Change CI Lower CI Upper
large_dataset bench_match_entity 14.191% 13.612% 14.704%
large_dataset bench_query_population_indexed_property_entities 7.766% 7.577% 7.944%
large_dataset bench_query_population_multi_indexed_entities 5.954% 5.564% 6.365%
counts single_property_indexed_entities 4.778% 4.612% 4.938%
indexing query_people_count_single_indexed_property_entities 3.955% 3.795% 4.152%
sampling sampling_single_known_length_entities 3.130% 2.655% 3.605%
counts reindex_after_adding_more_entities 3.058% 2.785% 3.340%
sampling count_and_sampling_single_known_length_entities 3.000% 2.683% 3.276%
examples example-basic-infection 2.401% 1.769% 2.953%
indexing query_people_multiple_individually_indexed_properties_entities 2.169% 1.918% 2.405%
sample_entity sample_entity_whole_population 1000 2.102% 1.845% 2.331%
sample_entity sample_entity_whole_population 10000 1.847% 1.578% 2.110%
Improvements (faster)
Group Bench Param Change CI Lower CI Upper
sample_entity sample_entity_single_property_unindexed 10000 -44.222% -44.643% -43.883%
indexing query_people_single_indexed_property_entities -8.728% -8.839% -8.619%
sample_entity sample_entity_single_property_unindexed 1000 -8.504% -9.111% -7.887%
sample_entity sample_entity_single_property_indexed 100000 -8.098% -8.703% -7.543%
indexing with_query_results_indexed_multi-property_entities -6.660% -6.786% -6.525%
sample_entity sample_entity_whole_population 100000 -6.190% -7.148% -5.205%
sampling sampling_single_unindexed_concrete_plus_derived_entities -5.960% -6.174% -5.717%
indexing query_people_indexed_multi-property_entities -5.607% -6.421% -4.707%
sample_entity sample_entity_multi_property_indexed 10000 -5.513% -6.021% -5.133%
indexing with_query_results_single_indexed_property_entities -5.450% -5.608% -5.295%
counts multi_property_indexed_entities -5.164% -5.576% -4.742%
indexing query_people_count_indexed_multi-property_entities -4.779% -4.981% -4.596%
sample_entity sample_entity_single_property_indexed 10000 -4.021% -4.265% -3.834%
sample_entity sample_entity_single_property_indexed 1000 -3.547% -4.161% -2.731%
sampling count_and_sampling_single_unindexed_concrete_plus_derived_entiti -2.807% -3.084% -2.511%
large_dataset bench_query_population_derived_property_entities -2.795% -3.344% -2.213%
algorithm_benches algorithm_sampling_multiple_known_length -2.503% -2.838% -2.278%
sampling sampling_multiple_l_reservoir_entities -1.988% -2.318% -1.737%
indexing with_query_results_multiple_individually_indexed_properties_enti -1.911% -2.089% -1.759%
Unchanged / inconclusive (CI crosses 0%)
Group Bench Param Change CI Lower CI Upper
sampling sampling_single_l_reservoir_entities 1.868% 0.770% 2.703%
examples example-births-deaths 1.044% 0.759% 1.309%
sample_entity sample_entity_single_property_unindexed 100000 -0.957% -1.237% -0.683%
algorithm_benches algorithm_sampling_multiple_l_reservoir -0.955% -1.431% -0.424%
large_dataset bench_query_population_multi_unindexed_entities -0.935% -1.200% -0.690%
sample_entity sample_entity_multi_property_indexed 1000 -0.871% -1.071% -0.706%
large_dataset bench_filter_unindexed_entity -0.846% -5.509% 3.758%
algorithm_benches algorithm_sampling_single_l_reservoir -0.828% -0.953% -0.701%
counts multi_property_unindexed_entities 0.638% 0.349% 0.890%
counts index_after_adding_entities -0.478% -0.753% -0.269%
large_dataset bench_filter_indexed_entity 0.451% -12.253% 13.453%
indexing query_people_count_multiple_individually_indexed_properties_enti 0.319% 0.045% 0.567%
algorithm_benches algorithm_sampling_single_known_length -0.210% -0.474% 0.068%
counts concrete_plus_derived_unindexed_entities 0.163% -0.187% 0.473%
sampling sampling_single_unindexed_entities -0.132% -0.256% -0.037%
large_dataset bench_query_population_property_entities 0.108% -0.332% 0.786%
sampling sampling_multiple_known_length_entities -0.023% -0.782% 0.642%
algorithm_benches algorithm_sampling_single_rand_reservoir -0.023% -0.135% 0.076%
sampling sampling_multiple_unindexed_entities -0.021% -0.166% 0.148%
sample_entity sample_entity_multi_property_indexed 100000 -0.012% -0.363% 0.374%
counts single_property_unindexed_entities 0.008% -0.815% 0.922%

@RobertJacobsonCDC RobertJacobsonCDC merged commit aa7b702 into main Apr 28, 2026
22 checks passed
@RobertJacobsonCDC RobertJacobsonCDC deleted the RobertJacobsonCDC_range_entity_set branch April 28, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a SourceSet variant for a range of EntityIds

3 participants