Skip to content

b-vitamins/fastset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fastset

Crates.io docs.rs License GitHub Workflow Status

Fast set implementation for dense, bounded integer collections, offering quick updates and random access.

Features

  • Tailored for unsigned integer (usize) elements, ideal for index-based applications
  • Fast insertion, removal, and membership check
  • random method for uniform random sampling
  • Paging mechanism to somewhat mitigate the large memory footprint1

Note that while paging improves the existing memory footprint, fastset::Set is still not a good solution for memory constrained applications or for applications with storage need for sparse elements spread over an extended range. For integers twice as sparse as the page size, the fastset::Set with paging has peak heap allocation ~ 8x that of std::collections::HashSet.

Benchmarks

Operation fastset::Set hashbrown::HashSet std::collections::HashSet
insert 1.1632 ns 4.7105 ns 14.136 ns
remove 1.1647 ns 3.0459 ns 10.625 ns
contains 932.81 ps 985.38 ps 13.827 ns
random 651.26 ps N/A N/A
  • CPU: AMD Ryzen™ 5 5600G with Radeon™ Graphics x 12
  • RAM: 58.8 GiB
  • OS: Guix System, 64-bit

Usage

use fastset::{set, Set};
use nanorand::WyRand;

   let mut set = set![5, 10, 15, 20, 25, 30]; // Initialize set with elements
   assert!(set.contains(&5)); // Check for element presence

   set.insert(35); // Insert a new element
   assert!(set.contains(&35));

   set.remove(&5); // Remove an element
   assert!(!set.contains(&5));

   if let Some(taken) = set.take(&10) { // Remove and return an element
       assert_eq!(taken, 10);
   }

   let mut rng = WyRand::new();
   if let Some(element) = set.random(&mut rng) { // Get a random element
       set.remove(&element); // Remove the randomly selected element
       assert!(!set.contains(&element));
   }

   println!("Set: {:?}, Length: {}", set, set.len()); // Display the set and its length

Delphic Sets

fastset::Set, as implemented here, meets the conditions for being a Delphic set [1], [2]:

Let Ω be a discrete universe. A set (S ⊆ Ω) is considered a member of a Delphic family if it supports the following operations within O(log |Ω|) time:

  • Membership: Verify if any element (x ∈ Ω) exists within (S).
  • Cardinality: Determine the size of (S), i.e., (|S|).
  • Sampling: Draw a uniform random sample from (S).

A unit test in src/set.rs verifies the uniform sampling property with a basic Chi-squared test.

use fastset::Set;
use nanorand::WyRand;
use statrs::distribution::{ChiSquared, ContinuousCDF};

fn sampling_is_uniformly_at_random() {
    const SAMPLES: usize = 1_000_000;
    const EDGE_OF_THE_UNIVERSE: usize = 10000;

    let elements = (1..=EDGE_OF_THE_UNIVERSE).collect::<Vec<_>>();
    let set = Set::from(elements.clone());
    let mut rng = WyRand::new_seed(42u64);
    let mut counts = vec![0f64; elements.len()];

(0..SAMPLES).for_each(|_| {
   if let Some(value) = set.random(&mut rng) {
       counts[value - 1] += 1.0;
   }
});

    let e = SAMPLES as f64 / elements.len() as f64;
    let statistic: f64 = counts.iter().map(|&o| { (o - e) * (o - e) / e }).sum();

    let dof = elements.len() - 1;
    let chi = ChiSquared::new(dof as f64).unwrap();
    let acceptable = chi.inverse_cdf(0.99);

    // Null hypothesis: Elements are sampled uniformly at random
    assert!(
        statistic < acceptable,
        "Chi-square statistic {} is greater than what's acceptable ({})",
        statistic,
        acceptable,
    );
}

References

[1]: Chakraborty, Sourav, N. V. Vinodchandran, and Kuldeep S. Meel. "Distinct Elements in Streams: An Algorithm for the (Text) Book." arXiv preprint arXiv:2301.10191 (2023).

[2]: Meel, Kuldeep S., Sourav Chakraborty, and N. V. Vinodchandran. "Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size." Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 2022.

Footnotes

  1. A paging mechanism is introduced in 0.4.0 that reduces the memory-footprint of fastset::Set. With the paging feature, fastset::Set achieves ~ 50% reduction in peak heap memory allocations with no additional performance overhead.

About

Fast set implementation for dense, bounded integer collections. Provides quick updates and random access.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages