Skip to content
/ DCF Public
forked from CGCL-codes/DCF

Dynamic Cuckoo Filter (DCF) is succinct data structure of approximate set representing and membership testing for large-scale dynamic data sets. DCF supports item insertion/deletion/query, and can flexibly adjust its capacity. A DCF reduces the memory space of the state-of-the-art Dynamic Bloom Filter significantly by 75% as well as greatly impr…

License

Notifications You must be signed in to change notification settings

GerHobbelt/DCF

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Dynamic Cuckoo Filter

Overview

The Dynamic Cuckoo Filter (DCF) is an efficient approximate membership test data structure. Different from the classic Bloom filter and its variants, DCF is especially designed for highly dynamic dataset and support extending and reducing its capacity. The advantages of DCF are as follows

  • The DCF design is the first to achieve both reliable item deletion and flexibly extending/reducing for approximate set representation and membership testing
  • DCF outperforms the state-of-the-art DBF design in terms of the capability of reliable item deletion
  • A DCF reduces the required memory space of the DBF by 75% as well as improving the speeds of insert/query/delete operation by 30% to 80%.

Structure of DCF

DCF example

A DCF leverages CF as building block and consists of a number of s linked homogeneous CFs. The DCF leverages fingerprints to represent items. Each fingerprints owns two candidate bucket addresses generated by partial-key cuckoo hashing. A DCF achieves dynamic capacity by appending new building blocks and removing empty building blocks generated by compact operation.

API

Generate DCF according to the expected maximum item number, expected false positive rate.

DynamicCuckooFilter* dcf = new DynamicCuckooFilter(config.item_num, config.exp_FPR);

The expected building block number of DCF is set to 6 by default and can also be modified as follow

DynamicCuckooFilter* dcf = new DynamicCuckooFilter(config.item_num, config.exp_FPR, config.exp_BBN);

Four operations of DCF: Insert, Query, Delete and Compact

dcf->insertItem(item) // insert item ,item is a char* format
dcf->queryItem(item)  //query item
dcf->deleteItem(item) //delete item
dcf->compact() //compact DCF

How to use?

Environment

We implement DCF in a Linux (Ubuntu 14.04.3 LTS) with an Intel(R) Core(TM) i5-2430M CPU @2.4GHz and OpenSSL library environment.

Install OpenSSL (Please refer to https://www.openssl.org to learn more).

sudo apt-get install openssl
sudo apt-get install libssl-dev

Build and run the example

cd src/
make test
./test

Configurations

Configurations including false pisitive, item number and dataset path can be costomized in "configuration/config.txt".

false positive = 0.02
item number = 1000000
input file path = input/input2.txt

Results

Results are shown in "output/results.txt", including false positive, fingerprint size, building block number, operation time consumed and etc. In the following is the comparison of DCF and DBF when dealing with 1,000,000 items (including insert/query/delete/compact operation).

Metrics:

item_num: total number inserted/queried/deleted

exp_FPR: the expected false positive rate

actual_FPR: the false positive rate that we measured

actual_BBN: the building block number that we observed

F_size: fingerprint size

space_cost: space overhead of data structure

I_time: insert time

Q_time: query time

D_time: delete time

C_rate: compact rate

///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// item_num   exp_FPR    actual_FPR    actual_BBN  F_size(bits)  space_cost(MB)   I_time(s)  Q_time(s)  D_time(s)  C_rate  ///
/// 1000000     0.0005      0.000502    5           16            2.5              1.08       1.05       1.36       1       ///
/// 1000000     0.0005      0.000505    7           0             10.8754          1.69       1.92       3.35       0       ///
///                                                               4X               1.5X       1.8X       2.4X               ///
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

After transform operation time to speed, the DCF improving the speeds of insert/query/delete by 30% to 80% and 4X space efficiency.

Author and Copyright

DCF is developed in Big Data Technology and System Lab, Services Computing Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China by Liangyi Liao (liaoliangyi@hust.edu.cn), Hanhua Chen (chen@hust.edu.cn), Hai Jin (hjin@hust.edu.cn)

Copyright (C) 2017, STCS & CGCL and Huazhong University of Science and Technology.

Publication

If you want to know more detailed information, please refer to this paper:

Hanhua Chen, Liangyi Liao, Hai Jin, Jie Wu, "The Dynamic Cuckoo Filter," Proceedings of the 25th IEEE International Conference on Network Protocols (ICNP 2017), Toronto, Canada, Oct. 10-13, 2017.

About

Dynamic Cuckoo Filter (DCF) is succinct data structure of approximate set representing and membership testing for large-scale dynamic data sets. DCF supports item insertion/deletion/query, and can flexibly adjust its capacity. A DCF reduces the memory space of the state-of-the-art Dynamic Bloom Filter significantly by 75% as well as greatly impr…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 94.8%
  • C 3.6%
  • Makefile 1.6%