add cuckoo filter for redis cache #2261

AlexStocks · 2024-01-03T07:08:17Z

Which PikiwiDB functionalities are relevant/related to the feature request?

No response

Description

from https://cloud.tencent.com/developer/article/1815554

Proposed solution

add cuckoo filter for redis cache

Alternatives considered

add cuckoo filter for redis cache

infdahai · 2024-01-03T08:43:10Z

@AlexStocks I would like to take this assignment.

AlexStocks · 2024-01-03T10:24:56Z

@AlexStocks I would like to take this assignment.

Great. Thanks for you. Two suggestions:
1 add cuckoo filter in https://github.com/pikiwidb/rediscache
2 add cuckoo filter in pika
3 if len(key) > 64 { key := key[:32] + "@" + key[:32] }; balance between memory and cache quickly

Maybe you can give us your plan firstly.

578223592 · 2024-02-24T13:00:19Z

@AlexStocks 我的计划如下，请帮忙指点下：

使用`cuckoo filter`的意义：

Redis缓存并不能全量存储数据，因此Redis缓存中没有命中的时候容易出现缓存相关的问题，如：缓存穿透等，添加一个filter可以解决。
相比于bloom filter，cuckoo filter可以支持删除元素。

本项目中`cuckoo filter`的设计思考：

cuckoo filter中怎么支持操作：元素过期删除。

主动删除：
1. 装饰“key”加上过期时间的标志，再启动一个线程或者每次查询之前，开启扫描任务，扫描到过期的再删除。
  
  这样的操作会带来额外空间消耗（存储过期时间），而且耗时（扫描key）
2. 启动定时器线程，定时器时间到的时候删除对应key。
  
  设计相对复杂，而且需要额外添加定时器线程（复用pika中的定时器任务线程），且多线程的引入要额外考虑锁的设计。
  
  无论如何，设计起来都比较复杂，而且cuckoo filter中存放的是“fprint”，改动还是很大的。
惰性删除：即过期不处理，等到在底层读取的时候发现该key不存在再在cuckoo filter中“删除该key”
1. 好处：设计简单
2. 坏处：相当于惰性删除，可能会导致cuckoo filter中的元素比rocksdb数据库中key元素更多。
主动删除和惰性删除结合：
上面两种方案的结合，即定期去扫描，每次扫描一部分数据这样，like redis的过期键值处理。

cuckoo filter中的动态扩容

调研了https://github.com/efficient/cuckoofilter 等常见的仓库，发现目前并不支持动态扩容，目前想到两个方案。

固定大小，预设一个很大的值

对于一个key来说，平均每个key的占用可能就是12bit，那么对于10w级别的数据量，总的存储也就是10000*12bit约为15Mb左右，完全是一个可以接受的数据量。

增加动态扩容功能

也许可以像bloom filter的扩展一样，扩展成多层的cuckoo filter来实现动态扩容的功能。
如果选用这个方案，待设计。

AlexStocks · 2024-02-25T02:13:34Z

@AlexStocks 我的计划如下，请帮忙指点下：

使用 cuckoo filter 的意义：

Redis 缓存并不能全量存储数据，因此 Redis 缓存中没有命中的时候容易出现缓存相关的问题，如：缓存穿透等，添加一个 filter 可以解决。

相比于 bloom filter，cuckoo filter 可以支持删除元素。

本项目中 cuckoo filter 的设计思考：

cuckoo filter 中怎么支持操作：元素过期删除。

主动删除：

装饰 “key” 加上过期时间的标志，再启动一个线程或者每次查询之前，开启扫描任务，扫描到过期的再删除。
这样的操作会带来额外空间消耗（存储过期时间），而且耗时（扫描 key）

启动定时器线程，定时器时间到的时候删除对应 key。
设计相对复杂，而且需要额外添加定时器线程（复用 pika 中的定时器任务线程），且多线程的引入要额外考虑锁的设计。

无论如何，设计起来都比较复杂，而且 cuckoo filter 中存放的是 “fprint”，改动还是很大的。

惰性删除：即过期不处理，等到在底层读取的时候发现该 key 不存在再在 cuckoo filter 中 “删除该 key”

好处：设计简单

坏处：相当于惰性删除，可能会导致 cuckoo filter 中的元素比 rocksdb 数据库中 key 元素更多。

主动删除和惰性删除结合：
上面两种方案的结合，即定期去扫描，每次扫描一部分数据这样，like redis 的过期键值处理。

cuckoo filter 中的动态扩容

调研了 https://github.com/efficient/cuckoofilter 等常见的仓库，发现目前并不支持动态扩容，目前想到两个方案。

固定大小，预设一个很大的值

对于一个 key 来说，平均每个 key 的占用可能就是 12bit，那么对于 10w 级别的数据量，总的存储也就是 10000*12bit 约为 15Mb 左右，完全是一个可以接受的数据量。

增加动态扩容功能

也许可以像 bloom filter 的扩展一样，扩展成多层的 cuckoo filter 来实现动态扩容的功能。如果选用这个方案，待设计。

1 filter 总大小我建议设定一个固定值，没必要考虑扩容；
2 key 的总长度可以设定为 16bit，如果某些 key 超过 16bit，则截取前 16bit，无论读还是写都这样判断；

578223592 · 2024-02-25T04:22:16Z

经过wechat沟通，考虑到cuckoo filter中的hash可能比较耗时，测试了一下hash运算的时间。

结果：
开启O3优化之后
一次substr的时间为：2纳秒
hash时间大概为：4纳秒

目前pika极限为600000qps（from readme.md），平均命令时间大概为1500纳秒。

时间程序逻辑如下：

// hash 和 截取长度的速度测试
  CuckooFilter<std::string, 12> filter2(total_items);
  size_t yyy = 0;
  std::string y = "12345678901234567890";
  size_t index;uint32_t tag;


  int i = 0;
  // 开始计时
  auto start = std::chrono::high_resolution_clock::now();


  for(;i<5000;++i) {
    // cuckoofilter::HashUtil::MurmurHash(y);
    // cuckoofilter::HashUtil::SuperFastHash(y);
    std::string tmp = y.substr(0,12);
  }


  // 结束计时
  auto end = std::chrono::high_resolution_clock::now();

  // 计算耗时并输出结果
  auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
  std::cout << "Function execution time: " << duration.count()/ (i+1)<< " nanoseconds." << std::endl;

经过讨论，最后选择使用hash

luky116 · 2024-08-09T13:24:20Z

不需要支持这个功能

Issues-translate-bot · 2024-08-09T13:24:32Z

Bot detected the issue body's language is not English, translate it automatically.

No need to support this feature

AlexStocks added the ✏️ Feature New feature or request label Jan 3, 2024

OpenAtomFoundation deleted a comment from infdahai Feb 25, 2024

AlexStocks mentioned this issue Feb 26, 2024

feat：add cuckoo filter pikiwidb/rediscache#6

Open

luky116 closed this as completed Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add cuckoo filter for redis cache #2261

add cuckoo filter for redis cache #2261

AlexStocks commented Jan 3, 2024 •

edited

Loading

infdahai commented Jan 3, 2024

AlexStocks commented Jan 3, 2024 •

edited

Loading

578223592 commented Feb 24, 2024

AlexStocks commented Feb 25, 2024

使用 `cuckoo filter` 的意义：

本项目中 `cuckoo filter` 的设计思考：

cuckoo filter 中怎么支持操作：元素过期删除。

cuckoo filter 中的动态扩容

固定大小，预设一个很大的值

增加动态扩容功能

578223592 commented Feb 25, 2024 •

edited

Loading

luky116 commented Aug 9, 2024

Issues-translate-bot commented Aug 9, 2024

add cuckoo filter for redis cache #2261

add cuckoo filter for redis cache #2261

Comments

AlexStocks commented Jan 3, 2024 • edited Loading

Which PikiwiDB functionalities are relevant/related to the feature request?

Description

Proposed solution

Alternatives considered

infdahai commented Jan 3, 2024

AlexStocks commented Jan 3, 2024 • edited Loading

578223592 commented Feb 24, 2024

使用cuckoo filter的意义：

本项目中cuckoo filter的设计思考：

cuckoo filter中怎么支持操作：元素过期删除。

cuckoo filter中的动态扩容

固定大小，预设一个很大的值

增加动态扩容功能

AlexStocks commented Feb 25, 2024

使用 cuckoo filter 的意义：

本项目中 cuckoo filter 的设计思考：

cuckoo filter 中怎么支持操作：元素过期删除。

cuckoo filter 中的动态扩容

固定大小，预设一个很大的值

增加动态扩容功能

578223592 commented Feb 25, 2024 • edited Loading

luky116 commented Aug 9, 2024

Issues-translate-bot commented Aug 9, 2024

AlexStocks commented Jan 3, 2024 •

edited

Loading

AlexStocks commented Jan 3, 2024 •

edited

Loading

使用`cuckoo filter`的意义：

本项目中`cuckoo filter`的设计思考：

使用 `cuckoo filter` 的意义：

本项目中 `cuckoo filter` 的设计思考：

578223592 commented Feb 25, 2024 •

edited

Loading