Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize / Deserialize Xor8 type. #1

Open
1 of 4 tasks
prataprc opened this issue Dec 21, 2019 · 9 comments
Open
1 of 4 tasks

Serialize / Deserialize Xor8 type. #1

prataprc opened this issue Dec 21, 2019 · 9 comments
Labels
help wanted Extra attention is needed

Comments

@prataprc
Copy link
Owner

prataprc commented Dec 21, 2019

So that it can be persisted onto disk and retrieved later for membership checks.

  • Serialization for Xor8.
  • Serialization for Fuse8 and Fuse16.
  • Add finger-print checksum as part of header for Xor8, Fuse8 and Fuse16
  • Resolve update 1

Update1: Now serializing and de-serializing Xor8::build_hasher() is more challenging. For instance documentation from std has this to say:

If RandomState is used as BuildHasher, std has got this to say

A particular instance RandomState will create the same instances
of Hasher, but the hashers created by two different RandomState
instances are unlikely to produce the same result for the same values.

If DefaultHasher is used as BuildHasher, std has got this to say,

The internal algorithm is not specified, and so its hashes
should not be relied upon over releases.

So unless we have a stable BuildHasher type that is stable across releases and across instances, we may not be able to provide a stable serialization and de-serialization API.

@uijin
Copy link
Contributor

uijin commented Dec 26, 2019

Hi, @prataprc,

I would like to write the persistent(SerDes) function :)

@prataprc
Copy link
Owner Author

There are many types of serialization formats. IMHO SerDe wants to Serialize any Rust type to any of those serialization formats.

In this case, I think, we only need binary serialization. So to begin with we can implement a simple encode() decode() API and do SerDe at a later point ?

And thanks for the offer.

@prataprc
Copy link
Owner Author

https://lemire.me/blog/2019/12/19/xor-filters-faster-and-smaller-than-bloom-filters/
^ blog post five some idea about serializing the filter.

@ayazhafiz
Copy link

FWIW, I have another impl of the xor filters in Rust with optional serialization/deserialization with serde behind a feature flag: https://github.com/ayazhafiz/xorf.

Feel free to use that implementation, or we can even merge these two libraries. Let me know what you think.

@uijin
Copy link
Contributor

uijin commented Dec 27, 2019

@ayazhafiz,
Just like @prataprc says, currently we need binary serialization only. I would develop a simple file persistent function firstly. Your impl of SerDe is worth for reference.

@prataprc
Copy link
Owner Author

Feel free to use that implementation, or we can even merge these two libraries. Let me know what you think.

@ayazhafiz thanks for the offer, will give a shout-out when the need arises. Cheers,

@uijin
Copy link
Contributor

uijin commented Jan 20, 2020

For new filter data structure, I would add an upgraded version of the persistent function, which could save new attributes(keys and hash_builder).

@prataprc
Copy link
Owner Author

IMHO, in case of Xor8, Serialization / De-serialization is only applicable to bitmap-index and its associated fields. That is, we only need those fields required to execute the "contain()" API.

I have tried to scope the problem of handing really large set of keys in #9.

@uijin
Copy link
Contributor

uijin commented Jan 21, 2020

@prataprc
Thanks for the explanation, I agree with you.

prataprc added a commit that referenced this issue Aug 21, 2021
* Now includes `hash_builder` field as part of Xor8 serialization.
* Test cases for TL1 (backward compatibility) and TL2.
* File version moves from `TL1` to `TL2`.
* METADATA includes length of the serialized `hash_builder`.
* Shape of the serialized file has changed.
* `Xor8::write_file`, `Xor8::read_file`, `Xor8::to_bytes`, `Xor8::from_bytes`
methods expect that type parameter implements `Default`, `Clone`,
`From<Vec<u8>>`, `Into<Vec<u8>>` traits.
* Having said this, the new change is backward compatible for `Xor8::read_file`
and `Xor8::from_bytes` to de-serialize Xor8 from previous version (TL1).
@prataprc prataprc added the help wanted Extra attention is needed label Aug 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants