Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimal xxh3 for very simple c code as a pure simple minimal function. #808

Closed
kolinfluence opened this issue Mar 2, 2023 · 11 comments
Closed
Labels

Comments

@kolinfluence
Copy link

kolinfluence commented Mar 2, 2023

hi @Cyan4973

been using xxh3 and it's extremely great.
thank you very much as this is very crucial module across all apps deployment.

i have read through the code. I need something really simple for use across platforms and also for kernel development.

according to you, if you are to distill xxh3 into something really minimal function of maybe... 50 lines?

how would the "c" program function be? using as simple a code as possible

the reason i'm asking for a minimal xxh3 (or something really really simple other than fnv) is because i need a hash function that is much much faster than fnv for 1kb - 4mb YET able to be coded as a function across all programming languages.

so i just need some basic idea if you can help with giving advice for a pure minimal function that i can rewrite in most major programming languages including c of course.

quality of hash is not that crucial as long as it's "quality" enough will do. maybe on a scale of 1-10, a 7 will do with xxh coming as 9-10.

possible to provide such a simple c code function? a distilled version of xxh3 just so as to be as portable across major programming language without porting or needing special tools? fnv currently is too slow for me and other built in functions like crc etc is not applicable. i need a total rewrite in extreme simple code that i can use across multiple platforms / programming language.

thx in advance.
there's a huge application value in something this simple. you may want to call it xxhXmin or something function.

@Cyan4973
Copy link
Owner

Cyan4973 commented Mar 2, 2023

Unfortunately, xxh3 is likely too complex to reach the 50 lines target.

But xxh64 and xxh32 can reach this target quite easily if that can be good enough for you.

@kolinfluence
Copy link
Author

kolinfluence commented Mar 2, 2023

possible to make it xxh64 or xxh32 then?
150lines? how many lines do u need to make something extreme minimal / portable across platforms and programming languages? quality of around 7 is good enough.

why cross language / cross platform compatible? because the hashed output needs to be used against plenty devices / platforms and even programming language. so that'll be extremely useful. calling c FFI is not very "profitable" as it's not easy to set up c compatibility for some languages etc.

actually 1024 lines is fine too. actually as many lines as possible is fine as well. (i'm just wondering if other garbage collected languages will be able to perform as optimally.) most probably keep within the if else and a bit of array use. things that wont "in most usual cases" generate a lot of garbage to be collected.

xxhXmin will be used everytime a data goes through it so surely the garbage generated in gc languages will be huge.

can u write the lines?

thx in advance!

@kolinfluence
Copy link
Author

kolinfluence commented Mar 2, 2023

can you come up with the fastest, simplest implementation because all data will be routing through it
XXH3 (SSE2) | 64 | 31.5 GB/s | 133.1 | 10

this is extremely impressive. truly need something purely as a function for kernel development, compatibility across all programming languages and portability across all devices.

(would i be asking a lot if u can make it faster than xxh3? but it's fine. i know it's pushing the limits.)
as long as quality is 6-7 and around the speed of xxh3 will be great. anything that can uniquely hash the data as "storage slots" will do.

pls help with this. if i can buy u a coffee i would. do show the sponsor link to buy u one. thx.

@Cyan4973
Copy link
Owner

Cyan4973 commented Mar 2, 2023

actually 1024 lines is fine too. actually as many lines as possible is fine as well.

In which case, why not using xxhash.h directly ?
It does all the things you are looking for, aka portable across all architectures, can be invoked from any language, always produce the same hash, etc.

@kolinfluence
Copy link
Author

kolinfluence commented Mar 2, 2023

@Cyan4973 i've checked through xxhash.h
it's too complicated.
let me define the parameters for a global highly compatible xxhXmin that can be written into bpf / ebpf as well

  1. 64 bit architecture. (no need consider 32bit)
  2. ~2500 lines, 5-8 functions max is "ideal" as reference.
  3. easily portable and be used as ebpf function (highly necessarily)
  4. good consideration to be converted for garbage collected language (like golang), means as little memory used (within 256bytes of memory variable declaration array or otherwise) as possible or none at all is best.
  5. quality at 6-7 compared with xxh3 as 10.

i've checked xxhash.h and idea is very simple which is great but possible to make it really simpler?
if can be copied and pasted as ebpf function, this will be highly portable against many other programming languages.

that's all the requirements. possible to get it done as xxhXmin? thx in advance.

@kolinfluence
Copy link
Author

there's a 1mil instructions limit for ebpf programs just fyi. so definitely hope to have this as simple as possible. currently i'm using fnv coz of this but hope xxh can fill the void easier (and faster of course)

@t-mat
Copy link
Contributor

t-mat commented Mar 3, 2023

I think (e)BPF doesn't have any SIMD instruction.
https://www.kernel.org/doc/html/latest/bpf/instruction-set.html
So I suppose you're using xxh3 with XXH_SCALAR.

It depends, but perhaps xxh64 may fit your demand. See also: #793

@kolinfluence
Copy link
Author

where's the code to xxh64?

@t-mat
Copy link
Contributor

t-mat commented Mar 3, 2023

xxhash.h contains XXH64_ functions.

@kolinfluence
Copy link
Author

@Cyan4973 dear expert hash programmer,

i'm sure given your skill, u can come up with a pure function that is a streamlined minimalist xxh that is around 6-7 quality rating and quick way to generate these hash right?

can you just provide a quick few lines maybe 1000 lines that will give the gist of a 6-7 hash that is minimal sufficiently for me to port to use across deployments? including bpf etc.

will appreciate this. in fact, this can be more useful than xxh3. may not need so extreme speed or quality but something that can be used across programming languages, platforms and devices.

pls pls pls give your feedback on what u think can be a minimalist function just for the purpose i mentioned.

i've read current XXH64_functions and based on those, i think it's very difficult to write to a lot of programming languages without wrapping your head too deep inside. can you help distill it down to minimalist barebones?

thx in advance.

@Cyan4973
Copy link
Owner

Cyan4973 commented Mar 3, 2023

You might want to look at https://create.stephan-brumme.com/xxhash/ .
It's C++, but it's very well streamlined, and it offers xxh32 and xxh64 only.
This is much simpler than xxh3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants