Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart string hashing #174

Closed
wants to merge 3 commits into from
Closed

Conversation

funny-falcon
Copy link

@funny-falcon funny-falcon commented May 4, 2016

Use full string hashing when a lot of collisions are generated.

  • detect when collision chain is too long.
    "average maximum" chain of table with fill-factor 1.0 is near 7, so if collision chain is longer than 10, then it is bad sign.
  • calculate "full" hash for new strings in long collision chain.
  • use "bloom" filter to bookkeeping existence of strings with "full" hash.
  • refill "bloom" on string sweeping.

Separately:

  • first commit adds checking for hash equality when collision chain is traversed.
  • last commit adds optional usage of strong hash function for "full hash".

Obsoletes #171 and #169, closes #168

@funny-falcon
Copy link
Author

@funny-falcon
Copy link
Author

Copy benchmark numbers:
Benchmarks:

  1. small safe strings: strings are short, and fully covered by sparse hash
  2. safe strings: 136bytes, and differs by last bytes (which covered by sparse hash)
  3. long half-safe strings: 264bytes, diff is in a middle, sparse hash covers less frequently changed bytes of a diff
  4. long unsafe strings: 392bytes, diff is not covered by sparse hash, diff is in first third.
  5. long unsafe strings: 392bytes, diff is not covered by sparse hash, diff is in second third
  6. long safe string: 392bytes, diff is at the end, so sparse hash covers much changed bytes.
Bench Count Unpatched Patched PatchedStrong
1 1000000 0.19s 0.19s 0.19s
2 500000 0.16s 0.15s 0.15s
3 100000 0.66s 0.11s 0.13s
4 10000 2.99s 0.012s 0.016s
5 10000 5.12s 0.016s 0.024s
6 300000 0.16s 0.14s 0.14s

@bungle
Copy link

bungle commented May 9, 2016

This fixes issues in my benchmark¹ as well, and I cannot see any side effects either. Without the patch and running something like 100 000 iterations it totally kills unpatched LuaJIT (while smart_str patched version runs linearly 17 secs, about 10x more than 10 000 iterations).

Something like this:

# Patched LuaJIT
$ luajit -e 'require "resty.template.microbenchmark".run(100000)'
Running 100 000 iterations in each test
    Parsing Time:    0.393554
Compilation Time:    2.614543 (template)
Compilation Time:    0.010143 (template, cached)
  Execution Time:    3.539249 (same template)
  Execution Time:    0.332168 (same template, cached)
  Execution Time:    4.767025 (different template)
  Execution Time:    0.677392 (different template, cached)
  Execution Time:    4.409578 (different template, different context)
  Execution Time:    0.696215 (different template, different context, cached)
      Total Time:   17.439867

# Unpatched LuaJIT
$ luajit -e 'require "resty.template.microbenchmark".run(100000)'
Running 100 000 iterations in each test
    Parsing Time:    0.387828
Compilation Time:    2.541598 (template)
Compilation Time:    0.010231 (template, cached)
  Execution Time:    3.388479 (same template)
  Execution Time:    0.310524 (same template, cached)
  Execution Time:  965.385754 (different template)
  Execution Time:  566.065327 (different template, cached)
  Execution Time: 1113.138669 (different template, different context)
  Execution Time:  653.596720 (different template, different context, cached)
      Total Time: 3304.825130

¹ https://github.com/bungle/lua-resty-template/blob/master/lib/resty/template/microbenchmark.lua

@data-man
Copy link

Only for *nix?
It's bad.

@funny-falcon
Copy link
Author

Why "unly for *nix"? It is cause of /dev/urandom ?
I just don't know how to easily get secure random within Windows :-(
If you know how to, please help me.

@funny-falcon
Copy link
Author

funny-falcon commented Jun 27, 2016

Could you try to compile on Windows now with LUAJIT_SMART_STRING set to 2 ?
Unfortunately, I have no access to Windows, so please tell me what to fix, if it is not compiled.

@data-man
Copy link

Sorry , I'm writing from the phone.
And this slow (but strong) mode is necessary?

As an experiment, try xxHash32 (xxHash64 for x64). It's really fast.
https://github.com/Cyan4973/xxHash

@funny-falcon
Copy link
Author

funny-falcon commented Jun 27, 2016 via email

@funny-falcon
Copy link
Author

@data-man , I'm really sure, that "secure" hash could at least twice faster (by mixing 2blocks-8bytes per siphash round instead of 1block-4bytes). It still will be safe for this use-case.

But I don't have much authority to people to believe such claim :-(

@funny-falcon
Copy link
Author

@bungle can you rerun tests, please? Both with SMART_STRING=1 and 2 (and unpatched).

I've noticed, that previous version were a bit slower for uncontendent case. So I've change heuristic a bit. I'd like to see, if there is a difference in realworld case.

@bungle
Copy link

bungle commented Jun 27, 2016

SMART_STRING = 1:

luajit -e 'require "resty.template.microbenchmark".run(10000)'

Running 10000 iterations in each test
    Parsing Time: 0.041476
Compilation Time: 0.269675 (template)
Compilation Time: 0.000118 (template, cached)
  Execution Time: 0.354958 (same template)
  Execution Time: 0.036355 (same template, cached)
  Execution Time: 0.438314 (different template)
  Execution Time: 0.074922 (different template, cached)
  Execution Time: 0.426644 (different template, different context)
  Execution Time: 0.065288 (different template, different context, cached)
      Total Time: 1.707750

SMART_STRING = 2:

luajit -e 'require "resty.template.microbenchmark".run(10000)'

Running 10000 iterations in each test
    Parsing Time: 0.043237
Compilation Time: 0.265141 (template)
Compilation Time: 0.000123 (template, cached)
  Execution Time: 0.356360 (same template)
  Execution Time: 0.037365 (same template, cached)
  Execution Time: 0.453068 (different template)
  Execution Time: 0.068437 (different template, cached)
  Execution Time: 0.428845 (different template, different context)
  Execution Time: 0.071860 (different template, different context, cached)
      Total Time: 1.724436

SMART_STRING = 0:

luajit -e 'require "resty.template.microbenchmark".run(10000)'

Running 10000 iterations in each test
    Parsing Time: 0.042821
Compilation Time: 0.262019 (template)
Compilation Time: 0.000114 (template, cached)
  Execution Time: 0.364316 (same template)
  Execution Time: 0.038646 (same template, cached)
  Execution Time: 5.617718 (different template)
  Execution Time: 1.313856 (different template, cached)
  Execution Time: 5.867694 (different template, different context)
  Execution Time: 1.365527 (different template, different context, cached)
      Total Time: 14.872711

I'm not sure if this is a "real world" test, but it shows a potential problem that is easy to trigger even with a lot smaller iteration counts than the above 10,000. Both SMART_STRING = 1 and SMART_STRING = 2 work about equally the same (probably somewhat in a error margin), but SMART_STRING = 0 renders exponentially worse results (SMART_STRING = 1 and SMART_STRING = 2 behave linerially when the iteration count is raised). So, I still hope these would be merged.

@funny-falcon
Copy link
Author

@bungle , thank you. Now "template" and "same template" lines look like not sufferring from patch.

@funny-falcon
Copy link
Author

funny-falcon commented Jun 28, 2016

I've made tests on my computer (i7-4770) both 64bit and 32bit version.
Results show, that enabling "escape from collisions" doesn't lead to degradation for non-collisioned case.

May some one make test on arm hardware?

64bit version:

$ # unpatched v2.1
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.024381
Compilation Time: 0.133061 (template)
Compilation Time: 0.000048 (template, cached)
  Execution Time: 0.169258 (same template)
  Execution Time: 0.018488 (same template, cached)
  Execution Time: 2.218973 (different template)
  Execution Time: 0.808557 (different template, cached)
  Execution Time: 2.336497 (different template, different context)
  Execution Time: 0.815447 (different template, different context, cached)
      Total Time: 6.524710
$ # LUAJIT_SMART_STRINGS=0
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.023924
Compilation Time: 0.131804 (template)
Compilation Time: 0.000046 (template, cached)
  Execution Time: 0.167594 (same template)
  Execution Time: 0.017966 (same template, cached)
  Execution Time: 2.067680 (different template)
  Execution Time: 0.835178 (different template, cached)
  Execution Time: 2.190304 (different template, different context)
  Execution Time: 0.843430 (different template, different context, cached)
      Total Time: 6.277926
$ # LUAJIT_SMART_STRINGS=1
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.024267
Compilation Time: 0.130701 (template)
Compilation Time: 0.000046 (template, cached)
  Execution Time: 0.166630 (same template)
  Execution Time: 0.017941 (same template, cached)
  Execution Time: 0.187001 (different template)
  Execution Time: 0.031751 (different template, cached)
  Execution Time: 0.187633 (different template, different context)
  Execution Time: 0.031479 (different template, different context, cached)
      Total Time: 0.777449
$ # LUAJIT_SMART_STRINGS=2
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.025608
Compilation Time: 0.131915 (template)
Compilation Time: 0.000045 (template, cached)
  Execution Time: 0.167586 (same template)
  Execution Time: 0.018868 (same template, cached)
  Execution Time: 0.187702 (different template)
  Execution Time: 0.033191 (different template, cached)
  Execution Time: 0.182604 (different template, different context)
  Execution Time: 0.031786 (different template, different context, cached)
      Total Time: 0.779305

And 32bit version:

$ # unpatched v2.1
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.026917
Compilation Time: 0.141205 (template)
Compilation Time: 0.000047 (template, cached)
  Execution Time: 0.177869 (same template)
  Execution Time: 0.020420 (same template, cached)
  Execution Time: 2.132768 (different template)
  Execution Time: 0.794488 (different template, cached)
  Execution Time: 2.256810 (different template, different context)
  Execution Time: 0.818921 (different template, different context, cached)
      Total Time: 6.369445
$ # LUAJIT_SMART_STRINGS=0
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.027170
Compilation Time: 0.141821 (template)
Compilation Time: 0.000049 (template, cached)
  Execution Time: 0.179753 (same template)
  Execution Time: 0.020086 (same template, cached)
  Execution Time: 2.084573 (different template)
  Execution Time: 0.799549 (different template, cached)
  Execution Time: 2.251725 (different template, different context)
  Execution Time: 0.827272 (different template, different context, cached)
      Total Time: 6.331998
$ # LUAJIT_SMART_STRINGS=1
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.027351
Compilation Time: 0.141296 (template)
Compilation Time: 0.000048 (template, cached)
  Execution Time: 0.180427 (same template)
  Execution Time: 0.020526 (same template, cached)
  Execution Time: 0.197217 (different template)
  Execution Time: 0.033653 (different template, cached)
  Execution Time: 0.192512 (different template, different context)
  Execution Time: 0.033335 (different template, different context, cached)
      Total Time: 0.826365
$ # LUAJIT_SMART_STRINGS=2
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.027580
Compilation Time: 0.141737 (template)
Compilation Time: 0.000048 (template, cached)
  Execution Time: 0.179490 (same template)
  Execution Time: 0.020845 (same template, cached)
  Execution Time: 0.202999 (different template)
  Execution Time: 0.034854 (different template, cached)
  Execution Time: 0.197867 (different template, different context)
  Execution Time: 0.033563 (different template, different context, cached)
      Total Time: 0.838983

- detect when a lot of collisions generated
  - if two full collisions found (ie hash value and len is equal)
  - if collision chain is longer than 18
    ("average maximum" chain with fillfactor 1.0 is near 7)
- calculate "full" hash for strings in long collision chain
- use "bloom" filter to bookkeeping existence of strings with "full" hash
- refill "bloom" on string sweeping.
@funny-falcon
Copy link
Author

I've updated pull request: squashed into 3 commits.

@data-man
Copy link

@funny-falcon
Thanks for explanations.

lj_saphash

Maybe lj_siphash?

@funny-falcon
Copy link
Author

@data-man well, it is not quite "siphash". It is 32bit cousine/offspring. It uses SipHash like permutation but over 32bit values (shifts are taken from Chaskey - other 32bit SipHash offspring). So it could not be named "SipHash", cause "SipHash" is defined as permutation over 64bit values.

I think, comment about relation of "saphash" and "siphash" is just enough.

@funny-falcon
Copy link
Author

@MikePall , I'm ready to work on it further if you describe what is wrong with this code.

@bungle
Copy link

bungle commented Nov 21, 2016

Sad to see this closing with wontfix, I also hope this could be sorted out somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants