Smart string hashing #174

funny-falcon · 2016-05-04T05:34:00Z

Use full string hashing when a lot of collisions are generated.

detect when collision chain is too long.
"average maximum" chain of table with fill-factor 1.0 is near 7, so if collision chain is longer than 10, then it is bad sign.
calculate "full" hash for new strings in long collision chain.
use "bloom" filter to bookkeeping existence of strings with "full" hash.
refill "bloom" on string sweeping.

Separately:

first commit adds checking for hash equality when collision chain is traversed.
last commit adds optional usage of strong hash function for "full hash".

Obsoletes #171 and #169, closes #168

funny-falcon · 2016-05-04T05:43:09Z

Benchmark: https://gist.github.com/funny-falcon/685dbfaea16b5919e6c84ab1b156d2f6

funny-falcon · 2016-05-06T14:05:46Z

Copy benchmark numbers:
Benchmarks:

small safe strings: strings are short, and fully covered by sparse hash
safe strings: 136bytes, and differs by last bytes (which covered by sparse hash)
long half-safe strings: 264bytes, diff is in a middle, sparse hash covers less frequently changed bytes of a diff
long unsafe strings: 392bytes, diff is not covered by sparse hash, diff is in first third.
long unsafe strings: 392bytes, diff is not covered by sparse hash, diff is in second third
long safe string: 392bytes, diff is at the end, so sparse hash covers much changed bytes.

Bench	Count	Unpatched	Patched	PatchedStrong
1	1000000	0.19s	0.19s	0.19s
2	500000	0.16s	0.15s	0.15s
3	100000	0.66s	0.11s	0.13s
4	10000	2.99s	0.012s	0.016s
5	10000	5.12s	0.016s	0.024s
6	300000	0.16s	0.14s	0.14s

bungle · 2016-05-09T08:25:16Z

This fixes issues in my benchmark¹ as well, and I cannot see any side effects either. Without the patch and running something like 100 000 iterations it totally kills unpatched LuaJIT (while smart_str patched version runs linearly 17 secs, about 10x more than 10 000 iterations).

Something like this:

# Patched LuaJIT
$ luajit -e 'require "resty.template.microbenchmark".run(100000)'
Running 100 000 iterations in each test
    Parsing Time:    0.393554
Compilation Time:    2.614543 (template)
Compilation Time:    0.010143 (template, cached)
  Execution Time:    3.539249 (same template)
  Execution Time:    0.332168 (same template, cached)
  Execution Time:    4.767025 (different template)
  Execution Time:    0.677392 (different template, cached)
  Execution Time:    4.409578 (different template, different context)
  Execution Time:    0.696215 (different template, different context, cached)
      Total Time:   17.439867

# Unpatched LuaJIT
$ luajit -e 'require "resty.template.microbenchmark".run(100000)'
Running 100 000 iterations in each test
    Parsing Time:    0.387828
Compilation Time:    2.541598 (template)
Compilation Time:    0.010231 (template, cached)
  Execution Time:    3.388479 (same template)
  Execution Time:    0.310524 (same template, cached)
  Execution Time:  965.385754 (different template)
  Execution Time:  566.065327 (different template, cached)
  Execution Time: 1113.138669 (different template, different context)
  Execution Time:  653.596720 (different template, different context, cached)
      Total Time: 3304.825130

¹ https://github.com/bungle/lua-resty-template/blob/master/lib/resty/template/microbenchmark.lua

data-man · 2016-06-27T15:30:45Z

Only for *nix?
It's bad.

funny-falcon · 2016-06-27T16:52:46Z

Why "unly for *nix"? It is cause of /dev/urandom ?
I just don't know how to easily get secure random within Windows :-(
If you know how to, please help me.

refi64 · 2016-06-27T16:57:30Z

@funny-falcon https://msdn.microsoft.com/en-us/library/windows/desktop/aa379942(v=vs.85).aspx

funny-falcon · 2016-06-27T18:36:08Z

Could you try to compile on Windows now with LUAJIT_SMART_STRING set to 2 ?
Unfortunately, I have no access to Windows, so please tell me what to fix, if it is not compiled.

data-man · 2016-06-27T19:48:17Z

Sorry , I'm writing from the phone.
And this slow (but strong) mode is necessary?

As an experiment, try xxHash32 (xxHash64 for x64). It's really fast.
https://github.com/Cyan4973/xxHash

funny-falcon · 2016-06-27T21:12:56Z

Dmitry, I've played a lot with hash functions, and I know about xxhash. xxhash32 is not more secure and not much faster than my fast hash (maybe even slower, i don't remember exact numbers) (LUAJIT_SMART_STRING=1) . But it is much more complex. Secure mode has to exist for those, who wants more security. It is compile time option and it is not default. So if you doesn't want it, you will not pay for.

funny-falcon · 2016-06-27T21:20:09Z

@data-man , I'm really sure, that "secure" hash could at least twice faster (by mixing 2blocks-8bytes per siphash round instead of 1block-4bytes). It still will be safe for this use-case.

But I don't have much authority to people to believe such claim :-(

funny-falcon · 2016-06-27T21:24:42Z

@bungle can you rerun tests, please? Both with SMART_STRING=1 and 2 (and unpatched).

I've noticed, that previous version were a bit slower for uncontendent case. So I've change heuristic a bit. I'd like to see, if there is a difference in realworld case.

bungle · 2016-06-27T22:18:49Z

SMART_STRING = 1:

luajit -e 'require "resty.template.microbenchmark".run(10000)'

Running 10000 iterations in each test
    Parsing Time: 0.041476
Compilation Time: 0.269675 (template)
Compilation Time: 0.000118 (template, cached)
  Execution Time: 0.354958 (same template)
  Execution Time: 0.036355 (same template, cached)
  Execution Time: 0.438314 (different template)
  Execution Time: 0.074922 (different template, cached)
  Execution Time: 0.426644 (different template, different context)
  Execution Time: 0.065288 (different template, different context, cached)
      Total Time: 1.707750

SMART_STRING = 2:

luajit -e 'require "resty.template.microbenchmark".run(10000)'

Running 10000 iterations in each test
    Parsing Time: 0.043237
Compilation Time: 0.265141 (template)
Compilation Time: 0.000123 (template, cached)
  Execution Time: 0.356360 (same template)
  Execution Time: 0.037365 (same template, cached)
  Execution Time: 0.453068 (different template)
  Execution Time: 0.068437 (different template, cached)
  Execution Time: 0.428845 (different template, different context)
  Execution Time: 0.071860 (different template, different context, cached)
      Total Time: 1.724436

SMART_STRING = 0:

luajit -e 'require "resty.template.microbenchmark".run(10000)'

Running 10000 iterations in each test
    Parsing Time: 0.042821
Compilation Time: 0.262019 (template)
Compilation Time: 0.000114 (template, cached)
  Execution Time: 0.364316 (same template)
  Execution Time: 0.038646 (same template, cached)
  Execution Time: 5.617718 (different template)
  Execution Time: 1.313856 (different template, cached)
  Execution Time: 5.867694 (different template, different context)
  Execution Time: 1.365527 (different template, different context, cached)
      Total Time: 14.872711

I'm not sure if this is a "real world" test, but it shows a potential problem that is easy to trigger even with a lot smaller iteration counts than the above 10,000. Both SMART_STRING = 1 and SMART_STRING = 2 work about equally the same (probably somewhat in a error margin), but SMART_STRING = 0 renders exponentially worse results (SMART_STRING = 1 and SMART_STRING = 2 behave linerially when the iteration count is raised). So, I still hope these would be merged.

funny-falcon · 2016-06-28T05:45:32Z

@bungle , thank you. Now "template" and "same template" lines look like not sufferring from patch.

funny-falcon · 2016-06-28T08:42:48Z

I've made tests on my computer (i7-4770) both 64bit and 32bit version.
Results show, that enabling "escape from collisions" doesn't lead to degradation for non-collisioned case.

May some one make test on arm hardware?

64bit version:

$ # unpatched v2.1
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.024381
Compilation Time: 0.133061 (template)
Compilation Time: 0.000048 (template, cached)
  Execution Time: 0.169258 (same template)
  Execution Time: 0.018488 (same template, cached)
  Execution Time: 2.218973 (different template)
  Execution Time: 0.808557 (different template, cached)
  Execution Time: 2.336497 (different template, different context)
  Execution Time: 0.815447 (different template, different context, cached)
      Total Time: 6.524710
$ # LUAJIT_SMART_STRINGS=0
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.023924
Compilation Time: 0.131804 (template)
Compilation Time: 0.000046 (template, cached)
  Execution Time: 0.167594 (same template)
  Execution Time: 0.017966 (same template, cached)
  Execution Time: 2.067680 (different template)
  Execution Time: 0.835178 (different template, cached)
  Execution Time: 2.190304 (different template, different context)
  Execution Time: 0.843430 (different template, different context, cached)
      Total Time: 6.277926
$ # LUAJIT_SMART_STRINGS=1
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.024267
Compilation Time: 0.130701 (template)
Compilation Time: 0.000046 (template, cached)
  Execution Time: 0.166630 (same template)
  Execution Time: 0.017941 (same template, cached)
  Execution Time: 0.187001 (different template)
  Execution Time: 0.031751 (different template, cached)
  Execution Time: 0.187633 (different template, different context)
  Execution Time: 0.031479 (different template, different context, cached)
      Total Time: 0.777449
$ # LUAJIT_SMART_STRINGS=2
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.025608
Compilation Time: 0.131915 (template)
Compilation Time: 0.000045 (template, cached)
  Execution Time: 0.167586 (same template)
  Execution Time: 0.018868 (same template, cached)
  Execution Time: 0.187702 (different template)
  Execution Time: 0.033191 (different template, cached)
  Execution Time: 0.182604 (different template, different context)
  Execution Time: 0.031786 (different template, different context, cached)
      Total Time: 0.779305

And 32bit version:

$ # unpatched v2.1
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.026917
Compilation Time: 0.141205 (template)
Compilation Time: 0.000047 (template, cached)
  Execution Time: 0.177869 (same template)
  Execution Time: 0.020420 (same template, cached)
  Execution Time: 2.132768 (different template)
  Execution Time: 0.794488 (different template, cached)
  Execution Time: 2.256810 (different template, different context)
  Execution Time: 0.818921 (different template, different context, cached)
      Total Time: 6.369445
$ # LUAJIT_SMART_STRINGS=0
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.027170
Compilation Time: 0.141821 (template)
Compilation Time: 0.000049 (template, cached)
  Execution Time: 0.179753 (same template)
  Execution Time: 0.020086 (same template, cached)
  Execution Time: 2.084573 (different template)
  Execution Time: 0.799549 (different template, cached)
  Execution Time: 2.251725 (different template, different context)
  Execution Time: 0.827272 (different template, different context, cached)
      Total Time: 6.331998
$ # LUAJIT_SMART_STRINGS=1
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.027351
Compilation Time: 0.141296 (template)
Compilation Time: 0.000048 (template, cached)
  Execution Time: 0.180427 (same template)
  Execution Time: 0.020526 (same template, cached)
  Execution Time: 0.197217 (different template)
  Execution Time: 0.033653 (different template, cached)
  Execution Time: 0.192512 (different template, different context)
  Execution Time: 0.033335 (different template, different context, cached)
      Total Time: 0.826365
$ # LUAJIT_SMART_STRINGS=2
$ luajit -e 'require("resty.template.microbenchmark").run(10000)'
Running 10000 iterations in each test
    Parsing Time: 0.027580
Compilation Time: 0.141737 (template)
Compilation Time: 0.000048 (template, cached)
  Execution Time: 0.179490 (same template)
  Execution Time: 0.020845 (same template, cached)
  Execution Time: 0.202999 (different template)
  Execution Time: 0.034854 (different template, cached)
  Execution Time: 0.197867 (different template, different context)
  Execution Time: 0.033563 (different template, different context, cached)
      Total Time: 0.838983

- detect when a lot of collisions generated - if two full collisions found (ie hash value and len is equal) - if collision chain is longer than 18 ("average maximum" chain with fillfactor 1.0 is near 7) - calculate "full" hash for strings in long collision chain - use "bloom" filter to bookkeeping existence of strings with "full" hash - refill "bloom" on string sweeping.

funny-falcon · 2016-06-28T09:19:32Z

I've updated pull request: squashed into 3 commits.

data-man · 2016-06-28T10:31:29Z

@funny-falcon
Thanks for explanations.

lj_saphash

Maybe lj_siphash?

funny-falcon · 2016-06-28T10:40:08Z

@data-man well, it is not quite "siphash". It is 32bit cousine/offspring. It uses SipHash like permutation but over 32bit values (shifts are taken from Chaskey - other 32bit SipHash offspring). So it could not be named "SipHash", cause "SipHash" is defined as permutation over 64bit values.

I think, comment about relation of "saphash" and "siphash" is just enough.

funny-falcon · 2016-11-21T06:19:34Z

@MikePall , I'm ready to work on it further if you describe what is wrong with this code.

bungle · 2016-11-21T09:56:14Z

Sad to see this closing with wontfix, I also hope this could be sorted out somehow.

This was referenced May 4, 2016

Proposal: Three String Tables #171

Closed

Simple string hash for large strings #169

Closed

funny-falcon force-pushed the smart_str branch 2 times, most recently from e2d8539 to e49e91a Compare May 4, 2016 12:10

funny-falcon mentioned this pull request May 4, 2016

Reduce string hash collisions #168

Closed

funny-falcon force-pushed the smart_str branch from e49e91a to 595b4be Compare May 4, 2016 19:13

bungle mentioned this pull request May 9, 2016

"different template cached" is more slowy than "different template, different context cached" bungle/lua-resty-template#18

Closed

funny-falcon force-pushed the smart_str branch from 773e2f8 to 9661aba Compare June 27, 2016 18:34

funny-falcon force-pushed the smart_str branch from 9661aba to af276ca Compare June 27, 2016 18:38

funny-falcon force-pushed the smart_str branch from d0fe42f to 4446ec3 Compare June 27, 2016 20:18

strings: compare hash as well

33af3ea

funny-falcon force-pushed the smart_str branch from 4446ec3 to bb5df5a Compare June 28, 2016 08:20

funny-falcon force-pushed the smart_str branch from bb5df5a to 9d54736 Compare June 28, 2016 08:47

funny-falcon added 2 commits June 28, 2016 11:57

strings: strong string hash for LUAJIT_SMART_STRINGS==2

f24deca

funny-falcon force-pushed the smart_str branch from 9d54736 to f24deca Compare June 28, 2016 08:59

MikePall added quality issues wontfix labels Nov 20, 2016

MikePall closed this Nov 20, 2016

funny-falcon mentioned this pull request Apr 4, 2017

Fix severe slowdown on certain strings #294

Closed

gonzalezjo mentioned this pull request Aug 31, 2018

Crash the server w/a malformed table Facepunch/garrysmod-issues#3526

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart string hashing #174

Smart string hashing #174

funny-falcon commented May 4, 2016 •

edited

Loading

funny-falcon commented May 4, 2016

funny-falcon commented May 6, 2016

bungle commented May 9, 2016 •

edited

Loading

data-man commented Jun 27, 2016

funny-falcon commented Jun 27, 2016

refi64 commented Jun 27, 2016

funny-falcon commented Jun 27, 2016 •

edited

Loading

data-man commented Jun 27, 2016

funny-falcon commented Jun 27, 2016 via email

funny-falcon commented Jun 27, 2016

funny-falcon commented Jun 27, 2016

bungle commented Jun 27, 2016

funny-falcon commented Jun 28, 2016

funny-falcon commented Jun 28, 2016 •

edited

Loading

funny-falcon commented Jun 28, 2016

data-man commented Jun 28, 2016

funny-falcon commented Jun 28, 2016

funny-falcon commented Nov 21, 2016

bungle commented Nov 21, 2016 •

edited

Loading

Smart string hashing #174

Smart string hashing #174

Conversation

funny-falcon commented May 4, 2016 • edited Loading

funny-falcon commented May 4, 2016

funny-falcon commented May 6, 2016

bungle commented May 9, 2016 • edited Loading

data-man commented Jun 27, 2016

funny-falcon commented Jun 27, 2016

refi64 commented Jun 27, 2016

funny-falcon commented Jun 27, 2016 • edited Loading

data-man commented Jun 27, 2016

funny-falcon commented Jun 27, 2016 via email

funny-falcon commented Jun 27, 2016

funny-falcon commented Jun 27, 2016

bungle commented Jun 27, 2016

funny-falcon commented Jun 28, 2016

funny-falcon commented Jun 28, 2016 • edited Loading

funny-falcon commented Jun 28, 2016

data-man commented Jun 28, 2016

funny-falcon commented Jun 28, 2016

funny-falcon commented Nov 21, 2016

bungle commented Nov 21, 2016 • edited Loading

funny-falcon commented May 4, 2016 •

edited

Loading

bungle commented May 9, 2016 •

edited

Loading

funny-falcon commented Jun 27, 2016 •

edited

Loading

funny-falcon commented Jun 28, 2016 •

edited

Loading

bungle commented Nov 21, 2016 •

edited

Loading