Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-3268] Hash-Dependent Iteration Order Produces Quadratic Behaviour in Dictionary/Set #45856

Closed
swift-ci opened this issue Nov 24, 2016 · 12 comments
Assignees
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. performance standard library

Comments

@swift-ci
Copy link
Collaborator

swift-ci commented Nov 24, 2016

Previous ID SR-3268
Radar None
Original Reporter Gankro (JIRA User)
Type Bug
Status Resolved
Resolution Done

Attachment: Download

Environment

Should be true in at least Swift 2 and Swift 3, and all platforms.

Additional Detail from JIRA
Votes 2
Component/s Standard Library
Labels Bug, Performance
Assignee @lorentey
Priority Medium

md5: ee40175f0f6337a3ee378a50f0a30cdf

Issue Description:

Rust recently ran into a nasty bug which led to quadratic behaviour in appending the contents of one HashMap to another. Swift has the same problem for Dictionary/Set. The problem is explained excellently here: http://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion

In essence, the problem requires:

  • iteration order to be tied to where elements hash in the collection (linear scan of the array)

  • a linear-probing-based collision resolution algorithm (FCFS for Swift, Robin Hood for Rust)

  • hashes mapped to indices via modulo (masking off bits)

  • max load factor greater than 50% (75% for Swift, 90% for Rust) (? not 100% sure if <50% is adequate)

  • inserting the elements of one map into another with less capacity (as would naturally occur when inserting all the elements of a map into a freshly created one without reserving space)

With all of these combined, the larger map has become a soft hash collision computer for the smaller map. Elements near the start of the first and second half of the larger map are guaranteed to be mapped near the start of the smaller map. Once iteration of the larger map hits the midpoint of its array, this leads to tons of collisions at the start of the smaller map's array, snowballing into linear search times for empty buckets.

Rust's solution was to use a different seed for its hash function for every hashmap (in fact, older versions of Rust already did this, the change to a global seed was an attempted optimization). I'm not sure if this is even possible while we're using Foundation's hashcode system.

Alternatively you could come up with some iteration scheme which is decoupled from the hash. Naively shifting the linear scan is inadequate. This is likely to hurt iteration performance under normal operation, as it uses cache worse.

Alternatively we could move to a fancier collision resolution scheme quadratic probing?). This is likely to hurt performance on normal operation as it trashes the cache.

Alternatively we could lower the load factor. This will hurt iteration speed and increase memory usage in common usage.

We could also just decide that this code is "bad" and it performing poorly is acceptable. I'm not a fan of this solution, but y'know, it's there.

Here's a sample program exhibiting the problematic behaviour.

let size = Int(CommandLine.arguments[1])!

var dict1 = [Int:Int]()

for i in 0..<size {
    dict1[i] = i * 2
}

var dict2 = [Int:Int]()

for (k, v) in dict1 {
    dict2[k] = v
}

print("computed value: \(dict2[size/2]!)")

Running this on different input sizes shows clearly quadratic behaviour:

swiftc -O dict.swift

time ./dict 1000000
computed value: 1000000

real    0m0.250s
user    0m0.197s
sys 0m0.045s

--------
time ./dict 2000000
computed value: 2000000

real    0m0.498s
user    0m0.400s
sys 0m0.094s

-----
time ./dict 3000000
computed value: 3000000

real    0m9.617s
user    0m9.498s
sys 0m0.102s

-----------
time ./dict 4000000
computed value: 4000000

real    0m1.089s
user    0m0.880s
sys 0m0.195s

---------
time ./dict 5000000
computed value: 5000000

real    4m45.307s
user    4m43.069s
sys 0m1.143s

The dip in run time at 4,000,000 is a result of the behaviour being very load factor sensitive.

@jtbandes
Copy link
Collaborator

jtbandes commented Nov 24, 2016

This is pretty interesting. It would be great to get some benchmarks added here: https://github.com/apple/swift/tree/master/benchmark

@swift-ci
Copy link
Collaborator Author

swift-ci commented Nov 24, 2016

Comment by Alexis Beingessner (JIRA)

Adding a benchmark now would probably be inappropriate as it would DDOS the benchmarking infra. One should definitely be added with a fix.

@jtbandes
Copy link
Collaborator

jtbandes commented Nov 24, 2016

It could be a small benchmark for now 🙂

@natecook1000
Copy link
Member

natecook1000 commented Dec 8, 2016

This is such a neat bug. I read the Rust article but hadn’t looked at whether it affects Swift collections as well—good find!

One way to fix this without storing an additional per-collection seed is to XOR the storage capacity with an element's hash value before squeezing. If the source and destination collections have the same capacity the quadratic behavior doesn't show up, and if they don't have the same capacity that will guarantee that they get a different hash order for the same elements, eliminating the blow ups. There's an additional cost for bucket lookups this way but you avoid the quadratic behavior.

@milseman
Copy link
Mannequin

milseman mannequin commented Mar 12, 2018

You don't need to store a per-collection seed, because each collection already has a source of identity: the memory address of its storage allocation. You can use relevant bits from there, e.g.

 addressBits[48:4] 

to seed hashing.

@lorentey
Copy link
Member

lorentey commented Mar 12, 2018

We're certainly susceptible to this effect; I attached some benchmarks (from #15017 (comment) demonstrating quadratic behavior for `dict.filter { _ in true }`. It's a wonderful gem of a benchmark curve, beautifully alternating between linear and quadratic segments. It's almost a shame we need to get rid of it! 🙂

Set and Dictionary currently use a global random seed; we should perturb that seed with something to eliminate this issue. Simply XORing either the storage address or the capacity to the seed would eliminate the O(n^2) behavior.

Per-instance seeding with the storage address is generally a good idea, except in cases where we want repeatable results – in test suites and the like. In those environments, we disable random hash seeding, but these operations should still not regress to quadratic performance. So when a hash seed override is in effect, we should still use the storage capacity as the hash seed.

@milseman
Copy link
Mannequin

milseman mannequin commented Mar 12, 2018

Is the storage capacity the requested capacity, or the real spare capacity of the allocation? If the latter, then that's still a source of instability.

edit: This is assuming that malloc_size could give different answers for the same sized malloc.

@lorentey
Copy link
Member

lorentey commented Mar 12, 2018

That's true, but Set/Dictionary ordering already depends on the allocated capacity, so making the dependency even tighter would not cause any regressions.

Set/Dictionary is pretty good about having repeatable allocation patterns, so capacity-dependent ordering doesn't typically lead to nondeterminism in tests that are otherwise deterministic.

edit: We don't currently make use of any extra space that malloc gives us. Our hash tables are restricted to power-of-two sizes, which makes it difficult to change this.

@milseman
Copy link
Mannequin

milseman mannequin commented Mar 13, 2018

Why are power-of-two sizes required? I know that such restrictions are common in quadratically probed hash tables (e.g. LLVM's), so that the traversal is more evenly distributed modulo modulo, but am unfamiliar with why a linearly probed hash table would require this.

@swift-ci
Copy link
Collaborator Author

swift-ci commented Mar 13, 2018

Comment by Alexis Beingessner (JIRA)

> Why are power-of-two sizes required?

It's so we can store the cap as `mask = realCap - 1` and implement all accesses as `idx & mask`.

@milseman
Copy link
Mannequin

milseman mannequin commented Mar 13, 2018

That sounds like a good reason.

@lorentey
Copy link
Member

lorentey commented Mar 19, 2018

Fixed in #15239; the "Quadratic.attaresult" chart (attached) plots the per-element costs of Dictionary.filter { _ in true } before and after the fix.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. performance standard library
Projects
None yet
Development

No branches or pull requests

4 participants