Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashObjIntMap performance problem #35

Closed
vsonnier opened this issue Feb 7, 2015 · 3 comments
Closed

HashObjIntMap performance problem #35

vsonnier opened this issue Feb 7, 2015 · 3 comments

Comments

@vsonnier
Copy link

vsonnier commented Feb 7, 2015

Hello Roman,
I'm currently trying to brew my own benchmarks using Mikhail Vorontsov's as inspiration (http://java-performance.info) using my own put() version modified for different inputs. Well, it turns out Koloboke is badly misbehaving, to the point of pathological, if you compare to Fastutil for instance.
I've reproduced the case using a Random distrib [- 6000000; 6000000] with 6000000 as target preallocated size, 0.75 load factor, using the same allocation logic as Mikhail did:

testedMap = HashObjIntMaps. getDefaultFactory()
.withHashConfig(HashConfig.fromLoads(KolobokeObjectIntIssue.LOAD_FACTOR / 2, KolobokeObjectIntIssue.LOAD_FACTOR, KolobokeObjectIntIssue.LOAD_FACTOR))
.newMutableMap(KolobokeObjectIntIssue.TARGET_SIZE);

Here is the code:
https://gist.github.com/vsonnier/35b68b109528884e2bdd

The int ==> int version does not seem to suffer the same problem, under the same input.

@leventov
Copy link
Owner

leventov commented Feb 7, 2015

This is because keys set is artificial and have unfortunate hash code distribution. Integer.hashCode() is the value primitive itself. If hash codes are not mixed, this performs really bad with linear hashing.

By default Koloboke doesn't mix Object keys' hashCodes (Koloboke does mix only primitive keys by default), assuming hash distribution is good because hashCode() on the keys is implemented properly. This doesn't hold in the cases like your test; but there is ability to workaround that by providing custom equivalence, if you cannot fix the keys' hashCode() implementation, like in the case with Integer:

        this.testedMap = HashObjIntMaps.<Integer> getDefaultFactory()
                .withHashConfig(HashConfig.fromLoads(KolobokeObjectIntIssue.LOAD_FACTOR / 2,
                        KolobokeObjectIntIssue.LOAD_FACTOR, KolobokeObjectIntIssue.LOAD_FACTOR))
                .withKeyEquivalence(new StatelessEquivalence<Integer>() {
                    @Override
                    public boolean equivalent(@Nonnull Integer i1, @Nonnull Integer i2) {
                        return i1.intValue() == i2.intValue();
                    }

                    @Override
                    public int hash(@Nonnull Integer i) {
                        return i * -1640531527; // magic mix
                    }
                })
                .newMutableMap(KolobokeObjectIntIssue.TARGET_SIZE);

Apparently, Fastutil does always mix keys' hash codes, including when they are Objects.

Applying this fix, you will see that libs are performing ~ equally. (Your might see considerably different numbers, but that's because of poor benchmarking; implement tests in JMH and you will see that they are indeed performing very close.)

@leventov leventov closed this as completed Feb 7, 2015
leventov added a commit to leventov/hashmapTest that referenced this issue Feb 7, 2015
@vsonnier
Copy link
Author

vsonnier commented Feb 7, 2015

Thanks for your reply, Roman !
I think I see your point : trust the hashCode(), else switch to "custom hash" approach. Why not ? Performance-wise, inlining offers comparable performance for both in the end. The only problem I see is that both are not stricly equivalent, since the custom hash may only access Object public data.

Indeed, Fastutil, HPPC* are re-hashing systematically, so are more resillient to bad-behaving Objects.
Anyway I'll be able to include Koloboke back to my JMH set, now.

@leventov
Copy link
Owner

leventov commented Feb 8, 2015

The only problem I see is that both are not stricly equivalent, since the custom hash may only access Object public data.

@vsonnier if incapsulated object has poor hash code implemention (for example, not accounting all fields, participating equivalence checks), Fastutil / HPPC approach won't help either, because it also could work only with hash value, returned from keys' hashCode() calls. I. e. it is strictly equivalent to Koloboke with cusom key equivalence (mixing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants