-
-
Notifications
You must be signed in to change notification settings - Fork 416
Proof of Concept - library AA #1282
Conversation
This will also allow us to provide a nice documentation for the AA. Example from the unittest AA!(int, int) aa;
assert(aa.length == 0);
aa[0] = 1;
assert(aa.length == 1 && aa[0] == 1);
aa[1] = 2;
assert(aa.length == 2 && aa[1] == 2);
import core.stdc.stdio;
int[int] rtaa = aa.toBuiltinAA();
assert(rtaa.length == 2);
puts("length");
assert(rtaa[0] == 1);
assert(rtaa[1] == 2);
rtaa[2] = 3;
assert(aa[2] == 3); |
Forum discussion http://forum.dlang.org/post/mjsma6$196h$1@digitalmars.com |
impl = new Impl(INIT_NUM_BUCKETS); | ||
|
||
// get hash and bucket for key | ||
immutable hash = hashOf(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cause an error because filled
is broken.
Looks like it should be hashOf(key) | HASH_FILLED_MARK
. This fix my unittests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed that's how it's done here https://github.com/D-Programming-Language/druntime/blob/64cceb13fb16556c00c1993e8621a2241d667367/src/rt/aaA.d#L305.
Please do not unite future commits, so I can synchronise this PR with the mirror https://github.com/arexeu/aammm |
This is really just a prototype to get feedback for the concept. |
@MartinNowak |
Slower than what, the builtin runtime AA?
It's using open addressing instead of separate chaining, so findSlot is the main change. |
Yes.
Looks like they both use open addressing: // find the first slot to insert a value with hash
inout(Bucket)* findSlotInsert(size_t hash) inout pure nothrow @nogc
{
for (size_t i = hash & mask, j = 1;; ++j)
{
if (!buckets[i].filled)
return &buckets[i];
i = (i + j) & mask;
}
}
// lookup a key
inout(Bucket)* findSlotLookup(size_t hash, in Key key) inout
{
for (size_t i = hash & mask, j = 1;; ++j)
{
if (buckets[i].hash == hash && key == buckets[i].entry.key)
return &buckets[i];
else if (buckets[i].empty)
return null;
i = (i + j) & mask;
}
} https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L101 // find the first slot to insert a value with hash
inout(Bucket)* findSlotInsert(size_t hash) inout pure nothrow @nogc
{
for (size_t i = hash & mask, j = 1;; ++j)
{
if (!buckets[i].filled)
return &buckets[i];
i = (i + j) & mask;
}
}
// lookup a key
inout(Bucket)* findSlotLookup(size_t hash, in void* pkey, in TypeInfo keyti) inout
{
for (size_t i = hash & mask, j = 1;; ++j)
{
if (buckets[i].hash == hash && keyti.equals(pkey, buckets[i].entry))
return &buckets[i];
else if (buckets[i].empty)
return null;
i = (i + j) & mask;
}
} |
@MartinNowak where is this going ?
|
I still think it's the best way forward, but I didn't yet drew enough attention on this, and currently have different priorities myself. |
How about rebasing this and seeing where we are with it? It looks like a great concept. |
- core.aa with a prototype of a strongly typed library AA - vtable interface for runtime AA to use new AA We can't easily turn the builtin AA into a library type because that would involve too many semantic changes and break a lot of code due to attribute correctness. Hacking around those issues is very difficult, and no easy solution exists for the `++aa[key1][key2]` case. This proof of concept adds a new strongly typed AA type to core.aa that can be cheaply converted to the builtin AA using it's toBuiltinAA method. The runtime knows about core.aa and will use a vtable interface to forward all operations on such a converted AA to core.aa (this might involve casting around attribute as done by the current AA). The implementation is supposed to be a general purpose AA with good performance for a broad amount of use-cases. It won't be parameterizable (allocator, load factor, comparison), because in the long-term it's intended to become a replacement for the builtin AA. Using a typed AA allows to immediately benefit from a huge performance improvement due to dropping the virtual typeinfo interface for hash calculation and key comparison, more efficient initialization (e.g. move construction), and optimizations for small keys and values. It also supports to precompute immutable AAs during CTFE and storing them in the data segment. This provides a strong incentive to no longer use the magic semantics of the builtin AAs, which at some point should be slowly deprecated to prepare the switch. Before doing that, we can extend core.aa with support for the builtin AA literal syntax as soon as this becomes available to any UDT (see [ER 11658](https://issues.dlang.org/show_bug.cgi?id=11658)). Even if we never manage to fully make the switch, this one-way compatible type will be a huge improvement for many programs. A more specializeable AA should be implemented as std.container.aa.
I did rebase the PR, but that's not the point. |
What I'd like to see is a fully templated alternate version in druntime that people can use, and over time make it indistinguishable enough from the builtin ones that they can be eventually merged. By doing it as a library type the problems can be found and dealt with without the breakages we've had with previous attempts. This appears to be what you're doing (I don't thoroughly understand it). |
This mostly means deprecating magic and broken behavior of the builtin AA, see first paragraph of description.
We already know most of them, so it's mostly a question on how to deal with them.
The plan consists of 4 essential parts.
|
Please allow new AA implementation to
This would help to build server software |
Yes, more that 4B elements might make sense for certain use-cases, but might also be left for specialized implementations since it's a performance liability for the majority of users.
No, it's supposed to become the language AA replacement and the semantics seem just too different.
Writing a hash table is a fairly simple undertaking. As mentioned initially, core.aa is supposed to be a general purpose hash table w/ good performance for common use-cases. Depending on your use-case and performance needs, you might have to use something more specialized. A good advice would be to use a DoS-proof hashing functions, e.g. SipHash. What we should do is adding a fully configurable/templated AA to |
We can't easily turn the builtin AA into a library type because that
would involve too many semantic changes and break a lot of code
due to attribute correctness. Hacking around those issues is very difficult,
and no easy solution exists for the
++aa[key1][key2]
case.This proof of concept adds a new strongly typed AA type to core.aa that can
be cheaply converted to the builtin AA using it's toBuiltinAA method.
The runtime knows about core.aa and will use a vtable interface to forward
all operations on such a converted AA to core.aa (this might involve casting
around attribute as done by the current AA).
The implementation is supposed to be a general purpose AA with good performance for a
broad amount of use-cases. It won't be parameterizable (allocator, load factor, comparison),
because in the long-term it's intended to become a replacement for the builtin AA.
Using a typed AA allows to immediately benefit from a huge performance improvement
due to dropping the virtual typeinfo interface for hash calculation and key comparison,
more efficient initialization (e.g. move construction), and optimizations for small keys
and values. It also supports to precompute immutable AAs during CTFE and storing them in
the data segment.
This provides a strong incentive to no longer use the magic semantics of the builtin AAs,
which at some point should be slowly deprecated to prepare the switch.
Before doing that, we can extend core.aa with support for the builtin AA literal syntax
as soon as this becomes available to any UDT (see ER 11658).
Even if we never manage to fully make the switch, this one-way compatible type will
be a huge improvement for many programs.
A more specializeable AA should be implemented as std.container.aa.