Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.

Conversation

MartinNowak
Copy link
Member

  • core.aa with a prototype of a strongly typed library AA
  • vtable interface for runtime AA to use new AA

We can't easily turn the builtin AA into a library type because that
would involve too many semantic changes and break a lot of code
due to attribute correctness. Hacking around those issues is very difficult,
and no easy solution exists for the ++aa[key1][key2] case.

This proof of concept adds a new strongly typed AA type to core.aa that can
be cheaply converted to the builtin AA using it's toBuiltinAA method.
The runtime knows about core.aa and will use a vtable interface to forward
all operations on such a converted AA to core.aa (this might involve casting
around attribute as done by the current AA).

The implementation is supposed to be a general purpose AA with good performance for a
broad amount of use-cases. It won't be parameterizable (allocator, load factor, comparison),
because in the long-term it's intended to become a replacement for the builtin AA.

Using a typed AA allows to immediately benefit from a huge performance improvement
due to dropping the virtual typeinfo interface for hash calculation and key comparison,
more efficient initialization (e.g. move construction), and optimizations for small keys
and values. It also supports to precompute immutable AAs during CTFE and storing them in
the data segment.

This provides a strong incentive to no longer use the magic semantics of the builtin AAs,
which at some point should be slowly deprecated to prepare the switch.
Before doing that, we can extend core.aa with support for the builtin AA literal syntax
as soon as this becomes available to any UDT (see ER 11658).

Even if we never manage to fully make the switch, this one-way compatible type will
be a huge improvement for many programs.

A more specializeable AA should be implemented as std.container.aa.

@MartinNowak
Copy link
Member Author

This will also allow us to provide a nice documentation for the AA.

Example from the unittest

AA!(int, int) aa;
assert(aa.length == 0);
aa[0] = 1;
assert(aa.length == 1 && aa[0] == 1);
aa[1] = 2;
assert(aa.length == 2 && aa[1] == 2);
import core.stdc.stdio;

int[int] rtaa = aa.toBuiltinAA();
assert(rtaa.length == 2);
puts("length");
assert(rtaa[0] == 1);
assert(rtaa[1] == 2);
rtaa[2] = 3;

assert(aa[2] == 3);

@MartinNowak
Copy link
Member Author

impl = new Impl(INIT_NUM_BUCKETS);

// get hash and bucket for key
immutable hash = hashOf(key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cause an error because filled is broken.
Looks like it should be hashOf(key) | HASH_FILLED_MARK. This fix my unittests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@9il
Copy link
Member

9il commented Aug 23, 2015

Please do not unite future commits, so I can synchronise this PR with the mirror https://github.com/arexeu/aammm

@MartinNowak
Copy link
Member Author

This is really just a prototype to get feedback for the concept.
Meanwhile the AA implementation was rewritten, and this implementation is no longer up to date.

@9il
Copy link
Member

9il commented Aug 26, 2015

This is really just a prototype to get feedback for the concept.
Meanwhile the AA implementation was rewritten, and this implementation is no longer up to date.

@MartinNowak findSlotInsert and findSlotLookup looks identical to current RT implementation. However this (templated) code works slower both with DMD and LDC.
Could you please point main new changes in algorithm?

@MartinNowak
Copy link
Member Author

However this (templated) code works slower both with DMD and LDC.

Slower than what, the builtin runtime AA?

Could you please point main new changes in algorithm?

It's using open addressing instead of separate chaining, so findSlot is the main change.
And the templated version is a lot faster b/c it must not use virtual methods with void* arguments for comparison.

@9il
Copy link
Member

9il commented Aug 26, 2015

Slower than what, the builtin runtime AA?

Yes.

It's using open addressing instead of separate chaining, so findSlot is the main change.

Looks like they both use open addressing:

This PR

 // find the first slot to insert a value with hash
 inout(Bucket)* findSlotInsert(size_t hash) inout pure nothrow @nogc
 {
     for (size_t i = hash & mask, j = 1;; ++j)
     {
         if (!buckets[i].filled)
             return &buckets[i];
         i = (i + j) & mask;
     }
 }
 // lookup a key
 inout(Bucket)* findSlotLookup(size_t hash, in Key key) inout
 {
     for (size_t i = hash & mask, j = 1;; ++j)
     {
         if (buckets[i].hash == hash && key == buckets[i].entry.key)
             return &buckets[i];
         else if (buckets[i].empty)
             return null;
         i = (i + j) & mask;
     }
 }

https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L101

 // find the first slot to insert a value with hash
inout(Bucket)* findSlotInsert(size_t hash) inout pure nothrow @nogc
{
    for (size_t i = hash & mask, j = 1;; ++j)
    {
        if (!buckets[i].filled)
            return &buckets[i];
        i = (i + j) & mask;
    }
}

// lookup a key
inout(Bucket)* findSlotLookup(size_t hash, in void* pkey, in TypeInfo keyti) inout
{
    for (size_t i = hash & mask, j = 1;; ++j)
    {
        if (buckets[i].hash == hash && keyti.equals(pkey, buckets[i].entry))
            return &buckets[i];
        else if (buckets[i].empty)
            return null;
        i = (i + j) & mask;
    }
}

@DmitryOlshansky
Copy link
Member

@MartinNowak where is this going ?

  • Close and wait for the remake of this pull
  • Wait on this for update

@MartinNowak
Copy link
Member Author

@MartinNowak where is this going ?

I still think it's the best way forward, but I didn't yet drew enough attention on this, and currently have different priorities myself.
To repeat the reasons why simply replacing the builtin AA won't work.
Proof of concept - library AA - D Programming Language Discussion Forum

@WalterBright
Copy link
Member

How about rebasing this and seeing where we are with it? It looks like a great concept.

- core.aa with a prototype of a strongly typed library AA
- vtable interface for runtime AA to use new AA

We can't easily turn the builtin AA into a library type because that
would involve too many semantic changes and break a lot of code
due to attribute correctness. Hacking around those issues is very difficult,
and no easy solution exists for the `++aa[key1][key2]` case.

This proof of concept adds a new strongly typed AA type to core.aa that can
be cheaply converted to the builtin AA using it's toBuiltinAA method.
The runtime knows about core.aa and will use a vtable interface to forward
all operations on such a converted AA to core.aa (this might involve casting
around attribute as done by the current AA).

The implementation is supposed to be a general purpose AA with good performance for a
broad amount of use-cases. It won't be parameterizable (allocator, load factor, comparison),
because in the long-term it's intended to become a replacement for the builtin AA.

Using a typed AA allows to immediately benefit from a huge performance improvement
due to dropping the virtual typeinfo interface for hash calculation and key comparison,
more efficient initialization (e.g. move construction), and optimizations for small keys
and values. It also supports to precompute immutable AAs during CTFE and storing them in
the data segment.

This provides a strong incentive to no longer use the magic semantics of the builtin AAs,
which at some point should be slowly deprecated to prepare the switch.
Before doing that, we can extend core.aa with support for the builtin AA literal syntax
as soon as this becomes available to any UDT (see [ER 11658](https://issues.dlang.org/show_bug.cgi?id=11658)).

Even if we never manage to fully make the switch, this one-way compatible type will
be a huge improvement for many programs.

A more specializeable AA should be implemented as std.container.aa.
@MartinNowak
Copy link
Member Author

How about rebasing this and seeing where we are with it? It looks like a great concept.

I did rebase the PR, but that's not the point.
The implementations is still a draft and needs some more testing and profiling/optimization.
If I correctly understood your comment as an approval of the concept, I'll allocate some time to finish the implementation in September. Meanwhile we can send this for another review to the newsgroup.

@WalterBright
Copy link
Member

What I'd like to see is a fully templated alternate version in druntime that people can use, and over time make it indistinguishable enough from the builtin ones that they can be eventually merged. By doing it as a library type the problems can be found and dealt with without the breakages we've had with previous attempts.

This appears to be what you're doing (I don't thoroughly understand it).

@MartinNowak
Copy link
Member Author

What I'd like to see is a fully templated alternate version in druntime that people can use, and over time make it indistinguishable enough from the builtin ones that they can be eventually merged.

This mostly means deprecating magic and broken behavior of the builtin AA, see first paragraph of description.

By doing it as a library type the problems can be found and dealt with without the breakages we've had with previous attempts.

We already know most of them, so it's mostly a question on how to deal with them.

This appears to be what you're doing (I don't thoroughly understand it).

The plan consists of 4 essential parts.

  • We provide a new core.aa library type w/ better performance, correct attributes, and CTFE-ability.
    In order to replace the usage of the builtin AA, people will have to fix attributes, and replace magic builtin-AA behaviors.
  • To facilitate the transition to the new library AA, we add a cheap to construct adapter for the builtin AA (see https://github.com/dlang/druntime/pull/1282/files#diff-fdc0da51523ff831dd6cbe33a5bb8b4cR382).
    This allows for compatibility of the new AA with existing libraries.
  • Before we can start to deprecate behaviors of the builtin AA, we have to add a decent way to construct UDTs from AA literals, see ER 11658 for a proposal.
  • When the behaviors are identical (fully deprecated), we can switch the builtin AA to the library type.

@9il
Copy link
Member

9il commented Jul 24, 2016

Please allow new AA implementation to

  • have more then 4K elements: we just need to switch from unit to size_t.
  • work with custom allocators from std.allocator.

This would help to build server software

@MartinNowak
Copy link
Member Author

MartinNowak commented Jul 24, 2016

Please allow new AA implementation to have more then 4K elements: we just need to switch from unit to size_t.

Yes, more that 4B elements might make sense for certain use-cases, but might also be left for specialized implementations since it's a performance liability for the majority of users.

work with custom allocators from std.allocator.

No, it's supposed to become the language AA replacement and the semantics seem just too different.
Also entangling this already complex plan w/ RC and escape analysis is a recipe for failure.

This would help to build server software

Writing a hash table is a fairly simple undertaking. As mentioned initially, core.aa is supposed to be a general purpose hash table w/ good performance for common use-cases. Depending on your use-case and performance needs, you might have to use something more specialized. A good advice would be to use a DoS-proof hashing functions, e.g. SipHash.
For languages used almost exclusively for web apps, using such a slower but more safe hash might be a good choice, but D is used in many different contexts and therefor we came to a different conclusion.

What we should do is adding a fully configurable/templated AA to std.container. Something that allows you to provide a custom hash function, a grow policy, tune the collision resolution, use allocators, ... Would be great to start this by gathering all the people that already forked this AA (or wrote their own), e.g. aammm and w0rps' dstruct.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Needs Rebase needs a `git rebase` performed Needs Work stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants