Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Implement AA .byPair forward range #574

Closed
wants to merge 1 commit into from

Conversation

quickfur
Copy link
Member

Fixes http://d.puremagic.com/issues/show_bug.cgi?id=9119

This pull also adds forward range capability to byKey and byValue, since byPair already provides that functionality. Exporting a forward range API makes AA's usable with many more Phobos algorithms, and costs almost nothing.

@bearophile
Copy link

We already have TypeTuples and Tuples. Here you are suggesting to add a third kind of "tuples" just for associative arrays. In my opinion that's a bad idea, and we should go the opposite direction and use a single syntax/semantic for all tuples (http://wiki.dlang.org/DIP32 ). This means your code becomes like:

aa
.byPair
.map!(t{k, v} => text("key=", k, " value=", v))
.joiner("\n")
.writeln;

An alternative bad solution is to put byPair in std.range, and let it return Tuples.

@@ -465,6 +466,7 @@ public:
@property bool empty() { return _aaRangeEmpty(r); }
@property ref Key front() { return *cast(Key*)_aaRangeFrontKey(r); }
void popFront() { _aaRangePopFront(r); }
Result save() { return Result(_aaRangeSave(r)); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AARange is a value type so Result save() { return this; } should suffice.

@bearophile
Copy link

In an orthogonal language zip(aa.byKey, aa.byValue) should be the same as aa.byPair. Please don't introduce even more special cases in D, there are plenty already.

copy.impl = r.impl;
copy.current = r.current;
return copy;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function isn't really needed, see above.

@MartinNowak
Copy link
Member

Other than that, looks good.

@quickfur
Copy link
Member Author

Rebased to make a cleaner diff.

@quickfur
Copy link
Member Author

@bearophile: i'm not introducing a new kind of tuple here. I'm just making byPair return a range of structs that contain .key and .value. Or perhaps byPair is a bad name, what about byKeyValue?

The problem with making it return the same type as zip, is that zip is a Phobos-defined type, and druntime isn't allowed to reference Phobos types.

@quickfur
Copy link
Member Author

P.S. putting byPair in std.range is not really an option because the AA range is supposed to be an opaque type.

@MartinNowak
Copy link
Member

He's right that byPair should be a tuple range though returning a tuple in front wouldn't allow lvalue access.
This would also support auto expansion as in foreach (k, v; aa.byPair()) {}.

@quickfur
Copy link
Member Author

Wait, are we talking about Tuple, as in std.typecons.Tuple, or are we talking about compiler tuples, as in that which is wrapped by TypeTuple?

Come to think of it, I think it's wrong to provide ref access to AA keys; I'll have to fix that.

@quickfur
Copy link
Member Author

I experimented with some code where .front returns a tuple, but it has two drawbacks: (1) it involves copying the key and value, which may be expensive if either is large; (2) it doesn't allow lvalue access to AA values, which limits its usefulness.

What do you think? Should we leave byPair as-is, or should we use the tuple version?

@JakobOvrum
Copy link
Member

I think the only acceptable way this can go is to move it to Phobos and make it return a range of Tuple instances. Returning an opaque, domain-specific Pair struct with the non-generic accessors key and value is just too unwieldy in generic code to be useful. Tuple, on the other hand, has special recognition in many generic algorithms and even in the language (in the case of foreach).

I also recognize that Phobos shouldn't be able to access AA internals, so if that is to be enforced, I don't think this function has any future at all.

@bearophile, language tuples and Phobos tuples, while sharing the name "tuple", are conceptually very different. Let's not conflate them. Language tuples have nothing to do with this pull request.

@jmdavis
Copy link
Member

jmdavis commented Aug 16, 2013

Honestly, I have no problem with this returning a range of an opaque pair type with key and value. It arguably makes for a cleaner API in this case anyway. Sure, it would be nice if we could use Tuple, but we can't, and this seems like a perfectly reasonable alternative to me. And we don't really need the auto-expansion of byPair, because we already can iterate over AAs that way directly, because foreach supports AAs.

@JakobOvrum
Copy link
Member

Yes, the auto-expansion is a non-issue, but surely the point of having a forward range of key-value pairs is to use that range in generic code. Since the element type is conceptually a tuple exactly like Tuple, yet of a completely different type, there's just so much lost potential. Tuple can also have named fields while still functioning as a generic tuple, so this opaque struct is in no way cleaner. This way, byPair is basically only useful when followed by map because the opaque Pair type has no useful behaviour or information besides the domain-specific fields key and value. Every other Phobos algorithm like this one returns a Tuple, and this consistency enables generic algorithms like LuaD's input range overload of luad.state.LuaState.newTable.

I just don't think introducing this function and this opaque tuple-yet-not type is worth it, when the only gain is the ability to follow up with map.

@quickfur
Copy link
Member Author

The only practical solution I can see is to rename byPair to some internal name, then implement byPair in std.range that access AA internals indirectly via this internal name so that it can wrap the result in a Tuple.

One issue with using Tuple is that you will incur copying overhead for the key and value (I don't think Tuples can store ref members), so I'd rather keep the current Pair as-is, and have std.range wrap around it to provide a more generic interface.

@bearophile
Copy link

I'd like a bit of threading in the comments here.
@quickfur: your last comment contains an interesting idea worth exploring.

@JakobOvrum: a universal syntax like t{} could be used for both "language tuples" and "Phobos tuples".

@JakobOvrum
Copy link
Member

I like the idea of splitting it in two to land the user-facing interface in Phobos.

As for the reference semantics, whatever the solution is, I don't think providing a mutable reference to the key is acceptable. Surely if a key is mutated in-place the table must be rehashed. Being able to mutate the value in-place is probably worth trying to keep, though.

How about making it a range of a type that has the return-by-ref value property, but subtypes Tuple through an AliasThis declaration?

@quickfur
Copy link
Member Author

Alright, since this pull seems to be stuck, I think I'm going to split it up a little, as there are some non-controversial bits that I feel should be checked in regardless of whether byPair is accepted or not. One is to make byKey and byValue forward ranges in order to improve their usefulness, another is the disable ref semantics for byKey (or make it const ref, if it has to be).

@quickfur
Copy link
Member Author

Actually, I just realized that the compiler always forces AA keys to be const, so returning by ref is OK since the language doesn't allow mutation via const references.

In any case, the changes for making byKey and byValue forward ranges has been submitted: #582 . I'll rebase this pull soon.

@quickfur
Copy link
Member Author

Oops... what have I done? I rebased this pull onto #582 and now both commits are showing up here? Will this cause problems during merging?

@quickfur
Copy link
Member Author

Alright, now that the other pull has merged, a rebase solved the problem.

@quickfur
Copy link
Member Author

Alright, since the idea of putting a Tuple version of byPair in Phobos seems to be well-accepted, what should I rename byPair? byPairInternal? byPairRaw? Or maybe leave it as-is, and call the Phobos version byTuple?

@MartinNowak
Copy link
Member

Alright, since the idea of putting a Tuple version of byPair in Phobos seems to be well-accepted

I don't think that using std.typecons.Tuple and thus phobos is necessary.
Something along this line would work fine.

struct Pair
{
    Key* _key;
    Value* _value;

    @property ref Key key() { return *_key; }
    @property ref Value value() { return *_value; }
    @property auto tuple()
    {
        static struct Tuple(KV...) { KV _kv; alias _kv this; }
        return Tuple!(Key, Value)(*_key, *_value);
    }
    alias tuple this;
}

@quickfur
Copy link
Member Author

Oh? Is a Tuple defined this way interchangeable with the Phobos Tuple? I had thought of this before, but rejected it on the grounds that this Tuple would be a distinct type from std.typecons.tuple.Tuple.

@MartinNowak
Copy link
Member

What do you mean by interchangeable?

@quickfur
Copy link
Member Author

I agree.

@quickfur quickfur closed this Sep 25, 2013
@quickfur quickfur deleted the issue9119 branch October 9, 2013 16:42
@timotheecour
Copy link
Contributor

In lack of any "standardized" tuple in druntime I'm inclined to close this pull request in favor of zip(aa.byKey, aa.byValue).

We shouldn't sacrifice performance. Won't there be overhead with zip(aa.byKey, aa.byValue) ?

@JakobOvrum
Copy link
Member

We shouldn't sacrifice performance. Won't there be overhead with zip(aa.byKey, aa.byValue) ?

Yes, but worse, it's not guaranteed to work (but it does with the current implementation). It's a workaround, not a solution.

@MartinNowak
Copy link
Member

Why doesn't it work?

@JakobOvrum
Copy link
Member

Why doesn't it work?

byKey is not guaranteed to return keys in the same order as byValue returns values.

@MartinNowak
Copy link
Member

byKey is not guaranteed to return keys in the same order as byValue returns values.

You must not change the content of a container during iteration. And it's fairly unlikely that an implementation would ever chose different iteration orders.
Anyhow, it seems like we'll also need a tuple type for the upcoming AA implementation, so I'd like to delay this pull a little longer.

@schveiguy
Copy link
Member

In lack of any "standardized" tuple in druntime I'm inclined to close this pull request in favor of zip(aa.byKey, aa.byValue).

Adding a byWhatever property that gets a range that can access fields via tuples does not preclude the use of zip(aa.byKey, aa.byValue). "standardizing" on zip is not a good answer at all.

I think it's kind of foolish that we don't provide a range access for the key/value pairs that requires keeping 2 copies of the exact same range.

byKey is not guaranteed to return keys in the same order as byValue returns values.

It's defacto guaranteed. No pull request to create a separate iteration order for keys and values would ever be accepted by anyone on the team.

@quickfur
Copy link
Member Author

I think we should just implement byPair in druntime, and add a wrapper to turn .front into a Tuple in Phobos for those who insist on it. Personally, I have trouble seeing why generic code should depend on the exact type of std.typecons.Tuple when it really should work with any type that has the requisite fields/operations, but it's not an insurmountable issue. All it takes is to do a map!(a => tuple(a[0],a[1])) after byPair. Wrap this in a function if one insists:

auto byTuple(V,K)(V[K] aa)
{
    return aa.byPair().map!(a => tuple(a[0],a[1]));
}
int[string] myAa;
myAa.byTuple. /*... do whatever you wish here */

@bearophile
Copy link

  • It's important to have a range of tuples because tuples can be sorted, and they are supported by several other std.typecons functions and templates.
  • byTuple (or how you want to call it) needs to be efficient.
  • Perhaps it can be added a pairs() or similar property function similar to AA.values and AA.keys.
  • This proposal should be future-proof, allowing the future change to built-in tuples (instead of Phobos ones) if someday they get accepted into D.

@nordlow
Copy link
Contributor

nordlow commented Dec 16, 2014

What about renaming byPair to byItem? Python names it items(). I like it.

https://stackoverflow.com/questions/10458437/what-is-the-difference-between-dict-items-and-dict-iteritems

BTW: Why would someone ever want to have different orders for byKey and byValue?

@quickfur
Copy link
Member Author

The name itself isn't that important to me, whatever you guys vote on is fine.

What's more pertinent to me is to get this functionality checked into druntime, then we can have a Phobos wrapper to turn it into Tuples. I'm thinking that to justify having a Phobos wrapper, it should be more generic, i.e., it should be able to take any AA-like user-defined types and return a range of Tuple, provided that type has standard methods for iteration.

@schveiguy
Copy link
Member

It's important to have a range of tuples because tuples can be sorted, and they are supported by several other std.typecons functions and templates.

Result of byPair should be sortable. And my answer to the latter part is, it is not on druntime to make sure phobos is implemented properly. Let's add the mechanism to druntime, and worry about phobos inconsistencies later.

The entire phobos range library has delusions of treating char[] as not an array. Why can't we add exceptions for this type?

byTuple (or how you want to call it) needs to be efficient.

Yes, it should be.

Perhaps it can be added a pairs() or similar property function similar to AA.values and AA.keys.

I'm not caring too much about this. keys and values should not be a model to emulate.

This proposal should be future-proof, allowing the future change to built-in tuples (instead of Phobos ones) if someday they get accepted into D.

Having it be a voldemort type definitely allows us to change to whatever we want in the future.

@schveiguy
Copy link
Member

I'm thinking that to justify having a Phobos wrapper, it should be more generic

I think your map solution is far superior to adding yet another one-liner.

@quickfur
Copy link
Member Author

@nordlow I don't think anyone would deliberately make them return different orders; it would probably be something that arises from the implementation. It's quite unlikely, though conceivable, that perhaps one day somebody discovers a superior-performing AA implementation where keys/values are traversed in a non-deterministic order, so they are not guaranteed to be returned in the same order unless you specifically ask for byPair.

@quickfur
Copy link
Member Author

@schveiguy In that case, I should just reopen this pull then?

@schveiguy
Copy link
Member

@quickfur I don't know how @MartinNowak feels. He is the one who nixed it originally. I'm personally not yearning for this so much that I want to override his position, but perhaps he can be persuaded :)

@schveiguy
Copy link
Member

Re: order of iteration, it's not going to ever be traversed differently for keys than values, and it's not going to be better to traverse them in a different order in 2 separate calls. There just isn't any possibility of that happening.

@quickfur
Copy link
Member Author

@schveiguy I'm certainly not expecting it to ever happen, but it can happen, for example, if looking up an item moves it to a more efficient place (a common optimization technique for frequently-referenced objects in a container). So byKey and byValue would return the same order if they are called at the same time, but the order may change if you call them in sequence. Again, I don't expect this to ever happen, but the possibility is not zero, is all I'm saying.

@quickfur quickfur mentioned this pull request Dec 16, 2014
@quickfur
Copy link
Member Author

OK, seems I can't reopen this PR 'cos I deleted the original branch from github (though not from my local repo) and repushed it, so I opened a new PR instead: #1070

@JakobOvrum
Copy link
Member

It should be of an identifiable tuple type, not an is-tuple-like concept, as was discussed before:

There's no way to differentiate a type that is intended to be a tuple from a type that just happens to look like a tuple without checking for specific implementations. The capability to differentiate is essential for algorithms that wish to overload and support both inputs, supporting a wide range of element types.

@MartinNowak
Copy link
Member

It should be of an identifiable tuple type, not an is-tuple-like concept, as was discussed before:

We cannot use std.typecons.Tuple in druntime, if that is what you mean.
But byPair should iterate over tuples identifiable by supporting the following operations.

static assert(tup.length == 2);
static assert(is(typeof(tup[0]) == Key));
static assert(is(typeof(tup[1]) == Value));
static assert(is(typeof(tup[]) == TypeTuple!(Key, Value)));

@schveiguy
Copy link
Member

It should be of an identifiable tuple type, not an is-tuple-like concept, as was discussed before:

The fact remains, you can shoehorn into whatever type you want if you need a specific flavor of tuple. Phobos has plenty of tools to do this.

But byPair should iterate over tuples identifiable by supporting the following operations.

I think these are not hard requirements, but I think it's trivial to support them. I would stipulate that tup.key and tup.value should be properties as well.

@bearophile
Copy link

@schveiguy >The fact remains, you can shoehorn into whatever type you want if you need a specific flavor of tuple. Phobos has plenty of tools to do this.

The less kinds of tuples are used, the better.

@bearophile
Copy link

@schveiguy >Having it be a voldemort type definitely allows us to change to whatever we want in the future.

Are hypothetical future built-in tuples going to have field names like key and value? If the answer is probably negative, then the new proposal doesn't seem future-proof.

@MartinNowak
Copy link
Member

Are hypothetical future built-in tuples going to have field names like key and value? If the answer is probably negative, then the new proposal doesn't seem future-proof.

Right, we shouldn't support access via .key and .value.

@schveiguy
Copy link
Member

Right, we shouldn't support access via .key and .value.

Yes we should.

  1. The idea that at some point we will return just a builtin tuple means no ref access
  2. D has plenty of capability to return something that devolves to something else (e.g. @quickfur's new PR).

@schveiguy
Copy link
Member

The less kinds of tuples are used, the better.

This is not a new kind of tuple. The fact that std.typecons.isTuple relies on the name of the template defined in std.typecons is flawed in so many ways.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
8 participants