-
-
Notifications
You must be signed in to change notification settings - Fork 379
Dynamic arrays: Fix some doc errors, and introduce the term 'array reference' to reduce confusion. #623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I don't like 'slice' for this but I like the rest of these changes. - Where it says 'dynamic array' say 'slice' generally. - Fix some example code which didn't compile (especially around pointers). - Add code cases around "s = <foo>" where s is a static array. - Reorder - introduce slices early, 'slicing' much later. Move "Array Declarations" later still. Discussion thread: http://forum.dlang.org/post/loekwubzykmysrdzwpak@forum.dlang.org
This works better. Note the one case where "rvalue array reference" applies.
- Fix an error in expression.dd - the definition of identity for array references. - Use 'array reference' term instead of 'dynamic array' in most cases (but for a few, 'dynamic array' was more appropriate).
I'm completely against the terminology changes. |
fyi Jonathan's position (which he's summarized above, thanks Jonathan!) was already clear in the discussion thread. I submitted a PR anyway because he seems to be in the minority - most other people find the current terminology confusing and misleading - some extremely so. In rebuttal (copying from the discussion thread):
|
P.S. I'm submitting this PR for 2 possible reasons:
|
I think you've been to eager with replacing the terms. For example, "The variadic part is converted to a dynamic array" is IMO more clear than using "array reference" here. I'm also not happy with the word "reference". It has come up several times on D.learn that people were confused by the hybrid reference/value semantics. When people hear "reference", they expect that changes to the length are visible in all aliasing references, and they don't expect the current behaviour WRT to reallocation on appending. |
Yah, it doesn't quite look "array reference" is a winner. For one thing it would be quite overloading the notion of "reference", which in D already stands for "class object reference" and "ref parameter to function". Practical suggestion: separate this into two pull requests, one that fixes obvious mistakes in documentation and one that's more debatable. |
I like "array reference" if and only if we think the word "reference" is the same concept as is used in "class object reference". (Note I said "concept", obviously it's not the same implementation, because it refers to something that needs its length tracked). Compare with C arrays - a C array acts like an rvalue pointer, but we call that "an array". The fact that it's only a pointer is the root of its unsafety. D effectively adds a second field to this thing, but also (in the case of T[]) makes it an lvalue. The latter change is what makes it problematic to continue calling it "an array" - because the 1-to-1 mapping that C has is not present. So we need a name for the separate thing, and my thinking is that - just like with D objects - we have here a hidden repointable thing; in this case it's not just a pointer, but conceptually it seems the same. i.e. if you think of "reference" in terms of its hidden implementation, you'd disagree; if you think of "reference" as a first-class concept (which I admit I always found a bit weird, but here we are), then you'd probably agree. [Marc] "When people hear "reference", they expect that changes to the length are visible in all aliasing references" I had the same expectation and the term "reference" isn't what misled me there. People need to realize that in D a reference to an array (whatever we choose to call it) includes the length. This length is not a property of the underlying array. This is highly confusing and different from every other dynamic array I have used. But it is the truth in D, and it makes a certain amount of sense. If we can make it make sense, that is. So IMO we should stop claiming to fully support "resizing of dynamic arrays". I mentioned the example that claims to demonstrate this, yet only shows growing (not shrinking) and only a single reference (so hides the complexity). What D supports is semantics more akin to copy-on-write, which avoids copying so long as existing references to the same array are unaffected. [Marc] "and they don't expect the current behaviour WRT to reallocation on appending." Neither did I, and by my estimation almost nobody does. Calling the reference "a dynamic array" makes this hugely worse. |
@AndrewGodfrey The problem is that you're thinking of the underlying druntime buffer as being the array. It's not. It's just the memory that's currently backing the array. D's arrays act essentially the same as C++'s std::vector does save for slicing. Reducing its length will reduce the number of objects in the array, and it won't reallocate. Appending or increasing the length directly will avoid allocation if the buffer behind the array or std::vector has more space available. If it doesn't it reallocates, and you get a new buffer backing the array or std::vector. All of that behavior is consistent with the general concept of dynamic arrays work, making it so that D's dynamic arrays are as much dynamic arrays as C++ std::vector is. Where the differences between D's dynamic arrays and C++ std::vector come in is slices. Slices allow the dynamic arrays to share the buffers that back them until they have to reallocate. It's nothing like COW, because as long as the elements aren't The fact that dynamic arrays can affect each other if they're a slice of the same buffer can be a bit confusing for those who basically are expecting std::vector, because std::vector can't do slicing, and understanding exactly which operations can reallocate can affect when two dynamic arrays remain slices of each other, so understanding it can be important, but it's the same rules as std::vector except for the fact that another array / vector can affect whether the current one has any remaining capacity or not, because they're sharing memory. Given reactions like yours, clearly, the documentation can and should be improved, but I think that the terminology that it uses is correct and that you're only harming your understanding of it (and the understanding of others) by trying to insist that the buffer that backs a dynamic array is the dynamic array instead of |
First, thank you for engaging so thoroughly. This may be frustrating but it's important.
No, this is not the problem. I am not doing this out of confusion. IMO this is a difference of opinion about (a) whether T[] is a complete enough abstraction to warrant calling it "an array", or perhaps (b) a difference of opinion about whether that matters. My position is that the semantics of T[] are far from that of an array and so conflating the two does a lot of harm. Here's why it matters (i.e. (b)): The current state is a huge red flag to someone evaluating D. Many developers will not appreciate D's other strengths as much as I do, and so for them I believe this is a huge blocker in the adoption of D. Dynamic arrays are such a basic requirement, that getting them wrong or appearing to is a major liability. Returning to why T[] is not complete enough (i.e. (a)) - I've mentioned already T[] as lvalue (compared to std::vector whose operator= does not repoint - it copies the values, avoiding reallocation when possible; and compared to D's own static arrays which also do not repoint given the same syntax). Responding to other points above:
This is either trivially true because you define "slicing" to cover all the semantics differences between the two, or not true because you're glossing over some differences I've pointed out earlier. I'm not sure which is the case.
True but incomplete: std::vector doesn't reallocate, but it also doesn't modify its capacity. i.e. I understand that this is a conscious design choice brought about by the slicing semantics. Nevertheless, it means that T[] does not behave like a regular dynamic array.
I'm less bothered by this part but it is also a little incomplete. T[] and std::vector share the abstraction-breaking behavior that if reallocation happens, existing pointers to elements are invalid.
|
As mentioned in the forum, the whole terminology debate is bonk IMO. We are pretending that dynamic arrays, reference arrays or whatever we want to call them are different than slice when they aren't. |
@deadalnix: Here is an example:
The second line takes a slice and then uses array-set on the slice.
|
d IS a slice. d[] is not a slice, this is a bizarro construct that would scare any PL person that implicitly cast to a slice, but has an overloaded behavior when assigned to. If you want to name something, name that. |
@deadalnix Okay let's say d is a slice. So you can't array-set to a slice. |
Seems like it's not going to fly. Ok to close? |
Discussion thread: http://forum.dlang.org/post/loekwubzykmysrdzwpak@forum.dlang.org
Changes:
Rationale:
reallocation occurs. This behavior seems reasonable, but it is not expected behavior
for a dynamic array.
that these types often refer to. Those act like arrays and are somewhat dynamic.
incorrect, but I'd misread. Changing it to clarify.)
In arrays.dd:
seems to think this is meaningless, but I found otherwise through experimentation.
Notes (things I didn't do):
an exception to what it says there.
the need for it is eliminated but perhaps not.
I feel overstates it. The example given only shows growing, not shrinking, and
it shows only a single array reference giving the impression of a one-to-one relationship
where there is actually in general a many-to-one relationship.
in the .dd file. I noticed one case in overview.dd (but for that one, I prefer to leave it saying
"dynamic array").
I'm inclined to think we don't need the "static" qualifier anymore, so
maybe these should now read "array or array reference".