-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantics of start / end keywords for sequence operations #843
Comments
I think that all the collection operations are assumed to be implemented via I think this is also the better approach from a software engineering perspective. If we do something special beyond a naive implementation that uses
|
Because of that difference in the way |
I feel like you've presented a lot of strawman arguments. Whether or not As for:
We have to specify the semantics, no matter what they are. And Gwydion already demonstrates that even within a single codebase, without a clear definition of those semantics, different specializations will end up implementing different behavior. That's why I'd like to see a single semantics for how define sealed generic string-equal?
(string1 :: <string>, string2 :: <string>, #key start1, end1, start2, end2, test)
=> (equal? :: <boolean>); I'd like to be able to compare a chunk of a string without having to worry at the call site about whether or not I've gone beyond the bounds of one of the strings. That's a pretty reasonable thing, otherwise, I end up with if (string-equal?(uri, "//", start1: scheme-end, end1: min(uri.size, scheme-end + 2))) Looking afield more at other languages ... In Apple's Python is pretty lenient. C++ I don't want to try to claim that one side is on the side of "better software engineering". There are valid arguments for both behaviors and there's no reason to try have one side try to claim some moral high ground. It isn't productive. As for |
I would suggest to use the above implementation of copy-sequence for string from Gwydion with end: and start: keywords bounded by the length of the sequence as the definition. This would match programmers expectations and common-lisp semantic best. An out-of-bounds access situation would never occur. |
I think that they should all signal an error if:
Silently doing the wrong thing, is the wrong thing. The fill! situation on stretchy sequences… extending the sequence might be right. |
Returning an empty sequence is the right thing for all 3 'error' cases. Compare to the CL-HyperSpec: http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec/Body/glo_b.html#bounding_index_designator bounding index n. (of a sequence with length n) either of a conceptual pair of integers, istart and iend, respectively called the bounding index designator (for a sequence) one of two objects that, taken together as an ordered pair, behave as a designator for bounding indices of the sequence; that is, they denote bounding indices of the sequence, and are either: an integer (denoting itself) and nil (denoting the length of the sequence), or two integers (each denoting themselves). |
let s = make(<string-stream>, contents: “hello”);
write(s, “goodbye”, start: 0) I think we can describe the different semantics in these terms:
I will use Swift range notation to describe usable start and end indexes. I can think of the following semantics (this would be easier if GitHub allowed tables or monospace font while editing): Conservative — Only indexes that are valid with
Virtual — The file or sequence is assumed to be as big as you need it to be.
Permissive — The system doesn’t complain if you access the file or sequence beyond its end.
Permissive/Virtual — If the file or sequence is stretchy, it is assumed to be as big as you need it to be. If not stretchy, the system doesn’t complain if you access it beyond its end.
Conservative/Virtual — If the file or sequence is stretchy, it is assumed to be as big as you need it to be. If not stretchy, only indexes that are valid with
Let’s consider some advantages and disadvantages.
Of these, then, the only semantic that does not make assumptions about desirability of pre-populated data or the equivalence of empty and null data is the Conservative semantic. More precisely, it does not prevent the application from making those distinctions. |
@BarAgent , that's a great comment, thanks! I think that you and @swmckay agree that the semantics described by your "Conservative" section are what you would prefer. @DHovekamp42, I don't see a clear statement in the CLHS that an empty sequence is what should be returned in those situations. It defines a bounded sequence, but it doesn't state what should happen when the bounds are invalid. This is true in the docs for subseq as well by my reading. I have no horse in this race and am not against that result at all. I just want something to be clearly specified and followed everywhere. An interesting extension to this question is when the bounds should be verified and a condition signaled. This is particularly true in the case of The current implementation does not verify the bounds prior to mutating the list, so you can end up with a situation where the list is mutated and a condition is signaled. |
Thanks to some help from http://www.lispworks.com/documentation/HyperSpec/Issues/iss332_w.htm is about out of bounds indices with While things like
beach indicates that:
The usage of the term should be prepared to signal an error is described here:
|
After rereading the given comments and the http://www.lispworks.com/documentation/HyperSpec/Issues/iss332_w.htm clean-up clarification in CLHS For the convenience I still prefer always working code - i.e. to get an empty sequence as plausible result from out-of-bounds indexing - but can confirm that current CL-implementations like LispWorks & CCL verify bounds and signal an out-of-bounds condition as an error. Also it will be easier to wrap some code around an error-signaling standard to get my desired behavior - by catching and handling out-of-bounds conditions accordingly - as the other way around. The case of having fill! on a leaving a modified list AND signaling a condition is worse - if possible the implementation should handle it in such a way that the mutation is done only after the indices have been verified as valid. |
By the way, code that runs without signaling an error is not the same What you want isn't really very sound engineering practice. I've learned --S On Tue, Jan 13, 2015 at 9:22 AM, Dieter Hovekamp notifications@github.com
|
Thanks @swmckay for elaborating and agreed to better have errors signaled than unpredictable results! My statement that an empty sequence is the right thing to return for the index out of bounds 'error' should be seen as conceptual semantic expectation on implicit lower and upper bounding pairs of start and end into a sequence as written up. Lifting this assumption puts the burden back to the user at each call as seen in the example So trading in readability and potentially expensive double checking on start & end limits in user & library code - for making the programmers assertions explicit. (The implicit semantic is working for me - but I may have just been lucky that I wasn't screwed up yet and read my simpler code with pleasure years later. It won't harm if I now have to write down my expectations in a kind of API to some library calls and/or make my "favors" explicit for others.) |
For the record, I'm very sympathetic to this kind of example. So your example would like like this: I just think the language itself should be more strict where possible, --S On Tue, Jan 13, 2015 at 3:45 PM, Dieter Hovekamp notifications@github.com
|
For the record, starts-with?: https://github.com/dylan-lang/strings/blob/master/strings.dylan#L426 For what it's worth, although I've been spoiled by the laxness of Python string operations over the past years, I'm totally on board with the conservative approach. The way I see it we can always provide a more lax implementation as part of a scripting library or similar. |
In the DRM, there are operations like
copy-sequence
andfill!
which are documented to takestart
andend
keyword arguments.In other libraries, there are things like
string-equal?
which do the same.None of these appear to be clearly specified as to what should happen when
end
is greater than the size of the sequence.It ends up that this is also an area where the implementations in Open Dylan and Gwydion Dylan weren't entirely consistent for the DRM-provided operations.
The DRM says (for copy-sequence):
This is in contrast to other operations which are documented to signal errors on out-of-bounds accesses or other conditions.
Creating a
copy-sequence("abc", 2, 5)
could be an entirely valid reading of the DRM ... and in fact, it is implemented that way in the specialization ofcopy-sequence
for<string>
in Gwydion:That said, it appears that other
copy-sequence
methods are not implemented in quite the same way in Gwydion Dylan. In Open Dylan, there isn't any ambiguity, they're all implemented to result in failures.For someone familiar with C and other libraries, this wasn't expected behavior for me. From
strncmp
:And from
strncpy
:There are a few different possible things that
copy-sequence
and other operations could do in this situation, but I'm not sure that just resulting in an error is the nicest, most convenient, or most expected of them.The text was updated successfully, but these errors were encountered: