Skip to content

Conversation

@non-Jedi
Copy link
Contributor

@non-Jedi non-Jedi commented Jun 5, 2020

Closes #36132.

skipuntil illustrates the purpose of this function much more clearly
than skipchars and improves discoverability.

My only concern with this change is that the skipuntil name now
mirrors the readuntil name without providing a similar
interface:

  1. readuntil only takes a Char/Int/String and finds
    the first occurrence of that rather than having the full flexibility
    of taking a predicate.
  2. readuntil has a signature of readuntil(io, searchedfor) rather
    than readuntil(searchedfor, io).
  3. readuntil is inclusive rather than exclusive. It includes the
    object being searched for and leaves the stream position after that
    object whereas skipuntil leaves the stream position before.

Point 2 means that point 1 can't be easily addressed by adding a
predicate-accepting version of readuntil (unless we're willing to
forgo do-block notation). Should I add a keyword-argument to
skipchars to toggle it between inclusive/exclusive to resolve point
3?

This was originally part of #36158 and is broken out now since the change is separate. If both are to be merged, I'd like to update this PR to make use of the function introduced in #36158.

skipuntil is similar to skipchars but advances a stream until
predicate returns true rather than until predicate returns false. This
should be more discoverable and improve clarity.
Closes JuliaLang#36132

The name skipchars does not clearly communicate the functionality
available.

Tests for skipchars were left in place to ensure the @deprecate macro
correctly converted those calls to using skipuntil. Although CI only
runs tests with --depwarn=error, I have confirmed these tests pass
successfully with --depwarn=yes/no locally.
@StefanKarpinski
Copy link
Member

@JeffBezanson, do you have any thoughts on API coherence here?

@non-Jedi
Copy link
Contributor Author

Polite bump to request any feedback on whether this is worthwhile to deal with before 2.0 given the issues described in OP.

@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label Jul 17, 2020
@StefanKarpinski
Copy link
Member

Marking for discussion on the next triage call.

@JeffBezanson
Copy link
Member

I think it's helpful to separate readuntil and skipchars --- I don't know how intentional it was, but the different names are good because the functions are used in very different ways. skipchars is used e.g. to skip whitespace, in which case you want the next read to give you the first non-whitespace character. But readuntil is used e.g. to read delimited fields, where you usually want to throw out the delimiter. So I'm fine with skipchars as-is. I find it very intuitive for the classic example of skipping whitespace.

@JeffBezanson
Copy link
Member

We also now have readeach which can be composed with Iterators.dropwhile.

@StefanKarpinski
Copy link
Member

Some notes from triage discussion:

  • Leaving the name as skipchars isn't appropriate since this change generalizes to not just skipping chars;
  • The name skipuntil gives me pause because it suggests that this should be like readuntil except that it skips data instead of reading it into a string.

@JeffBezanson feels that this function is fundamentally different from readuntil in that they want different behaviors with respect to whether the IO cursor is left before or after the sought value:

  • for readuntil you're reading a delimited field and you typically want to discard the delimiter and continue reading after it, which is exactly what readuntil defaults to;
  • for skipchars, let's say you're trying to skip whitespace, then you want the cursor after the last character that was isspace.

Note that this change doesn't generalize skip to other types, so changing the name from skipchars isn't actually necessitated by this change. If we were to also generalize this by adding a type argument, then the skipchars name would become inappropriate. In that case this could be made a method of skip which currently only has skip(io, count) methods. Since the natural sense of the predicate for skipping seems to be the opposite of what it is for readuntil, this could be called skipwhile instead. But again, that name change doesn't seem necessary unless the function is generalized to skip other things besides characters.

@non-Jedi
Copy link
Contributor Author

I don't understand some of these comments and would appreciate any help resolving my confusion.

  • Leaving the name as skipchars isn't appropriate since this change generalizes to not just skipping chars;

This pull-request does not generalize skipchars. All it does is reverse the meaning of the predicate passed to skipchars.

@JeffBezanson feels that this function is fundamentally different from readuntil in that they want different behaviors with respect to whether the IO cursor is left before or after the sought value:

  • for readuntil you're reading a delimited field and you typically want to discard the delimiter and continue reading after it, which is exactly what readuntil defaults to;
  • for skipchars, let's say you're trying to skip whitespace, then you want the cursor after the last character that was isspace.

I agree that both behaviors are desirable, but I don't think having separate functions for them makes for a desirable API. I think ideally both the skip... and the read... variants should accept a greedy keyword arg that controls whether the cursor ends before or after the first character not satisfying the condition. Ideally calling skipchars should be more efficient than calling readuntil because of not needing to allocate strings, but this is not currently the case in practice because of how IO is implemented.

Note that this change doesn't generalize skip to other types, so changing the name from skipchars isn't actually necessitated by this change.

The change as currently proposed is merely for api consistency and discover-ability without changing functionality at all. The fact that skipchars (name indicates to me that it should take as its arguments an IO and a list of chars to skip) takes a predicate while readuntil takes a delimiter feels particularly off.

If we were to also generalize this by adding a type argument, then the skipchars name would become inappropriate. In that case this could be made a method of skip which currently only has skip(io, count) methods. Since the natural sense of the predicate for skipping seems to be the opposite of what it is for readuntil, this could be called skipwhile instead. But again, that name change doesn't seem necessary unless the function is generalized to skip other things besides characters.

So is the overall conclusion here that this change isn't worthwhile for the mere sake of discoverability? Adding support for other types than Char does seem worthwhile and would be largely trivial. And calling the resulting function skipwhile (more clear/discoverable name than skipchars in my opinion) neatly sidesteps the issue of api consistency with readuntil. Maybe that's the best way forward?

On the alternative of making it an additional method of skip, I don't think that's a good idea unless we also make readuntil an additional method of read. It seems like there should be symmetry between the two.

@StefanKarpinski
Copy link
Member

I don't understand some of these comments and would appreciate any help resolving my confusion.

  • Leaving the name as skipchars isn't appropriate since this change generalizes to not just skipping chars;

This pull-request does not generalize skipchars. All it does is reverse the meaning of the predicate passed to skipchars.

I was mistaken when I wrote that. You're right, of course, that the PR doesn't generalize to non-Char iteration.

@brenhinkeller brenhinkeller added breaking This change will break code and removed triage This should be discussed on a triage call labels Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking This change will break code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

seekuntil function to mirror readuntil

4 participants