-
Notifications
You must be signed in to change notification settings - Fork 418
-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I/O module: replace ioLiteral and ioNewline #19487
Comments
Would you be willing to put together an example showing these proposed features in action? I'm finding it too abstract to reason about very well. |
Sure. Suppose you are working with a channel configured with a JSON encoder but you want to read some stuff that is not JSON. E.g. your input file is this:
This is not in JSON format on its own, but You could write the above file like this var output = ... // create a channel with JSON encoding
var A:[1..3] int = 1..3;
output.writeLiteral("arraydata"); // writes literally arraydata, ignoring Encoder
output.writeNewline(); // writes the newline, ignoring Encoder
output.write(A); // writes [1,2,3] Note in the above, since the channel is configured for JSON encoding, simply writing output.write("arraydata"); would produce As a result, writeLiteral / writeNewline will be useful when implementing the JSON encoder itself. As we have talked about in other issues, I do expect to be able to choose a different Encoder/Decoder for specific write calls. However I'd like to have these more purpose-built functions available. For one thing, that way, when writing the JSON Encoder, it doesn't have to use the default encoder internally. Now a reading example: var input = ... // create a channel with JSON decoding
var A:[1..3] int;
if input.matchLiteral("arraydata") &&
input.matchNewline() {
// this block runs if the next data in input is literally arraydata followed by newline
input.read(A); // reads [1,2,3]
} |
Great, thanks. I'm a fan of the general approach, though I'd prefer to replace |
Re readLiteral vs matchLiteral, actually there are two reasons we might not want readLiteral:
I think I'd rather have matchLiteral and have it return false if the literal wasn't found (instead of having readLiteral and having it return false only on EOF and throw if the literal wasn't found). |
Can it return the literal it just read? Which is redundant, but consistent.
My 2 cents: for this method in question it sounds OK to me if it throws a FormatError if the literal doesn't match rather than returning false. But I am not familiar with the use cases of these methods to argue strongly about it. |
Since it's reading a literal, we already know what it will read. So the only useful thing it can do is to tell you if it was able to read the literal or not. Hence my preference for it to return |
In terms of an example that would use |
This may be tangential enough for its own issue, and may not be relevant for stabilization, but: Would it make sense to have a kind of The ability to write Or the ability to write |
Regarding So, IMO anyway, it's pretty clear what the signatures of these methods should be. What I think needs more discussion is which names we would like: Option 1: matchLiteral / matchNewline 👍proc reader.matchLiteral(lit: string): bool throws;
proc reader.matchLiteral(lit: bytes): bool throws;
proc reader.matchNewline(): bool throws; Option 2: readLiteral / readNewline 🎉proc reader.readLiteral(lit: string): bool throws;
proc reader.readLiteral(lit: bytes): bool throws;
proc reader.readNewline(): bool throws; Option 3: consumeLiteral / consumeNewline 🚀proc reader.consumeLiteral(lit: string): bool throws;
proc reader.consumeLiteral(lit: bytes): bool throws;
proc reader.consumeNewline(): bool throws; |
Ben, I take it your concept is motivated by wanting to process/consume a value of a given type but not spend the memory on it (?). Like maybe I'd want to do For me, this seems like the sort of thing we might keep in mind but put off until we / users have a motivating use-case for it (since adding it later could be a non-breaking change?). |
Yeah, that's the gist of it. I agree that we should put it off. I prefer Option 1 (match*) of the three. |
I updated #19487 (comment) to give each option an emoji so it is easy for people to indicate your preference. I turned the 👍 you already put on it into a vote for Option 1. |
I'm working on fixing up my IO operator deprecation PR, and finding that the pair of Also, I just want to confirm: do you still see |
Now the voting is very close. Perhaps the next step is to ask for opinions from more people.
Yes, I think that's important, I had just left it out of the name discussion. It was in the original post:
Today my reaction is that it should be |
I'm fine with I also noticed that ioNewline has a field |
No, I want |
Reduce IO operator usage with ioLiteral and ioNewline This PR replaces a number of uses of the IO operator (``<~>``) with calls to ``channel._readLiteral`` and ``channel._writeLiteral``, and their newline counterparts. These methods are no-doc'd draft versions of those proposed in #19487 and their names are subject to change. The relevant changes are generally fall under the following patterns: - <~> becoming ``f._readLiteral`` - <~> becoming ``f._writeLiteral`` - <~> becoming a wrapper for a nested ``rwLiteral`` function that handles both reads and writes - Replacing some uses of ``readIt`` with ``f._readLiteral`` Some convenient cleanup tasks were done at this time: - Removed an old, unused, untested branch of ``chpl_serialReadWriteRectangularHelper`` - Simplified the literal constants in ``_tuple._readWriteHelper`` - Some ``var``s becoming ``const`` - bug fix for default readThis implementation for unions [reviewed-by @lydia-duncan]
Should use IO;
proc main() {
var f = openmem();
{
var w = f.writer();
for i in 1..100 do w.write("\n");
w.write(5);
}
{
var r = f.reader();
var ionl = new ioNewline();
r.read(ionl);
var x : int;
r.read(x);
writeln(x); // read x as '5'
}
} Though this could certainly be a bug. I had overlooked the bool-returning functionality here, and I'm less certain it's the right choice. While returning // reading the end of a list bounded by square brackets...
if !r.readLiteral("]") then throw <something>; // probably a FormatError? I would find this more annoying than using try-catch to speculatively test for the presence of a literal. Could we address this by having both I think this question extends beyond just literals though, so maybe we're asking "How should users speculatively read from a channel?". |
Offline we discussed the naming of these methods, and |
This argument doesn't carry a lot of weight for me—specifically, I think this sort of thing implies to lots of routines (doesn't it?), and would prefer that we address the concern by implementing the proposed warning for ignored return values rather than designing in a way that assumes users won't. Specifically (and I've lost track now), wouldn't the same concern potentially apply to the I could also imagine cases where the lack of a match wouldn't indicate an error so much as the end of a section, such as the last thing in a CSV file (s.t. being able to look at a boolean might be convenient and my reaction wouldn't be to throw but to terminate a
I think that's a good question (and potentially relates to my varargs read example above). |
Yes, and I believe I've expressed concern about that case elsewhere as well. If I expect to read something, and cannot, then that feels more like an error to be thrown than a true/false result. If I have to check a boolean result here then I need to forward some kind of error anyways in the instances where I expected something. Writing this is kind of annoying even if I get a warning from the compiler about an unused result. // annoying
if !f.readLiteral("[") then
throw new FormatError(...);
do {
push_back(f.read(int));
} while f.readLiteral(", "); // nice
// annoying
if !.freadLiteral("]") then
throw new FormatError(...); If we don't return a boolean, then this pattern might look like: f.readLiteral("[");
var done = false;
while !done {
push_back(f.read(int));
try { f.readLiteral(", "); }
catch { done = false; }
}
f.readLiteral("]"); I guess the utility of using the boolean in a while-loop isn't valuable enough to me in comparison with the extra checking in non-speculative cases, and the possibility of ignoring errors. |
My expectation is that in that case the error would be caught - error handling doesn't mean the behavior is bad in all scenarios, just that it can be and we're providing information about what happened. By using error handling, we give the user the ability to respond to the behavior - I think people would object if this became a halt for this very reason |
To clarify, I wasn't suggesting this routine should halt (nor even that we remove the // reading the end of a list bounded by square brackets...
if !r.readLiteral("]") then throw <something>; // probably a FormatError? by imagining the ease of reading a CSV of ints by writing: do {
const v = r.read(int);
} while (r.readLiteral(","); where I wouldn't be interested in throwing my own error if the read failed, because it feels unnecessary and like a more roundabout / non-productive way of writing the pattern. Ben's example of reading a CSV within square brackets is compelling to me for the square brackets, though being forced to write the while-loop as in his "always throws" version makes me sad (again, it feels very roundabout and anti-productive). I asked about the "both read and match" proposal above in today's meeting, and re-read the description above, but still am not confident I've got the right understanding of it. Is the concept that the read*() versions would throw on mismatches and return nothing while the match*() versions would return a bool (and only throw on other file systems-y errors? Such that one could write my CSV of ints case as: do {
const v = r.read(int);
} while (r.matchLiteral(","); and Ben's case with squre brackets as: f.readLiteral("[");
do {
push_back(f.read(int));
} while f.matchLiteral(", ");
f.readLiteral("]"); or am I off on the wrong path? If I've got it right, that seems like a pretty nice "cake and eat it too" solution (though I don't have an obvious mnemonic for remembering which is which... though I'm also not sure I would let that stand in the way). [Ben, can you remind me whether your AoC codes are available somewhere? I ask because I used the bool-returning forms heavily in those codes, and every time I had to use a throw...catch form, I bristled at the additional typing given that the nature of the codes were quick-and-dirty / scripty]. |
You've got it right. My suggestion was something like this: // always throws errors upwards
proc channel.readLiteral(lit:string, ignoreWhitespace=true) : void throws
// returns false if the literal was not present
// I think this means FormatErrors and EOF errors, not just any old error
proc channel.matchLiteral(lit:string, ignoreWhitespace=true) : bool throws And then we could use those in the manner that you demonstrate in your second example. An alternative approach would be to try and think of ways to generally handle speculative reads, and turning errors into bools. We could approach this from the language level with some hypothetical new try-based expression. Another idea I've had is to create a kind of "matching-view" over a reader channel, which supports the same methods and signatures with the difference of returning a boolean "false" if there was an error caught.
They aren't, and I'd be interested in going back over them having spent more time thinking about IO stuff recently. I agree that the try-catch stuff isn't good for quick/scripty/fun programs, but I suppose that I haven't been placing much importance on that use-case, personally. |
I'm a fan, then. The only reservations I have are (a) where do we draw the line? when do we not create two versions of a routine? what makes this case warrant it more than others? and (b) what are the implications for other status-returning I/O routines.
I think that continues to remain an important case for us to keep in mind: That we should continue trying to support both quick, scripty-style codes (for cases that don't need more; to attract new users) and also more bullet-proof approaches (for when the sketch needs to start becoming production code).
I thought about this too, remembering your previous sketches around such features, and continue to be intrigued by them, particularly since they'd help with the "when do we create two versions?" question above. But I'm also not sure I'd want to wait for that feature or rush it in for this case's sake. And even with the feature, I think there'd be extra typing/cruft that would feel not as nice as the above. But popping back to the top, I agree that making sure we have a unified/coherent story for other speculative reads (or other status-returning reads? are those necessarily the same thing?) feels important before taking the step of adding both overloads.
That's intriguing as well. Do you have a sketch of what it would look like? |
In the while-bool case I was thinking of some kind of method on a channel that created a matching-view, so from the user's perspective: // 'r' is a 'reader'
// the name 'matcher' is just a dummy example
while r.matcher().readLiteral(",") do ...
// or ...
manage r.matcher() as m { ... } The development cost of maintaining this thing could be significant, as forwarding doesn't allow for us to generically intercept errors or returned values from forwarded methods. I'm not sure how we could (easily) make sure that our wrapper methods stayed in sync with the actual channel methods. |
Well, one issue is whether the channel position changes or not. Other than saying that a user can use mark/revert if they don't want it to change - I will ignore that for now. Other than that, it seems that if we have a proc checkForComma(ch) : bool {
try {
ch.readLiteral(",");
return true;
} catch FormatError {
return false;
}
} This does not bother me. I don't think all the patterns have to be short. (I do think it is reasonable to think about language features that can do this kind of thing; I don't remember if we have issues about them.) Anyway, even if we have some language feature to write the more general case, I think the situation coming up when reading literals is going to be frequent enough to justify both readLiteral and matchLiteral so having both sounds great to me. |
OK, I think we're converging on supporting a "read" form and a "match" form. In that case, I'm going to propose the following unstable methods for the 1.28 release: // All of these (read/write/match) will be marked unstable for 1.28
proc channel.writeLiteral(lit:string) : void throws
proc channel.writeLiteral(lit:bytes) : void throws
proc channel.writeNewline() : void throws
// Returns nothing, throws all errors
proc channel.readLiteral(lit:string, ignoreWhitespace=true) : void throws
proc channel.readLiteral(lit:bytes, ignoreWhitespace=true) : void throws
proc channel.readNewline() : void throws
// Returns "true" if the literal was found, in which case the channel advances
// Returns "false" on any FormatError or EOFError encountered, in which case the channel does *not* advance
proc channel.matchLiteral(lit:string, ignoreWhitespace=true) : bool throws
proc channel.matchLiteral(lit:bytes, ignoreWhitespace=true) : bool throws
proc channel.matchNewline() : bool throws Some open questions:
I should also point out that |
I would think it would only deal with the first and that one would have to put it into a while loop to get more than that. What would be the rationale for doing otherwise? Oh, is it the comment above about
I definitely don't have a mnemonic for keeping track of which is which, but am also not sure that it's a big enough deal to worry about (in the "I can read the manual to remind myself sense"). One other option which I'm not expecting to be greeted with enthusiasm would be to just have |
I don't know why it is that way today. The only thing I can guess is that it is related to the "ignore whitespace" rule for ioLiteral and maybe it thinks of these earlier newlines as whitespace. I don't have any particular preference for what it should do, though.
I'm happy with the names "match" vs "read" here. To me, "match" is something I want to try doing in a conditional, but "read" is something I want to do unconditionally, so it matches my thinking. Of course we could have
I wouldn't be upset by this strategy in this case. However I like |
For what it's worth, I don't think the write variants need to be marked unstable, I think it was pretty clear what we'd do with them |
I wanted to follow-up on this briefly. Reading a newline was not in fact skipping over multiple newlines. It was the following call to var x : int;
r.read(x); Sorry for the false alarm! |
Introduce unstable read/match/write methods for literals and newlines This PR introduces new methods on channels to support reading and writing of literal strings, bytes, and newlines. The new methods, as discussed in #19487: ```chpl proc channel.readLiteral(literal:string, ignoreWhitespace=true) : void throws proc channel.readLiteral(literal:bytes, ignoreWhitespace=true) : void throws proc channel.readNewline() : void throws // returns 'false' if literal could not be matched, or on EOF proc channel.matchLiteral(literal:string, ignoreWhitespace=true) : bool throws proc channel.matchLiteral(literal:bytes, ignoreWhitespace=true) : bool throws proc channel.matchNewline() : bool throws proc channel.writeLiteral(literal:string) : void throws proc channel.writeLiteral(literal:bytes) : void throws proc channel.writeNewline() : void throws ``` These methods will be marked as unstable for the time being until we can conclude a couple of questions: - What if a literal contains leading whitespace, and ``ignoreWhitespace`` is ``true`` (for read/match methods)? - Should the read/match newline methods contain an ``ignoreWhitespace=true`` argument? [reviewed-by @mppf]
There were a couple of open questions I raised when I first implemented this functionality. I have a couple of proposed answers to these questions that would hopefully allow for us to move forward with what we have already implemented.
Currently the behavior of We have a couple of options here: A) Regard the current behavior as a bug, and update the implementation to never ignore leading whitespace in the desired string literal. B) Regard the current behavior as a "feature", where I am currently leaning towards option A. I don't think that either of these should stand in the way of stabilizing these methods.
The current behavior for |
@benharsh : With respect to the readLiteral()/matchLiteral() behavior w.r.t. whitespace: During AoC 2022 (specifically the monkey exercise), I struggled a lot with when a What I'm less certain of is what |
Sorry @bradcray, I was just reviewing this issue and realized I didn't reply to your comment.
I'm not exactly sure whether you're saying the readf behavior here is desirable or not. Either way, the current behavior for use IO;
proc main() {
var f = openMemFile();
{
f.writer().write("Hello\nWorld\n");
}
// readf
{
try {
var r = f.reader();
r.readf("Hello");
r.readf("World"); // will throw a bad format error
writeln("no readf errors");
} catch e {
writeln(e.message());
}
}
writeln("-----");
// matchLiteral
{
try {
var r = f.reader();
writeln(r.matchLiteral("Hello"));
writeln(r.matchLiteral("World")); // true
} catch e {
writeln(e.message());
}
}
writeln("-----");
// matchLiteral: ignoring whitespace
{
try {
var r = f.reader();
writeln(r.matchLiteral("Hello", ignoreWhitespace=false));
writeln(r.matchLiteral("World", ignoreWhitespace=false)); // false
} catch e {
writeln(e.message());
}
}
}
To me, the answer here depends on the
If But perhaps the real question is what the default should be for |
Summary for the ad-hoc sub-team discussion next week: 1. What is the API being presented?We're intending to replace ioLiteral and ioNewline with: proc fileReader.readLiteral(literal:string, ignoreWhitespace=true) : void throws
proc fileReader.readLiteral(literal:bytes, ignoreWhitespace=true) : void throws
proc fileReader.readNewline() : void throws
// returns 'false' if literal could not be matched, or on EOF
proc fileReader.matchLiteral(literal:string, ignoreWhitespace=true) : bool throws
proc fileReader.matchLiteral(literal:bytes, ignoreWhitespace=true) : bool throws
proc fileReader.matchNewline() : bool throws
proc fileWriter.writeLiteral(literal:string) : void throws
proc fileWriter.writeLiteral(literal:bytes) : void throws
proc fileWriter.writeNewline() : void throws These methods ignore the serializer/deserializer. Both string and bytes are accepted. They are intended to use when the contents are expected to be consistent, rather than varying based on the value of a field or variable. For instance, if a file should always start with a specific header, that header could be read or written using these methods. This also useful when using serializers/deserializers where strings are otherwise expected to have surrounding quotation marks. We still have a couple of questions we want to resolve: The current behavior today is to ignore leading whitespace when matching, meaning that a given string like B. Should The current behavior today is consistent with having that option and setting it to How is it being used in Arkouda and CHAMPS?Arkouda has 12 uses in src/compat/*/ArkoudaTimeCompat.chpl that it copied from the Time module to maintain compatibility between versions. There are no uses outside of the compatibility module. CHAMPS has 6 uses of ioNewline in its vec3 file's writeThis routines. 2. What's the history of the feature, if it already exists?ioLiteral and ioNewline are records that were added with the rest of the QIO support 12 years ago. They were intended especially to enable code to be written that handled both reading and writing, but we've seen little evidence that supporting both in the same method is really useful, hence splitting them into reading and writing methods. The record aspect of them in particular has seemed unnecessary and also can be problematic when resolving other problems (see related issues below) 3. What's the precedent in other languages, if they support it?a. In Python, you can pass 4. Are there known Github issues with the feature?
5. Are there features it is related to? What impact would changing or adding this feature have on those other features?
6. How do you propose to solve the problem?For question A (if a literal contains leading whitespace and A2. Regard the current behavior as a "feature", where Ben favors (and I agree with) option A1. For question B (Should B2. Add the option now B3. Add the option now and set the default to false, where the methods would return true only if the newline was the first whitespace character encountered from where the fileReader currently is in the file. Ben favors (and I agree with) option B1 |
Discussion summary:
Discussion notes:
|
Per the discussion summarized here: #19487 (comment), update `readLiteral` and `matchLiteral` to respect leading whitespace in the literal string when `ignoreWhitespace=true`. - [x] local paratest - [x] gasnet paratest [ reviewed by @benharsh ] - Thanks!
Deprecate: - `ioLiteral` - `ioNewline` - `readWriteLiteral` - `readWriteNewline` In favor of: - `readLitearal` - `matchLiteral` - `readNewline` - `matchNewline` - `writeLiteral` - `writeNewline` This involves removing unstable warnings from the above replacement symbols, as their respective design issues have now been resolved. Resolves: #19487, #20402 - [x] local paratest - [x] gasnet paratest - [x] inspect docs [ reviewed by @benharsh ] - Thanks!
Today we have records ‘ioLiteral’ and ‘ioNewline’ to differentiate from regular string reads/writes:
This situation continues with our current Encoder/Decoder plans [18504]
However, ‘ioLiteral’ and ‘ioNewline’ as separate types seems unnecessarily confusing
Proposal:
‘matchLiteral’ will accept a string and an optional ignoreWhiteSpace=true argument
The text was updated successfully, but these errors were encountered: