New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix issue 14861 - Error in stdio.d in LockingTextReader.readFront() #3622
Conversation
immutable c = chars[$ - 1 - i]; | ||
enforce(ungetc(c, cast(FILE*) _f._p.handle) == c); | ||
} | ||
_f.seek(start); // rewind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is horrible. Do we ungetc the file only to maintain range abstraction?? I think it's apparent that C's I/O doesn't map well to codepoint ranges so one more point for native i/o package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @schveiguy - any news btw ? ;)
Calling ungetc more than once is not guaranteed to work. Replacing the ungetc calls with ftell/fseek.
Is this going to affect performance much? Maybe it can cringe do an |
I guess the main problem is DMC++ runtime which is lacking proper support for most basic features, other libc-s do just fine. |
I think algorithm with ungetc bad idea. The read-ahead should be carried not in the ring buffer of the driver input-output, and in own this to structure. |
1 similar comment
I think algorithm with ungetc bad idea. The read-ahead should be carried not in the ring buffer of the driver input-output, and in own this to structure. |
Well the problem is that if somebody stops reading this range and craetes a new one with the current stream then whatever extra buffering on top of C's stdio will stay with the first range.. Ehm now the 2nd one will miss a lot of data that's supposed to be there. The whole "just wrap C's IO in ranges" idea is bunkrupt from the start and I'm really surprized how long it managed to survive. |
Another idea would be to explicitly disable C's buffering and replicate that in std.stdio.File however that won't work if any I/O was done on a stream by a 3rd party e.g. a C library. |
The performance hit is big. I tested reading 10 MB of some patterns (test code). Results: Multiple "a" 575 ms, 344 μs, and 1 hnsec No "a" 1 sec, 975 ms, 638 μs, and 9 hnsecs Not good. Call "a" 755 ms, 866 μs, and 4 hnsecs Better, but not good. Don't call "a" 556 ms, 168 μs, and 3 hnsecs I'm not sure how portable SEEK_CUR is. Tested on linux with dmd and no compilation flags whatsoever. Seems that How public is |
As in lockingTextReader public. The latter being quite convenient input range for dchars. |
You mean there's a lockingTextReader (lower L) function somewhere? I can't find it. |
Mistook for lockingTextWriter. Anyway the thing is public so it very well may be used... even if undocumented. |
@DmitryOlshansky I agree wrapping around C's Maybe what we really need to do is to implement our own |
Stay tuned. |
LOL... I've been tuned for the past, oh, how many years now? and have yet to see anything materialize. |
And I've participated in 3 iterations of the std.io effort. Patience, patience it's coming. |
On a more serious note, I agree that any algorithm that uses |
As for |
This is a fair and deserved dig. I have been very poor at completing this.
I am ashamed that I have not pushed any code to github for my latest effort, because I've designed it as I was writing (and a lot of things have changed), and I didn't want an ugly history with all kinds of deletes and rewrites, plus some files there now probably will be removed. However, it IS forthcoming, I'm going to commit more time for this. The previous efforts are here: https://github.com/schveiguy/phobos/tree/new-io The last one is very similar to what I have now, but I've redesigned a crucial portion of it. |
@schveiguy I apologize for making light of the situation; I was not aware of all the work that has gone into the prospective new Maybe you could put the code up on github in a separate repo or something like that, which will be merged into Phobos when it's ready. That way people could see what's happening, and more importantly, contribute. For something as crucial as |
Not necessary, I agree that I should have finished it by now :)
I will push the code to github as soon as it's working to any degree (and that should be soon). At this point, I'm unsure where it will fit in, the design is a drastic change from standard io mechanisms. |
Seems to me that the problem is that the 'fix' for LockingTextReader: attempts to do decoding. NONONONONONONONONONONONOOOO There is no reason whatsoever to decode when reading characters and putting them into a |
Stop it. LockingTextReader was SUPPOSED to do so, except that it was broken. |
Low level code should not be decoding. It's just nothing but trouble, as repeated regressions show. |
That ship has long sailed when std.stdio was designed. Now we either fix it or not, but breaking API is out of question. |
It's already broken. Before the last 'fix', it simply cast char to dchar. And frankly, I don't believe in throwing the whole thing out for std.io just because of this. And lastly, as the benchmarks show, this decoding is absolutely miserable. |
Frankly I never like std.stdio nor wrapping C's IO was EVER fastest and/or easiest of things. The cost of maintenance goes up with every run-time we try to support. There is a case for
Want no decoding - use raw reads. Anyhow I don't decode and it takes some WORK to avoid it in every case. The mistake with decoding was made, now we need to clean up the consequences properly not by waving hands and silently disabling decoding in existing code. |
LockingTextReader isn't even documented. Fixing it won't break things. The only user of it is readf(), which can use .byDchar if it must, although the decoding should only be done as necessary by formattedRead. The fix is being applied in the wrong place. |
Then let's start with making it private. But you know, bug reports came for it even though it's undocumented. Keeping private stuff undocumented doesn't prevent people from using it. Keeping it private does.
That might be true. |
3 bug reports. 2 were reporting a bug in readf. The third https://issues.dlang.org/show_bug.cgi?id=12320 doesn't say where the use comes from - could have been isolated down from a call to readf as well. We don't need to maintain undocumented interfaces. They are implicitly use at your own risk. |
Cool - so let's close this pull and start with making lockingTextReader private. |
The original test case does not mention LockingTextReader anywhere, only a reduced test case in a comment. |
New PR that makes LockingTextReader private and turns it into a char range: #3696 |
Things seem to be worse than I expected. formattedRead is doing one character lookahead, but readf closes down its copy of LockingTextReader after formattedRead returns, which causes the buffered looked ahead character to be lost. When readf is called again, the looked ahead character is lost. This isn't the protocol for InputRange. |
What about |
ungetc fails if a dchar is pushed back. |
Right, I thought the new thing was going to use |
Right, but when you use byDchar to read more than one in order to decode, then you've looked ahead more than one. |
I have to do a bit more investigating to see just where the fault lies. |
It's looking like formattedRead, when it is reading dchars, does a popFront after the lookahead. If formattedRead is just reading chars, this does not happen. Correction: it's byDchar doing the extra popFront. |
More research. Fixing byDchar won't fix the lookahead problem. The fix has to be in formattedRead, and the fix for that is for char ranges, it only reads as far as necessary (which should only be one lookahead) and decode only when absolutely necessary. This should run faster, too. |
Calling ungetc more than once is not guaranteed to work. Replacing the
ungetc calls with ftell/fseek.
https://issues.dlang.org/show_bug.cgi?id=14861