fix issue 14861 - Error in stdio.d in LockingTextReader.readFront() #3622

aG0aep6G · 2015-09-01T18:56:31Z

Calling ungetc more than once is not guaranteed to work. Replacing the
ungetc calls with ftell/fseek.

https://issues.dlang.org/show_bug.cgi?id=14861

DmitryOlshansky · 2015-09-01T19:00:50Z

std/stdio.d

-            immutable c = chars[$ - 1 - i];
-            enforce(ungetc(c, cast(FILE*) _f._p.handle) == c);
-        }
+        _f.seek(start); // rewind


This is horrible. Do we ungetc the file only to maintain range abstraction?? I think it's apparent that C's I/O doesn't map well to codepoint ranges so one more point for native i/o package.

CC @schveiguy - any news btw ? ;)

Calling ungetc more than once is not guaranteed to work. Replacing the ungetc calls with ftell/fseek.

CyberShadow · 2015-09-02T02:01:28Z

Is this going to affect performance much? Maybe it can cringe do an ungetc if it read only one character?

DmitryOlshansky · 2015-09-02T05:27:24Z

Maybe it can cringe do an ungetc if it read only one character?

I guess the main problem is DMC++ runtime which is lacking proper support for most basic features, other libc-s do just fine.

MGWL · 2015-09-02T07:14:24Z

I think algorithm with ungetc bad idea. The read-ahead should be carried not in the ring buffer of the driver input-output, and in own this to structure.

MGWL · 2015-09-02T07:14:53Z

I think algorithm with ungetc bad idea. The read-ahead should be carried not in the ring buffer of the driver input-output, and in own this to structure.

DmitryOlshansky · 2015-09-02T07:25:12Z

I think algorithm with ungetc bad idea. The read-ahead should be carried not in the ring buffer of the driver input-output, and in own this to structure.

Well the problem is that if somebody stops reading this range and craetes a new one with the current stream then whatever extra buffering on top of C's stdio will stay with the first range.. Ehm now the 2nd one will miss a lot of data that's supposed to be there.

The whole "just wrap C's IO in ranges" idea is bunkrupt from the start and I'm really surprized how long it managed to survive.

DmitryOlshansky · 2015-09-02T07:27:18Z

Another idea would be to explicitly disable C's buffering and replicate that in std.stdio.File however that won't work if any I/O was done on a stream by a 3rd party e.g. a C library.

aG0aep6G · 2015-09-02T15:00:57Z

Is this going to affect performance much? Maybe it can cringe do an ungetc if it read only one character?

The performance hit is big. I tested reading 10 MB of some patterns (test code). Results:

Multiple ungetc calls (master):

"a" 575 ms, 344 μs, and 1 hnsec
"ä" 456 ms, 378 μs, and 7 hnsecs
"aaaaaaaaä" 559 ms, 995 μs, and 9 hnsecs

No ungetc, always tell and seek (what's currently in this PR):

"a" 1 sec, 975 ms, 638 μs, and 9 hnsecs
"ä" 1 sec, 122 ms, 810 μs, and 9 hnsecs
"aaaaaaaaä" 1 sec, 804 ms, 806 μs, and 4 hnsecs

Not good.

Call ungetc instead of seek when chars.length == 1 (always call tell):

"a" 755 ms, 866 μs, and 4 hnsecs
"ä" 1 sec, 120 ms, 895 μs, and 7 hnsecs
"aaaaaaaaä" 831 ms, 562 μs, and 4 hnsecs

Better, but not good.

Don't call tell and use relative seeking with SEEK_CUR instead:

"a" 556 ms, 168 μs, and 3 hnsecs
"ä" 1 sec, 57 ms, 159 μs, and 1 hnsec
"aaaaaaaaä" 673 ms, 818 μs, and 5 hnsecs

I'm not sure how portable SEEK_CUR is.

Tested on linux with dmd and no compilation flags whatsoever.

Seems that seek is just costly.

How public is LockingTextReader exactly? I mean, it's not private, but it's not documented either. Could we change its interface or drop it completely? Looks like it's only used in readf.

DmitryOlshansky · 2015-09-02T15:04:20Z

How public is LockingTextReader exactly? I mean, it's not private, but it's not documented either. Could we change its interface or drop it completely? Looks like it's only used in readf.

As in lockingTextReader public. The latter being quite convenient input range for dchars.

aG0aep6G · 2015-09-02T15:10:17Z

As in lockingTextReader public. The latter being quite convenient input range for dchars.

You mean there's a lockingTextReader (lower L) function somewhere? I can't find it.

DmitryOlshansky · 2015-09-02T15:44:09Z

You mean there's a lockingTextReader (lower L) function somewhere? I can't find it.

Mistook for lockingTextWriter. Anyway the thing is public so it very well may be used... even if undocumented.

quickfur · 2015-09-02T17:07:17Z

@DmitryOlshansky I agree wrapping around C's stdio is kludgy. It's the reason for a lot of back-and-forth in std.stdio, e.g., the drama with File originally being a "thin" wrapper around FILE*, then we have to add hacks to make range API work, then we have to use ungetc, then we have to turn it into a struct-wrapped reference type, then we have to make it a ref-counted type, ad nauseaum.

Maybe what we really need to do is to implement our own stdio, that directly uses OS APIs to read file data. Handle all buffering in D instead of trying to kludge around C's API. (Where's the long-promised but never materialized std.io?!)

DmitryOlshansky · 2015-09-02T17:08:45Z

(Where's the long-promised but never materialized std.io?!)

Stay tuned.

quickfur · 2015-09-02T17:12:33Z

LOL... I've been tuned for the past, oh, how many years now? and have yet to see anything materialize.

DmitryOlshansky · 2015-09-02T17:16:00Z

LOL... I've been tuned for the past, oh, how many years now? and have yet to see anything materialize.

And I've participated in 3 iterations of the std.io effort. Patience, patience it's coming.

quickfur · 2015-09-02T17:21:28Z

On a more serious note, I agree that any algorithm that uses ungetc is fundamentally flawed. Any read-ahead should be kept in the range itself. As for creating a new range that "loses" some characters because of read-ahead, why not make lockingTextReader a reference type to a unique instance per File? The idea of multiple readers per File already doesn't work anyway (e.g., if I create multiple readers then hand them to different threads -- the data read will basically be randomly interspersed bytes from the underlying file, which is almost always not what is wanted).

quickfur · 2015-09-02T17:21:50Z

As for std.io, is the current effort on github somewhere?

schveiguy · 2015-09-02T18:21:00Z

LOL... I've been tuned for the past, oh, how many years now?

This is a fair and deserved dig. I have been very poor at completing this.

As for std.io, is the current effort on github somewhere?

I am ashamed that I have not pushed any code to github for my latest effort, because I've designed it as I was writing (and a lot of things have changed), and I didn't want an ugly history with all kinds of deletes and rewrites, plus some files there now probably will be removed. However, it IS forthcoming, I'm going to commit more time for this.

The previous efforts are here:

https://github.com/schveiguy/phobos/tree/new-io
https://github.com/schveiguy/phobos/tree/new-io2
https://github.com/schveiguy/phobos/tree/new-io3

The last one is very similar to what I have now, but I've redesigned a crucial portion of it.

quickfur · 2015-09-02T18:50:44Z

@schveiguy I apologize for making light of the situation; I was not aware of all the work that has gone into the prospective new std.io.

Maybe you could put the code up on github in a separate repo or something like that, which will be merged into Phobos when it's ready. That way people could see what's happening, and more importantly, contribute. For something as crucial as std.io, I think a group effort would do a lot of good.

schveiguy · 2015-09-02T19:18:36Z

I apologize for making light of the situation

Not necessary, I agree that I should have finished it by now :)

Maybe you could put the code up on github in a separate repo or something like that

I will push the code to github as soon as it's working to any degree (and that should be soon). At this point, I'm unsure where it will fit in, the design is a drastic change from standard io mechanisms.

WalterBright · 2015-10-05T05:58:32Z

Seems to me that the problem is that the 'fix' for LockingTextReader:

#2663

attempts to do decoding. NONONONONONONONONONONONOOOO

There is no reason whatsoever to decode when reading characters and putting them into a string. It should simply have made it a range over char.

DmitryOlshansky · 2015-10-05T06:09:57Z

attempts to do decoding. NONONONONONONONONONONONOOOO

Stop it. LockingTextReader was SUPPOSED to do so, except that it was broken.
And we can't silently change dchar to char[] API anyhow.

WalterBright · 2015-10-05T06:11:19Z

Low level code should not be decoding. It's just nothing but trouble, as repeated regressions show.

DmitryOlshansky · 2015-10-05T06:13:30Z

It's just nothing but trouble, as repeated regressions show.

That ship has long sailed when std.stdio was designed. Now we either fix it or not, but breaking API is out of question.

WalterBright · 2015-10-05T06:17:08Z

It's already broken. Before the last 'fix', it simply cast char to dchar. And frankly, I don't believe in throwing the whole thing out for std.io just because of this. And lastly, as the benchmarks show, this decoding is absolutely miserable.

DmitryOlshansky · 2015-10-05T06:24:29Z

And frankly, I don't believe in throwing the whole thing out for std.io just because of this

Frankly I never like std.stdio nor wrapping C's IO was EVER fastest and/or easiest of things. The cost of maintenance goes up with every run-time we try to support. There is a case for std.io and I'm helping the effort as time permits.

And lastly, as the benchmarks show, this decoding is absolutely miserable.

Want no decoding - use raw reads. Anyhow I don't decode and it takes some WORK to avoid it in every case. The mistake with decoding was made, now we need to clean up the consequences properly not by waving hands and silently disabling decoding in existing code.

WalterBright · 2015-10-05T06:53:40Z

LockingTextReader isn't even documented. Fixing it won't break things. The only user of it is readf(), which can use .byDchar if it must, although the decoding should only be done as necessary by formattedRead.

The fix is being applied in the wrong place.

DmitryOlshansky · 2015-10-05T06:55:24Z

LockingTextReader isn't even documented

Then let's start with making it private. But you know, bug reports came for it even though it's undocumented. Keeping private stuff undocumented doesn't prevent people from using it. Keeping it private does.

The fix is being applied in the wrong place.

That might be true.

WalterBright · 2015-10-05T07:14:45Z

bug reports came for it even though it's undocumented

3 bug reports. 2 were reporting a bug in readf. The third https://issues.dlang.org/show_bug.cgi?id=12320 doesn't say where the use comes from - could have been isolated down from a call to readf as well.

We don't need to maintain undocumented interfaces. They are implicitly use at your own risk.

DmitryOlshansky · 2015-10-05T07:24:40Z

We don't need to maintain undocumented interfaces. They are implicitly use at your own risk.

Cool - so let's close this pull and start with making lockingTextReader private.

CyberShadow · 2015-10-05T10:42:40Z

Then let's start with making it private. But you know, bug reports came for it even though it's undocumented.

The original test case does not mention LockingTextReader anywhere, only a reduced test case in a comment.

aG0aep6G · 2015-10-05T19:27:25Z

New PR that makes LockingTextReader private and turns it into a char range: #3696

WalterBright · 2015-10-05T19:55:48Z

Things seem to be worse than I expected. formattedRead is doing one character lookahead, but readf closes down its copy of LockingTextReader after formattedRead returns, which causes the buffered looked ahead character to be lost. When readf is called again, the looked ahead character is lost.

This isn't the protocol for InputRange.

CyberShadow · 2015-10-05T19:57:24Z

What about ungetc?

WalterBright · 2015-10-05T19:58:38Z

ungetc fails if a dchar is pushed back.

CyberShadow · 2015-10-05T19:59:05Z

Right, I thought the new thing was going to use chars.

WalterBright · 2015-10-05T20:01:11Z

Right, but when you use byDchar to read more than one in order to decode, then you've looked ahead more than one.

WalterBright · 2015-10-05T20:07:07Z

I have to do a bit more investigating to see just where the fault lies.

WalterBright · 2015-10-05T20:19:31Z

It's looking like formattedRead, when it is reading dchars, does a popFront after the lookahead. If formattedRead is just reading chars, this does not happen.

Correction: it's byDchar doing the extra popFront.

https://issues.dlang.org/show_bug.cgi?id=15164

WalterBright · 2015-10-06T00:48:34Z

More research. Fixing byDchar won't fix the lookahead problem. The fix has to be in formattedRead, and the fix for that is for char ranges, it only reads as far as necessary (which should only be one lookahead) and decode only when absolutely necessary. This should run faster, too.

DmitryOlshansky reviewed Sep 1, 2015
View reviewed changes

fix issue 14861 - Error in stdio.d in LockingTextReader.readFront()

e1b1336

Calling ungetc more than once is not guaranteed to work. Replacing the ungetc calls with ftell/fseek.

aG0aep6G force-pushed the 14861 branch from 24123bd to e1b1336 Compare September 1, 2015 19:11

DmitryOlshansky closed this Oct 5, 2015

aG0aep6G mentioned this pull request Oct 5, 2015

make LockingTextReader a char range #3696

Closed

WalterBright mentioned this pull request Oct 6, 2015

[REG] fix Issue 14861 - Error in stdio.d in LockingTextReader.readFront() #3698

Merged

fix issue 14861 - Error in stdio.d in LockingTextReader.readFront() #3622

fix issue 14861 - Error in stdio.d in LockingTextReader.readFront() #3622

Conversation

aG0aep6G commented Sep 1, 2015

DmitryOlshansky Sep 1, 2015

Choose a reason for hiding this comment

DmitryOlshansky Sep 1, 2015

Choose a reason for hiding this comment

CyberShadow commented Sep 2, 2015

DmitryOlshansky commented Sep 2, 2015

MGWL commented Sep 2, 2015

MGWL commented Sep 2, 2015

DmitryOlshansky commented Sep 2, 2015

DmitryOlshansky commented Sep 2, 2015

aG0aep6G commented Sep 2, 2015

DmitryOlshansky commented Sep 2, 2015

aG0aep6G commented Sep 2, 2015

DmitryOlshansky commented Sep 2, 2015

quickfur commented Sep 2, 2015

DmitryOlshansky commented Sep 2, 2015

quickfur commented Sep 2, 2015

DmitryOlshansky commented Sep 2, 2015

quickfur commented Sep 2, 2015

quickfur commented Sep 2, 2015

schveiguy commented Sep 2, 2015

quickfur commented Sep 2, 2015

schveiguy commented Sep 2, 2015

WalterBright commented Oct 5, 2015

DmitryOlshansky commented Oct 5, 2015

WalterBright commented Oct 5, 2015

DmitryOlshansky commented Oct 5, 2015

WalterBright commented Oct 5, 2015

DmitryOlshansky commented Oct 5, 2015

WalterBright commented Oct 5, 2015

DmitryOlshansky commented Oct 5, 2015

WalterBright commented Oct 5, 2015

DmitryOlshansky commented Oct 5, 2015

CyberShadow commented Oct 5, 2015

aG0aep6G commented Oct 5, 2015

WalterBright commented Oct 5, 2015

CyberShadow commented Oct 5, 2015

WalterBright commented Oct 5, 2015

CyberShadow commented Oct 5, 2015

WalterBright commented Oct 5, 2015

WalterBright commented Oct 5, 2015

WalterBright commented Oct 5, 2015

WalterBright commented Oct 6, 2015