-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FilePlugin>>primitiveFileAtEnd for non-regular files #232
FilePlugin>>primitiveFileAtEnd for non-regular files #232
Conversation
FilePlugin currently splits files in to two groups: 1) Stdio streams and 2) everything else. To test for the end of file, FilePlugin>>primitiveFileAtEnd: 1) Uses feof() for stdio streams. 2) Compares the current position to the file size for everything else. This returns the expected results for regular files, but fails for non-regular files, e.g.: (FileSystem / 'dev' / 'urandom') binaryReadStream next: 8. returns an empty array. On Unix, the proper way to check is to read past the end of the file and then call feof() to confirm that it is in fact at the end. Pharo has plenty of code that assumes that a file position >= the file size means that the file is at the end, and as stated above this generally works for regular files. This patch modifies FilePlugin>>primitiveFileAtEnd to: a) Keep the current behaviour of using the file position test for regular files. b) Keep the current behaviour of using feof() for stdio streams. c) Use feof() for non-regular files, e.g. /dev/urandom. This allows existing code to continue to function, and allows non-regular files to be tested correctly. After applying the patch, the example code above answers the expected result, e.g.: (FileSystem / 'dev' / 'urandom') binaryReadStream next: 8. " #[179 136 227 226 28 147 197 125]" On Windows, as far as I can tell, all files are regular, and the position test is used regularly, so no change is requried. Fogbugz: https://pharo.fogbugz.com/f/cases/21643/
The discussion on vm-dev shows that this tries to solve a problem which rather is at image side in latest Pharo. And excuse the naive question, but why feof() would not work for all cases? |
Hi Nicolas, There are actually 3 or 4 issues (depending on what you want to include at the moment):
It might, but currently there are tests in the Pharo automated test suite which explicitly check that #atEnd returns true when the stream has returned the last character (but not past it). I don't know how important these tests are. Cheers, |
On Sat, Mar 31, 2018 at 09:01:49AM -0700, Nicolas Cellier wrote:
The discussion on vm-dev shows that this tries to solve a problem which rather is at image side in latest Pharo.
See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027341.html
and that we should aim at simplifying the implementation rather than complexifying
See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027343.html
About the simplification, wasn't there another effort to remove the cached file size?
And excuse the naive question, but why feof() would not work for all cases?
It is not naive, and it is definitely not obvious.
The answer I would give is that the feof() test has a different meaning from
"at end of file". The feof() test determines if an end of file flag was set
in some previous operation. If the flag is set, then you know that you are
at the end of file. But if it is not set, you could still be at the end of
file position, in which case the end of file flag will be set the next time
you attempt to do a read.
Thus if we want atEnd to mean "at the end now, and the next read will produce
an error", then the feof() cannot answer that.
Dave
|
Hi Both,
thanks for the feof() explanation, I should have known this.
So, it shows that we are not on the right track.
The right way to do it on Unix is to try to read and check if atEnd in
post-condition rather than trying to test atEnd in pre-condition.
This is exactly as suggested by Levente.
Or even better, get some end-of-file error condition directly in response
to read primitive, and handle that gracefully at image side.
IMO primitiveFileAtEnd should rather be deprecated.
2018-03-31 18:35 GMT+02:00 OpenSmalltalk-Bot <notifications@github.com>:
… On Sat, Mar 31, 2018 at 09:01:49AM -0700, Nicolas Cellier wrote:
>
> The discussion on vm-dev shows that this tries to solve a problem which
rather is at image side in latest Pharo.
> See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-
March/027341.html
> and that we should aim at simplifying the implementation rather than
complexifying
> See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-
March/027343.html
>
> About the simplification, wasn't there another effort to remove the
cached file size?
> And excuse the naive question, but why feof() would not work for all
cases?
>
It is not naive, and it is definitely not obvious.
The answer I would give is that the feof() test has a different meaning
from
"at end of file". The feof() test determines if an end of file flag was set
in some previous operation. If the flag is set, then you know that you are
at the end of file. But if it is not set, you could still be at the end of
file position, in which case the end of file flag will be set the next time
you attempt to do a read.
Thus if we want atEnd to mean "at the end now, and the next read will
produce
an error", then the feof() cannot answer that.
Dave
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#232 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAscIrim1_6lcCunTBfwxNA7jtNToY5Kks5tj7BugaJpZM4S6rMa>
.
|
For example in Squeak, we can rewrite atEnd like this:
then entirely remove |
Hi Nicolas,
But that is exactly what this PR is about - it provides the ability for a post-condition check to be performed.
The correct usage, as you say above, is to read until there is no more data, and then check to see if the end of file has been reached. Without this (modified) primitive there's no way to perform that check. We could change the image to handle no-more-data and end-of-file differently, but it would still require the check to be made, and I wouldn't combine the read and eof checks in to one primitive.
Based on my understanding, I disagree. Please let me know if I'm missing something. Cheers, |
I think this will return incorrect results. No-more-data is not the same is end-of-file. No-more-data just says that there isn't any data available now, but that may change. Don't forget we are talking about non-regular files, e.g. /dev/urandom, that isn't a real stream, it is calculated and may be waiting on external resources. |
I'll never learn to not reply when I'm in a hurry...
It is, of course, a real stream. Just not a regular file. |
Hi Alistair,
It's kind of brainfuck. I say we should simplify and keep only one, post-condition with a simple feof and leave smartness at image side.
And leave current primitive only for backward compatibility with deprecation warning in the comment? |
On Sun, Apr 01, 2018 at 12:26:56PM -0700, Nicolas Cellier wrote:
Hi Alistair,
OK, I see your point, my proposal would work for /dev/random but maybe not all possible kind of stream
But then primitiveFileAtEnd is trying to be too smart:
- being able to test atEnd in pre-condition for regular files
- being able to test atEnd in post-condition for the rest of the world (pipe socket special devices etc...).
It's kind of brainfuck. I say we should simplify and keep only one, post-condition with a simple feof and leave smartness at image side.
What if we add
primitiveEncounteredEndOfFile
"Answer true if the end of file was encountered at last read operation".
And leave current primitive only for backward compatibility with deprecation warning in the comment?
As a point of reference, in OSProcess it is implemented as StandardFileStream>>atEndOfFile
as an extension in *OSProcess-Base, as distinct from StandardFileStream>>atEnd. The EOF
test is done in the OSProcessPlugin (not FilePlugin), and is implemented as #primitiveTestEndOfFileFlag.
The most important use case is for a file stream on an OS pipe (e.g. OSPipe new reader).
Similar to /dev/random, the stream on an OS pipe is represented as a kind of
PositionableStream, but it is a positionable stream that does not know its
position, and for which the position is not settable.
I do not know of a perfect solution to this, but if you want to experiment on
the image side with the different semantics of "at end now" versus "last operation
hit the EOF condition so do not bother asking for more" then the primitiveTestEndOfFileFlag
that is present in any VM with the UnixOSProcessPlugin will probably give you
what is needed to try out some new ideas.
Dave
|
Hi David, I will have a look. |
Hi Nicolas and Dave,
You're correct (as I understand it). I think it is more a matter of Right now the behaviour of #atEnd is:
So we already have both behaviours. My natural inclination is to minimise the primitive code and let the The primitive should also be checking ferror(), so should be
Cheers, |
Alistair, you are perfectly right. |
Hi Nicolas, Thanks. I agree that this isn't simplifying the existing code, although I also think it isn't really adding complexity. It does result in an arguably correct response where before it was clearly incorrect. Just for the record, the same incorrect behaviour currently exists in Squeak (#atEnd returns the wrong result for a non-regular file). Cheers, |
Hi Alistair,
On Apr 3, 2018, at 4:19 AM, akgrant43 ***@***.***> wrote:
Hi Nicolas,
Thanks. I agree that this isn't simplifying the existing code, although I also think it isn't really adding complexity. It does result in an arguably correct response where before it was clearly incorrect.
Just for the record, the same incorrect behaviour currently exists in Squeak (#atEnd returns the wrong result for a non-regular file).
Please add the gist of David's description of feof() to the code when you fix this. That's extremely important information. TIA
… Cheers,
Alistair
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Modify sqFileAtEnd() (FilePlugin>>primitiveFileAtEnd) so that #atEnd returns true when the last character has been read. Both Squeak and Pharo expect that #atEnd can be used to control iteration over a stream, answering true as soon as the last character has been read, and importantly not requiring #next values to be tested for an end-of-stream. Previously, stdio streams just used feof(), which requires reading past the end of the stream, and so required the value returned by #next to be tested for nil. This causes problems with streams over other collections where nil is a valid value. sqFileAtEnd() would also fail to return the correct value for other files such as kernel virtual files, e.g. /proc/cpuinfo, and devices like /dev/urandom, because they have a reported file size of 0. This change also takes a step towards removing the special handling of stdio files on Unix, significantly simplifying the code base. This change doesn't break any existing code that uses #next == nil to test for end-of-stream.
Done. |
wow, if it works, it looks good! |
@@ -30,6 +31,7 @@ typedef int mode_t; | |||
typedef struct { | |||
int sessionID; /* ikp: must be first */ | |||
void *file; | |||
mode_t st_mode; /* from stat() */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, do we still need st_mode and setMode now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I just saw that. I don't know how I managed to lose part of the commit. There's also code left behind in sqFileBasicPrims.c.
I'm fixing it now. Sorry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, there should be a special prime to whoever remove code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Hopefully I've got them all this time.
Remove references to stat.h and st_mode that were accidentally left behind.
I've run the full automated test suite on Ubuntu 64 with Squeak5.1-16549-64bit.image. There were 4 failures and 3 errors, but none looked remotely related. Pharo 7 I filtered the packages in Test Runner with 'Zinc|File' and there were no failures. |
@akgrant43 Since this PR was merged OSProcess is broken if I use it with the latest Pharo 61 and the latest VM. Here is an example of error:
|
Hi Cyril, Can you provide some sample code that triggers the problem? I've never used BufferedAsyncFileReadStream directly, and it's quite a while since I've used OSProcess. Also: 32 or 64 bit? Thanks, |
Yes sorry I was a little in a hurry :) OS:
Pharo version: Pharo 61 60540 VM: Latest VM 64bits (we do not test the latest 32bits vm so it's maybe the same) We use OSProcess this way: compileFile: aFile
| thisProcess currentPath result command |
thisProcess := OSProcess thisOSProcess.
currentPath := thisProcess environment at: #PATH.
thisProcess environment at: #PATH put: self binDirectory , ':' , currentPath.
command := PipeableOSProcess command: ( '"' , self binPath , '" "' , aFile fullName , '"').
[ result := command outputAndError ] ensure: [ command close ].
thisProcess environment at: #PATH put: currentPath.
^ result first |
Hi Cyril, I wasn't able to reproduce it, although I'm obviously missing the definition of "self binPath". Are you able to adapt the script below to trigger the failure?
System Report:
|
I could not include everything because it's a proprietary code (I already had to remove some part in this example ) It'll be a little hard to get a reproducible case since I only have a windows :( I saw the failure on a Jenkins build. I'll try to find a way to reproduce it. |
I added to our Jenkins a command to print the version and I see that we used the same VM.
|
More interpreter tests
FilePlugin currently splits files in to two groups: 1) Stdio streams and
2) everything else.
To test for the end of file, FilePlugin>>primitiveFileAtEnd:
This returns the expected results for regular files, but fails for
non-regular files, e.g.:
(FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
returns an empty array.
On Unix, the proper way to check is to read past the end of the file and
then call feof() to confirm that it is in fact at the end.
Pharo has plenty of code that assumes that a file position >= the file
size means that the file is at the end, and as stated above this
generally works for regular files.
This patch modifies FilePlugin>>primitiveFileAtEnd to:
a) Keep the current behaviour of using the file position test for
regular files.
b) Keep the current behaviour of using feof() for stdio streams.
c) Use feof() for non-regular files, e.g. /dev/urandom.
This allows existing code to continue to function, and allows
non-regular files to be tested correctly.
After applying the patch, the example code above answers the expected
result, e.g.:
(FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
" #[179 136 227 226 28 147 197 125]"
On Windows, as far as I can tell, all files are regular, and the
position test is used regularly, so no change is requried.
Fogbugz: https://pharo.fogbugz.com/f/cases/21643/