Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FilePlugin>>primitiveFileAtEnd for non-regular files #232

Merged
merged 3 commits into from
Apr 5, 2018
Merged

FilePlugin>>primitiveFileAtEnd for non-regular files #232

merged 3 commits into from
Apr 5, 2018

Conversation

akgrant43
Copy link
Contributor

FilePlugin currently splits files in to two groups: 1) Stdio streams and
2) everything else.

To test for the end of file, FilePlugin>>primitiveFileAtEnd:

  1. Uses feof() for stdio streams.
  2. Compares the current position to the file size for everything else.

This returns the expected results for regular files, but fails for
non-regular files, e.g.:

(FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.

returns an empty array.

On Unix, the proper way to check is to read past the end of the file and
then call feof() to confirm that it is in fact at the end.

Pharo has plenty of code that assumes that a file position >= the file
size means that the file is at the end, and as stated above this
generally works for regular files.

This patch modifies FilePlugin>>primitiveFileAtEnd to:

a) Keep the current behaviour of using the file position test for
regular files.
b) Keep the current behaviour of using feof() for stdio streams.
c) Use feof() for non-regular files, e.g. /dev/urandom.

This allows existing code to continue to function, and allows
non-regular files to be tested correctly.

After applying the patch, the example code above answers the expected
result, e.g.:

(FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
" #[179 136 227 226 28 147 197 125]"

On Windows, as far as I can tell, all files are regular, and the
position test is used regularly, so no change is requried.

Fogbugz: https://pharo.fogbugz.com/f/cases/21643/

FilePlugin currently splits files in to two groups: 1) Stdio streams and
2) everything else.

To test for the end of file, FilePlugin>>primitiveFileAtEnd:

1) Uses feof() for stdio streams.
2) Compares the current position to the file size for everything else.

This returns the expected results for regular files, but fails for
non-regular files, e.g.:

(FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.

returns an empty array.

On Unix, the proper way to check is to read past the end of the file and
then call feof() to confirm that it is in fact at the end.

Pharo has plenty of code that assumes that a file position >= the file
size means that the file is at the end, and as stated above this
generally works for regular files.

This patch modifies FilePlugin>>primitiveFileAtEnd to:

a) Keep the current behaviour of using the file position test for
regular files.
b) Keep the current behaviour of using feof() for stdio streams.
c) Use feof() for non-regular files, e.g. /dev/urandom.

This allows existing code to continue to function, and allows
non-regular files to be tested correctly.

After applying the patch, the example code above answers the expected
result, e.g.:

(FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
" #[179 136 227 226 28 147 197 125]"

On Windows, as far as I can tell, all files are regular, and the
position test is used regularly, so no change is requried.

Fogbugz: https://pharo.fogbugz.com/f/cases/21643/
@nicolas-cellier-aka-nice
Copy link
Contributor

nicolas-cellier-aka-nice commented Mar 31, 2018

The discussion on vm-dev shows that this tries to solve a problem which rather is at image side in latest Pharo.
See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027341.html
and that we should aim at simplifying the implementation rather than complexifying
See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027343.html

And excuse the naive question, but why feof() would not work for all cases?

@akgrant43
Copy link
Contributor Author

Hi Nicolas,

There are actually 3 or 4 issues (depending on what you want to include at the moment):

  1. ZnBufferedReadStream was doing the wrong thing (the image side issue you mentioned).
    That has been fixed.

  2. #atEnd returns the wrong information for non-regular files on Unix.
    That's what this PR will fix.
    http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027342.html demonstrates the issue.

  3. Stdio streams are treated as a special case on Unix.
    Now that we can open streams by file descriptor we're almost at the point where we can remove special handling for stdio streams on Unix. I haven't gone through all the code to be sure though. That would obviously simplify things.

  4. Stdio streams are treated as a special case on Windows.
    I haven't looked at this at all.

And excuse the naive question, but why feof() would not work for all cases?

It might, but currently there are tests in the Pharo automated test suite which explicitly check that #atEnd returns true when the stream has returned the last character (but not past it). I don't know how important these tests are.

Cheers,
Alistair

@OpenSmalltalk-Bot
Copy link

OpenSmalltalk-Bot commented Mar 31, 2018 via email

@nicolas-cellier-aka-nice
Copy link
Contributor

@nicolas-cellier-aka-nice
Copy link
Contributor

nicolas-cellier-aka-nice commented Apr 1, 2018

For example in Squeak, we can rewrite atEnd like this:

StandardFileStream>>atEnd
    "Answer whether the receiver is at its end. "
    collection ifNotNil: [ position < readLimit ifTrue: [ ^false ] ].
    self basicNext ifNil: [ ^true ].
    self skip: -1.
    ^false

then entirely remove primAtEnd: which has no more sender.
I presume it should be more or less equivalent in Pharo.
Then tell about limitation and deprecation in primitiveFileAtEnd comment.

@akgrant43
Copy link
Contributor Author

Hi Nicolas,

So, it shows that we are not on the right track.
The right way to do it on Unix is to try to read and check if atEnd in
post-condition rather than trying to test atEnd in pre-condition.

But that is exactly what this PR is about - it provides the ability for a post-condition check to be performed.

This is exactly as suggested by Levente.
Or even better, get some end-of-file error condition directly in response
to read primitive, and handle that gracefully at image side.

The correct usage, as you say above, is to read until there is no more data, and then check to see if the end of file has been reached. Without this (modified) primitive there's no way to perform that check.

We could change the image to handle no-more-data and end-of-file differently, but it would still require the check to be made, and I wouldn't combine the read and eof checks in to one primitive.

IMO primitiveFileAtEnd should rather be deprecated.

Based on my understanding, I disagree. Please let me know if I'm missing something.

Cheers,
Alistair

@akgrant43
Copy link
Contributor Author

For example in Squeak, we can rewrite atEnd like this:
...

I think this will return incorrect results. No-more-data is not the same is end-of-file. No-more-data just says that there isn't any data available now, but that may change.

Don't forget we are talking about non-regular files, e.g. /dev/urandom, that isn't a real stream, it is calculated and may be waiting on external resources.

@akgrant43
Copy link
Contributor Author

I'll never learn to not reply when I'm in a hurry...

Don't forget we are talking about non-regular files, e.g. /dev/urandom, that isn't a real stream,
it is calculated and may be waiting on external resources.

It is, of course, a real stream. Just not a regular file.

@nicolas-cellier-aka-nice
Copy link
Contributor

nicolas-cellier-aka-nice commented Apr 1, 2018

Hi Alistair,
OK, I see your point, my proposal would work for /dev/random but maybe not all possible kind of stream
But then primitiveFileAtEnd is trying to be too smart:

  • being able to test atEnd in pre-condition for regular files
  • being able to test atEnd in post-condition for the rest of the world (pipe socket special devices etc...).

It's kind of brainfuck. I say we should simplify and keep only one, post-condition with a simple feof and leave smartness at image side.
What if we add

primitiveEncounteredEndOfFile
    "Answer true if the end of file was encountered at last read operation".

And leave current primitive only for backward compatibility with deprecation warning in the comment?

@OpenSmalltalk-Bot
Copy link

OpenSmalltalk-Bot commented Apr 1, 2018 via email

@nicolas-cellier-aka-nice
Copy link
Contributor

Hi David, I will have a look.
We still have ungetc to just skip: -1 even on a pipe, as long as called only once.
This could be used to test atEndNow in precondition, Am I wrong?

@akgrant43
Copy link
Contributor Author

akgrant43 commented Apr 3, 2018

Hi Nicolas and Dave,

We still have ungetc to just skip: -1 even on a pipe, as long as called only once.
This could be used to test atEndNow in precondition, Am I wrong?

You're correct (as I understand it). I think it is more a matter of
what do we want to try and move to in the long term.

Right now the behaviour of #atEnd is:

  • True after moving past the end for stdio files
  • True when at the end for regular (disk) files
  • Broken for non-regular files

So we already have both behaviours.

My natural inclination is to minimise the primitive code and let the
image handle it. In this case, that means keeping the feof() behaviour
and eventually changing the definition for regular files.

The primitive should also be checking ferror(), so should be
something like (untested):

sqInt
sqFileAtEnd(SQFile *f) {
	sqInt status;
	/* Return true if the file's read/write head is at the end of the file. */

	if (!sqFileValid(f))
		return interpreterProxy->success(false);
	pentry(sqFileAtEnd);
	/* Regular files test based on current position vs size,
	 * all other files use feof() */
	if (f->isStdioStream || !S_ISREG(f->st_mode))
		status = feof(getFile(f))
	else
		status = ftell(getFile(f)) >= getSize(f);
	if (ferror(getFile(f)))
		primitiveFailForOSError(errno);
	return pexit(status);
}

Cheers,
Alistair

@nicolas-cellier-aka-nice
Copy link
Contributor

nicolas-cellier-aka-nice commented Apr 3, 2018

Alistair, you are perfectly right.
The dual pre/post condition did already exist, and you are proposing an improvement.
My only concern was about complexifying a piece that we will keep forever for compatibility reasons, while we may better make deeper changes at the image side.
But 'the best is the enemy of the good', if you think that this change is vital for Pharo, i have no further objection.

@akgrant43
Copy link
Contributor Author

Hi Nicolas,

Thanks. I agree that this isn't simplifying the existing code, although I also think it isn't really adding complexity. It does result in an arguably correct response where before it was clearly incorrect.

Just for the record, the same incorrect behaviour currently exists in Squeak (#atEnd returns the wrong result for a non-regular file).

Cheers,
Alistair

@akgrant43 akgrant43 self-assigned this Apr 3, 2018
@eliotmiranda
Copy link
Contributor

eliotmiranda commented Apr 3, 2018 via email

Modify sqFileAtEnd() (FilePlugin>>primitiveFileAtEnd) so that #atEnd
returns true when the last character has been read.

Both Squeak and Pharo expect that #atEnd can be used to control
iteration over a stream, answering true as soon as the last character
has been read, and importantly not requiring #next values to be tested
for an end-of-stream.

Previously, stdio streams just used feof(), which requires reading past
the end of the stream, and so required the value returned by #next to be
tested for nil.  This causes problems with streams over other
collections where nil is a valid value.

sqFileAtEnd() would also fail to return the correct value for other
files such as kernel virtual files, e.g. /proc/cpuinfo, and devices like
/dev/urandom, because they have a reported file size of 0.

This change also takes a step towards removing the special handling of
stdio files on Unix, significantly simplifying the code base.

This change doesn't break any existing code that uses #next == nil to
test for end-of-stream.
@akgrant43
Copy link
Contributor Author

Please add the gist of David's description of feof() to the code when you fix this. That's extremely important information. TIA

Done.

@nicolas-cellier-aka-nice
Copy link
Contributor

wow, if it works, it looks good!

@@ -30,6 +31,7 @@ typedef int mode_t;
typedef struct {
int sessionID; /* ikp: must be first */
void *file;
mode_t st_mode; /* from stat() */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, do we still need st_mode and setMode now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just saw that. I don't know how I managed to lose part of the commit. There's also code left behind in sqFileBasicPrims.c.

I'm fixing it now. Sorry about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, there should be a special prime to whoever remove code!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Hopefully I've got them all this time.

Remove references to stat.h and st_mode that were accidentally left
behind.
@akgrant43
Copy link
Contributor Author

wow, if it works, it looks good!

I've run the full automated test suite on Ubuntu 64 with Squeak5.1-16549-64bit.image. There were 4 failures and 3 errors, but none looked remotely related.

Pharo 7 I filtered the packages in Test Runner with 'Zinc|File' and there were no failures.

@akgrant43 akgrant43 merged commit 3421494 into OpenSmalltalk:Cog Apr 5, 2018
@jecisc
Copy link
Contributor

jecisc commented Apr 6, 2018

@akgrant43
Hi,

Since this PR was merged OSProcess is broken if I use it with the latest Pharo 61 and the latest VM.

Here is an example of error:

BufferedAsyncFileReadStream(Object)>>primitiveFailed:
BufferedAsyncFileReadStream(Object)>>primitiveFailed
BufferedAsyncFileReadStream(StandardFileStream)>>primAtEnd:
BufferedAsyncFileReadStream(StandardFileStream)>>atEnd
[ self readBuffer atEnd and: [ super atEnd ] ] in BufferedAsyncFileReadStream>>atEnd
[ caught := true.
self wait.
blockValue := mutuallyExcludedBlock value ] in Semaphore>>critical:
BlockClosure>>ensure:
Semaphore>>critical:
BufferedAsyncFileReadStream>>atEnd
BufferedAsyncFileReadStream(AttachableFileStream)>>upToEnd
ExternalPipe>>upToEnd
PipeableOSProcess(PipeJunction)>>upToEnd
[ super upToEnd ] in PipeableOSProcess>>upToEnd
[ caught := true.
self wait.
blockValue := mutuallyExcludedBlock value ] in Semaphore>>critical:
BlockClosure>>ensure:
Semaphore>>critical:
PipeableOSProcess>>upToEnd
PipeableOSProcess(PipeJunction)>>outputOn:
PipeableOSProcess(PipeJunction)>>output
PipeableOSProcess(PipeJunction)>>outputAndError

@akgrant43
Copy link
Contributor Author

Hi Cyril,

Can you provide some sample code that triggers the problem? I've never used BufferedAsyncFileReadStream directly, and it's quite a while since I've used OSProcess.

Also:

32 or 64 bit?
OS?

Thanks,
Alistair

@jecisc
Copy link
Contributor

jecisc commented Apr 6, 2018

Yes sorry I was a little in a hurry :)

OS:

 ~> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial

Pharo version: Pharo 61 60540

VM: Latest VM 64bits (we do not test the latest 32bits vm so it's maybe the same)

We use OSProcess this way:

compileFile: aFile
	| thisProcess currentPath result command |
	thisProcess := OSProcess thisOSProcess.
	currentPath := thisProcess environment at: #PATH.
	thisProcess environment at: #PATH put: self binDirectory , ':' , currentPath.
	command := PipeableOSProcess command: ( '"' , self binPath , '" "' , aFile fullName , '"').
	[ result := command outputAndError ] ensure: [ command close ].
	thisProcess environment at: #PATH put: currentPath.
	^ result first

@akgrant43
Copy link
Contributor Author

Hi Cyril,

I wasn't able to reproduce it, although I'm obviously missing the definition of "self binPath".

Are you able to adapt the script below to trigger the failure?

| thisProcess currentPath result command aFile |

	aFile := 'PharoDebug.log' asFileReference.
	aFile exists ifFalse: [ self error: 'I want a reasonable size file (several kB)' ].
	thisProcess := OSProcess thisOSProcess.
	currentPath := thisProcess environment at: #PATH.
	command := PipeableOSProcess command: ( 'cat ' , aFile fullName).
	[ result := command outputAndError ] ensure: [ command close ].
	thisProcess environment at: #PATH put: currentPath.
	result


" #('THERE_BE_DRAGONS_HERE
...
"

System Report:

Image
-----
/home/alistair/pharo7/IssueOSProcess6/Pharo.image
Pharo6.0
Latest update: #60540
Unnamed

Virtual Machine
---------------
/home/alistair/pharo7/IssueOSProcess6/vm/lib/pharo/5.0-201804051426/pharo
CoInterpreter VMMaker.oscog-eem.2361 uuid: 7ca2f89a-de70-422f-b92b-54f91ac4e47b Apr  5 2018
StackToRegisterMappingCogit VMMaker.oscog-eem.2361 uuid: 7ca2f89a-de70-422f-b92b-54f91ac4e47b Apr  5 2018
VM: 201804051426 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Thu Apr 5 16:26:17 2018 +0200 $ CommitHash: 3421494 $ Plugins: 201804051426 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $

@jecisc
Copy link
Contributor

jecisc commented Apr 6, 2018

I could not include everything because it's a proprietary code (I already had to remove some part in this example )

It'll be a little hard to get a reproducible case since I only have a windows :( I saw the failure on a Jenkins build. I'll try to find a way to reproduce it.

@jecisc
Copy link
Contributor

jecisc commented Apr 6, 2018

I added to our Jenkins a command to print the version and I see that we used the same VM.

5.0-201804051426  Thu Apr  5 14:34:45 UTC 2018 gcc 4.8 [Production Spur 64-bit VM]
CoInterpreter VMMaker.oscog-eem.2361 uuid: 7ca2f89a-de70-422f-b92b-54f91ac4e47b Apr  5 2018
StackToRegisterMappingCogit VMMaker.oscog-eem.2361 uuid: 7ca2f89a-de70-422f-b92b-54f91ac4e47b Apr  5 2018
VM: 201804051426 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
Date: Thu Apr 5 16:26:17 2018 +0200 $ CommitHash: 3421494 $
Plugins: 201804051426 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
Linux travis-job-8fe7d8cc-2ee3-4c21-8897-143e18d7a979 4.4.0-101-generic #124~14.04.1-Ubuntu SMP Fri Nov 10 19:05:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
plugin path: /var/lib/jenkins/workspace/Ada-Generator/ARCHITECTURE/64/PHARO/61/VERSION/development/pharo-vm/lib/pharo/5.0-201804051426 [default: /var/lib/jenkins/workspace/Ada-Generator/ARCHITECTURE/64/PHARO/61/VERSION/development/pharo-vm/lib/pharo/5.0-201804051426/]

@akgrant43 akgrant43 deleted the 21643-FilePlugin-primitiveFileAtEnd branch April 8, 2018 15:12
tesonep pushed a commit to tesonep/opensmalltalk-vm that referenced this pull request Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants