[BEAM-778] Make filesystem._CompressedFile seekable.#2392
Conversation
|
R: @chamikaramj |
|
Refer to this link for build results (access rights to CI server needed): |
|
Thanks Tibor. I'm looking into this. |
| elif whence == os.SEEK_CUR: | ||
| absolute_offset = self._uncompressed_position + offset | ||
| elif whence == os.SEEK_END: | ||
| # Determine and cache the uncompressed size of the file |
There was a problem hiding this comment.
This could be very expensive for large files. How about producing a warning along with the time it took to re-read the file ?
There was a problem hiding this comment.
Created a warning message using logger. Note that I've used the per module logger what i've suggested in 22th of March in the Dev list and was accepted by Ahmet (BEAM-1825).
We might consider reducing the log level to debug if the users find that it floods the logs.
| return False | ||
| return self._file.mode == 'r' | ||
|
|
||
| def _rewind_read_buffer(self): |
There was a problem hiding this comment.
Could you add comments describing what each of the _rewind* methods does ?
There was a problem hiding this comment.
Done. I found a better name for _rewind_read_buffer: _clear_read_buffer. I think it better represents the purpose.
| * if the new offset is out of bound, it is adjusted to either 0 or EOF. | ||
|
|
||
| Args: | ||
| offset: seek offset as number. |
There was a problem hiding this comment.
Is this the offset of the compressed file or is it the offset after file is uncompressed ? Please clarify in the comment.
|
|
||
| Raises: | ||
| IOError: When this buffer is closed. | ||
| ValueError: When whence is invalid or the file is not seekable |
There was a problem hiding this comment.
Please mention what happens if offset is out of bound.
There was a problem hiding this comment.
Out of bound seek is explained in 'seeking behavior' section in this docstring. I'd prefer to keep it there as the behavior does not yield to exceptions.
|
|
||
| # Determine how many bytes needs to be read before we reach | ||
| # the requested offset. Rewind if we already passed the position. | ||
| if offset < self._uncompressed_position: |
There was a problem hiding this comment.
Shouldn't this be "absolute_offset < self._uncompressed_position" ?
There was a problem hiding this comment.
Very good catch! Fixed.
| # the requested offset. Rewind if we already passed the position. | ||
| if offset < self._uncompressed_position: | ||
| self._rewind() | ||
| bytes_to_skip = absolute_offset - self._uncompressed_position |
There was a problem hiding this comment.
Shouldn't we skip up to absolute_offset if we ran self._rewind()
There was a problem hiding this comment.
We do skip up to absolute_offset as _rewind sets _uncompressed_position to 0 through _rewind_file.
| read_size=self.read_block_size) | ||
| reference_fd = StringIO(self.content) | ||
|
|
||
| random_position = randint(0, len(self.content) - 1) |
There was a problem hiding this comment.
Let's run this for several fixed positions within the range (0, size - 1) instead of a random position to make the test consistent.
There was a problem hiding this comment.
Done: added several positional seeks around the boundaries and inside.
| self.assertEqual(uncompressed_position, expected_position) | ||
| self.assertEqual(reference_position, expected_position) | ||
|
|
||
| seek_position = randint(-1 * mid_point, mid_point) |
There was a problem hiding this comment.
Ditto. Let's run this for several fixed positions instead of a random position.
There was a problem hiding this comment.
Done. This exercise revealed an interesting behavior of cStringIO:
If we seek out of bound of a buffer through cStringIO then call tell() it will report
the out of bound position. If we do a read() or readline() prior to tell() then tell()
will report the correct position which is the size of the buffer.
CompressedFile's tell() never returns out of bound numbers. I've documented this difference in the testcase.
|
@chamikaramj : PTAL |
|
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 2.18 MB...] at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution failed. at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:302) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 31 moreCaused by: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:764) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:711) at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:289) ... 33 more2017-04-01T15:14:49.415 [ERROR] 2017-04-01T15:14:49.415 [ERROR] Re-run Maven using the -X switch to enable full debug logging.2017-04-01T15:14:49.415 [ERROR] 2017-04-01T15:14:49.415 [ERROR] For more information about the errors and possible solutions, please read the following articles:2017-04-01T15:14:49.415 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException2017-04-01T15:14:49.415 [ERROR] 2017-04-01T15:14:49.415 [ERROR] After correcting the problems, you can resume the build with the command2017-04-01T15:14:49.415 [ERROR] mvn -rf :beam-sdks-pythonchannel stoppedSetting status of 9870ab0 to FAILURE with url https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/9048/ and message: 'Build finished. 'Using context: Jenkins: Maven clean install--none-- |
|
Could you fix the Jenkins (lint) failure ? LGTM other than that. |
|
@chamikaramj : Fixed the lint warnings. Thanks for the review! |
|
Refer to this link for build results (access rights to CI server needed): Failed Tests: 1beam_PreCommit_Java_MavenInstall/org.apache.beam:beam-runners-apex: 1--none-- |
|
retest this please (rerunning Jenkins) |
|
Refer to this link for build results (access rights to CI server needed): |
|
LGTM |
|
R: @aaltay regarding extra dependency |
|
I would argue against taking the additional dependency in this case. Historically we had issues with not-stable dependencies. And in this case:
For all of that, @tibkiss could you remove the extra dependency please. |
|
@aaltay : Thanks for the review. I've removed 'parameterized' dependency as you suggested. Please take a look. Thanks! |
|
Refer to this link for build results (access rights to CI server needed): |
|
LGTM. Thanks for the updates. I'll merge. |
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.