Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid corrupting data when calling seekend #149

Closed
wants to merge 2 commits into from

Conversation

nhz2
Copy link
Member

@nhz2 nhz2 commented Jun 14, 2023

Fixes #109

I changed the default seekstart to error when applied to a stream in :write mode, and reset the stream if it is in :read mode. It also calls seekstart on the wrapped IO.

seekend(s) should be equivalent to skip(s, big number) in which case it should currently error if the stream is not in :read mode.

Also changed skip to return the stream to be more consistent.

@nhz2
Copy link
Member Author

nhz2 commented Jun 22, 2023

@Marlin-Na Does this solve your issue? @quinnj @bicycle1885 This is ready for review.

The current behavior of seekend on a stream in :write mode can very easily lead to corrupt data.

For example currently:

julia> stream = GzipCompressorStream(open("foo.txt.gz"; write=true))
TranscodingStreams.TranscodingStream{GzipCompressor, IOStream}(<mode=idle>)

julia> write(stream, collect(1:10000))
80000

julia> seekend(stream)
TranscodingStreams.TranscodingStream{GzipCompressor, IOStream}(<mode=write>)

julia> write(stream, collect(10001:20000))
80000

julia> close(stream)

julia> reinterpret(Int,transcode(GzipDecompressor,read("foo.txt.gz")))
ERROR: zlib error: invalid block type (code: -3)

With this PR the error happens while writing, but no data is lost

julia> stream = GzipCompressorStream(open("foo.txt.gz"; write=true))
TranscodingStreams.TranscodingStream{GzipCompressor, IOStream}(<mode=idle>)

julia> write(stream, collect(1:10000))
80000

julia> seekend(stream)
ERROR: ArgumentError: not in read mode
Stacktrace:
 [1] seekend(stream::TranscodingStreams.TranscodingStream{GzipCompressor, IOStream})
   @ TranscodingStreams ~/juliadev/TranscodingStreams/src/stream.jl:308
 [2] top-level scope
   @ REPL[49]:1

julia> write(stream, collect(10001:20000))
80000

julia> close(stream)

julia> reinterpret(Int,transcode(GzipDecompressor,read("foo.txt.gz")))
20000-element reinterpret(Int64, ::Vector{UInt8}):
     1
     2
     3
     4
     5
     6
     
 19996
 19997
 19998
 19999
 20000

@nhz2 nhz2 changed the title Change default behavior of seekend and seekstart Avoid corrupting data when calling seekend Oct 11, 2023
Comment on lines +302 to +305
if mode == :read
while !eof(stream)
emptybuffer!(buffer1)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to read the entire file here. Normally, when a user uses seek, it's specifically to avoid processing a whole file.
I understand that it may not be possible to seek into a compressed file and get the position of the decompressed stream, but I believe in that case, seekend should simply not be implemented, else we risk frustrating users when an operation they believe to be constant time suddenly takes ages.
We can then provide the functionality of reading the whole file under a function with another name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that it is much better for Base.seekend(stream::TranscodingStream) not to be implemented than to silently do the wrong thing. I'm not sure how to go about removing this method though. I created a PR JuliaIO/CodecZlib.jl#68 to remove its use in CodecZlib.jl's tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can split this PR into three parts, one for each change, to make reviewing easier.

@nhz2 nhz2 marked this pull request as draft October 16, 2023 14:54
@nhz2 nhz2 closed this Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

seekstart/seek/seekend leads to invalid position
2 participants