Skip to content

Fix reader out of bounds#9

Merged
j-vizcaino merged 4 commits intomasterfrom
fix-reader-out-of-bounds
Jan 21, 2020
Merged

Fix reader out of bounds#9
j-vizcaino merged 4 commits intomasterfrom
fix-reader-out-of-bounds

Conversation

@j-vizcaino
Copy link
Copy Markdown

@j-vizcaino j-vizcaino commented Jan 20, 2020

The first commit illustrates the issue (see CI: https://circleci.com/gh/DataDog/golz4/36).

When decompressing data, size of dst was not checked and the slice taken for writing turns into an out of bound panic.

The fix relies on a separate buffer to hold the decompressed bytes that cannot be written right away.
The implementation is fairly straightforward: on Read() either consume from the internal buffer or decompress more data.

The input data needs to be at least 512 bytes long.
Use an internal buffer to store decompressed data between Read() calls
when dst buffer is not big enough.
Copy link
Copy Markdown
Member

@HippoBaro HippoBaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! However, IMO passing a read buffer of the right size is the responsibility of the caller.

For example on the write path we do

golz4/lz4.go

Lines 84 to 86 in 4bea358

if len(src) > streamingBlockSize+4 {
return 0, fmt.Errorf("block is too large: %d > %d", len(src), streamingBlockSize+4)
}

Your solution adds pure overheads for callers that do the right thing.

@j-vizcaino
Copy link
Copy Markdown
Author

@HippoBaro Thanks for the review but I disagree with your point: there are lots of cases where the user has no control over the size of the buffer that is passed to the Read function. Just have a look at the unit test in 3a126b7: bytes.Buffer reads from the LZ4 reader and we have no control over the size of the buffer.
Here is another real-world scenario that illustrates this: https://github.com/DataDog/dd-go/pull/17942/commits/1a38124a4c2a92f4d46c8b7bac18b14e6c878e33#diff-a48157fe8a7dcfeb8fa1bc89948fea81R17

Furthermore, I don't think the performance argument stands either, because, when the caller provides a buffer that is big enough, no pending slice gets created nor used. Unless I missed something, the current implementation does not apply extra data copied, compared to the previous one.

FWIW, I first tried to patch the Read function to make it return io.ErrShortBuffer, as I thought this would make the caller grow the buffer and try again but it turns out this is not correctly handled (by bytes.Buffer, at least).

@HippoBaro
Copy link
Copy Markdown
Member

I'm not very opinionated on the topic, but I would use a buffered reader on the client side in this case.

https://golang.org/pkg/bufio/#NewReaderSize

@j-vizcaino
Copy link
Copy Markdown
Author

While using a buffered reader would probably help, I do not think that having the reader know the max read size beforehand is a good pattern.
How is the reader supposed to know the max size a given inflate operation can return anyway?
And even if it did based on how the bytestream was written, it feels like coupling two components or apps for no obvious reason, making future changes harder than it should.

@HippoBaro
Copy link
Copy Markdown
Member

That's a fair point; my approach comes from using the lz4 library itself. It solves this problem by providing a function returning a buffer length to external users.

https://github.com/lz4/lz4/blob/c7ad96e299545330617e95eebc1369edd4e5fdf0/lib/lz4.h#L173-L182

I misread the code btw, you're not adding overhead; apologies.

@j-vizcaino
Copy link
Copy Markdown
Author

That's a fair point; my approach comes from using the lz4 library itself. It solves this problem by providing a function returning a buffer length to external users.

https://github.com/lz4/lz4/blob/c7ad96e299545330617e95eebc1369edd4e5fdf0/lib/lz4.h#L173-L182

Thanks for the pointer but this function is only meant to be used when compressing (it is already provided by https://github.com/DataDog/golz4/blob/master/lz4.go#L50 FWIW).

This PR is about reads, not writes ;)

@HippoBaro
Copy link
Copy Markdown
Member

All good then; your code works and does the job. :shipit: !

Copy link
Copy Markdown
Collaborator

@zzzzssss zzzzssss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix gives the lz4 reader a more general usage, without letting the caller worrying about the size for the block size, so go ahead

@j-vizcaino j-vizcaino merged commit bdbee0b into master Jan 21, 2020
@j-vizcaino j-vizcaino deleted the fix-reader-out-of-bounds branch January 21, 2020 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants