Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear decoder buffer #166

Merged
merged 6 commits into from
Jun 7, 2024
Merged

Clear decoder buffer #166

merged 6 commits into from
Jun 7, 2024

Conversation

n8maninger
Copy link
Member

@n8maninger n8maninger commented Jun 6, 2024

Ensures the decoder's buffer is cleared when Read encounters an error. This should fix OOM issues we're seeing when syncing Mainnet. It may also fix SiaFoundation/hostd#406, but I haven't looked deep into that one yet.

@n8maninger
Copy link
Member Author

n8maninger commented Jun 6, 2024

@lukechampine I think there's another potential attack vector where a huge prefix for an array is sent that is technically smaller than d.lr.N. since we use make([]foo, d.ReadPrefix()) everywhere the decoder allocates the entire slice immediately which could expand significantly.

For example, when decoding a block sent by a peer, d.lr.N is 50MB. That would enable a slice like block.FileContracts to allocate up to 50M items. I propose changing ReadPrefix to ReadPrefix(max int64) to prevent that. An alternative is adding MaxSize() to DecoderFrom

types/encoding.go Outdated Show resolved Hide resolved
types/encoding.go Outdated Show resolved Hide resolved
@lukechampine
Copy link
Member

For example, when decoding a block sent by a peer, d.lr.N is 50MB. That would enable a slice like block.FileContracts to allocate up to 50M items. I propose changing ReadPrefix to ReadPrefix(max int64) to prevent that. An alternative is adding MaxSize() to DecoderFrom

The old encoder took a different approach: https://github.com/NebulousLabs/Sia/blob/master/encoding/marshal.go#L341

// NextPrefix is like NextUint64, but performs sanity checks on the prefix.
// Specifically, if the prefix multiplied by elemSize exceeds MaxSliceSize,
// NextPrefix returns 0 and sets d.Err().
func (d *Decoder) NextPrefix(elemSize uintptr) uint64 {

But this kinda sucked because your callsites end up looking like:

addrs := make([]types.Address, d.NextPrefix(unsafe.Sizeof(types.Address{}))

i.e. you had to import unsafe all over the place (or hardcode a value). I would be open to a generic version that calls unsafe.Sizeof internally, so you could write:

addrs := d.ReadSlice[[]types.Address]()

...but Go doesn't support generic methods, so I guess you'd have to do types.ReadSlice[[]types.Address](d)?

Backing up, though: I did consider this issue when I implemented the decoder. My feeling at the time was that, although it does allow amplification attacks, the amplification is linear, and in practice the scaling factor is not enormous; the biggest is probably V2FileContract, at 368 bytes (if my math is correct). So yeah, if you're decoding a 50MB SendBlocks buffer, that can potentially blow up to a 18.4 GB allocation. Not great.

But is this actually a problem? When you call make([]byte, HUGE_NUMBER), the OS doesn't immediately reserve all that memory. To a first approximation, you only pay for what you use. Here's a Go Playground where I allocate an 18 GB slice and modify parts of it. You can see it runs instantly. If you change it to modify the entire slice, it errors out. But in the context of our decoder, this is totally fine: the attack can make us allocate a bunch of memory, but they can't make us write to that memory unless the encoded message is just as big. So in practice, we'll "allocate" 18 GB, page ~50 MB of it into RAM, and then the decoder will hit EOF and the slice will get garbage-collected.

@n8maninger n8maninger merged commit c89664e into master Jun 7, 2024
8 checks passed
@n8maninger n8maninger deleted the nate/clear-decoder-buffer branch June 7, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Figure out what is causing hostd to return massive OutputLength in RPCExecuteProgramResponse
4 participants