Skip to content

[BUG] Potential block on compaction on read from S3 timeout #2108

@SCNieh

Description

@SCNieh

How does this happen

Due to the limitation of memory that can be used in SSO compaction, all the data blocks that need to be compacted will be grouped and splitted into multiple iteration, size of which is under the memory limitation.

At the end of each iteration, the compaction thread will wait for all the data blocks to be successfully uploaded to S3 before clearing the cache and execute the next iteration. However, if the data blocks written into the underlying MultiPartWriter is smaller than the MIN_PART_SIZE, whch is 5MB by default, the data will remain in the memory unless the writer is closed or the accumulated data size exceeds the limit.

And since the writer can only be closed at the end of compactoin, so we need to move on to the next iteration in this case to prevent endless waiting for the data to be uploaded.

However, the current implementation only takes care of normal cases but not on the situation when there are exceptions on the underlying cfs, which makes the iteration not be able to move on and blocks the thread forever

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions