Skip to content

Compaction: Abort the job smartly when partial commit starts to fail #6579

@qinghui-xu

Description

@qinghui-xu

Feature Request / Improvement

We have a streaming pipeline serving (upsert) data to a table, and a spark compaction job to rewrite files asynchronously.
Compaction job will fail to commit when streaming pipeline commits to the table with some deletion in existing data. To address this we enabled partial commit in compaction job.

What we observe after enabling partial commit (say, with partial-progress.max-commit = 10):

  • First few partial commits succeeded
  • Streaming job commits a snapshot with upsert
  • All subsequent partial commits failed

In our case, the streaming pipeline is writing to all partitions at the same time constantly, which means when a first partial commit fails because of conflict all the subsequent partial commits would fail almost for sure. It would be nice to abort the job sooner to avoid wasting resources on doomed-to-fail processing.

Proposal:
I can think about 3 modes regarding to the commit failure handling:

  • default: continue compactions and try the other partial commits, as is the current behavior
  • conservative: abort compaction right away when first commit failure happens, used when we know in advance (almost) for sure following commits would also fail
  • smart: check the commit conflict, find out the impacted files / partitions, and abort only tasks that would be impacted by the conflict

Query engine

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions