Skip to content

branch-4.0: [fix](streaming-job) Fix PG replication slot leak when streaming task is cancelled during pause/resume #62010#62736

Merged
yiguolei merged 1 commit into
branch-4.0from
auto-pick-62010-branch-4.0
May 9, 2026
Merged

branch-4.0: [fix](streaming-job) Fix PG replication slot leak when streaming task is cancelled during pause/resume #62010#62736
yiguolei merged 1 commit into
branch-4.0from
auto-pick-62010-branch-4.0

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Cherry-picked from #62010

… is cancelled during pause/resume (#62010)

### What problem does this PR solve?

  Problem Summary:

When a PostgreSQL CDC streaming job is paused and resumed, the PG
replication slot
  can be permanently leaked, causing all subsequent tasks to fail with:
  `replication slot "doris_cdc_xxx" is active for PID xxx`

  **Root cause:**

The CDC client reuses a single `SourceReader` instance per jobId
(`Env.getOrCreateReader`).
When FE cancels a task (PAUSE), the BE HTTP connection is closed, but
the CDC client's
`buildStreamRecords` thread may still be blocked in `pollRecords` (up to
15s timeout).
Before the old task finishes, the new task (after RESUME) arrives at the
same CDC client
and calls `prepareStreamSplit`, which overwrites `this.streamReader`
with a new Fetcher
without closing the old one. The old Debezium reader (holding the PG
replication connection)
is leaked — its reference is lost, so `finishSplitRecords` in the old
task's finally block
  closes the new Fetcher instead, and the PG slot is never released.

From the logs, the slot remained occupied for 25+ minutes until the test
timed out:
Failed to start replication stream at LSN{0/318EBC8}; when setting up
multiple connectors
for the same database host, please make sure to use a distinct
replication slot name for each.

  **Fix:**

  Close the previous stream/binlog reader before creating a new one in
`prepareStreamSplit` (PG) and `prepareBinlogSplit` (MySQL). This ensures
the old
Debezium connection is properly released when a new task reuses the same
SourceReader.
@yiguolei
Copy link
Copy Markdown
Contributor

yiguolei commented May 7, 2026

run buildall

@yiguolei yiguolei closed this May 9, 2026
@yiguolei yiguolei reopened this May 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions Bot commented May 9, 2026

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added approved Indicates a PR has been approved by one committer. reviewed labels May 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions Bot commented May 9, 2026

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 2375fb4 into branch-4.0 May 9, 2026
39 of 43 checks passed
@github-actions github-actions Bot deleted the auto-pick-62010-branch-4.0 branch May 9, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants