[fix](audit) Mark internal query failures as ERR in audit log by yujun777 · Pull Request #62908 · apache/doris

yujun777 · 2026-04-28T10:09:16Z

What problem does this PR solve?

Problem Summary:
When an internal query inside StmtExecutor.executeInternalQuery() failed (for example, the column-statistics gather SQL that ANALYZE issues against a user table when the underlying tablet hits a BE IO error), the audit_log entry recorded:

state=OK | error_code=0 | error_message=<empty> | return_rows=0

This is misleading: the gather query actually failed, but the audit log makes it look like it succeeded with zero rows. ANALYZE itself still surfaces the failure to the user, but the per-internal-query audit entries hide the root cause, complicating triage.

Root cause: executeInternalQuery() wraps the inner work in try { ... } finally { AuditLogHelper.logAuditLog(context, ...) }, but the inner catch handlers only re-throw the exception and never set ConnectContext state to ERR. The default OK state is therefore what gets logged.

Fix: add an outer catch (Exception e) around the inner try that, when state has not already been moved to ERR, records ERR_INTERNAL_ERROR together with the message (falling back to root-cause message when the exception message is empty), then re-throws so callers behave as before. The setNereids/setIsQuery/setInternal flags are also moved above the parse step so audit entries for parse/plan failures still carry the right metadata.

Release note

Internal query failures are now correctly recorded as ERR in audit_log instead of misleadingly showing OK with empty error info.

Check List (For Author)

Test:
- Unit Test: StmtExecutorInternalQueryTest#testExecuteInternalQuerySetsErrorStateOnFailure
- Regression test: fault_injection_p0/test_audit_log_internal_query_failure
- Manual test: reproduced on a local cluster with the LocalFileReader::read_at_impl.io_error debug point; before the fix audit_log shows state=OK / error_message=, after the fix it shows state=ERR / error_code=1815 / full IO_ERROR description.
Behavior changed: Yes (audit_log entries for failed internal queries now show ERR; previously they showed OK).
Does this need documentation: No.

### What problem does this PR solve? Issue Number: https://jira.selectdb-in.cc/browse/CIR-20019 Problem Summary: When an internal query inside StmtExecutor.executeInternalQuery() failed (for example, the column-statistics gather SQL that ANALYZE issues against a user table when the underlying tablet hits a BE IO error), the audit_log entry recorded: state=OK | error_code=0 | error_message=<empty> | return_rows=0 This is misleading: the gather query actually failed, but the audit log makes it look like it succeeded with zero rows. ANALYZE itself still surfaces the failure to the user, but the per-internal-query audit entries hide the root cause, complicating triage. Root cause: executeInternalQuery() wraps the inner work in try { ... } finally { AuditLogHelper.logAuditLog(context, ...) }, but the inner catch handlers only re-throw the exception and never set ConnectContext state to ERR. The default OK state is therefore what gets logged. Fix: add an outer catch (Exception e) around the inner try that, when state has not already been moved to ERR, records ERR_INTERNAL_ERROR together with the message (falling back to root-cause message when the exception message is empty), then re-throws so callers behave as before. The setNereids/setIsQuery/setInternal flags are also moved above the parse step so audit entries for parse/plan failures still carry the right metadata. ### Release note Internal query failures are now correctly recorded as ERR in audit_log instead of misleadingly showing OK with empty error info. ### Check List (For Author) - Test: - Unit Test: StmtExecutorInternalQueryTest#testExecuteInternalQuerySetsErrorStateOnFailure - Regression test: fault_injection_p0/test_audit_log_internal_query_failure - Manual test: reproduced on a local cluster with the LocalFileReader::read_at_impl.io_error debug point; before the fix audit_log shows state=OK / error_message=, after the fix it shows state=ERR / error_code=1815 / full IO_ERROR description. - Behavior changed: Yes (audit_log entries for failed internal queries now show ERR; previously they showed OK). - Does this need documentation: No. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

yujun777 · 2026-04-28T10:09:24Z

run buildall

yujun777 · 2026-04-28T10:09:27Z

/review

github-actions · 2026-04-28T10:48:22Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-04-28T10:48:25Z

PR approved by anyone and no changes requested.

morrySnow · 2026-04-29T02:27:47Z

/review

hello-stephen · 2026-04-29T02:46:10Z

skip buildall

hello-stephen · 2026-04-29T02:46:59Z

skip buildall

### What problem does this PR solve? Problem Summary: When an internal query inside `StmtExecutor.executeInternalQuery()` failed (for example, the column-statistics gather SQL that ANALYZE issues against a user table when the underlying tablet hits a BE IO error), the audit_log entry recorded: ``` state=OK | error_code=0 | error_message=<empty> | return_rows=0 ``` This is misleading: the gather query actually failed, but the audit log makes it look like it succeeded with zero rows. ANALYZE itself still surfaces the failure to the user, but the per-internal-query audit entries hide the root cause, complicating triage. Root cause: `executeInternalQuery()` wraps the inner work in `try { ... } finally { AuditLogHelper.logAuditLog(context, ...) }`, but the inner catch handlers only re-throw the exception and never set `ConnectContext` state to ERR. The default OK state is therefore what gets logged. Fix: add an outer `catch (Exception e)` around the inner try that, when state has not already been moved to ERR, records `ERR_INTERNAL_ERROR` together with the message (falling back to root-cause message when the exception message is empty), then re-throws so callers behave as before. The `setNereids/setIsQuery/setInternal` flags are also moved above the parse step so audit entries for parse/plan failures still carry the right metadata. ### Release note Internal query failures are now correctly recorded as ERR in audit_log instead of misleadingly showing OK with empty error info. ### Check List (For Author) - Test: - Unit Test: `StmtExecutorInternalQueryTest#testExecuteInternalQuerySetsErrorStateOnFailure` - Regression test: `fault_injection_p0/test_audit_log_internal_query_failure` - Manual test: reproduced on a local cluster with the `LocalFileReader::read_at_impl.io_error` debug point; before the fix audit_log shows `state=OK / error_message=`, after the fix it shows `state=ERR / error_code=1815 / full IO_ERROR description`. - Behavior changed: Yes (audit_log entries for failed internal queries now show ERR; previously they showed OK). - Does this need documentation: No. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

yujun777 marked this pull request as draft April 28, 2026 10:10

yujun777 marked this pull request as ready for review April 28, 2026 10:10

morrySnow approved these changes Apr 28, 2026

View reviewed changes

morrySnow added dev/4.0.x dev/4.1.x labels Apr 28, 2026

github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 28, 2026

github-actions Bot added the reviewed label Apr 28, 2026

hello-stephen merged commit 53c52f5 into apache:master Apr 29, 2026
39 checks passed

github-actions Bot mentioned this pull request Apr 29, 2026

branch-4.0: [fix](audit) Mark internal query failures as ERR in audit log #62908 #62919

Open

github-actions Bot mentioned this pull request Apr 29, 2026

branch-4.1: [fix](audit) Mark internal query failures as ERR in audit log #62908 #62920

Open

This was referenced May 5, 2026

branch-4.1: [fix](audit) Mark internal query failures as ERR in audit log #62996

Open

branch-4.0: [fix](audit) Mark internal query failures as ERR in audit log #62997

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix](audit) Mark internal query failures as ERR in audit log#62908

[fix](audit) Mark internal query failures as ERR in audit log#62908
hello-stephen merged 1 commit intoapache:masterfrom
yujun777:fix/audit-log-internal-query-failure

yujun777 commented Apr 28, 2026 •

edited

Loading

Uh oh!

yujun777 commented Apr 28, 2026

Uh oh!

yujun777 commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

morrySnow commented Apr 29, 2026

Uh oh!

hello-stephen commented Apr 29, 2026

Uh oh!

hello-stephen commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yujun777 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Uh oh!

yujun777 commented Apr 28, 2026

Uh oh!

yujun777 commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

morrySnow commented Apr 29, 2026

Uh oh!

hello-stephen commented Apr 29, 2026

Uh oh!

hello-stephen commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yujun777 commented Apr 28, 2026 •

edited

Loading