Skip to content

[fix](fe) Mask sensitive headers in stream load logs#62108

Merged
gavinchou merged 2 commits intoapache:masterfrom
liaoxin01:fix-streamload-mask-headers-master
Apr 17, 2026
Merged

[fix](fe) Mask sensitive headers in stream load logs#62108
gavinchou merged 2 commits intoapache:masterfrom
liaoxin01:fix-streamload-mask-headers-master

Conversation

@liaoxin01
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: FE stream load REST logs printed full request headers, which could leak Authorization and token values into INFO logs.

Release note

None

Check List (For Author)

  • Test: No completed automated test run in this environment. I attempted a FE build in a worktree and reached fe-core, but did not wait for full FE packaging to finish.
  • Behavior changed: No.
  • Does this need documentation: No

Copilot AI review requested due to automatic review settings April 3, 2026 16:02
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 3, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces the risk of credential leakage by masking selected sensitive HTTP request headers when FE logs stream load REST requests.

Changes:

  • Mask values for a small set of sensitive headers (e.g., Authorization, token) in getAllHeaders().
  • Add isSensitiveHeader() helper to centralize the masking decision.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: FE stream load REST logs printed full request headers, which could leak Authorization and token values into INFO logs.

### Release note

None

### Check List (For Author)

- Test: No need to test (log sanitization only; no completed automated test run in this environment)
- Behavior changed: No
- Does this need documentation: No
@liaoxin01 liaoxin01 force-pushed the fix-streamload-mask-headers-master branch from c55096e to 012d589 Compare April 3, 2026 16:14
@dataroaring
Copy link
Copy Markdown
Contributor

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29072 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 012d5896d8b4d9ce6cd66e6b7c0e9d4d651d03fc, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17598	3689	3694	3689
q2	q3	10691	851	595	595
q4	4687	459	368	368
q5	7448	1350	1149	1149
q6	186	166	134	134
q7	903	952	783	783
q8	9316	1439	1275	1275
q9	5530	5350	5272	5272
q10	6307	2033	1775	1775
q11	485	269	271	269
q12	639	408	279	279
q13	18066	2786	2149	2149
q14	280	287	263	263
q15	q16	890	855	787	787
q17	1090	1116	903	903
q18	6359	5688	5619	5619
q19	1322	1216	1101	1101
q20	576	415	300	300
q21	5082	2421	2024	2024
q22	470	413	338	338
Total cold run time: 97925 ms
Total hot run time: 29072 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4622	4353	4334	4334
q2	q3	4639	4726	4153	4153
q4	2037	2102	1362	1362
q5	4928	4997	5144	4997
q6	196	171	136	136
q7	2011	1799	1639	1639
q8	3267	3082	3077	3077
q9	8243	8334	8381	8334
q10	4459	4511	4303	4303
q11	624	462	405	405
q12	676	709	501	501
q13	2858	3095	2353	2353
q14	306	305	281	281
q15	q16	746	773	690	690
q17	1258	1327	1201	1201
q18	7949	7012	6930	6930
q19	1156	1161	1119	1119
q20	2231	2188	1930	1930
q21	6057	5333	4799	4799
q22	537	511	441	441
Total cold run time: 58800 ms
Total hot run time: 52985 ms

### What problem does this PR solve?

Issue Number: None

Related PR: apache#62108

Problem Summary: Expand stream load header masking to cover cookie headers and add a regression test for sensitive header masking.

### Release note

Mask Cookie and Set-Cookie headers in FE stream load logs.

### Check List (For Author)

- Test: FE unit test
    - ./run-fe-ut.sh --run org.apache.doris.httpv2.rest.LoadActionTest
- Behavior changed: Yes (Cookie and Set-Cookie headers are now masked in logs)
- Does this need documentation: No
@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

@liaoxin01
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking issue remains, so I cannot approve this PR yet.

Critical checkpoint conclusions:

  • Goal of the task: Partially accomplished. The patch masks sensitive headers in getAllHeaders() and adds a focused FE unit test, but it does not fully prevent sensitive credential exposure because the invalid cluster-token path still echoes the raw token back in the UnauthorizedException message.
  • Modification size/focus: Yes. The change is small and focused on stream-load header logging.
  • Concurrency: Not applicable. No shared-state or locking changes are introduced here.
  • Lifecycle/static initialization: Not applicable.
  • Configuration changes: None.
  • Compatibility/incompatible change: None.
  • Functionally parallel code paths: Not fully handled. The same raw-token echo pattern still exists in this controller flow and in the matching StreamingJobAction path.
  • Special conditional checks: The new sensitive-header filter is straightforward, but because this is security-sensitive, partial masking is insufficient when another path still exposes the same secret.
  • Test coverage: Improved but incomplete. The added FE unit test verifies header masking, but there is no coverage for the invalid-token/error-response path that still leaks the credential.
  • Observability: Sufficient for this scope; no extra metrics/logging needed.
  • Transaction/persistence: Not applicable.
  • Data write/modification correctness: Not applicable.
  • FE-BE variable passing: Not applicable.
  • Performance: No meaningful concern in this change.
  • Other issues: Blocking security issue described inline.

Please remove raw token values from unauthorized/error messages before this is approved.

@liaoxin01 liaoxin01 dismissed github-actions[bot]’s stale review April 15, 2026 03:51

invalid token don't need to mask.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 17, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@gavinchou gavinchou merged commit 4c421c0 into apache:master Apr 17, 2026
32 of 33 checks passed
github-actions Bot pushed a commit that referenced this pull request Apr 17, 2026
FE stream load REST logs printed full request headers, which could leak Authorization and token values into INFO logs.

Changes:
- Mask values for a small set of sensitive headers (e.g., Authorization, token) in getAllHeaders()
- Add isSensitiveHeader() helper to centralize the masking decision
github-actions Bot pushed a commit that referenced this pull request Apr 17, 2026
FE stream load REST logs printed full request headers, which could leak Authorization and token values into INFO logs.

Changes:
- Mask values for a small set of sensitive headers (e.g., Authorization, token) in getAllHeaders()
- Add isSensitiveHeader() helper to centralize the masking decision
yiguolei pushed a commit that referenced this pull request Apr 20, 2026
… (#62594)

Cherry-picked from #62108

Co-authored-by: Xin Liao <liaoxin@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants