Skip to content

[SPARK-57262][SQL][WEBUI] Job description derived from a query should respect spark.sql.redaction.string.regex#56326

Closed
sarutak wants to merge 2 commits into
apache:masterfrom
sarutak:fix-redact-sql-description
Closed

[SPARK-57262][SQL][WEBUI] Job description derived from a query should respect spark.sql.redaction.string.regex#56326
sarutak wants to merge 2 commits into
apache:masterfrom
sarutak:fix-redact-sql-description

Conversation

@sarutak
Copy link
Copy Markdown
Member

@sarutak sarutak commented Jun 4, 2026

What changes were proposed in this pull request?

This PR changes SparkSQLDriver.scala to redact a query before setJobDescription.

Why are the changes needed?

In the current implementation, redaction is done in SQLExecution.scala so the description in the table on the top of /SQL/execution is redacted.
sql-execution-page-top-table

But the description in the table on the /jobs page and the one in the table on the bottom of /SQL/execution page are not redacted.
jobs-page-before
sql-execution-page-before

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Added new test.
Also confirmed descriptions are redacted in UI.

$ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
spark-sql (default)>  CREATE TABLE test1(secret string);
spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
jobs-page-after sql-execution-page-after

Was this patch authored or co-authored using generative AI tooling?

Kiro CLI / Claude

@sarutak sarutak changed the title [SPARK-57262][SQL] Job description derived from a query should respect spark.sql.redaction.string.regex [SPARK-57262][SQL][WEBUI] Job description derived from a query should respect spark.sql.redaction.string.regex Jun 4, 2026
@dongjoon-hyun
Copy link
Copy Markdown
Member

cc @cloud-fan , too

@sarutak sarutak closed this in 583e5bb Jun 6, 2026
sarutak added a commit that referenced this pull request Jun 6, 2026
… respect `spark.sql.redaction.string.regex`

### What changes were proposed in this pull request?
This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`.

### Why are the changes needed?
In the current implementation, redaction is done in `SQLExecution.scala` so the description in the table on the top of `/SQL/execution` is redacted.
<img width="1083" height="349" alt="sql-execution-page-top-table" src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda" />

But the description in the table on the `/jobs` page and the one in the table on the bottom of `/SQL/execution` page are not redacted.
<img width="525" height="692" alt="jobs-page-before" src="https://github.com/user-attachments/assets/0a5a8ce8-e4be-4669-bd7d-a6c62fe316ca" />
<img width="515" height="274" alt="sql-execution-page-before" src="https://github.com/user-attachments/assets/bd0406cc-5b0b-40a0-96c4-9f9fa1aa048a" />

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
Added new test.
Also confirmed descriptions are redacted in UI.
```
$ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
spark-sql (default)>  CREATE TABLE test1(secret string);
spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
```
<img width="852" height="690" alt="jobs-page-after" src="https://github.com/user-attachments/assets/8e28e37e-369f-479c-9711-999b431756db" />
<img width="598" height="272" alt="sql-execution-page-after" src="https://github.com/user-attachments/assets/cb734556-619b-45c6-a7f6-d52e60132aff" />

### Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Claude

Closes #56326 from sarutak/fix-redact-sql-description.

Authored-by: Kousuke Saruta <sarutak@amazon.co.jp>
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
(cherry picked from commit 583e5bb)
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
sarutak added a commit that referenced this pull request Jun 6, 2026
… respect `spark.sql.redaction.string.regex`

### What changes were proposed in this pull request?
This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`.

### Why are the changes needed?
In the current implementation, redaction is done in `SQLExecution.scala` so the description in the table on the top of `/SQL/execution` is redacted.
<img width="1083" height="349" alt="sql-execution-page-top-table" src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda" />

But the description in the table on the `/jobs` page and the one in the table on the bottom of `/SQL/execution` page are not redacted.
<img width="525" height="692" alt="jobs-page-before" src="https://github.com/user-attachments/assets/0a5a8ce8-e4be-4669-bd7d-a6c62fe316ca" />
<img width="515" height="274" alt="sql-execution-page-before" src="https://github.com/user-attachments/assets/bd0406cc-5b0b-40a0-96c4-9f9fa1aa048a" />

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
Added new test.
Also confirmed descriptions are redacted in UI.
```
$ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
spark-sql (default)>  CREATE TABLE test1(secret string);
spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
```
<img width="852" height="690" alt="jobs-page-after" src="https://github.com/user-attachments/assets/8e28e37e-369f-479c-9711-999b431756db" />
<img width="598" height="272" alt="sql-execution-page-after" src="https://github.com/user-attachments/assets/cb734556-619b-45c6-a7f6-d52e60132aff" />

### Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Claude

Closes #56326 from sarutak/fix-redact-sql-description.

Authored-by: Kousuke Saruta <sarutak@amazon.co.jp>
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
(cherry picked from commit 583e5bb)
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
@sarutak
Copy link
Copy Markdown
Member Author

sarutak commented Jun 6, 2026

Merged to master/branch-4.x/branch-4.2. Will open backport-PRs for other branches later.
Thank you @dongjoon-hyun for reviewing!

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, very sorry, @sarutak .

I found a critical regression with the following use cases. For SQL plan executions without SparkSQLDriver, this PR lost the security redaction. For the record, Hive JDBC and Spark Connect has its own redactions, so they are not affected by this regression.

$ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*"
scala> val s = "SELECT * FROM (SELECT 'secret=1')"
scala> sc.setJobDescription(s)
scala> sql(s).show()
+--------+
|secret=1|
+--------+
|secret=1|
+--------+

BEFORE (4.2.0-preview5)

Image

AFTER (this PR)
Image

Let me revert this commit, @sarutak , because we are under security scan in these days as you know in the security channel.

@dongjoon-hyun
Copy link
Copy Markdown
Member

This is reverted from master/4.x/4.2.

@dongjoon-hyun
Copy link
Copy Markdown
Member

dongjoon-hyun commented Jun 6, 2026

For the record, the above usage is a common use case which we need to consider.

@dongjoon-hyun
Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants