[SPARK-57262][SQL][WEBUI] Job description derived from a query should respect spark.sql.redaction.string.regex#56326
[SPARK-57262][SQL][WEBUI] Job description derived from a query should respect spark.sql.redaction.string.regex#56326sarutak wants to merge 2 commits into
spark.sql.redaction.string.regex#56326Conversation
spark.sql.redaction.string.regexspark.sql.redaction.string.regex
…act-sql-description
|
cc @cloud-fan , too |
… respect `spark.sql.redaction.string.regex` ### What changes were proposed in this pull request? This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`. ### Why are the changes needed? In the current implementation, redaction is done in `SQLExecution.scala` so the description in the table on the top of `/SQL/execution` is redacted. <img width="1083" height="349" alt="sql-execution-page-top-table" src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda" /> But the description in the table on the `/jobs` page and the one in the table on the bottom of `/SQL/execution` page are not redacted. <img width="525" height="692" alt="jobs-page-before" src="https://github.com/user-attachments/assets/0a5a8ce8-e4be-4669-bd7d-a6c62fe316ca" /> <img width="515" height="274" alt="sql-execution-page-before" src="https://github.com/user-attachments/assets/bd0406cc-5b0b-40a0-96c4-9f9fa1aa048a" /> ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Added new test. Also confirmed descriptions are redacted in UI. ``` $ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*" spark-sql (default)> CREATE TABLE test1(secret string); spark-sql (default)> SELECT * FROM test1 WHERE secret=1; ``` <img width="852" height="690" alt="jobs-page-after" src="https://github.com/user-attachments/assets/8e28e37e-369f-479c-9711-999b431756db" /> <img width="598" height="272" alt="sql-execution-page-after" src="https://github.com/user-attachments/assets/cb734556-619b-45c6-a7f6-d52e60132aff" /> ### Was this patch authored or co-authored using generative AI tooling? Kiro CLI / Claude Closes #56326 from sarutak/fix-redact-sql-description. Authored-by: Kousuke Saruta <sarutak@amazon.co.jp> Signed-off-by: Kousuke Saruta <sarutak@apache.org> (cherry picked from commit 583e5bb) Signed-off-by: Kousuke Saruta <sarutak@apache.org>
… respect `spark.sql.redaction.string.regex` ### What changes were proposed in this pull request? This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`. ### Why are the changes needed? In the current implementation, redaction is done in `SQLExecution.scala` so the description in the table on the top of `/SQL/execution` is redacted. <img width="1083" height="349" alt="sql-execution-page-top-table" src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda" /> But the description in the table on the `/jobs` page and the one in the table on the bottom of `/SQL/execution` page are not redacted. <img width="525" height="692" alt="jobs-page-before" src="https://github.com/user-attachments/assets/0a5a8ce8-e4be-4669-bd7d-a6c62fe316ca" /> <img width="515" height="274" alt="sql-execution-page-before" src="https://github.com/user-attachments/assets/bd0406cc-5b0b-40a0-96c4-9f9fa1aa048a" /> ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Added new test. Also confirmed descriptions are redacted in UI. ``` $ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*" spark-sql (default)> CREATE TABLE test1(secret string); spark-sql (default)> SELECT * FROM test1 WHERE secret=1; ``` <img width="852" height="690" alt="jobs-page-after" src="https://github.com/user-attachments/assets/8e28e37e-369f-479c-9711-999b431756db" /> <img width="598" height="272" alt="sql-execution-page-after" src="https://github.com/user-attachments/assets/cb734556-619b-45c6-a7f6-d52e60132aff" /> ### Was this patch authored or co-authored using generative AI tooling? Kiro CLI / Claude Closes #56326 from sarutak/fix-redact-sql-description. Authored-by: Kousuke Saruta <sarutak@amazon.co.jp> Signed-off-by: Kousuke Saruta <sarutak@apache.org> (cherry picked from commit 583e5bb) Signed-off-by: Kousuke Saruta <sarutak@apache.org>
|
Merged to |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Oh, very sorry, @sarutak .
I found a critical regression with the following use cases. For SQL plan executions without SparkSQLDriver, this PR lost the security redaction. For the record, Hive JDBC and Spark Connect has its own redactions, so they are not affected by this regression.
$ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*"
scala> val s = "SELECT * FROM (SELECT 'secret=1')"
scala> sc.setJobDescription(s)
scala> sql(s).show()
+--------+
|secret=1|
+--------+
|secret=1|
+--------+
BEFORE (4.2.0-preview5)
Let me revert this commit, @sarutak , because we are under security scan in these days as you know in the security channel.
|
This is reverted from master/4.x/4.2. |
|
For the record, the above usage is a common use case which we need to consider. |
|
I made a test case addition PR to prevent a regression. |

What changes were proposed in this pull request?
This PR changes
SparkSQLDriver.scalato redact a query beforesetJobDescription.Why are the changes needed?
In the current implementation, redaction is done in

SQLExecution.scalaso the description in the table on the top of/SQL/executionis redacted.But the description in the table on the


/jobspage and the one in the table on the bottom of/SQL/executionpage are not redacted.Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
Added new test.
Also confirmed descriptions are redacted in UI.
Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Claude