[SPARK-36510][DOCS] Add spark.redaction.string.regex property to the docs#33740
[SPARK-36510][DOCS] Add spark.redaction.string.regex property to the docs#33740dnskr wants to merge 2 commits intoapache:masterfrom
Conversation
…docs Signed-off-by: dnskr <dnskrv88@gmail.com>
|
Can one of the admins verify this patch? |
| <td> | ||
| Regex to decide which parts of strings produced by Spark contain sensitive | ||
| information. When this regex matches a string part, that string part is replaced by a | ||
| dummy value. This is currently used to redact the output of SQL explain commands. |
There was a problem hiding this comment.
Hm, sounds like it's currently only used by SQL's config fallback value, and we already documented spark.sql.redaction.string.regex. I wouldn't document this one alone for now as it's unlikely set by users at this moment.
There was a problem hiding this comment.
Yes, you are right. Looks like STRING_REDACTION_PATTERN (spark.redaction.string.regex) usage has been removed in commit#2831571 in Dec 19, 2017. So it is not used in source code anymore rather than fallback value for spark.sql.redaction.string.regex property.
I found having this property in spark.sql.redaction.string.regex description confusing because it creates a feeling that spark.redaction.string.regex is needed to redact sensetive data in non-SQL places. It might confuse others as well.
Would it be a good idea to remove it from spark.sql.redaction.string.regex description to avoid this misunderstanding? Also in source code we can mark spark.redaction.string.regex as Deprecated or just add a note that it is not supposed to be set by users.
What do you think?
There was a problem hiding this comment.
I am not sure on this for now. If we happen to add the redaction on other places not in core, spark.redaction.string.regex could make sense. We won't necessarily have to deprecate it - I would just prefer to leave it undocumented because semantically it looks fine now although the configuration is virtually useless at this moment.
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
The PR fixes SPARK-36510 by adding missing
spark.redaction.string.regexproperty to the docsWhy are the changes needed?
The property referred by
spark.sql.redaction.string.regexdescription as its default valueDoes this PR introduce any user-facing change?
No
How was this patch tested?
Not needed for docs