Skip to content

[SPARK-36510][DOCS] Add spark.redaction.string.regex property to the docs#33740

Closed
dnskr wants to merge 2 commits intoapache:masterfrom
dnskr:docs/SPARK-36510
Closed

[SPARK-36510][DOCS] Add spark.redaction.string.regex property to the docs#33740
dnskr wants to merge 2 commits intoapache:masterfrom
dnskr:docs/SPARK-36510

Conversation

@dnskr
Copy link

@dnskr dnskr commented Aug 13, 2021

What changes were proposed in this pull request?

The PR fixes SPARK-36510 by adding missing spark.redaction.string.regex property to the docs

Why are the changes needed?

The property referred by spark.sql.redaction.string.regex description as its default value

Does this PR introduce any user-facing change?

No

How was this patch tested?

Not needed for docs

…docs

Signed-off-by: dnskr <dnskrv88@gmail.com>
@github-actions github-actions bot added the DOCS label Aug 13, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

<td>
Regex to decide which parts of strings produced by Spark contain sensitive
information. When this regex matches a string part, that string part is replaced by a
dummy value. This is currently used to redact the output of SQL explain commands.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sounds like it's currently only used by SQL's config fallback value, and we already documented spark.sql.redaction.string.regex. I wouldn't document this one alone for now as it's unlikely set by users at this moment.

Copy link
Author

@dnskr dnskr Aug 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Looks like STRING_REDACTION_PATTERN (spark.redaction.string.regex) usage has been removed in commit#2831571 in Dec 19, 2017. So it is not used in source code anymore rather than fallback value for spark.sql.redaction.string.regex property.

I found having this property in spark.sql.redaction.string.regex description confusing because it creates a feeling that spark.redaction.string.regex is needed to redact sensetive data in non-SQL places. It might confuse others as well.
Would it be a good idea to remove it from spark.sql.redaction.string.regex description to avoid this misunderstanding? Also in source code we can mark spark.redaction.string.regex as Deprecated or just add a note that it is not supposed to be set by users.
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure on this for now. If we happen to add the redaction on other places not in core, spark.redaction.string.regex could make sense. We won't necessarily have to deprecate it - I would just prefer to leave it undocumented because semantically it looks fine now although the configuration is virtually useless at this moment.

@dnskr dnskr requested a review from HyukjinKwon August 21, 2021 12:16
@github-actions
Copy link

github-actions bot commented Dec 1, 2021

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 1, 2021
@github-actions github-actions bot closed this Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants