-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40688][SQL] Support data masking built-in function 'mask_first_n' #39449
Conversation
66994b5
to
420c4ed
Compare
@gengliangwang Can you please review this PR ? |
420c4ed
to
416f55d
Compare
This is very helpful. Thank you @vinodkc for working on this. We will take a look. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Outdated
Show resolved
Hide resolved
12c09e8
to
b162820
Compare
@vinodkc please ping again when you're ready for another review. |
b162820
to
4e9e587
Compare
@dtenedor , Thank you for the code review comments, I applied the suggested code changes. Can you please do another review |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
Outdated
Show resolved
Hide resolved
33f35fc
to
47850ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Thanks again for doing the work to implement this functionality!
@gengliangwang do you want to take a look and possibly help with merging this once we feel all the reviewers have approved? |
@gengliangwang Could please review this PR? |
8f6122f
to
f7d8359
Compare
f7d8359
to
ba8cb64
Compare
@cloud-fan would you mind helping review this as well? It LGTM as of now. |
@cloud-fan Would please help review this PR? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR supports data masking built-in Function mask_first_n,
Returns a masked version of str with the first n values masked.
By default, first 4 characters will be masked, upper case letters will be converted to "X", lower case letters will be converted to "x" and numbers will be converted to "n".
For example, mask_first_n("1234-5678-8765-4321", 4) results in nnnn-5678-8765-4321.
Can override the characters used in the mask by supplying additional arguments: second argument control the number of charatcters to be masked, the third argument controls the mask character for upper case letters, the fourth argument for lower case letters and the fifth argument for numbers.
For example, mask_first_n("abcd-EFGH-8765-4321-abcd-EFGH-8765-4321", 19, "U", "l", "#") will return
llll-UUUU-####-####-abcd-EFGH-8765-4321
Examples:
Why are the changes needed?
To support data masking built-in function mask_first_n, which returns a masked version of the input string
Ref : Data masking functions
Does this PR introduce any user-facing change?
Yes, added a new build-in function named 'mask_first_n'
How was this patch tested?
Added test cases