Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40688][SQL] Support data masking built-in function 'mask_first_n' #39449

Closed
wants to merge 1 commit into from

Conversation

vinodkc
Copy link
Contributor

@vinodkc vinodkc commented Jan 7, 2023

What changes were proposed in this pull request?

This PR supports data masking built-in Function mask_first_n,

Returns a masked version of str with the first n values masked.
By default, first 4 characters will be masked, upper case letters will be converted to "X", lower case letters will be converted to "x" and numbers will be converted to "n".

For example, mask_first_n("1234-5678-8765-4321", 4) results in nnnn-5678-8765-4321.
Can override the characters used in the mask by supplying additional arguments: second argument control the number of charatcters to be masked, the third argument controls the mask character for upper case letters, the fourth argument for lower case letters and the fifth argument for numbers.

For example, mask_first_n("abcd-EFGH-8765-4321-abcd-EFGH-8765-4321", 19, "U", "l", "#") will return
llll-UUUU-####-####-abcd-EFGH-8765-4321

Examples:

  > SELECT _FUNC_('abcd-EFGH-8765-4321');
    xxxx-EFGH-8765-4321
  > SELECT _FUNC_('abcd-EFGH-8765-4321', 9);
    xxxx-XXXX-8765-4321
  > SELECT _FUNC_('abcd-EFGH-8765-@$#', 14);
    xxxx-XXXX-nnnn-@$#
  > SELECT _FUNC_('abcd-EFGH-8765-@$#', 15, 'x', 'X', 'n', 'o');
    XXXXoxxxxonnnno@$#
  > SELECT _FUNC_('abcd-EFGH-8765-@$#', 20, 'x', 'X', 'n', 'o');
    XXXXoxxxxonnnnoooo
  > SELECT _FUNC_('AbCD123-@$#', 10,'Q', 'q', 'd', 'o');
    QqQQdddooo#
  > SELECT _FUNC_('AbCD123-@$#', 10, -1, 'q', 'd', 'o');
    AqCDdddooo#
  > SELECT _FUNC_('AbCD123-@$#', 10, -1, -1, 'd', 'o');
    AbCDdddooo#
  > SELECT _FUNC_('AbCD123-@$#', 10, -1, -1, -1, 'o');
    AbCD123ooo#
  > SELECT _FUNC_(NULL);
    NULL
  > SELECT _FUNC_(NULL, 1, -1, -1, 'o');
    NULL

Why are the changes needed?

To support data masking built-in function mask_first_n, which returns a masked version of the input string
Ref : Data masking functions

Does this PR introduce any user-facing change?

Yes, added a new build-in function named 'mask_first_n'

How was this patch tested?

Added test cases

@github-actions github-actions bot added the SQL label Jan 7, 2023
@vinodkc vinodkc force-pushed the br_udf_mask_first_n branch 5 times, most recently from 66994b5 to 420c4ed Compare January 10, 2023 05:23
@vinodkc
Copy link
Contributor Author

vinodkc commented Jan 10, 2023

@gengliangwang Can you please review this PR ?

@vinodkc vinodkc changed the title [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n' [WIP][SPARK-40688][SQL] Support data masking built-in function 'mask_first_n' Jan 11, 2023
@vinodkc vinodkc changed the title [WIP][SPARK-40688][SQL] Support data masking built-in function 'mask_first_n' [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n' Jan 22, 2023
@vinodkc
Copy link
Contributor Author

vinodkc commented Jan 22, 2023

@dtenedor @srielau , Could please review this PR?

@dtenedor
Copy link
Contributor

This is very helpful. Thank you @vinodkc for working on this. We will take a look.

@vinodkc vinodkc force-pushed the br_udf_mask_first_n branch 3 times, most recently from 12c09e8 to b162820 Compare January 25, 2023 17:54
@dtenedor
Copy link
Contributor

@vinodkc please ping again when you're ready for another review.

@vinodkc
Copy link
Contributor Author

vinodkc commented Jan 25, 2023

@dtenedor , Thank you for the code review comments, I applied the suggested code changes. Can you please do another review

@vinodkc vinodkc force-pushed the br_udf_mask_first_n branch 2 times, most recently from 33f35fc to 47850ba Compare January 26, 2023 22:25
Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Thanks again for doing the work to implement this functionality!

@dtenedor
Copy link
Contributor

@gengliangwang do you want to take a look and possibly help with merging this once we feel all the reviewers have approved?

@vinodkc
Copy link
Contributor Author

vinodkc commented Feb 8, 2023

@gengliangwang Could please review this PR?

@vinodkc vinodkc force-pushed the br_udf_mask_first_n branch 3 times, most recently from 8f6122f to f7d8359 Compare February 9, 2023 19:18
@dtenedor
Copy link
Contributor

@cloud-fan would you mind helping review this as well? It LGTM as of now.

@vinodkc
Copy link
Contributor Author

vinodkc commented Jun 2, 2023

@cloud-fan Would please help review this PR?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Sep 11, 2023
@github-actions github-actions bot closed this Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants