-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-383] [Feature] Support Dynamic Data Masking in CTAS Statements #85
Comments
This has also been brought up in the slack community a few times, adding links for reference: |
@jtcohen6 I'm sorry if we already discussed this... but is that something on your radar for contracts? |
Thanks @jdoldis for the write-up, you make a great case for it :) |
Thanks @Fleid , I'm writing a custom materialization to support the syntax as described above. Let me know if you'd like those changes contributed here. Otherwise, I look forward to seeing it at some point down the road 🙂 |
I have the exact same requirement, but for row access policies ! I think this can be implemented in one go, as it share the same CTAS syntax CTAS:
Config with masking and row access policies could like this:
Currently the only option is via post_hook Since one has to check the information_schema first if that relation (view or table) already has said RAP applied, because otherwise the ALTER command fails, this whole process can take up to 10 seconds [Edit: that was mainly due to a |
@jdoldis Would you release your custom materialization as a package, until this gets implemented into dbt-snowflake? I'd be very interested as well! |
Hey @ingolevin, the materialisation I wrote is essentially a copy paste of the standard v1.5 table materialisation. The difference is I have modified the table_columns_and_constraints macro to support masking policies. In this modified macro I build the ddl by looping through the model columns and outputting the name/datatype, and then adding |
Since I raised this it seems the ddl logic has moved around a bit, the relevant code that could be modified in the Snowflake adapter to support masking policies would now be this function I think. |
Regarding creating a package, it would be good to hear back from @Fleid first. Ideally we could implement here, but if that's not possible I would be open to it 🙂 |
I'm overdue responding here! It's true that, starting in v1.5, for models with enforced contracts, dbt will be able to template out That's the prerequisite to defining row-level access policies & column-level masking policies while the table is being created, rather than via an I hadn't had row-level & column-level access/masking policies in scope for For the moment, it would be possible to stand this up via some macro overrides. (Maybe these policies could even be a constraint of |
Sounds great @jtcohen6 , let me know if I can help! |
I'm following up here after a good chat with @graciegoheen @dbeatty10 @dataders, given the prompt to support similar functionality in
I think the good implementation of this functionality would look like:
*These map to DWH objects, so they are members of the DAG, and models / other functions could call (= depend on) them. I don't think these are models because they aren't (1) is a bigger lift than (2), and it's not something we have the capacity to prioritize right now — but in the meantime, I've asked @dataders to do a bit more thinking about what a good UX might look like :) |
Databricks also allows data masking in CTA as well!
|
Moving this feature request to the dbt-adapters repo for further refinement since the underlying functionality is supported on many cloud data warehouses now (Redshift, Snowflake (Enterprise only), BigQuery, Databricks, Azure, etc.) |
Is this your first time submitting a feature request?
Describe the feature
Currently you cannot specify column masking policies in Snowflake
CTAS
statements with dbt-snowflake. For example,CREATE TABLE <table_name>(<col_name> <col_type> WITH MASKING POLICY <policy_name>) AS SELECT <query>
.As a workaround masking policies can be applied to columns in a dbt post hook using an
ALTER TABLE
statement. The issue with doing this is that theCTAS
andALTER TABLE
statements cannot be issued in the same transaction, as per the Snowflake documentation - "Each DDL statement executes as a separate transaction". As a result there is there is a small window of time between theCTAS
andALTER TABLE <table_name> MODIFY COLUMN <column_name> SET MASKING POLICY <policy_name>
statements where the data is not masked, and if theALTER TABLE
statement fails it would remain that way.Supporting masking policy specification in the CTAS statement would fix this. As per the Snowflake documentation "Executing a CREATE TABLE … AS SELECT (CTAS) statement applies any masking policies on columns included in the statement before the data is populated in the new table".
It also may not be difficult to support this given the recent work on model contracts which provides the
CREATE TABLE <table_name>(<col_name> <col_type>) AS SELECT <query>
syntax. All that would need to be added is theWITH MASKING POLICY <policy_name>
part of the statement.One way to provide the config would be something like:
The masking policy could then be applied in get_columns_spec_ddl.
Describe alternatives you've considered
Using an
ALTER TABLE
statement in a post hook to apply the masking policy. As described above due to Snowflake DDL statements always being executed in separate transactions this leaves the possibility of unmasked data.Who will this benefit?
Anyone that wants to take advantage of dynamic data masking in Snowflake using dbt.
Are you interested in contributing this feature?
Yes
Anything else?
No response
The text was updated successfully, but these errors were encountered: