Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ct-581 grants as configs #5230

Merged
merged 7 commits into from
May 19, 2022
Merged

Ct-581 grants as configs #5230

merged 7 commits into from
May 19, 2022

Conversation

gshank
Copy link
Contributor

@gshank gshank commented May 11, 2022

resolves #5189

Description

Add "grants" attribute to NodeConfig. Fix up all the tests.

This also contains a fix for a partial parsing error that was encountered when writing the test.

Checklist

@gshank gshank requested review from a team as code owners May 11, 2022 00:52
@cla-bot cla-bot bot added the cla:yes label May 11, 2022
@gshank gshank marked this pull request as draft May 11, 2022 00:52
@gshank gshank requested a review from jtcohen6 May 11, 2022 00:53
@gshank
Copy link
Contributor Author

gshank commented May 11, 2022

@jtcohen6 We weren't really clear on whether we want to limit the keys that can be contained in the 'grants' dictionary. I went forward assuming that people might want to create and support their own particular keys, and also that using something really specific like "select" would feel wrong on some warehouses. Or would you prefer to limit the legal keys in the grant dictionary?

I used "my_select" in the test instead of "select" because the test would probably break when we actually implement a "select" grant macro.

@gshank
Copy link
Contributor Author

gshank commented May 11, 2022

I also noticed that the jsonschema artifact test doesn't fail when we add a new attribute to NodeConfig, probably because of the really clever things we do to allow random keys in configs. I would have updated the jsonschemas, but I'm not sure whether we want to update to a v6 jsonschema now (and keep updating as things change) or do later.

@gshank
Copy link
Contributor Author

gshank commented May 11, 2022

The partial parsing error was that if you have config in a SQL file that's been parsed and add a new schema file that also has config, the config from the SQL file will get lost because I wasn't saving the 'config_call_dict' in the serialized manifest. I switched to saving that dictionary and deleting config_call_dict for the manifest.json instead. Updates to the schema file will work correctly. Any update to either file would correct the error.

This feels like an edge case and possibly not worth doing as a backport, but if you'd like it backported let me know and I can separate it out.

model_config = model.config
assert hasattr(model_config, "grants")

expected = {"my_select": ["reporter", "bi", "other_user"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be some mechanism to remove grants that have been applied at a broader level of the configuration hierarchy.

So: I think we need other_user to clobber the existing values of the my_select key (["reporter", "bi"]), rather than append to the list of recipients for the my_select privilege.

The ideal case is one in which users have some control over this merging behavior. Is now a good time to revisit the discussion in #4108? (Even if we don't / can't actually enable that now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I'm not sure how we could do the one with the '+=' and '-=' syntax. Might need our own jinja parser?

One alternative would be to encode the behavior in the key, like '+select'. Of course we already heavily use that for a different purpose... We could also do something more complicated with having the value of the grant key ("select") be either a list (clobbering) or a dictionary like: "select": { "override": ["bi"]} or "select": {"merge": ["bi"]}

I can switch to using MergeBehavior.Update, which would replace selects with the more specific versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm into the + behavior! We'll need testing for its various permutations, but I like how it does the sensible thing by default (clobber—fewer grants is safer), while preserving flexibility for the more-advanced end user.

I think by far the hardest part here might be writing the docs for this in a way that doesn't totally confuse people who are already stymied by + in dbt_project.yml: https://docs.getdbt.com/reference/resource-configs/plus-prefix

I left some more "where could we go from here" thoughts about this in #4108 (comment), which are out of scope for the changes in this PR. The only thing worth considering now is whether we should support + on the grants dict key, or the values of those keys, or both.

@jtcohen6
Copy link
Contributor

I went forward assuming that people might want to create and support their own particular keys, and also that using something really specific like "select" would feel wrong on some warehouses. Or would you prefer to limit the legal keys in the grant dictionary?

I agree with your working assumption. I wouldn't want to have to update this every time a warehouse released a new privilege type. It's also infeasible on BigQuery, which supports tons of privileges, and custom privileges.

I would have updated the jsonschemas, but I'm not sure whether we want to update to a v6 jsonschema now (and keep updating as things change) or do later.

Worth thinking about:

  • Pairing this with the short-term salve for users of Slim CI, proposed in [CT-599] [Bug] Upgrading from 1.0 to 1.1 breaks defer/state #5213 (comment), for artifact updates that actually have backwards-compatible changes
  • Whether we actually need to update the artifact version for non-breaking changes, such as an additional default value in node.config (which can already contain any additional properties)

Nice catch on the partial parsing error! I haven't heard of someone running into this yet, so I'm happy to keep it as a forthcoming bug fix for now, without the backport.

@gshank
Copy link
Contributor Author

gshank commented May 11, 2022

I changed to default of clobbering the grants with the ability of extending them if the grant key starts with '+'. Take a look and see what you think.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming along swimmingly!

Comment on lines +452 to +454
grants: Dict[str, Any] = field(
default_factory=dict, metadata=MergeBehavior.DictKeyAppend.meta()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related tech debt that we discussed yesterday, which is out of scope for this PR, opened as a new issue: #5236

model_config = model.config
assert hasattr(model_config, "grants")

expected = {"my_select": ["reporter", "bi", "other_user"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm into the + behavior! We'll need testing for its various permutations, but I like how it does the sensible thing by default (clobber—fewer grants is safer), while preserving flexibility for the more-advanced end user.

I think by far the hardest part here might be writing the docs for this in a way that doesn't totally confuse people who are already stymied by + in dbt_project.yml: https://docs.getdbt.com/reference/resource-configs/plus-prefix

I left some more "where could we go from here" thoughts about this in #4108 (comment), which are out of scope for the changes in this PR. The only thing worth considering now is whether we should support + on the grants dict key, or the values of those keys, or both.

raise InternalException(f"expected dict, got {other_value}")
new_dict = {}
for key in self_value.keys():
new_dict[key] = _listify(self_value[key])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _listify might make sense in all cases. Consider:

{{ config(grants = {'select': 'model_level'}) }}

Shows up as expected: "grants": {"select": ["model_level"]}

Whereas:

{{ config(grants = {'+select': 'model_level'}) }}

Shows up as: "grants": {"select": ["project_level", "m", "o", "d", "e", "l", "_", "l", "e", "v", "e", "l"]}, when I would expect "grants": {"select": ["model_level, project_level"]}, i.e. the same as what I get if I just wrap it in [].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wish Python hadn't chosen to treat strings as sequences... So yeah, listify in all cases would be better.

@gshank gshank marked this pull request as ready for review May 16, 2022 21:41
@gshank gshank requested a review from emmyoop May 16, 2022 21:41
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Left some clearify questions.
To sum up, this PR

  1. added the grant option in config
  2. added the merge logic for dict with list as values
  3. moved where we remove the config_call_dict.(I asked some questions about why we move it)

Is the sum up correct?

config_call_dict[k].extend(v)
elif k in config_call_dict and isinstance(config_call_dict[k], dict):
config_call_dict[k].update(v)
if k in config_call_dict and isinstance(config_call_dict[k], list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need the second part of the check? looks like the current behavior is going to be if it is not a dictionary we would just overwrite config_call_dict[k]. Could this cause misterious behavior in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would cause any problems, but you're right that it's unnecessary. I've removed the isinstance check.

elif k in BaseConfig.mergebehavior["dict_key_append"]:
if not isinstance(v, dict):
raise InternalException(f"expected dict, got {v}")
if k in config_call_dict and isinstance(config_call_dict[k], dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the isinstance check. It ought to be unnecessary.

@@ -1135,6 +1135,12 @@ class WritableManifest(ArtifactMixin):
)
)

def __post_serialize__(self, dct):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we move this logic from parsed.py to here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem we were running into with the partial parsing bug was that the config_call_dict wasn't being saved with the partial parsing manifest. So when a new schema file was added the config was being applied to the existing model node, but without the config_call_dict we lose the config from the SQL file.

There were two options to fix it, 1) save the config_call_dict or 2) every time we add a new schema file reload all of the nodes that might be affected. The second option would complicate partial parsing substantially and felt more inconsistent. The config_call_dict wasn't being saved to start with only because my imagination did not think about this case. I clean it up in the WritableManifest post_serialize method because we don't really want or need it in the manifest artifact.

@gshank
Copy link
Contributor Author

gshank commented May 19, 2022

@ChenyuLInx Your sum up is correct.

@gshank gshank requested a review from ChenyuLInx May 19, 2022 16:49
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gshank gshank merged commit e50678c into main May 19, 2022
@gshank gshank deleted the ct-581-grants_as_configs branch May 19, 2022 18:50
agoblet pushed a commit to BigDataRepublic/dbt-core that referenced this pull request May 20, 2022
* Handle 'grants' in NodeConfig, with correct merge behavior

* Fix a bunch of tests

* Add changie

* Actually add the test

* Change to default replace of grants with '+' extending them

* Additional tests, fix config_call_dict handling

* Tweak _add_config_call to remove unnecessary isinstance checks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-581] [Feature] Grants as Node Configs
3 participants