Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2605] Add restrict_access option to package dependencies #7713

Closed
Tracked by #7372
jtcohen6 opened this issue May 26, 2023 · 3 comments · Fixed by #7962
Closed
Tracked by #7372

[CT-2605] Add restrict_access option to package dependencies #7713

jtcohen6 opened this issue May 26, 2023 · 3 comments · Fixed by #7962
Assignees

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 26, 2023

If the same project is specified as both a "package" dependency and "project" dependency:

  • If resolving ref to a public model in that project, prefer the installed package (source code) over publication artifact (metadata)
  • If resolving ref to non-public models, raise an error*

*How do we avoid breaking existing projects with installed modeling packages (e.g. Fivetran packages), where users regularly have single-arg ref to protected models in another installed package/project?

Options:

  1. If the project is specified in both the projects and packages dependencies, we respect the distinction between protected & public access. If it's only specified in the packages dependencies, we allow ref (two-arg or single-arg) to both protected & public models in the package.
  2. A package-level config, require_public_access. False by default to avoid breaking existing projects
@github-actions github-actions bot changed the title Create test (and handle issues) with same model in publication project and packages.yml project [CT-2605] Create test (and handle issues) with same model in publication project and packages.yml project May 26, 2023
@jtcohen6 jtcohen6 added Refinement Maintainer input needed multi_project labels May 26, 2023
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jun 9, 2023

A package-level config, require_public_access. False by default to avoid breaking existing projects

Let's do this one!

IMO differentiation between public & private access is an important part of the dbt-core language construct for model access. It should be possible to enforce this, even if you're just adopting a "mesh" paradigm via packages. It should be opt-in (for now) to avoid breaking status quo, but we might want to make this "on" by default in a future version of dbt (perhaps v2).

Let's call it enforce_access. When set to True, we should raise an error during ref resolution if a model with access: protected is being referenced across projects.

# packages.yml
packages:
  - git: https://my-org/upstream-project
    enforce_access: True  # default is False

@jtcohen6 jtcohen6 removed the Refinement Maintainer input needed label Jun 9, 2023
@jtcohen6 jtcohen6 removed their assignment Jun 9, 2023
@MichelleArk MichelleArk changed the title [CT-2605] Create test (and handle issues) with same model in publication project and packages.yml project [CT-2605] Add enforce_access option to package dependencies Jun 21, 2023
@MichelleArk
Copy link
Contributor

Digging into this more, it could be pretty tricky to get this config within PackageSpec because we'll need to map package_name values to associated PackageSpecs in ref resolution. I believe the right place for this config internally would be on the Project object so that it's accessible during ref resolution via config.dependencies. However, we don't currently do this mapping anywhere (that I could find) and it would introduce complexity to plumb configurations from the running project to project (package) parsing / initialization.

Have we considered having this configuration be something the package maintainer specifies at the project-level (i.e. in dbt_project.yml)? Are there other (current or suggested in issues) configurations that would make sense in the context of the project being installed as a package in a downstream project? Having this value defined in the package's dbt_project.yml would also enable configuring transitive dependencies with enforce_access without having to explicitly require them in packages.yml.

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jun 26, 2023

Have we considered having this configuration be something the package maintainer specifies at the project-level (i.e. in dbt_project.yml)?

The more I think about this, the more sense it makes. It's better, much better.

Considerations

  • Let's call this restrict_access instead of enforce_access. "Enforce" sounds like, Do I want to enforce access rules (including for private models) within my own project? Whereas "restrict" sounds like, Do I want to restrict others from accessing my private/protected models?
  • Still boolean, still default False
  • We don't have any other top-level project configs for "when I am installed as a package," so let's just make this one top-level. I expect the majority of users to never set it or think about it.
  • If restrict_access = True, then any reference MUST be to a model with access: public, and (ideally) must also use two-argument ref.
  • I expect Hub modeling packages, e.g. Fivetran's, to never set restrict_access = True. Why? When I'm installing an OSS package maintained by someone else, it's fairly common that I need to reimplement some logic from that package, by disabling the model in the package and creating another model with the same name in my root project. This should still be possible, even if that model is selecting from private/protected models in the package, so long as restrict_access = False for that package.

This should still be possible, even if that model is selecting from private/protected models in the package

Confirmed that this "override" is possible, even when private models are in the mix.

My package:

# mypkg/models/schema.yml
groups:
  - name: mygroup
    owner:
      name: jerco

models:
  - name: my_pkg_private_model
    config:
      group: mygroup
    access: private
-- mypkg/models/private_model.sql
select 1 as id
-- mypkg/models/protected_model.sql
{{ config(group = "mygroup") }}
select * from {{ ref("private_model") }}
# mypkg/dbt_project.yml
name: mypkg

My root project:

# dbt_project.yml
name: my_dbt_project
profile: <profile>

models:
  mypkg:
    protected_model:
      +enabled: false
# dependencies.yml
packages:
  - local: mypkg
-- models/protected_model.sql
{{ config(group = "mygroup") }}
-- my override
select * from {{ ref("private_model") }}
$ dbt compile -s protected_model
...
11:40:47  Compiled node 'protected_model' is:

-- my override
select * from "jerco"."dbt_jcohen"."private_model"

Following the proposal in this issue, if mypkg were to set restrict_access: True, that ref should start raising an error because my_pkg_private_model does not have access: public.

@MichelleArk MichelleArk changed the title [CT-2605] Add enforce_access option to package dependencies [CT-2605] Add restrict_access option to package dependencies Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants