Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2619] [Feature] Group and access configuration for sources and exposures #7750

Closed
3 tasks done
Tracked by #7979
Thrasi opened this issue Jun 1, 2023 · 7 comments
Closed
3 tasks done
Tracked by #7979
Labels
enhancement New feature or request model_groups_access Issues related to groups multi_project

Comments

@Thrasi
Copy link

Thrasi commented Jun 1, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Hi

We would like to be able to control access to sources as well as models using group access. So private sources will only be accessible from within their assigned group.
Currently this is possible for models, seeds, snapshots, tests, analysis and metrics, but not sources.
Groups
Model Access

Acceptance criteria

group and access work for sources
group works for exposures
As described here, referencing a resource outside of its supported access should raise an error (but otherwise should not).

So a private source should not be referencable outside of its group. And a private model should be referencable by an exposure inside of its group.

@Thrasi Thrasi added enhancement New feature or request triage labels Jun 1, 2023
@github-actions github-actions bot changed the title [Feature] Group and access configuration for sources [CT-2619] [Feature] Group and access configuration for sources Jun 1, 2023
@jtcohen6
Copy link
Contributor

@Thrasi Thanks for opening, and sorry for the delay getting back to you!

Could you talk to me more about your use case for "private" sources? Are these source tables containing sensitive data (PII), and is your idea to prepare them (perhaps anonymize in a staging layer) before surfacing to the rest of the project / analytics universe? If so, thinking within the realm of model access: I'm wondering if these are "private" sources, or if this is really a use case for splitting out these sources & staging models into an entirely separate project, and surfacing those as public models for use in other projects?


In the initial implementation for v1.5, we didn't include sources within model/resource groups because:

  • We've implemented this logic for ref, but not yet for source
  • Sources are already "grouped" within their parent source — a loader instead of an owner, etc
  • My belief: Guarantees that need to be enforced on a source (contract for data shape, data quality rules, etc) are more enforceable if defined on a staging model that wraps around that source. While dbt can run introspective queries against sources, it can't actually control the shape or content of the source table.

@Thrasi
Copy link
Author

Thrasi commented Jul 26, 2023

Hi! @jtcohen6 Thanks for the response!
I think you are probably right about:

this is really a use case for splitting out these sources & staging models into an entirely separate project, and surfacing those as public models for use in other projects

One of the reasons we haven't is that we want access to the full lineage graph.
Hoping for great results from this

I can still describe the situation:
Instead we use sub directories and groups. Teams own their own sub directory and group, with their own sources and models.
Through groups they can make some models public and accessible to others and keep staging models private.
In this setup, sources in one sub directory are always accessible to models in other directories.
We can limit access to them using CI. For this specific setup it may seem natural to extend the concept of groups to sources.
But then this setup may not be optimal to begin with.

@nicholasyager
Copy link
Contributor

nicholasyager commented Sep 14, 2023

Given that this has lingered for a little bit, I'll add my two cents.

Suppose you're working in a large (1000+++ models) project with dozens of analytics engineers across multiple distributed teams owning different parts of the DAG. Now, suppose that the project leverages groups and access to create interfaces to core marts that can be used between groups.

While we'd love for everyone to follow best practices, without group and access configurations on sources, there's no technical mechanism within dbt Core to prevent someone from referencing the source within a new model, fundamentally bypassing any dbt Core access controls in place.

flowchart LR

    classDef source fill:#60b826, stroke:#60b826, color:#fff
    classDef model fill:#0094b3, stroke:#0094b3, color:#fff

    classDef privateModel fill:#ffffff00, stroke:#0094b3, color:#0094b3

    style product fill:#ffffff00, stroke:#bbb
    style revenue fill:#ffffff00, stroke:#bbb
    style customer_success fill:#ffffff00, stroke:#bbb

    crm.deals:::source
    crm.customers:::source
    product.events:::source

    subgraph product
        stg_product__events:::privateModel
        user_events:::model
    end

    subgraph revenue
        stg_crm__deals:::privateModel
        stg_crm__customers:::privateModel
        deals:::model
    end

    subgraph customer_success
        report_product_outcomes:::model
    end

    crm.deals --> stg_crm__deals
    crm.customers --> stg_crm__customers

    product.events --> stg_product__events

    stg_crm__deals --> deals
    stg_crm__customers --> deals
    
    stg_product__events --> user_events

    crm.deals -- Bypass access and groups by using `source` --> report_product_outcomes
    user_events --> report_product_outcomes
Loading

Other options

Something something macros?

I'm confident that an enterprising community member could implement a macro that runs on-run-start to check metadata on the source and raise a compilation exception if cross-group source references are being made. But, this is also a bit of a kludge fix for what (in my opinion) should be part of dbt Core's functionality.

Use multiple projects

I'm wondering if [...] this is really a use case for splitting out these sources & staging models into an entirely separate project, and surfacing those as public models for use in other projects?

My primary rebuttal to this is that there is no dbt-core-specific way to actually do this!

  • If we create a project to stage our staging models and import that project as a package into our main project, the sources are also defined in the upstream package. This leaves us where we started, since sources cannot be in a group and cannot have access assigned.
  • As expressed by Jeremy Cohen, proper cross-project references and multi-project deployments are wholly in the domain of dbt Labs' proprietary software and not in-scope for dbt Core. On a philosophical note, I'd rather take the plunge and purchase dbt Cloud seats and pay for Successful Model Builds because I need dbt Cloud functionality, and not because I want to skip working around gaps in dbt Core's governance functionality.

@dbeatty10
Copy link
Contributor

dbeatty10 commented Sep 21, 2023

Okay, @Thrasi and @nicholasyager we buy it!

After discussing this further with @graciegoheen and @jtcohen6, we are aligned for adding support for groups + access on sources 👍

For consistency, we'll add it to exposures too so that all resource types can be grouped (see #8550)

We haven't seen convincing use-cases for exposures yet, so we will document as a best practice that you probably shouldn't be "exposing" a private model 🙂

@dbeatty10
Copy link
Contributor

Acceptance criteria

As described here, referencing a resource outside of its supported access should raise an error (but otherwise should not).

So a private source should not be referencable outside of its group. And a private model should be referencable by an exposure inside of its group.

@graciegoheen graciegoheen changed the title [CT-2619] [Feature] Group and access configuration for sources [CT-2619] [Feature] Group and access configuration for sources and exposures Jan 2, 2024
@graciegoheen
Copy link
Contributor

Notes from technical refinement:

  • would allow folks to set this for the source, and also for a specific table within a source
  • it's a config (enabled is the only one right now https://docs.getdbt.com/reference/source-configs), may be more complicated
  • we could split this up into 1 ticket for group, and 1 ticket for access - but we'd need to do the group piece first
  • if we are going to support public for sources, then we'd need to update the plugin and publication artifact and inject source nodes (the same way we do for public models) - is there a use case for this? we could start with only supporting private and protected for source access? do we want to allow public sources? protected is the default for other resources, but we could do it differently for sources
  • sources can be overriden in a project, if we have public sources would we then need a 3-argument {{ source ... }} ? if we don't need a cross-project source, then we should explicitly not support public sources?

@graciegoheen
Copy link
Contributor

Thanks for all of the input, I'm going to close this one in favor of an implementation ticket -> #9339

@graciegoheen graciegoheen closed this as not planned Won't fix, can't repro, duplicate, stale Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model_groups_access Issues related to groups multi_project
Projects
None yet
Development

No branches or pull requests

5 participants