Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] copy_grants option for seeds (for use in Snowflake Data Share) #1030

Open
2 tasks done
sarah-tuva opened this issue May 9, 2024 · 4 comments
Open
2 tasks done
Labels
enhancement New feature or request triage

Comments

@sarah-tuva
Copy link

Is this a new bug in dbt-snowflake?

  • I believe this is a new bug in dbt-snowflake
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

We added the copy_grants: true config to our dbt_project.yml. We are setting this option for the base project and a package. We also have a data share set up in Snowflake to share the tables created by the dbt project with another Snowflake account.

models:
  +copy_grants: true
  the_tuva_project:
    +copy_grants: true

seeds:
  +copy_grants: true
  the_tuva_project:
    terminology:
      +copy_grants: true
    value_sets:
      +copy_grants: true

(note: we also tried just setting the copy_grants variable at the highest levels of the models and seeds.)

When we run dbt build --full-refresh, the model tables are still shared, but the seed tables become unshared (i.e. the seed tables become unchecked in the Snowflake data share settings and we have to reshare them). When we run dbt build, we do not see this problem.

Expected Behavior

Both the model and seed tables should be fully refreshed and stay shared in the Snowflake data share. We should not have to re-share them each time we run dbt build --full-refresh.

Steps To Reproduce

  1. Add the copy_grants config to the dbt_project.yml
  2. Run dbt build
  3. Create a data share in Snowflake and share the tables created by models and seeds
  4. Run dbt build --full-refresh
  5. The seed tables become unchecked and are no longer shared, the model tables are still shared

Relevant log output

No response

Environment

- OS: macOS-14.4.1-arm64-arm-64bit
- Python: 3.9.6
- dbt-core: 1.7.4
- dbt-snowflake: 1.7.1

Additional Context

No response

@sarah-tuva sarah-tuva added bug Something isn't working triage labels May 9, 2024
@dataders
Copy link
Contributor

@sarah-tuva great callout! Before I get into the nitty gritty, I want to mention that:

  1. a valid workaround would be to make a wrapper view or table model of this seed table and share that instead
  2. I'm especially interested in this workaround because I don't understand the use case for sharing a .csv seed table via a Snowflake data share. Seeds are sort of like sources, right? Not products we want to share? Change my view!

nitty gritty

Seed is a strange fruit compared to it's materialization brethren.

Table and Incremental models materializations share use of a single macro to make tables: snowflake__create_table_as() which uses the copy_grants config

{% endif %}
{% if copy_grants and not temporary -%} copy grants {%- endif %} as
(

However, Snowflake's seed materialization is actually dbt's default implementation, therefore does not use make use of copy_grants at all. Specifically it uses: default__create_csv_table()

  {% set sql %}
    create table {{ this.render() }} (
        {%- for col_name in agate_table.column_names -%}
            {%- set inferred_type = adapter.convert_type(agate_table, loop.index0) -%}
            {%- set type = column_override.get(col_name, inferred_type) -%}
            {%- set column_name = (col_name | string) -%}
            {{ adapter.quote_seed_column(column_name, quote_seed_column) }} {{ type }} {%- if not loop.last -%}, {%- endif -%}
        {%- endfor -%}
    )
  {% endset %}

If you really wanted to get this working, you could do so by introducing a new macro called snowflake__create_csv_table() in your projects macros/ directory that looks almost exactly like but the default__ version but instead has the following lines peppered in

-- before DDL
{%- set copy_grants = config.get('copy_grants') -%}

-- before the line starting with "as (sql)"
{% if copy_grants is not none -%}
   copy grants
{%- endif -%}

@dataders dataders added enhancement New feature or request awaiting_response and removed bug Something isn't working triage labels May 14, 2024
@sarah-tuva
Copy link
Author

Hi @dataders! Thanks for providing this background and a potential workaround. We'll try to implement this and let you know how it goes.

For us, sharing seeds is a huge part of what we do. It is one of our key products. We are building open-source healthcare data analytics tools. Part of what we do is create and maintain a set of healthcare-specific terminology seed files. In most cases, our users can just install the Tuva package and run it in their own environment. In some cases, though, we manage this for customers and need to be able to share it back with them. Hence, the need to share seed files as part of the full analytics suite. Feel free to check out our repo.

@sarah-tuva
Copy link
Author

Hello! I just wanted to report back that I was able to create a workaround for this. Thank you so much for leading me to the right place. Through testing, I figured out that the default reset_csv_table() macro, which is triggered by using the --full-refresh flag was dropping the seed table and then inserting the CSV data. This means the grants for the data share were being dropped as well. I ended up having to create two new versions of the default macros, create_csv_table() and reset_csv_table().

Here is how the fix works.

  1. Add the copy_grants config to the dbt_project.yml:
models:
  +copy_grants: true

seeds:
  +copy_grants: true
  1. Create two custom macros and add them to the project's macros/ folder:
  {% macro snowflake__create_csv_table(model, agate_table) %}
  {%- set column_override = model['config'].get('column_types', {}) -%}
  {%- set quote_seed_column = model['config'].get('quote_columns', None) -%}
  {%- set copy_grants = model['config'].get('copy_grants') -%}

  {% set sql %}
    create or replace table {{ this.render() }} (
        {%- for col_name in agate_table.column_names -%}
            {%- set inferred_type = adapter.convert_type(agate_table, loop.index0) -%}
            {%- set type = column_override.get(col_name, inferred_type) -%}
            {%- set column_name = (col_name | string) -%}
            {{ adapter.quote_seed_column(column_name, quote_seed_column) }} {{ type }} {%- if not loop.last -%}, {%- endif -%}
        {%- endfor -%}
    ) {% if copy_grants and not temporary -%} copy grants {%- endif %}
  {% endset %}

  {% call statement('_') -%}
    {{ sql }}
  {%- endcall %}

  {{ return(sql) }}
{% endmacro %}


{% macro snowflake__reset_csv_table(model, full_refresh, old_relation, agate_table) %}
    {% set sql = "" %}
    {% if full_refresh %}
        {% set sql = snowflake__create_csv_table(model, agate_table) %}
    {% else %}
        {{ adapter.truncate_relation(old_relation) }}
        {% set sql = "truncate table " ~ old_relation %}
    {% endif %}

    {{ return(sql) }}
{% endmacro %}

I would be happy to create a PR if you think this fix should go in the adapter. It might only be an issue for us since we do share seeds with Snowflake data share.

@dataders dataders changed the title [Bug] copy_grants option not working for seeds in a Snowflake data share [Feature] copy_grants option not working for seeds in a Snowflake data share May 28, 2024
@dataders dataders changed the title [Feature] copy_grants option not working for seeds in a Snowflake data share [Feature] copy_grants option for seeds (for use in Snowflake Data Share) May 28, 2024
@dataders
Copy link
Contributor

great to hear that you're unblocked and that I pointed you in a helpful direction.

You can certainly open a PR, however, given that we're not certain if there's other's who would benefit, it doesn't strike me as a priority to get merged.

In the meantime, we can leave this issue open for a while to see if there's interest from others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

2 participants