Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt_utils.get_column_values should use the Relation object it's given instead of making its own #424

Closed
1 of 5 tasks
fivetran-joemarkiewicz opened this issue Sep 27, 2021 · 13 comments · Fixed by #440
Closed
1 of 5 tasks
Labels
bug Something isn't working good first issue

Comments

@fivetran-joemarkiewicz
Copy link
Contributor

fivetran-joemarkiewicz commented Sep 27, 2021

Describe the bug

The get_column_values and star macros do not seem to properly work when the source is passed through as an argument and the src.yml is leveraging the non-default quoting. Specifically for Snowflake.

Steps to reproduce

Use quoting in your src.yml:

version: 2

sources:
  - name: salesforce
    schema: 'salesforce'
    database: mydatabase
    quoting: 
          database: false
          schema: true
          identifier: true

    tables:
      - name: fivetran_formula
      - name: test_order_c

With the above src.yml config if I attempt to use the get_column_values and star macro like below:

{%- set old_formula_fields = dbt_utils.get_column_values(source('salesforce', 'fivetran_formula'),'field') | upper -%}

select
{{ dbt_utils.star(source('salesforce','test_order_c'), except=old_formula_fields) }}
from {{ source('salesforce', 'test_order_c') }}

The resulting compiled output is completely empty.

select

from DATABASE."SCHEMA"."test_order_c"

Expected results

If I remove the quoting config in the src.yml then I see the output as expected. The expected results would be to allow users that have to use the quoting config to still be able to use the combination of these macros.

select
    "FIELDS_I_WANT",
    "ETC"
from DATABASE."SCHEMA"."test_order_c"

Actual results

Compiled output below.

select

from DATABASE."SCHEMA"."test_order_c"

Screenshots and log output

System information

The contents of your packages.yml file:

packages:
  - package: fivetran/salesforce_formula_utils
    version: [">=0.4.0", "<0.5.0"]
  - package: fivetran/fivetran_utils
    version: [">=0.2.0", "<0.3.0"]
  - package: dbt-labs/dbt_utils
    version: [">=0.7.0", "<0.8.0"]

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.20.2
   latest version: 0.20.2

Up to date!

Plugins:
  - bigquery: 0.20.2
  - snowflake: 0.20.2
  - redshift: 0.20.2
  - postgres: 0.20.2
  - spark: 0.20.2

Additional context

Are you interested in contributing the fix?

I would be happy to contribute to this fix! However, I am not entirely sure where the issue is arising? Once we locate, I would be happy to help in any way I can!

@fivetran-joemarkiewicz fivetran-joemarkiewicz added bug Something isn't working triage labels Sep 27, 2021
@morgankrey-amplitude
Copy link

Additional context is that we discovered this issue when trying to install salesforce_formula_utils, and have narrowed down the bug hunt to specifically how the get_column_values macro works with Snowflake quoting. Happy to provide more details if helpful

@joellabes
Copy link
Contributor

Hi @fivetran-joemarkiewicz and @morgankrey-amplitude, thanks for opening this!

From a cursory glance at get_column_values()' implementation, I'm surprised to see a new relation being created here, given that it's being passed in a perfectly good relation from ref (or source) in the first place. This seems like the most likely culprit - if you tried setting target_relation to table directly I think it would solve your problem. (Why not completely replace target_relation with table? IMO target_relation is more accurate since, well, views aren't tables.)

As for star, it's using the adapter.get_columns_in_relation() function. Since that's provided by dbt Core, I would expect it to understand quoting permutations - if it doesn’t, we might need to open an issue over there.

@joellabes joellabes removed the triage label Oct 19, 2021
@fivetran-joemarkiewicz
Copy link
Contributor Author

Hey @joellabes fun seeing you on the dbt-labs side! 🎉

Thanks for taking a look over this. I'll try setting the target_relation to table in this instance and see if it does the trick.

@morgankrey-amplitude
Copy link

Hey @joellabes after @fivetran-joemarkiewicz and I sat down to debug for a while, it seems that there's a general issue with quoting permutations in get_column_values().

First, we confirmed that our explicitly-lowercased table is queriable from Snowflake natively
Screen Shot 2021-10-28 at 2 38 23 PM

Next, we confirmed that we can query the table as a source directly, successfully
Screen Shot 2021-10-28 at 2 38 37 PM

Finally, we confirmed that querying the source using get_column_values does NOT work
Screen Shot 2021-10-28 at 2 39 53 PM

Our quoting setup in our YAML file is below
Screen Shot 2021-10-28 at 3 09 01 PM

@fivetran-joemarkiewicz please feel free to add any context you think I missed!

@joellabes
Copy link
Contributor

Hey @morgankrey-amplitude, thanks for the writeup - were these issues after trying the patch I described above?

if you tried setting target_relation to table directly I think it would solve your problem

You can try locally by making a copy of the macro and referencing it without a dbt_utils. namespace

@fivetran-joemarkiewicz
Copy link
Contributor Author

Hey @joellabes I tried this locally but it didn't seem to do the trick. Although, I may have not done this correctly at all! Please excuse my noviceness, but I updated the macro locally to set target relation to table and still seemed to get the same error.

{%- set target_relation = table) -%}

However, @morgankrey-amplitude would you be able to try this on your end as well as your warehouse has the mixed casing and mine doesn't? You can copy the below as a new macro and like @joellabes said you could just reference the macro name without dbt_utils. in the query we did early and see if you still get the same error.

## Named get_column_values.sql
{% macro get_column_values(table, column, order_by='count(*) desc', max_records=none, default=none) -%}

    {#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
    {%- if not execute -%}
        {{ return('') }}
    {% endif %}

    {%- set target_relation = table -%}

    {%- call statement('get_column_values', fetch_result=true) %}

        {%- if not target_relation and default is none -%}

          {{ exceptions.raise_compiler_error("In get_column_values(): relation " ~ table ~ " does not exist and no default value was provided.") }}

        {%- elif not target_relation and default is not none -%}

          {{ log("Relation " ~ table ~ " does not exist. Returning the default value: " ~ default) }}

          {{ return(default) }}

        {%- else -%}


            select
                {{ column }} as value

            from {{ target_relation }}
            group by {{ column }}
            order by {{ order_by }}

            {% if max_records is not none %}
            limit {{ max_records }}
            {% endif %}

        {% endif %}

    {%- endcall -%}

    {%- set value_list = load_result('get_column_values') -%}

    {%- if value_list and value_list['data'] -%}
        {%- set values = value_list['data'] | map(attribute=0) | list %}
        {{ return(values) }}
    {%- else -%}
        {{ return(default) }}
    {%- endif -%}

{%- endmacro %}

and then in your query you would use the below.

{%- set old_formula_fields = get_column_values(source('salesforce', 'fivetran_formula'),'field') | upper -%}

select
{{ dbt_utils.star(source('salesforce','test_order_c'), except=old_formula_fields) }}
from {{ source('salesforce', 'test_order_c') }}

@joellabes feel free to call out if this is not what you were referring to 😄

@morgankrey-amplitude
Copy link

Interestingly, it's at least a different error!

Once I replaced my code with what you had above @fivetran-joemarkiewicz, I got the following
Screen Shot 2021-10-28 at 8 07 55 PM

Quoting the "field" reference, since the field column in the fivetran_formula Snowflake table is lowercase, gets me effectively an empty macro return
Screen Shot 2021-10-28 at 8 08 28 PM

@joellabes
Copy link
Contributor

OK! This is good news.

I infer that you've got the | upper filter to uppercase the results of the old_formula_fields variable, is that right? I think that what it's doing instead is uppercasing the field param 😑

Could you remove that upper filter, and check whether you're getting results back from the get_column_values macro? You should be able to log them by just putting {{ old_formula_values }} on line 2 and compiling that.

If you are, then we have the solution for this original bug - update the macro to use the Relation provided instead of making our own.

From there, assuming that the output is lowercase, we just need to work out a way to uppercase them without breaking the introspective query. Worst case we can do a loop, but there should be a tidier way than that.

@morgankrey-amplitude
Copy link

Confirmed that there are mixed-case column values returning when I specifically quote the field parameter (like below)
Screen Shot 2021-11-01 at 9 22 19 AM

@joellabes
Copy link
Contributor

joellabes commented Nov 9, 2021

@morgankrey-amplitude sorry for the delay in getting back to you!

I've done some experimentation today, and I have a couple of approaches that should work.

Option 1 (my preference): you should just be able to wrap the get_column_values call in another set of parens.

{% set old_formula_fields = (get_column_values(source('salesforce', 'fivetran_formula'), '"field"')) | upper %}
{{ old_formula_fields }}

Option 2: (pretty much guaranteed to work, but a bit ugly): just set uppercase them separately

{% set old_formula_fields = get_column_values(source('salesforce', 'fivetran_formula'), '"field"') %}
{% set old_formula_fields = old_formula_fields | upper %}
{{ old_formula_fields }}

Let me know how you get on with those

@joellabes joellabes changed the title dbt_utils.get_column_values and dbt_utils.star macro not working with database and schema quoting dbt_utils.get_column_values should use the Relation object it's given instead of making its own Nov 9, 2021
@morgankrey-amplitude
Copy link

At this point @joellabes I think everything is working! The issue now is pulling this knowledge into the salesforce formula package from Fivetran that I think @fivetran-joemarkiewicz and I need to figure out from here. I've confirmed that I am able to get all the column values in lower, upper, and original casing using this modified macro and macro call above

@fivetran-joemarkiewicz
Copy link
Contributor Author

Thanks so much @joellabes. I agree with @morgankrey-amplitude that we should move this back to the dbt_salesforce_formula_utils package to apply what we learned here to fix the issue.

@morgankrey-amplitude I just opened an issue on the formula_utils package so we may take this issue back to the package.

Thanks again @joellabes 🎉

@joellabes
Copy link
Contributor

Yay glad to hear it! In addition to any chances on the formula_utils side, there's definitely a bug on dbt_utils' side around that artisinal, handcrafted Relation object instead of just using the one that's passed in. I'll tidy that up shortly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants