Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Make sql_header configuration available on tests #9775

Open
3 tasks done
epapineau opened this issue Mar 19, 2024 · 8 comments
Open
3 tasks done

[Feature] Make sql_header configuration available on tests #9775

epapineau opened this issue Mar 19, 2024 · 8 comments
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors

Comments

@epapineau
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The sql_header model configuration is a convenient way to define a temporary BigQuery UDF, as described in the dbt docs. It would be helpful to have this configuration available on tests as well for the same purposes.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

@epapineau epapineau added enhancement New feature or request triage labels Mar 19, 2024
@dbeatty10
Copy link
Contributor

@epapineau ! So good to see you 😄

Are you thinking primarily data tests or unit tests or both? If data tests is included, singular or generic or both?

Could you share a simplified code example in which you'd want to define a temporary BigQuery UDF and a test that would use it?

@dbeatty10 dbeatty10 added dbt tests Issues related to built-in dbt testing functionality awaiting_response and removed triage labels Mar 19, 2024
@epapineau
Copy link
Contributor Author

Are you thinking primarily data tests or unit tests or both? If data tests is included, singular or generic or both?

Data tests! And probably both singular and generic - can't imagine having the ability in one and not the other

@epapineau
Copy link
Contributor Author

Could you share a simplified code example in which you'd want to define a temporary BigQuery UDF and a test that would use it?

I've been working with the temporary UDF recommended here this week. I'm creating a temporary UDF using sql_headers in my model, but I'd like to be able to test the key values in a singular data test to confirm they can successfully cast to integers. However, I'm not able to create the temporary UDF in the test like I am in the model. LMK if you want something more explicit here 👍

@dbeatty10
Copy link
Contributor

LMK if you want something more explicit here 👍

Something more explicit would be great! It would serve as a two-fer-one:

  1. It will help us more tangibly consider this proposal 🧠
  2. If we end up adopting this proposal, then we can drop a variation of the explicit example into pytest to make sure it works as expected 💪

@epapineau
Copy link
Contributor Author

Okay, here's an example using the UDF from the article linked above to get keys and values from JSON. In this example, I want the test to fail where test is null. I currently can't define a temp UDF in a test w/o sql_header config on a test node.

{%- call set_sql_header(config) -%}
    CREATE TEMP FUNCTION EXTRACT_KV_PAIRS(json_str STRING)
    RETURNS ARRAY<STRUCT<key STRING, value STRING>>
    LANGUAGE js AS """
    try{ 
        const json_dict = JSON.parse(json_str); 
        const all_kv = Object.entries(json_dict).map(
            (r)=>Object.fromEntries([["key", r[0]],["value",  
                                    JSON.stringify(r[1])]]));
        return all_kv;
    } catch(e) { return [{"key": "error","value": e}];}
    """;
{%- endcall -%}

-- sample JSON requiring a UDF to extract keys
WITH
sample_json AS(
  SELECT '{"1":{"status":"success"},"2":{"status":"success"},"NaN":{"status":"unsuccess"}}' AS payload
),
-- use UDF
extracted_pairs as (
  SELECT EXTRACT_KV_PAIRS(payload) as kv_list FROM sample_json
)
SELECT
    items.key as id,
    SAFE_CAST(items.key AS INTEGER) as test  -- Test fails for rows where test is null
FROM extracted_pairs
CROSS JOIN UNNEST(kv_list) as items

@dbeatty10
Copy link
Contributor

Thanks for that example @epapineau !

I tried out your example, and surprising it works if (and only if) store_failures_as="table" (or store_failures=true).

Regardless, it makes sense for us to do the following:

  • Enable sql_header to apply to both singular and generic data tests regardless if they are storing failures or not.

I did a proof-of-concept in these draft PRs:

Labeling this as help_wanted for someone in the community to bring across the finish line (adding test cases, etc.).

@dbeatty10 dbeatty10 removed the triage label Apr 4, 2024
@dbeatty10 dbeatty10 removed their assignment Apr 4, 2024
@dbeatty10 dbeatty10 added the help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors label Apr 5, 2024
@jx2lee
Copy link
Contributor

jx2lee commented May 23, 2024

@dbeatty10 hi! Could i take your draft PRs?
I've worked on good_first_issue issue in dbt-core, but I didn't contribute to dbt-adapter.

  • I'm wondering if I could do it,
  • and if so, is it okay to �do on ref branch of draft-PR (e.g. dbeatty/9775-sql-header)?

@github-christophe-oudar

@dbeatty10 sql_header is not working on unit tests too, maybe we could take care of that one as well?
Would dbt-labs/dbt-adapters#145 changes also affect them? I would need to rework my local setup to test if you don't know 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors
Projects
None yet
Development

No branches or pull requests

4 participants