Unit test fixture (csv) returns "" for empty value #9881

leesiongchan · 2024-04-09T01:31:44Z

Current Behavior

I have fixture with empty value, the unit test will convert them to "" instead of null. It looks like seeding is working by converting them to null but not the case for unit test.

Expected Behavior

csv with empty value should be treating as null.

Steps To Reproduce

create a unit test with fixture
in the fixture provide empty value
run dbt test
error - Database Error: invalid input syntax for integer: ""

Relevant log output

No response

Environment

- OS:
- Python: 3.12.2
- dbt: 1.8.0-b2 + redshift 1.8.0b2

Which database adapter are you using with dbt?

redshift

Additional Context

No response

dbeatty10 · 2024-04-09T14:13:57Z

Thanks for trying out unit testing and reporting this @leesiongchan !

How dbt seeds treat empty values

dbt-labs/docs.getdbt.com#4867 describes how dbt seeds handle empty strings by converting them to SQL NULL values.

It also describes that there are two values that you can't load directly with CSV seeds:

An empty / blank string ("" / '')

A string with the value "null"

It totally makes sense that you'd expect CSV-formatted unit test fixtures to behave the same way as dbt seeds.

But we'll need to take another look at this and choose between a couple options for moving forward.

Our options

What's your take on this @graciegoheen? We'd need to choose between one of these two options:

Update the behavior of blanks in CSV-formatted unit test fixtures to align with CSV-formatted seeds, accepting that it won't be possible to represent blank strings. Users that need to represent blank strings for unit tests will need to use dict rather than csv for the format.
Keep current behavior of blanks in CSV-formatted unit test fixtures, accepting that it is different than CSV-formatted seeds. We'd want to add appropriate caveats in our documentation.

To me the 1st option has more congruence and the 2nd option has more dissonance. What do you think, Grace?

Reprex

seeds/my_seed.csv

animal,legs
duck,2
snake,""

tests/fixtures/my_model_fixture.csv

animal,legs
duck,2
snake,

models/animals.sql

select 'duck' as animal, 2 as legs union all
select 'snake' as animal, null as legs

models/_unit.yml

unit_tests:

  - name: test_a_9881
    description: https://github.com/dbt-labs/dbt-core/issues/9881
    model: animals
    given: []
    expect:
      # This doesn't work as expected
      format: csv
      fixture: my_model_fixture

  - name: test_b_9881
    description: https://github.com/dbt-labs/dbt-core/issues/9881
    model: animals
    given: []
    expect:
      # But this works as expected
      rows:
        - {animal: duck, legs: 2}
        - {animal: snake, legs: null}

Run this command to see how seeds work with a blank value (a blank string "" converts to a SQL NULL):

dbt show -s seeds/my_seed.csv --output json

Run this command and get an unexpected failure:

dbt build -s my_seed animals --full-refresh

graciegoheen · 2024-04-09T15:13:26Z

Hello! @leesiongchan - thanks for opening up this request. I'd like to know more about your use case here. We've designed unit tests such that you only have to supply input mock data for the columns that are relevant to your unit test.

If we imagine a unit test that only needs col_a and col_c…

Instead of:

# tests/fixtures/my_fixture.csv

col_a,col_b,col_c
1,,doug
2,,grace

You should do:

# tests/fixtures/my_fixture.csv

col_a,col_c
1,doug
2,grace

Now, that won’t be relevant if you do need all columns but want to test 1 case where col_b is null.

In that case, could you explicitly call out the null like so:

# tests/fixtures/my_fixture.csv

col_a,col_b,col_c
1,blue,doug
2,null,grace

Seeds behave differently than fixtures (though understand the comparison since they're both csv files). I am also hesitant to convert empty values to null for fixtures, if it would prevent someone from using an empty string in their mock input/output data for a unit test.

dbeatty10 · 2024-04-09T16:16:35Z

@graciegoheen how does this behave for you when you add it as a fixture?

animal,legs
duck,2
snake,null

Here's the error message I get when I use the project files described in #9881 (comment):

16:15:10    Runtime Error in unit_test test_a_9881 (models/_unit.yml)
  An error occurred during execution of unit test 'test_a_9881'. There may be an error in the unit test definition: check the data types.
   Database Error
    invalid input syntax for type integer: "null"
    LINE 29:     cast('null' as integer)

But maybe I have a bjorked local environment somehow?

leesiongchan · 2024-04-10T05:54:28Z

In that case, could you explicitly call out the null like so:
# tests/fixtures/my_fixture.csv

col_a,col_b,col_c
1,blue,doug
2,null,grace

If this is supported perhaps it is the best solution for both worlds (seeding & unit test fixture)? In our case we need all columns because our update/delete event consists only partial of the data.

graciegoheen · 2024-04-11T17:35:02Z

So I think this is going to be dependent on the warehouse you're using. For example, Snowflake is totally fine with try_cast('null' as NUMBER(38,0)), so this actually works fine for Snowflake (defining a numeric column as null in a fixture file).

In postgres (which is what @dbeatty10 is using), cast('null' as integer) does not work but cast(null as integer) does.

Things get funky as well when trying to cast to a string, because try_cast('null' as text) will result in just the actual word "null" rather than NULL.

@dbeatty10 did a deep dive on what you can and can't cast NULL to in Snowflake here.

We've had a tradeoff for seeds for awhile where:

you cannot define an empty string in a seed
you can define NULL

I understand it's confusing for us to decide the opposite for fixtures:

you can define an empty string
you cannot define NULL

That being said, you can define NULL using the other formats available (currently dict and coming soon sql). I believe this is because we're wrapping the values inputted in the csv fixture in single quotes, but we don't do that for the other formats.

At minimum, I think we have a few things to do here:

add a test to core to cover this case
update our docs on data types for unit testing with an example of how you'd supply an intended null in each of the warehouses

In addition, we should decide how we want to handle this. A few options include:

allowing seeds and fixtures to support both empty strings and explicit NULLs

col_a,col_b,col_c,col_d
1,blue,doug,hat
2,null,,shoe

making fixtures consistent with seeds (not support empty strings, but support NULLs)
documenting that fixtures don't support NULLs (must use alternative format for this case)
something else?

I'm going to move this over to technical refinement to get input from our engineers! Thanks for the discussion.

dbeatty10 · 2024-04-12T00:23:02Z

dbt-labs/dbt-snowflake#894 (comment) has the deep dive of casting NULL (without quotes) across all known standard data types for the following databases:

snowflake
bigquery
redshift
postgres
databricks
spark

graciegoheen · 2024-04-15T19:18:01Z

Acceptance Criteria

My seeds and fixtures should treat empty values the same

col_a,col_b,col_c,col_d
1,blue,doug,hat
2,red,,shoe

,, becomes NULL
there is no current way to specify an empty string using csv format

FishtownBuildBot · 2024-05-09T14:10:41Z

Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#5476

leesiongchan added bug Something isn't working triage labels Apr 9, 2024

dbeatty10 self-assigned this Apr 9, 2024

dbeatty10 added Refinement Maintainer input needed and removed triage labels Apr 9, 2024

dbeatty10 assigned graciegoheen Apr 9, 2024

dbeatty10 mentioned this issue Apr 9, 2024

For dbt seeds and csv format of unit testing, empty strings ("") and "null" both convert to NULL dbt-labs/docs.getdbt.com#4867

Open

1 task

dbeatty10 removed their assignment Apr 9, 2024

graciegoheen added enhancement New feature or request and removed bug Something isn't working labels Apr 9, 2024

graciegoheen changed the title ~~[Bug] Unit test fixture (csv) returns "" for empty value~~ Unit test fixture (csv) returns "" for empty value Apr 9, 2024

graciegoheen added the awaiting_response label Apr 9, 2024

github-actions bot added triage and removed awaiting_response labels Apr 9, 2024

graciegoheen removed triage Refinement Maintainer input needed labels Apr 11, 2024

graciegoheen mentioned this issue Apr 11, 2024

[CT-2911] [Epic] Unit testing dbt models #8283

Closed

graciegoheen added pre-release Bug not yet in a stable release bug Something isn't working and removed enhancement New feature or request labels Apr 11, 2024

dbeatty10 mentioned this issue Apr 12, 2024

[Bug] Type object 'MacroFuzzEnvironment' has no attribute 'concat' #9896

Open

2 tasks

graciegoheen added this to the v1.8 milestone Apr 15, 2024

graciegoheen removed their assignment Apr 24, 2024

martynydbt assigned ChenyuLInx Apr 25, 2024

MichelleArk assigned MichelleArk and unassigned ChenyuLInx May 9, 2024

MichelleArk mentioned this issue May 9, 2024

empty unit test csv fixture values default to null #10117

Merged

5 tasks

MichelleArk added the backport 1.8.latest label May 9, 2024

MichelleArk closed this as completed in #10117 May 9, 2024

MichelleArk added the user docs [docs.getdbt.com] Needs better documentation label May 9, 2024

dbeatty10 mentioned this issue May 9, 2024

For csv format of unit testing, empty strings ("") and "null" both convert to SQL NULL dbt-labs/docs.getdbt.com#5476

Open

graciegoheen mentioned this issue May 20, 2024

[Bug] unit tests' comparisons are sometimes sensitive to the order of records returned #10167

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unit test fixture (csv) returns "" for empty value #9881

Unit test fixture (csv) returns "" for empty value #9881

leesiongchan commented Apr 9, 2024 •

edited by graciegoheen

dbeatty10 commented Apr 9, 2024

Reprex

graciegoheen commented Apr 9, 2024

dbeatty10 commented Apr 9, 2024

leesiongchan commented Apr 10, 2024

graciegoheen commented Apr 11, 2024 •

edited

dbeatty10 commented Apr 12, 2024

graciegoheen commented Apr 15, 2024 •

edited

FishtownBuildBot commented May 9, 2024

Unit test fixture (csv) returns "" for empty value #9881

Unit test fixture (csv) returns "" for empty value #9881

Comments

leesiongchan commented Apr 9, 2024 • edited by graciegoheen

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

dbeatty10 commented Apr 9, 2024

How dbt seeds treat empty values

Our options

Reprex

graciegoheen commented Apr 9, 2024

dbeatty10 commented Apr 9, 2024

leesiongchan commented Apr 10, 2024

graciegoheen commented Apr 11, 2024 • edited

dbeatty10 commented Apr 12, 2024

graciegoheen commented Apr 15, 2024 • edited

Acceptance Criteria

FishtownBuildBot commented May 9, 2024

leesiongchan commented Apr 9, 2024 •

edited by graciegoheen

graciegoheen commented Apr 11, 2024 •

edited

graciegoheen commented Apr 15, 2024 •

edited