-
Notifications
You must be signed in to change notification settings - Fork 149
feat: clickhouse safe_cast macro #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MaelleBaillet5
wants to merge
8
commits into
ClickHouse:main
Choose a base branch
from
MaelleBaillet5:clickhouse_safe_cast
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
45e401a
feat: add clickhouse safe_cast
MaelleBaillet5 2c07277
feat: add changelog
MaelleBaillet5 88ae97e
fix: improve safe_cast test
MaelleBaillet5 0006cd9
docs: add comment to explain clickhouse__safe_cast macro goal
MaelleBaillet5 161857f
fix: add setup for test_safe_cast
MaelleBaillet5 3abb80e
fix: take into account DateTime(Europe/Paris) datatype
MaelleBaillet5 2579e7e
fix: adapt fixedstring to pass ci
MaelleBaillet5 aa5534c
feat: add unit testing based on issue 315
MaelleBaillet5 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| -- This macro provides type-safe casting with automatic default values for ClickHouse types. | ||
| -- When the literal string 'null' is passed as the field parameter, it returns the ClickHouse | ||
| -- default value for the specified type. This is primarily used in unit test fixtures to avoid | ||
| -- having to specify all non-nullable columns. | ||
|
|
||
| {% macro clickhouse__safe_cast(field, dtype) %} | ||
| {%- if field == 'null' -%} | ||
| CAST(defaultValueOfTypeName('{{ dtype | replace("'", "\\'") }}') AS {{ dtype }}) | ||
| {%- else -%} | ||
| CAST({{ field }} AS {{ dtype }}) | ||
| {%- endif -%} | ||
| {% endmacro %} | ||
84 changes: 84 additions & 0 deletions
84
tests/integration/adapter/unit_testing/test_missing_column_values.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| """ | ||
| Test that unit tests work correctly when column values are omitted from input rows. | ||
| The safe_cast macro should provide default values instead of NULL for missing columns. | ||
| """ | ||
| import pytest | ||
| from dbt.tests.util import run_dbt | ||
|
|
||
|
|
||
| # First model: a table with non-nullable columns | ||
| my_first_dbt_model_sql = """ | ||
| select 1 as id, 'a' AS foo | ||
| union all | ||
| select 2 as id, 'b' AS foo | ||
| """ | ||
|
|
||
| # Second model: filters the first model | ||
| my_second_dbt_model_sql = """ | ||
| select * | ||
| from {{ ref('my_first_dbt_model') }} | ||
| where id = 1 | ||
| """ | ||
|
|
||
| # Unit test with missing column values (foo is omitted from input rows) | ||
| test_my_model_yml = """ | ||
| version: 2 | ||
|
|
||
| models: | ||
| - name: my_first_dbt_model | ||
| description: "A starter dbt model" | ||
| columns: | ||
| - name: id | ||
| data_type: uint64 | ||
| - name: foo | ||
| data_type: string | ||
| - name: my_second_dbt_model | ||
| description: "A starter dbt model" | ||
| columns: | ||
| - name: id | ||
| data_type: uint64 | ||
| - name: foo | ||
| data_type: string | ||
| unit_tests: | ||
| - name: test_not_null | ||
| model: my_second_dbt_model | ||
| given: | ||
| - input: ref('my_first_dbt_model') | ||
| rows: | ||
| - {id: 1} | ||
| - {id: 2} | ||
| expect: | ||
| rows: | ||
| - {id: 1} | ||
| """ | ||
|
|
||
|
|
||
| class TestMissingColumnValues: | ||
| """ | ||
| Test that unit tests handle missing column values correctly. | ||
| The safe_cast macro should provide appropriate default values instead. | ||
| """ | ||
|
|
||
| @pytest.fixture(scope="class") | ||
| def models(self): | ||
| return { | ||
| "my_first_dbt_model.sql": my_first_dbt_model_sql, | ||
| "my_second_dbt_model.sql": my_second_dbt_model_sql, | ||
| "unit_tests.yml": test_my_model_yml, | ||
| } | ||
|
|
||
| def test_missing_column_values(self, project): | ||
| """ | ||
| Test that unit tests work when column values are omitted from input rows. | ||
|
|
||
| This test should pass without errors, demonstrating that the safe_cast macro | ||
| correctly handles NULL values by providing appropriate defaults for ClickHouse | ||
| non-nullable types. | ||
| """ | ||
| # Run the models | ||
| results = run_dbt(["run"]) | ||
| assert len(results) == 2 | ||
|
|
||
| # Run the unit test - this should pass without ClickHouse type conversion errors | ||
| results = run_dbt(["test", "--select", "test_type:unit"]) | ||
| assert len(results) == 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,160 @@ | ||
| import pytest | ||
| from datetime import datetime, date, timezone | ||
| from uuid import UUID | ||
| from dbt.tests.util import run_dbt | ||
|
|
||
|
|
||
| # Model that tests safe_cast with various ClickHouse types | ||
|
|
||
| safe_cast_model_sql = """ | ||
| select | ||
| -- String types | ||
| {{ safe_cast("null", "String") }} as string_default, | ||
| {{ safe_cast("null", "FixedString(10)") }} as fixedstring_default, | ||
|
|
||
| -- Integer types | ||
| {{ safe_cast("null", "Int32") }} as int_default, | ||
| {{ safe_cast("null", "UInt32") }} as uint_default, | ||
|
|
||
| -- Floating point types | ||
| {{ safe_cast("null", "Float32") }} as float_default, | ||
| {{ safe_cast("null", "Decimal(10, 2)") }} as decimal_default, | ||
|
|
||
| -- Date/Time types | ||
| {{ safe_cast("null", "Date") }} as date_default, | ||
| {{ safe_cast("null", "DateTime") }} as datetime_default, | ||
| {{ safe_cast("null", "DateTime64(3)") }} as datetime64_default, | ||
| {{ safe_cast("null", "DateTime('Europe/Paris')") }} as datetime_tz_default, | ||
|
|
||
| -- Other types | ||
| {{ safe_cast("null", "UUID") }} as uuid_default, | ||
| {{ safe_cast("null", "Bool") }} as bool_default, | ||
|
|
||
| -- Complex types | ||
| {{ safe_cast("null", "Array(String)") }} as array_default, | ||
| {{ safe_cast("null", "Map(String, Int32)") }} as map_default, | ||
| {{ safe_cast("null", "Tuple(String, Int32)") }} as tuple_default, | ||
|
|
||
| -- Nullable | ||
| {{ safe_cast("null", "Nullable(String)") }} as nullable_default, | ||
|
|
||
| -- Provided values (non-null) | ||
| {{ safe_cast("'Alice'", "String") }} as provided_string, | ||
| {{ safe_cast("42", "Int32") }} as provided_int, | ||
| {{ safe_cast("toUUID('00000000-0000-0000-0000-000000000001')", "UUID") }} as provided_uuid | ||
| """ | ||
|
|
||
|
|
||
| class TestSafeCast: | ||
| """Test ClickHouse-specific safe_cast functionality""" | ||
|
|
||
| @pytest.fixture(scope="class") | ||
| def models(self): | ||
| return { | ||
| "safe_cast_test.sql": safe_cast_model_sql, | ||
| } | ||
|
|
||
| @pytest.fixture(scope="class", autouse=True) | ||
| def setup(self, project): | ||
| """Run the model once for all tests in this class""" | ||
| results = run_dbt(["run", "--select", "safe_cast_test"]) | ||
| assert len(results) == 1 | ||
| yield | ||
|
|
||
| def test_safe_cast_defaults(self, project): | ||
| """Test that safe_cast generates correct default values for ClickHouse types""" | ||
|
|
||
| # Query the results | ||
| result = project.run_sql( | ||
| "select * from safe_cast_test", | ||
| fetch="one" | ||
| ) | ||
|
|
||
| # String types | ||
| assert result[0] == '' # String default | ||
| # FixedString(10) default: some drivers return bytes of nulls, others empty string | ||
| if isinstance(result[1], (bytes, bytearray)): | ||
| assert result[1] == b'\x00' * 10 | ||
| else: | ||
| # In some environments, trailing nulls are stripped and returned as empty string | ||
| assert result[1] in ('', '\x00' * 10) | ||
|
|
||
| # Integer types | ||
| assert result[2] == 0 # Int32 default | ||
| assert result[3] == 0 # UInt32 default | ||
|
|
||
| # Floating point types | ||
| assert result[4] == 0.0 # Float32 default | ||
| assert result[5] == 0.0 # Decimal default | ||
|
|
||
| # Date/Time types | ||
| assert result[6] == date(1970, 1, 1) # Date default | ||
| assert result[7] == datetime(1970, 1, 1, 0, 0, 0) # DateTime default | ||
| assert result[8] == datetime(1970, 1, 1, 0, 0, 0) # DateTime64 default | ||
| # For timezone-aware DateTime, compare in UTC to avoid local TZ shifts | ||
| assert result[9].astimezone(timezone.utc) == datetime(1970, 1, 1, 0, 0, 0, tzinfo=timezone.utc) # DateTime with timezone default | ||
|
|
||
| # Other types | ||
| assert result[10] == UUID('00000000-0000-0000-0000-000000000000') # UUID default | ||
| assert result[11] is False # Bool default | ||
|
|
||
| # Complex types | ||
| assert result[12] == [] # Array default | ||
| assert result[13] == {} # Map default | ||
| assert result[14] == ('', 0) # Tuple default | ||
|
|
||
| # Nullable | ||
| assert result[15] is None # Nullable default | ||
|
|
||
| # Provided values (should be kept as-is) | ||
| assert result[16] == 'Alice' # Provided string | ||
| assert result[17] == 42 # Provided int | ||
| assert result[18] == UUID('00000000-0000-0000-0000-000000000001') # Provided UUID | ||
|
|
||
| def test_safe_cast_types(self, project): | ||
| """Test that safe_cast preserves the expected data types""" | ||
| # Get column types from ClickHouse | ||
| columns = project.run_sql( | ||
| "SELECT name, type FROM system.columns WHERE table = 'safe_cast_test' AND database = currentDatabase() ORDER BY name", | ||
| fetch="all" | ||
| ) | ||
|
|
||
| # Create a dict for easier lookup | ||
| column_types = {col[0]: col[1] for col in columns} | ||
|
|
||
| # Verify each column has the expected type | ||
| # String types | ||
| assert column_types['string_default'] == 'String' | ||
| assert column_types['fixedstring_default'] == 'FixedString(10)' | ||
|
|
||
| # Integer types | ||
| assert column_types['int_default'] == 'Int32' | ||
| assert column_types['uint_default'] == 'UInt32' | ||
|
|
||
| # Floating point types | ||
| assert column_types['float_default'] == 'Float32' | ||
| assert column_types['decimal_default'] == 'Decimal(10, 2)' | ||
|
|
||
| # Date/Time types | ||
| assert column_types['date_default'] == 'Date' | ||
| assert column_types['datetime_default'] == 'DateTime' | ||
| assert column_types['datetime64_default'] == 'DateTime64(3)' | ||
| assert column_types['datetime_tz_default'] == "DateTime('Europe/Paris')" | ||
|
|
||
| # Other types | ||
| assert column_types['uuid_default'] == 'UUID' | ||
| assert column_types['bool_default'] == 'Bool' | ||
|
|
||
| # Complex types | ||
| assert column_types['array_default'] == 'Array(String)' | ||
| assert column_types['map_default'] == 'Map(String, Int32)' | ||
| assert column_types['tuple_default'] == 'Tuple(String, Int32)' | ||
|
|
||
| # Nullable | ||
| assert column_types['nullable_default'] == 'Nullable(String)' | ||
|
|
||
| # Provided values | ||
| assert column_types['provided_string'] == 'String' | ||
| assert column_types['provided_int'] == 'Int32' | ||
| assert column_types['provided_uuid'] == 'UUID' | ||
|
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition
field == 'null'checks if the field is literally the string 'null', not if it's a SQL NULL value. For SQL NULL values, you should use{{ field }} is nullinstead. The current implementation will only work if 'null' is passed as a string literal.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct that the condition field == 'null' checks for the literal string 'null' rather than a SQL NULL value. This is actually intentional for this macro's specific use case.
The safe_cast macro is designed specifically to solve issue #315 - handling non-nullable ClickHouse columns in dbt unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MaelleBaillet5 ! Thanks for contributing this. This PR is looking nice and I'd like to merge it ASAP.
The only thing I'm missing now is a proper test for the unit-testing part. As you mention, this is specifically designed to solve the issue in #315, so it would be great to add a test to ensure #315 is not happening again. I think creating an additional test and adding there the unit-test listed in the "Steps to reproduce" section would be enough. Would you add it?