Fix dbt incremental_strategy behavior by fixing schema table existing check #530

case-k-git · 2023-12-09T15:01:35Z

Resolves #
#523

Description

I have made a modification to consider case sensitivity when checking for the presence of a schema. Databricks manages schemas in lowercase, so if there isn't an exact match, it can cause problems in subsequent processes. For example, when setting materialized='incremental' in the SQL Config for an append operation, it checks for the existence of the table before appending the differential data. If the schema is passed in uppercase in this case, since the schema on Databricks is in lowercase, the check for the table's existence will fail, and instead of appending, it will run recreate the existing table even if using incremental option. We have now fixed it to ensure an exact match during the initial schema check.

{{ config(
    materialized='incremental',
    incremental_strategy='append',
) }}

Checklist

I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

benc-db · 2023-12-11T17:06:57Z

It's unclear how the change you're proposing accomplishes your stated goal. It seems if anything, that your change will try to compare a potentially mixed-case schema against a list that I think is natively lower-cased.

case-k-git · 2023-12-12T02:45:18Z

@benc-db
Thank you for your reply. Let me explain. Both Normal Expected Case and Abnormal Unexpected Case

🔹 Normal Expected Case：Defining the schema in lowercase
We use the following profiles.yml and execute the incremental SQL. If the schema is defined in lowercase, there are no problems

databricks_demo:
  target: dev
  outputs:
    dev:
      catalog: catalog_demo
      host: <host>
      http_path: <http_path>
      schema: schema_demo
      threads: 1
      token:  "<token>"
      type: databricks

Execution Log
Upon checking the target table, it can be seen that it has been incremented

dbt run
01:48:33  2 of 2 START sql incremental model schema_demo.daily_model_2 ................... [RUN]
01:48:39  2 of 2 OK created sql incremental model schema_demo.daily_model_2 .............. [OK in 6.12s]

Table Hisotry：
Upon checking the history, it can be seen that it has been appended (Write).

🔹 　Abnormal Unexpected Case：Defining the schema in uppercase
We use the following profiles.yml and execute the incremental SQL. If the schema is defined in uppercase, it behaves unexpectedly

databricks_demo:
  target: dev
  outputs:
    dev:
      catalog: catalog_demo
      host: <host>
      http_path: <http_path>
      schema: Schema_Demo
      threads: 1
      token:  "<token>"
      type: databricks

Execution Log
Upon checking the target table, it can be seen that it has been incremented. But table History is wrong

dbt run
01:48:33  2 of 2 START sql incremental model Schema_Demo.daily_model_2 ................... [RUN]
01:48:39  2 of 2 OK created sql incremental model Schema_Demo.daily_model_2 .............. [OK in 6.12s]

Table History
However, upon checking the history, it turns out that it has been recreated.Should be increment

This happens because when doing an incremental update, it checks for the existence of the table. If the schema is passed in uppercase, it is determined that the table does not exist and thus a recreate is triggered. While it is possible to fix this on the incremental side, I thought it would be more considerate to check beforehand if the schema defined in profiles.yml exactly matches.This is what I would like to solve and the change that I make.

benc-db

I believe you that the problem exists, but the code you're proposing to remove should be exactly preventing what you describe, as it compares the fully lower-cased version of the schema in question to the lower-cased name of every schema in the database. Given that, as you said, Databricks returns the schema name lower-cased, how does removing schema.lower() achieve your ends? Have you tested that using your local version fixes your problem?

case-k-git · 2023-12-13T11:47:33Z

Thank you for your reply. I am attempting to integration test, but I haven't yet been able to create the environment for that.I am trying to do following document way.
https://github.com/databricks/dbt-databricks/blob/main/CONTRIBUTING.MD#functional--integration-tests

What I want to achieve is that if the 'profiles' schema is passed in uppercase, it would be treated as a non-existent schema, leading to a connection failure and a fail state. Currently, even if passed in uppercase, it does not fail to connect, which causes issues in subsequent processes like I shared. By removing the 'lower', I believe it will compare the uppercase 'profiles' schema passed with the lowercase schema retrieved from Databricks, and fail the connection as a non-existent schema.

benc-db · 2023-12-14T17:15:28Z

If the schema is passed in uppercase in this case, since the schema on Databricks is in lowercase, the check for the table's existence will fail

So, I'm now understanding that you are trying to make it fail at the schema step so that you don't recreate the table. I didn't realize that was your goal initially. Ok, so I think the fix that is more inline with our principles is to find out why the table check is failing and fix that instead. In general we trying to make everything lowercase for comparison because, to my knowledge, Databricks is not supposed to be case sensitive. I think getting rid of the lowercasing here will be too disruptive. Can we instead identify why the table check is failing?

case-k-git · 2023-12-16T15:09:07Z

I think getting rid of the lowercasing here will be too disruptive. Can we instead identify why the table check is failing?

Yea I apologize for the confusion, of course we can identify and can find the ideal way to solve this issue. It will be possible that user lost history data. So I think better to solve this issue some how.Thank you!

case-k-git · 2023-12-19T10:00:13Z

I have created the new PR to investigate this schema issue. Could you also check this PR?
#538

Currently I am investigating by passing schema.This PR help to pass default schema option and will be useful to investigate this issue .

case-k-git · 2023-12-29T10:06:09Z

Hi @benc-db

I fix this issue could you please check this change? I have test it locally and confirm the issue has been fixed. I pass the uppercase of schema and confirm instead of recreate table, append the rows into existing tables

To pass the uppercase of schema from dbt integration test , I need to fix existing integration environment. I have created that change as different PR. So please check this PR as well
#541

Test Results

 .tox/integration-databricks-uc-sql-endpoint/bin/python -m pytest -v -m profile_databricks_uc_sql_endpoint -n4 tests/integration/incremental_strategies/test_incremental_strategies.py::TestDeltaStrategiesWarehouse::test_delta_strategies_databricks_uc_sql_endpoint
=========================================== test session starts ===========================================
platform darwin -- Python 3.11.5, pytest-7.4.3, pluggy-1.3.0 -- /Users/user/Library/CloudStorage/OneDrive/Desktop/workspace/dbt-databricks/.tox/integration-databricks-uc-sql-endpoint/bin/python
cachedir: .pytest_cache
rootdir: /Users/user/Library/CloudStorage/OneDrive/Desktop/workspace/dbt-databricks
configfile: pytest.ini
plugins: csv-3.0.0, flaky-3.7.0, dotenv-0.5.2, xdist-3.5.0
4 workers [1 item]      
scheduling tests via LoadScheduling

tests/integration/incremental_strategies/test_incremental_strategies.py::TestDeltaStrategiesWarehouse::test_delta_strategies_databricks_uc_sql_endpoint 

[gw0] [100%] PASSED tests/integration/incremental_strategies/test_incremental_strategies.py::TestDeltaStrategiesWarehouse::test_delta_strategies_databricks_uc_sql_endpoint 

============================================ warnings summary =============================================
tests/integration/incremental_strategies/test_incremental_strategies.py::TestDeltaStrategiesWarehouse::test_delta_strategies_databricks_uc_sql_endpoint
tests/integration/incremental_strategies/test_incremental_strategies.py::TestDeltaStrategiesWarehouse::test_delta_strategies_databricks_uc_sql_endpoint
tests/integration/incremental_strategies/test_incremental_strategies.py::TestDeltaStrategiesWarehouse::test_delta_strategies_databricks_uc_sql_endpoint
tests/integration/incremental_strategies/test_incremental_strategies.py::TestDeltaStrategiesWarehouse::test_delta_strategies_databricks_uc_sql_endpoint
tests/integration/incremental_strategies/test_incremental_strategies.py::TestDeltaStrategiesWarehouse::test_delta_strategies_databricks_uc_sql_endpoint
tests/integration/incremental_strategies/test_incremental_strategies.py::TestDeltaStrategiesWarehouse::test_delta_strategies_databricks_uc_sql_endpoint
tests/integration/incremental_strategies/test_incremental_strategies.py::TestDeltaStrategiesWarehouse::test_delta_strategies_databricks_uc_sql_endpoint
  /Users/user/Library/CloudStorage/OneDrive/Desktop/workspace/dbt-databricks/dbt/adapters/databricks/connections.py:693: UserWarning: The cursor was closed by destructor.

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================ 1 passed, 7 warnings in 85.58s (0:01:25) =================================

Query History ※ 2024/01/03 modified image

Checked uppercase of schema has been passed from integration test and list table check has been done by lowercase converted schema.Table existing check has been passed and this change increment rows into existing table instead of recreate table

Table History: ※ 2024/01/03 modified image

Checked append the row as expected. In the second log of the table history, 「WRITE」 is recorded.

def list_tables(self, database: str, schema: str, identifier: Optional[str] = None) -> Table:
        database = database.strip("`")
        schema = schema.strip("`").lower() # The test passed successfully

I also try to test without lower convert from list tables method. I confirmed instead of append rows, table has been recreated and integration test is failed as we expected. In the second log of the table history, 「CREATE OR REPLACE」 is recorded.

def list_tables(self, database: str, schema: str, identifier: Optional[str] = None) -> Table:
        database = database.strip("`")
        schema = schema.strip("`") # The test failed because it was recreated

case-k-git · 2023-12-29T10:13:19Z

I think this is the same issue
#523

benc-db · 2024-01-02T17:10:31Z

Sorry for the delay, was taking holidays. Will take a look today.

case-k-git · 2024-01-03T06:32:02Z

@benc-db Thank you! I modified the query and table history images to make the verification results easier to understand.

case-k-git · 2024-01-09T00:52:04Z

Hello @benc-db Thank you for the other PR check that I requested ! I appreciate if you check this PR as well when you have time. Thank you.

benc-db · 2024-01-11T23:19:21Z

@case-k-git please resolve the conflict, and then I will run tests and review :)

case-k-git · 2024-01-12T00:28:40Z

@benc-db Thank you fix the conflict

benc-db · 2024-01-12T17:06:02Z

Thanks for your help and patience. Will merge today, and this will be in the next release (which I expect will come out next week).

case-k-git · 2024-01-14T07:02:22Z

@benc-db Thank you !!

update-schema-check-validation

1c02106

case-k-git requested review from andrefurlan-db, susodapop, benc-db and rcypher-databricks as code owners December 9, 2023 15:01

case-k-git changed the title ~~update-schema-check-validation~~ update schema existing check validation considering the distinction between uppercase and lowercase Dec 9, 2023

refactor schema check function

9a8c1e4

benc-db reviewed Dec 12, 2023

View reviewed changes

This was referenced Dec 29, 2023

Pass the profile setting default schema for schema-related tests. #538

Closed

Eliminate the conversion of the schema to lowercase for schema-related test #541

Merged

fix schema check when executing incremental method

7fb8a17

case-k-git changed the title ~~update schema existing check validation considering the distinction between uppercase and lowercase~~ Fix dbt incremental_strategy behavior by fixing schema table existing check Dec 29, 2023

case-k-git requested a review from benc-db December 29, 2023 10:11

Merge branch 'main' into faeture/update-schema-check-validation

7787337

benc-db temporarily deployed to azure-prod January 12, 2024 00:30 — with GitHub Actions Inactive

update CHANGELOG.md

0e30975

benc-db previously approved these changes Jan 12, 2024

View reviewed changes

Update CHANGELOG.md

ac0cab6

benc-db dismissed their stale review via ac0cab6 January 12, 2024 17:07

benc-db approved these changes Jan 12, 2024

View reviewed changes

benc-db merged commit 7421491 into databricks:main Jan 12, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dbt incremental_strategy behavior by fixing schema table existing check #530

Fix dbt incremental_strategy behavior by fixing schema table existing check #530

case-k-git commented Dec 9, 2023 •

edited

benc-db commented Dec 11, 2023

case-k-git commented Dec 12, 2023 •

edited

benc-db left a comment

case-k-git commented Dec 13, 2023

benc-db commented Dec 14, 2023

case-k-git commented Dec 16, 2023 •

edited

case-k-git commented Dec 19, 2023 •

edited

case-k-git commented Dec 29, 2023 •

edited

case-k-git commented Dec 29, 2023

benc-db commented Jan 2, 2024

case-k-git commented Jan 3, 2024 •

edited

case-k-git commented Jan 9, 2024

benc-db commented Jan 11, 2024

case-k-git commented Jan 12, 2024

benc-db commented Jan 12, 2024

case-k-git commented Jan 14, 2024

Fix dbt incremental_strategy behavior by fixing schema table existing check #530

Fix dbt incremental_strategy behavior by fixing schema table existing check #530

Conversation

case-k-git commented Dec 9, 2023 • edited

Description

Checklist

benc-db commented Dec 11, 2023

case-k-git commented Dec 12, 2023 • edited

benc-db left a comment

Choose a reason for hiding this comment

case-k-git commented Dec 13, 2023

benc-db commented Dec 14, 2023

case-k-git commented Dec 16, 2023 • edited

case-k-git commented Dec 19, 2023 • edited

case-k-git commented Dec 29, 2023 • edited

case-k-git commented Dec 29, 2023

benc-db commented Jan 2, 2024

case-k-git commented Jan 3, 2024 • edited

case-k-git commented Jan 9, 2024

benc-db commented Jan 11, 2024

case-k-git commented Jan 12, 2024

benc-db commented Jan 12, 2024

case-k-git commented Jan 14, 2024

case-k-git commented Dec 9, 2023 •

edited

case-k-git commented Dec 12, 2023 •

edited

case-k-git commented Dec 16, 2023 •

edited

case-k-git commented Dec 19, 2023 •

edited

case-k-git commented Dec 29, 2023 •

edited

case-k-git commented Jan 3, 2024 •

edited