Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DuckDB data diffs work better #716

Merged
merged 9 commits into from Oct 10, 2023
Merged

Conversation

sungchun12
Copy link
Contributor

@sungchun12 sungchun12 commented Sep 26, 2023

Get duckdb/motherduck to work with data-diff using joindiff within the dbt-core integration.

Right now, it defaults to hashdiff which is not as performant.

Scrub sensitive motherduck tokens in logs.

Known Limitation with SaaS only mode in that it will not work with data-diff. We'll have to change multiple query runner functions to avoid race conditions which is outside the scope of this PR.

 ~/De/data-diff/datafold-demo-sung  demo-pr !1 ?1  data-diff --conf datadiff.toml --run demo_xdb_duckdb -k "id" -c status --debug 
12:31:03 DEBUG    Applied run configuration: {'verbose': False, 'database1': {'driver': 'duckdb',         __main__.py:296
                  'filepath': 'md:datafold_demo?motherduck_token=**********', 'database':                                
                  'datafold_demo'}, 'table1': 'development.raw_orders', 'database2': {'driver':                          
                  'snowflake', 'database': 'DEV', 'user': 'sung', 'password': '*************', 'account':                
                  'bya42734', 'schema': 'DEVELOPMENT_SUNG', 'warehouse': 'DEMO', 'role': 'DEMO_ROLE'},                   
                  'table2': 'RAW_ORDERS'}                                                                                
12:31:06 DEBUG    Running SQL (DuckDB): SET GLOBAL TimeZone='UTC'                                             base.py:879
         ERROR    Invalid Input Error: Cannot change configuration option "TimeZone" - the configuration  __main__.py:332
                  has been locked                                                                                        
Traceback (most recent call last):
  File "/Users/sung/Desktop/data-diff/datafold-demo-sung/venv/bin/data-diff", line 8, in <module>
    sys.exit(main())
  File "/Users/sung/Desktop/data-diff/datafold-demo-sung/venv/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/sung/Desktop/data-diff/datafold-demo-sung/venv/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/sung/Desktop/data-diff/datafold-demo-sung/venv/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/sung/Desktop/data-diff/datafold-demo-sung/venv/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/sung/Desktop/data-diff/data_diff/__main__.py", line 328, in main
    return _data_diff(
  File "/Users/sung/Desktop/data-diff/data_diff/__main__.py", line 408, in _data_diff
    db1 = connect(database1, threads1 or threads)
  File "/Users/sung/Desktop/data-diff/data_diff/databases/_connect.py", line 278, in __call__
    conn = self.connect_with_dict(db_conf, thread_count, **kwargs)
  File "/Users/sung/Desktop/data-diff/data_diff/databases/_connect.py", line 222, in connect_with_dict
    return self._connection_created(db)
  File "/Users/sung/Desktop/data-diff/data_diff/databases/_connect.py", line 302, in _connection_created
    db.query(db.dialect.set_timezone_to_utc())
  File "/Users/sung/Desktop/data-diff/data_diff/databases/base.py", line 893, in query
    res = self._query(sql_code)
  File "/Users/sung/Desktop/data-diff/data_diff/databases/duckdb.py", line 162, in _query
    return self._query_conn(self._conn, sql_code)
  File "/Users/sung/Desktop/data-diff/data_diff/databases/base.py", line 1071, in _query_conn
    return apply_query(callback, sql_code)
  File "/Users/sung/Desktop/data-diff/data_diff/databases/base.py", line 201, in apply_query
    return callback(sql_code)
  File "/Users/sung/Desktop/data-diff/data_diff/databases/base.py", line 1056, in _query_cursor
    c.execute(sql_code)
duckdb.InvalidInputException: Invalid Input Error: Cannot change configuration option "TimeZone" - the configuration has been locked

@sungchun12 sungchun12 linked an issue Sep 26, 2023 that may be closed by this pull request
data_diff/joindiff_tables.py Outdated Show resolved Hide resolved
data_diff/joindiff_tables.py Outdated Show resolved Hide resolved
data_diff/sqeleton/databases/duckdb.py Outdated Show resolved Hide resolved
@sungchun12 sungchun12 changed the title Feature/motherduck-support Make DuckDB data diffs work better Sep 26, 2023
@sungchun12 sungchun12 self-assigned this Sep 26, 2023
@sungchun12 sungchun12 added the enhancement New feature or request label Sep 26, 2023
tests/test_utils.py Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Show resolved Hide resolved
@sungchun12 sungchun12 marked this pull request as ready for review October 9, 2023 19:38
@sungchun12 sungchun12 requested a review from dlawin October 9, 2023 19:38
@sungchun12
Copy link
Contributor Author

Working dry run using motherduck with joindiff in action.

datafold_demo:

  target: dev
  outputs:
    dev:
      type: duckdb
      schema: development
      # path: 'datafold_demo.duckdb'
      path: 'md:datafold_demo?motherduck_token={{ env_var("motherduck_token") }}'
      threads: 16
 ~/De/data-diff/data_diff_demo  main !3 ?1  data-diff --dbt --debug              ✔  data_diff_demo 🐍  12:39:19 PM 
Running with data-diff=0.9.2 (Update 0.9.3 is available!)
12:39:25 INFO     Parsing file dbt_project.yml                                                          dbt_parser.py:287
         INFO     Parsing file /Users/sung/Desktop/data-diff/data_diff_demo/target/manifest.json        dbt_parser.py:280
         INFO     Parsing file target/run_results.json                                                  dbt_parser.py:253
         INFO     config: prod_database='datafold_demo' prod_schema='production'                        dbt_parser.py:159
                  prod_custom_schema=None datasource_id=6357                                                             
         INFO     Parsing file /Users/sung/Desktop/data-diff/data_diff_demo/profiles.yml                dbt_parser.py:294
         DEBUG    Found PKs via Uniqueness tests: {'order_id'}                                          dbt_parser.py:458
12:39:27 DEBUG    Running SQL (DuckDB): SET GLOBAL TimeZone='UTC'                                             base.py:879
         DEBUG    Running SQL (DuckDB): SELECT column_name, data_type, datetime_precision, numeric_precision, base.py:879
                  numeric_scale FROM datafold_demo.information_schema.columns WHERE table_name = 'orders' AND            
                  table_schema = 'production'                                                                            
         DEBUG    Running SQL (DuckDB): SELECT column_name, data_type, datetime_precision, numeric_precision, base.py:879
                  numeric_scale FROM datafold_demo.information_schema.columns WHERE table_name = 'orders' AND            
                  table_schema = 'development'                                                                           
         DEBUG    Running SQL (DuckDB): SELECT column_name, data_type, datetime_precision, numeric_precision, base.py:879
                  numeric_scale FROM datafold_demo.information_schema.columns WHERE table_name = 'orders' AND            
                  table_schema = 'production'                                                                            
         DEBUG    Running SQL (DuckDB): SELECT TRIM("status") FROM "datafold_demo"."production"."orders"      base.py:879
                  LIMIT 64                                                                                               
         DEBUG    [DuckDB] Schema = {'order_id': Integer(precision=0, python_type=<class 'int'>),            schema.py:12
                  'customer_id': Integer(precision=0, python_type=<class 'int'>), 'order_date':                          
                  UnknownColType(text='DATE'), 'status': String_VaryingAlphanum(), 'credit_card_amount':                 
                  Float(precision=13), 'coupon_amount': Float(precision=13), 'bank_transfer_amount':                     
                  Float(precision=13), 'gift_card_amount': Float(precision=13), 'amount':                                
                  Float(precision=13)}                                                                                   
         DEBUG    Running SQL (DuckDB): SELECT column_name, data_type, datetime_precision, numeric_precision, base.py:879
                  numeric_scale FROM datafold_demo.information_schema.columns WHERE table_name = 'orders' AND            
                  table_schema = 'development'                                                                           
         DEBUG    Running SQL (DuckDB): SELECT TRIM("status") FROM "datafold_demo"."development"."orders"     base.py:879
                  LIMIT 64                                                                                               
12:39:28 DEBUG    [DuckDB] Schema = {'order_id': Integer(precision=0, python_type=<class 'int'>),            schema.py:12
                  'customer_id': Integer(precision=0, python_type=<class 'int'>), 'order_date':                          
                  UnknownColType(text='DATE'), 'status': String_VaryingAlphanum(), 'credit_card_amount':                 
                  Float(precision=13), 'coupon_amount': Float(precision=13), 'bank_transfer_amount':                     
                  Float(precision=13), 'gift_card_amount': Float(precision=13), 'amount':                                
                  Float(precision=13)}                                                                                   
         DEBUG    Testing for duplicate keys                                                       joindiff_tables.py:230
         INFO     Validating that the are no duplicate keys in columns: ['order_id']               joindiff_tables.py:243
         DEBUG    Running SQL (DuckDB): SELECT count(*) AS "total", count(distinct                            base.py:879
                  coalesce("order_id"::VARCHAR, '<null>')) AS "total_distinct" FROM                                      
                  "datafold_demo"."production"."orders"                                                                  
         DEBUG    Collecting stats for table #1                                                    joindiff_tables.py:270
         DEBUG    Querying for different rows                                                      joindiff_tables.py:208
         DEBUG    Running SQL (DuckDB): SELECT sum("amount") AS "sum_amount", sum("credit_card_amount") AS    base.py:879
                  "sum_credit_card_amount", sum("customer_id") AS "sum_customer_id", sum("gift_card_amount")             
                  AS "sum_gift_card_amount", sum("bank_transfer_amount") AS "sum_bank_transfer_amount",                  
                  sum("coupon_amount") AS "sum_coupon_amount", count(*) AS "count" FROM                                  
                  "datafold_demo"."production"."orders"                                                                  
         DEBUG    Running SQL (DuckDB): SELECT * FROM (SELECT ("tmp2"."order_id" IS NULL) AS                  base.py:879
                  "is_exclusive_a", ("tmp1"."order_id" IS NULL) AS "is_exclusive_b", CASE WHEN                           
                  "tmp1"."order_id" is distinct from "tmp2"."order_id" THEN 1 ELSE 0 END AS                              
                  "is_diff_order_id", CASE WHEN "tmp1"."amount" is distinct from "tmp2"."amount" THEN 1 ELSE             
                  0 END AS "is_diff_amount", CASE WHEN "tmp1"."order_date" is distinct from                              
                  "tmp2"."order_date" THEN 1 ELSE 0 END AS "is_diff_order_date", CASE WHEN                               
                  "tmp1"."credit_card_amount" is distinct from "tmp2"."credit_card_amount" THEN 1 ELSE 0 END             
                  AS "is_diff_credit_card_amount", CASE WHEN "tmp1"."customer_id" is distinct from                       
                  "tmp2"."customer_id" THEN 1 ELSE 0 END AS "is_diff_customer_id", CASE WHEN                             
                  "tmp1"."gift_card_amount" is distinct from "tmp2"."gift_card_amount" THEN 1 ELSE 0 END AS              
                  "is_diff_gift_card_amount", CASE WHEN "tmp1"."bank_transfer_amount" is distinct from                   
                  "tmp2"."bank_transfer_amount" THEN 1 ELSE 0 END AS "is_diff_bank_transfer_amount", CASE                
                  WHEN "tmp1"."status" is distinct from "tmp2"."status" THEN 1 ELSE 0 END AS                             
                  "is_diff_status", CASE WHEN "tmp1"."coupon_amount" is distinct from "tmp2"."coupon_amount"             
                  THEN 1 ELSE 0 END AS "is_diff_coupon_amount", "tmp1"."order_id"::VARCHAR AS "order_id_a",              
                  "tmp2"."order_id"::VARCHAR AS "order_id_b", "tmp1"."amount"::DECIMAL(38, 13)::VARCHAR AS               
                  "amount_a", "tmp2"."amount"::DECIMAL(38, 13)::VARCHAR AS "amount_b",                                   
                  "tmp1"."order_date"::VARCHAR AS "order_date_a", "tmp2"."order_date"::VARCHAR AS                        
                  "order_date_b", "tmp1"."credit_card_amount"::DECIMAL(38, 13)::VARCHAR AS                               
                  "credit_card_amount_a", "tmp2"."credit_card_amount"::DECIMAL(38, 13)::VARCHAR AS                       
                  "credit_card_amount_b", "tmp1"."customer_id"::VARCHAR AS "customer_id_a",                              
                  "tmp2"."customer_id"::VARCHAR AS "customer_id_b", "tmp1"."gift_card_amount"::DECIMAL(38,               
                  13)::VARCHAR AS "gift_card_amount_a", "tmp2"."gift_card_amount"::DECIMAL(38, 13)::VARCHAR              
                  AS "gift_card_amount_b", "tmp1"."bank_transfer_amount"::DECIMAL(38, 13)::VARCHAR AS                    
                  "bank_transfer_amount_a", "tmp2"."bank_transfer_amount"::DECIMAL(38, 13)::VARCHAR AS                   
                  "bank_transfer_amount_b", "tmp1"."status"::VARCHAR AS "status_a", "tmp2"."status"::VARCHAR             
                  AS "status_b", "tmp1"."coupon_amount"::DECIMAL(38, 13)::VARCHAR AS "coupon_amount_a",                  
                  "tmp2"."coupon_amount"::DECIMAL(38, 13)::VARCHAR AS "coupon_amount_b" FROM                             
                  "datafold_demo"."production"."orders" "tmp1" FULL OUTER JOIN                                           
                  "datafold_demo"."development"."orders" "tmp2" ON ("tmp1"."order_id" = "tmp2"."order_id"))              
                  tmp3 WHERE (("is_diff_order_id" = 1) OR ("is_diff_amount" = 1) OR ("is_diff_order_date" =              
                  1) OR ("is_diff_credit_card_amount" = 1) OR ("is_diff_customer_id" = 1) OR                             
                  ("is_diff_gift_card_amount" = 1) OR ("is_diff_bank_transfer_amount" = 1) OR                            
                  ("is_diff_status" = 1) OR ("is_diff_coupon_amount" = 1))                                               
         INFO     Validating that the are no duplicate keys in columns: ['order_id']               joindiff_tables.py:243
         DEBUG    Running SQL (DuckDB): SELECT count(*) AS "total", count(distinct                            base.py:879
                  coalesce("order_id"::VARCHAR, '<null>')) AS "total_distinct" FROM                                      
                  "datafold_demo"."development"."orders"                                                                 
         DEBUG    Done collecting stats for table #1                                               joindiff_tables.py:306
         DEBUG    Collecting stats for table #2                                                    joindiff_tables.py:270
         DEBUG    Running SQL (DuckDB): SELECT sum("amount") AS "sum_amount", sum("credit_card_amount") AS    base.py:879
                  "sum_credit_card_amount", sum("customer_id") AS "sum_customer_id", sum("gift_card_amount")             
                  AS "sum_gift_card_amount", sum("bank_transfer_amount") AS "sum_bank_transfer_amount",                  
                  sum("coupon_amount") AS "sum_coupon_amount", count(*) AS "count" FROM                                  
                  "datafold_demo"."development"."orders"                                                                 
         DEBUG    Done collecting stats for table #2                                               joindiff_tables.py:306
         DEBUG    Testing for null keys                                                            joindiff_tables.py:252
         DEBUG    Running SQL (DuckDB): SELECT "order_id" FROM "datafold_demo"."production"."orders" WHERE    base.py:879
                  ("order_id" IS NULL)                                                                                   
         DEBUG    Running SQL (DuckDB): SELECT "order_id" FROM "datafold_demo"."development"."orders" WHERE   base.py:879
                  ("order_id" IS NULL)                                                                                   
         DEBUG    Counting exclusive rows                                                          joindiff_tables.py:352
         DEBUG    Running SQL (DuckDB): SELECT count(*) FROM (SELECT * FROM (SELECT ("tmp2"."order_id" IS     base.py:879
                  NULL) AS "is_exclusive_a", ("tmp1"."order_id" IS NULL) AS "is_exclusive_b", CASE WHEN                  
                  "tmp1"."order_id" is distinct from "tmp2"."order_id" THEN 1 ELSE 0 END AS                              
                  "is_diff_order_id", CASE WHEN "tmp1"."amount" is distinct from "tmp2"."amount" THEN 1 ELSE             
                  0 END AS "is_diff_amount", CASE WHEN "tmp1"."order_date" is distinct from                              
                  "tmp2"."order_date" THEN 1 ELSE 0 END AS "is_diff_order_date", CASE WHEN                               
                  "tmp1"."credit_card_amount" is distinct from "tmp2"."credit_card_amount" THEN 1 ELSE 0 END             
                  AS "is_diff_credit_card_amount", CASE WHEN "tmp1"."customer_id" is distinct from                       
                  "tmp2"."customer_id" THEN 1 ELSE 0 END AS "is_diff_customer_id", CASE WHEN                             
                  "tmp1"."gift_card_amount" is distinct from "tmp2"."gift_card_amount" THEN 1 ELSE 0 END AS              
                  "is_diff_gift_card_amount", CASE WHEN "tmp1"."bank_transfer_amount" is distinct from                   
                  "tmp2"."bank_transfer_amount" THEN 1 ELSE 0 END AS "is_diff_bank_transfer_amount", CASE                
                  WHEN "tmp1"."status" is distinct from "tmp2"."status" THEN 1 ELSE 0 END AS                             
                  "is_diff_status", CASE WHEN "tmp1"."coupon_amount" is distinct from "tmp2"."coupon_amount"             
                  THEN 1 ELSE 0 END AS "is_diff_coupon_amount", "tmp1"."order_id"::VARCHAR AS "order_id_a",              
                  "tmp2"."order_id"::VARCHAR AS "order_id_b", "tmp1"."amount"::DECIMAL(38, 13)::VARCHAR AS               
                  "amount_a", "tmp2"."amount"::DECIMAL(38, 13)::VARCHAR AS "amount_b",                                   
                  "tmp1"."order_date"::VARCHAR AS "order_date_a", "tmp2"."order_date"::VARCHAR AS                        
                  "order_date_b", "tmp1"."credit_card_amount"::DECIMAL(38, 13)::VARCHAR AS                               
                  "credit_card_amount_a", "tmp2"."credit_card_amount"::DECIMAL(38, 13)::VARCHAR AS                       
                  "credit_card_amount_b", "tmp1"."customer_id"::VARCHAR AS "customer_id_a",                              
                  "tmp2"."customer_id"::VARCHAR AS "customer_id_b", "tmp1"."gift_card_amount"::DECIMAL(38,               
                  13)::VARCHAR AS "gift_card_amount_a", "tmp2"."gift_card_amount"::DECIMAL(38, 13)::VARCHAR              
                  AS "gift_card_amount_b", "tmp1"."bank_transfer_amount"::DECIMAL(38, 13)::VARCHAR AS                    
                  "bank_transfer_amount_a", "tmp2"."bank_transfer_amount"::DECIMAL(38, 13)::VARCHAR AS                   
                  "bank_transfer_amount_b", "tmp1"."status"::VARCHAR AS "status_a", "tmp2"."status"::VARCHAR             
                  AS "status_b", "tmp1"."coupon_amount"::DECIMAL(38, 13)::VARCHAR AS "coupon_amount_a",                  
                  "tmp2"."coupon_amount"::DECIMAL(38, 13)::VARCHAR AS "coupon_amount_b" FROM                             
                  "datafold_demo"."production"."orders" "tmp1" FULL OUTER JOIN                                           
                  "datafold_demo"."development"."orders" "tmp2" ON ("tmp1"."order_id" = "tmp2"."order_id"))              
                  tmp3 WHERE (("is_diff_order_id" = 1) OR ("is_diff_amount" = 1) OR ("is_diff_order_date" =              
                  1) OR ("is_diff_credit_card_amount" = 1) OR ("is_diff_customer_id" = 1) OR                             
                  ("is_diff_gift_card_amount" = 1) OR ("is_diff_bank_transfer_amount" = 1) OR                            
                  ("is_diff_status" = 1) OR ("is_diff_coupon_amount" = 1)) AND ("is_exclusive_a" OR                      
                  "is_exclusive_b")) tmp4                                                                                
         DEBUG    Counting differences per column                                                  joindiff_tables.py:338
         DEBUG    Running SQL (DuckDB): SELECT sum("is_diff_order_id"), sum("is_diff_amount"),                base.py:879
                  sum("is_diff_order_date"), sum("is_diff_credit_card_amount"), sum("is_diff_customer_id"),              
                  sum("is_diff_gift_card_amount"), sum("is_diff_bank_transfer_amount"),                                  
                  sum("is_diff_status"), sum("is_diff_coupon_amount") FROM (SELECT ("tmp2"."order_id" IS                 
                  NULL) AS "is_exclusive_a", ("tmp1"."order_id" IS NULL) AS "is_exclusive_b", CASE WHEN                  
                  "tmp1"."order_id" is distinct from "tmp2"."order_id" THEN 1 ELSE 0 END AS                              
                  "is_diff_order_id", CASE WHEN "tmp1"."amount" is distinct from "tmp2"."amount" THEN 1 ELSE             
                  0 END AS "is_diff_amount", CASE WHEN "tmp1"."order_date" is distinct from                              
                  "tmp2"."order_date" THEN 1 ELSE 0 END AS "is_diff_order_date", CASE WHEN                               
                  "tmp1"."credit_card_amount" is distinct from "tmp2"."credit_card_amount" THEN 1 ELSE 0 END             
                  AS "is_diff_credit_card_amount", CASE WHEN "tmp1"."customer_id" is distinct from                       
                  "tmp2"."customer_id" THEN 1 ELSE 0 END AS "is_diff_customer_id", CASE WHEN                             
                  "tmp1"."gift_card_amount" is distinct from "tmp2"."gift_card_amount" THEN 1 ELSE 0 END AS              
                  "is_diff_gift_card_amount", CASE WHEN "tmp1"."bank_transfer_amount" is distinct from                   
                  "tmp2"."bank_transfer_amount" THEN 1 ELSE 0 END AS "is_diff_bank_transfer_amount", CASE                
                  WHEN "tmp1"."status" is distinct from "tmp2"."status" THEN 1 ELSE 0 END AS                             
                  "is_diff_status", CASE WHEN "tmp1"."coupon_amount" is distinct from "tmp2"."coupon_amount"             
                  THEN 1 ELSE 0 END AS "is_diff_coupon_amount", "tmp1"."order_id"::VARCHAR AS "order_id_a",              
                  "tmp2"."order_id"::VARCHAR AS "order_id_b", "tmp1"."amount"::DECIMAL(38, 13)::VARCHAR AS               
                  "amount_a", "tmp2"."amount"::DECIMAL(38, 13)::VARCHAR AS "amount_b",                                   
                  "tmp1"."order_date"::VARCHAR AS "order_date_a", "tmp2"."order_date"::VARCHAR AS                        
                  "order_date_b", "tmp1"."credit_card_amount"::DECIMAL(38, 13)::VARCHAR AS                               
                  "credit_card_amount_a", "tmp2"."credit_card_amount"::DECIMAL(38, 13)::VARCHAR AS                       
                  "credit_card_amount_b", "tmp1"."customer_id"::VARCHAR AS "customer_id_a",                              
                  "tmp2"."customer_id"::VARCHAR AS "customer_id_b", "tmp1"."gift_card_amount"::DECIMAL(38,               
                  13)::VARCHAR AS "gift_card_amount_a", "tmp2"."gift_card_amount"::DECIMAL(38, 13)::VARCHAR              
                  AS "gift_card_amount_b", "tmp1"."bank_transfer_amount"::DECIMAL(38, 13)::VARCHAR AS                    
                  "bank_transfer_amount_a", "tmp2"."bank_transfer_amount"::DECIMAL(38, 13)::VARCHAR AS                   
                  "bank_transfer_amount_b", "tmp1"."status"::VARCHAR AS "status_a", "tmp2"."status"::VARCHAR             
                  AS "status_b", "tmp1"."coupon_amount"::DECIMAL(38, 13)::VARCHAR AS "coupon_amount_a",                  
                  "tmp2"."coupon_amount"::DECIMAL(38, 13)::VARCHAR AS "coupon_amount_b" FROM                             
                  "datafold_demo"."production"."orders" "tmp1" FULL OUTER JOIN                                           
                  "datafold_demo"."development"."orders" "tmp2" ON ("tmp1"."order_id" = "tmp2"."order_id"))              
                  tmp3 WHERE (("is_diff_order_id" = 1) OR ("is_diff_amount" = 1) OR ("is_diff_order_date" =              
                  1) OR ("is_diff_credit_card_amount" = 1) OR ("is_diff_customer_id" = 1) OR                             
                  ("is_diff_gift_card_amount" = 1) OR ("is_diff_bank_transfer_amount" = 1) OR                            
                  ("is_diff_status" = 1) OR ("is_diff_coupon_amount" = 1))                                               
12:39:29 INFO     Diffing complete                                                                 joindiff_tables.py:165

datafold_demo.production.orders <> datafold_demo.development.orders 

  Rows Added    Rows Removed
------------  --------------
           0              94

Updated Rows: 0
Unchanged Rows: 5

Values Updated:
amount: 0
order_date: 0
credit_card_amount: 0
customer_id: 0
gift_card_amount: 0
bank_transfer_amount: 0
status: 0
coupon_amount: 0 

@dlawin dlawin requested a review from nolar October 9, 2023 23:21
Copy link
Contributor

@dlawin dlawin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment isn't blocking, existing smell that could be handled in a subsequent PR

@dlawin dlawin self-requested a review October 9, 2023 23:37
@sungchun12
Copy link
Contributor Author

Did a git rebase to remove all the duplicative commit history from past merges.

@sungchun12 sungchun12 removed the request for review from nolar October 10, 2023 16:38
@sungchun12 sungchun12 merged commit fd54d2d into master Oct 10, 2023
6 checks passed
@sungchun12 sungchun12 deleted the feature/motherduck-support branch October 10, 2023 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add support for [motherduck](https://motherduck.com/)
2 participants