Skip to content

fix: elevate timezone warning and implement set_timezone_to_utc() for BigQuery and ClickHouse #29

@dtsong

Description

@dtsong

Problem

_connect.py:298-303 silently catches NotImplementedError from set_timezone_to_utc() at DEBUG level and proceeds normally. BigQuery and ClickHouse both raise here.

This is silent data corruption. When comparing timestamps across databases where one session is in a non-UTC timezone, the bisection algorithm produces wrong ranges and phantom diffs — or worse, masks real diffs.

Scope

  1. BigQuery (bigquery.py:154): Implement using SET @@time_zone = 'UTC' session variable
  2. ClickHouse (clickhouse.py:98): Implement using SET session_timezone = 'UTC'
  3. _connect.py:298-303: Elevate the caught NotImplementedError log from DEBUG to WARNING. Consider adding a --require-utc flag that makes this an error.

Key Files

  • data_diff/databases/_connect.py:296-304
  • data_diff/databases/bigquery.py:154
  • data_diff/databases/clickhouse.py:98

Acceptance Criteria

  • BigQuery set_timezone_to_utc() sets session to UTC
  • ClickHouse set_timezone_to_utc() sets session to UTC
  • Connection factory logs WARNING (not DEBUG) when timezone cannot be set
  • Tests cover timezone normalization for both databases

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0-criticalShip-blocking, fix immediatelybugSomething isn't workingtriage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions