Skip to content

Fix(clickhouse): support non-datetime time column partitioning#3357

Merged
treysp merged 8 commits intomainfrom
trey/ch-time-partition
Nov 18, 2024
Merged

Fix(clickhouse): support non-datetime time column partitioning#3357
treysp merged 8 commits intomainfrom
trey/ch-time-partition

Conversation

@treysp
Copy link
Contributor

@treysp treysp commented Nov 11, 2024

The Clickhouse adapter automatically partitions incremental by time models if the time column is not included in any partitioning expression.

Clickhouse strongly recommends against fine-grained/small partitions. Therefore, the automatic partitioning "floors" the time column to weekly granularity with the toMonday() function.

toMonday() only accepts date/datetime input types, so we currently error if the time column is string type.

This PR allows non-date/datetime time columns by:

  • If time column type known and date/datetime, pass column directly
  • If time column type known and NOT date/datetime, cast time column to DateTime64 before passing to toMonday() then cast output back to original time column type
  • If time column type not known, cast time column to DateTime64 before passing to toMonday()

Implementation note

This PR moves the partitioned_by property from the ModelMeta to _Model class so it can access the _Model column_to_types property.

@treysp treysp requested a review from a team November 11, 2024 20:23
@treysp treysp force-pushed the trey/ch-time-partition branch 2 times, most recently from 19fba89 to ed66bbd Compare November 13, 2024 16:21
@treysp treysp marked this pull request as ready for review November 13, 2024 17:31
@treysp treysp force-pushed the trey/ch-time-partition branch from 53151e0 to 14dee18 Compare November 18, 2024 18:02
@treysp treysp force-pushed the trey/ch-time-partition branch from 14dee18 to 19e2b06 Compare November 18, 2024 18:35
Copy link
Contributor

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for addressing comments 👍

@treysp treysp merged commit 7b9277e into main Nov 18, 2024
@treysp treysp deleted the trey/ch-time-partition branch November 18, 2024 20:01
@swt30
Copy link

swt30 commented Jan 3, 2025

Hey @treysp what version of Clickhouse are you using? I am experimenting with sqlmesh + clickhouse and finding that this causes an error because DateTime64('UTC') needs scale as the first parameter, at least in the latest Clickhouse. I'm new to Clickhouse so perhaps I'm missing something here?

Adding a scale of 6 here and just below fixes the problem. But the choice of 6 feels a bit arbitrary. What if the column were a nanosecond-precision string-formatted timestamp?

@treysp
Copy link
Contributor Author

treysp commented Jan 3, 2025

Thanks for reporting - that looks like a bug.

Precision is essentially arbitrary, so my inclination would be to go for nanosecond to prevent truncation. Do you see any issues with that? Pandas handles nanosecond timestamps, so conversion to Python should be ok.

@treysp
Copy link
Contributor Author

treysp commented Jan 3, 2025

Addressing this in #3582

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants