Skip to content

Commit

Permalink
[Spark] Implement Hilbert clustering
Browse files Browse the repository at this point in the history
This PR is part of delta-io#1874.

This PR implements a new data clustering algorithm based on Hilbert curve. No code uses this new implementation yet. Will implement incremental clustering using ZCube in follow-up PRs.

Design can be found at: https://docs.google.com/document/d/1FWR3odjOw4v4-hjFy_hVaNdxHVs4WuK1asfB6M6XEMw/edit#heading=h.uubbjjd24plb.

Closes delta-io#2314

GitOrigin-RevId: abafaa717ba8f7d8809114858c0fd2a25861fcb8
  • Loading branch information
weiluo-db authored and andreaschat-db committed Jan 5, 2024
1 parent 63a6d94 commit 2f33bf6
Show file tree
Hide file tree
Showing 10 changed files with 1,148 additions and 47 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,8 @@ class OptimizeExecutor(
MultiDimClustering.cluster(
input,
approxNumFiles,
zOrderByColumns)
zOrderByColumns,
"zorder")
} else {
val useRepartition = sparkSession.sessionState.conf.getConf(
DeltaSQLConf.DELTA_OPTIMIZE_REPARTITION_ENABLED)
Expand Down
Loading

0 comments on commit 2f33bf6

Please sign in to comment.