Skip to content

Commit

Permalink
[Spark] Implement Hilbert clustering
Browse files Browse the repository at this point in the history
This PR is part of #1874.

This PR implements a new data clustering algorithm based on Hilbert curve. No code uses this new implementation yet. Will implement incremental clustering using ZCube in follow-up PRs.

Design can be found at: https://docs.google.com/document/d/1FWR3odjOw4v4-hjFy_hVaNdxHVs4WuK1asfB6M6XEMw/edit#heading=h.uubbjjd24plb.

Closes #2314

GitOrigin-RevId: abafaa717ba8f7d8809114858c0fd2a25861fcb8
  • Loading branch information
weiluo-db authored and vkorukanti committed Dec 20, 2023
1 parent feb1258 commit 2940429
Show file tree
Hide file tree
Showing 10 changed files with 1,148 additions and 47 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,8 @@ class OptimizeExecutor(
MultiDimClustering.cluster(
input,
approxNumFiles,
zOrderByColumns)
zOrderByColumns,
"zorder")
} else {
val useRepartition = sparkSession.sessionState.conf.getConf(
DeltaSQLConf.DELTA_OPTIMIZE_REPARTITION_ENABLED)
Expand Down
Loading

0 comments on commit 2940429

Please sign in to comment.