Skip to content

[docs] Improve POC guide with bucket sizing rules and heading cleanup#3445

Merged
dataroaring merged 21 commits intoapache:masterfrom
dataroaring:docs/poc-guide-and-bucket-sizing-v2
Mar 10, 2026
Merged

[docs] Improve POC guide with bucket sizing rules and heading cleanup#3445
dataroaring merged 21 commits intoapache:masterfrom
dataroaring:docs/poc-guide-and-bucket-sizing-v2

Conversation

@dataroaring
Copy link
Contributor

Summary

  • Replace bucket count guidance with clear four-rule approach (BE multiple, minimize count, 20GB/10GB cap, 128 max per partition)
  • Rename section headings for clarity: Key Columns → Sort Key, Typical Use Cases → Example Templates, Common Performance Pitfalls → Performance Pitfalls
  • Merge sparse partition section into a single paragraph
  • Remove unnecessary "Fixing Mistakes" section
  • Fix broken CDC sync link and rule count reference

Test plan

  • Verify docs build successfully
  • Check EN and ZH POC guide pages render correctly
  • Confirm all internal links resolve

🤖 Generated with Claude Code

- Replace bucket guidance with clear four-rule approach
- Rename section headings (Sort Key, Example Templates, Performance Pitfalls)
- Merge sparse partition section into single paragraph
- Remove unnecessary Fixing Mistakes section
- Fix broken link and rule count reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 10, 2026 09:55
Demote sections 1-4 and Important Notes from h2 to h3, nested
under a new 'Table Design' (建表设计) h2 parent heading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves the POC table-design guide by clarifying bucket sizing guidance (four-rule approach), cleaning up headings, and simplifying/streamlining sections in both EN and ZH docs.

Changes:

  • Replaces prior bucket-count guidance with a clearer four-rule approach and updates related references.
  • Renames several headings for clarity and consistency (e.g., Key Columns → Sort Key).
  • Consolidates/simplifies partitioning guidance and removes the “choose wrong” remediation section.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
i18n/zh-CN/docusaurus-plugin-content-docs/current/gettingStarted/must-read-before-poc.md Updates ZH headings/wording to match the new bucket rules and streamlined layout.
docs/gettingStarted/must-read-before-poc.md Updates EN headings/wording to match the new bucket rules and streamlined layout.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

For a POC, **Duplicate Key works for most scenarios**. Switch only if you have a clear need for upsert or pre-aggregation. For a detailed comparison, see [Data Model Overview](../table-design/data-model/overview).

## 2. Key Columns
### 2. Sort Key
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section heading was renamed to “Sort Key”, but this paragraph still refers to “Key columns” throughout. To avoid confusing readers, update the terminology to consistently use “Sort Key” (or “sort key columns”) in this explanation (including “first 36 bytes of …”).

Suggested change
### 2. Sort Key
**Why it matters:** The sort key determines the **physical sort order** on disk. Doris builds a [prefix index](../table-design/index/prefix-index) on the first 36 bytes of the sort key columns, so queries that filter on these columns run significantly faster. However, when a `VARCHAR` column is encountered, the prefix index stops immediately — no subsequent columns are included. So place fixed-size columns (INT, BIGINT, DATE) before VARCHAR to maximize index coverage.

Copilot uses AI. Check for mistakes.
@@ -41,27 +43,21 @@ CREATE TABLE my_table

POC 阶段,**Duplicate Key 适用于大多数场景**。只有在明确需要更新或预聚合时才切换。详细对比见[数据模型概述](../table-design/data-model/overview)。
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ZH doc, most headings are primarily Chinese, but this one is English-first (“Sort Key(排序键)”). For consistency/readability in the localized page, consider switching to a Chinese-first form like “排序键(Sort Key)” (or fully Chinese if that matches the rest of the page style).

Suggested change
POC 阶段,**Duplicate Key 适用于大多数场景**。只有在明确需要更新或预聚合时才切换。详细对比见[数据模型概述](../table-design/data-model/overview)
## 2. 排序键(Sort Key)

Copilot uses AI. Check for mistakes.
dataroaring and others added 2 commits March 10, 2026 03:01
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Significantly shorten Table Design section — remove verbose explanations,
keep only actionable guidance, and link to existing docs for details.
Also trim Example Templates descriptions and Performance Pitfalls to
one-liners with references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dataroaring and others added 16 commits March 10, 2026 05:37
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix "performance can sustain" → "performance holds up"
- Merge competing intros under Table Design
- Replace em dashes with periods/commas throughout
- Remove "small tablets" bullet (overlaps with bucket rule 2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove repeated "data model cannot be changed" (already in intro)
- Replace repeated sort key advice with anchor link

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ify load_to_single_tablet

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dataroaring dataroaring merged commit 5791beb into apache:master Mar 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants