Skip to content

[docs] Add file path pattern documentation for S3 TVF and Broker Load#3337

Merged
morningman merged 1 commit intoapache:masterfrom
dataroaring:docs/file-path-pattern
Feb 5, 2026
Merged

[docs] Add file path pattern documentation for S3 TVF and Broker Load#3337
morningman merged 1 commit intoapache:masterfrom
dataroaring:docs/file-path-pattern

Conversation

@dataroaring
Copy link
Contributor

Summary

  • Add new documentation page for file path patterns under sql-manual/basic-element
  • Document supported URI formats (S3, HDFS, and other cloud providers)
  • Document wildcard patterns (*, ?, [...]) and range expansion ({1..10})
  • Add usage examples for S3 TVF, Broker Load, and INSERT INTO SELECT
  • Include performance considerations and troubleshooting guide
  • Update S3 TVF documentation to reference the new file-path-pattern page
  • Update Broker Load documentation to reference the new file-path-pattern page
  • Add sidebar entry for new documentation

Changes

  1. New file: docs/sql-manual/basic-element/file-path-pattern.md

    • Comprehensive documentation on file path patterns
    • Covers S3-style URIs, HDFS URIs, and other cloud storage
    • Documents wildcard patterns and range expansion syntax
    • Includes practical examples and performance tips
  2. Updated: docs/sql-manual/sql-functions/table-valued-functions/s3.md

    • Added reference to file-path-pattern documentation in URI parameter description
    • Updated "URI with Wildcards" section with reference to comprehensive docs
  3. Updated: docs/data-operate/import/import-way/broker-load-manual.md

    • Added "Supported file path patterns" section in Limitations
    • Added reference to file-path-pattern documentation in wildcard example section
  4. Updated: sidebars.ts

    • Added sidebar entry for file-path-pattern under Basic Elements

Test plan

  • Verify documentation builds correctly
  • Verify sidebar navigation works
  • Verify cross-references are correct

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings February 4, 2026 20:03
@dataroaring dataroaring force-pushed the docs/file-path-pattern branch from d1ba844 to 3df2417 Compare February 4, 2026 20:05
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a centralized documentation page for file path patterns (wildcards and range expansion) and wires it into the S3 TVF and Broker Load docs, plus the SQL manual sidebar.

Changes:

  • Add file-path-pattern.md under SQL basic elements, documenting supported URI schemes, glob-style wildcards, brace-based range expansion, examples, performance tips, and troubleshooting.
  • Update the S3 TVF and Broker Load manuals to describe wildcard/range support succinctly and link to the new shared documentation.
  • Register the new page in sidebars.ts under “Basic Elements” so it appears in navigation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
sidebars.ts Adds sql-manual/basic-element/file-path-pattern to the “Basic Elements” sidebar section so the new page is navigable.
docs/sql-manual/sql-functions/table-valued-functions/s3.md Extends the uri parameter description and the “URI with Wildcards” section to mention wildcard and range support and link to the central file-path-pattern doc.
docs/sql-manual/basic-element/file-path-pattern.md New comprehensive documentation for URI schemes, wildcard syntax, brace range expansion, usage examples (S3 TVF, Broker Load, INSERT INTO SELECT), performance considerations, and troubleshooting.
docs/data-operate/import/import-way/broker-load-manual.md Documents supported file path patterns under Limitations and adds an explicit note (with link) in the HDFS wildcard example section.

Notable issue to address before merge:

  • In file-path-pattern.md, the description of brace expansion for zero-padded numeric ranges (stating that leading zeros are stripped) conflicts with later examples that rely on fixed-width, zero-padded filenames. The behavior description and/or examples should be aligned with the actual implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 70 to 71
| `{01..05}` | `{1,2,3,4,5}` | `1`, `2`, `3`, `4`, `5` (leading zeros stripped) |
| `{3..1}` | `{1,2,3}` | `1`, `2`, `3` (reverse ranges supported) |
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description for brace range {01..05} indicates that leading zeros are stripped ({1,2,3,4,5}), but later examples (e.g., month={01..12} and part-{00000..00099}.csv) rely on zero-padded values being preserved in the expanded filenames. This is internally inconsistent documentation: either the note about stripping leading zeros or the later examples are incorrect. Please update the behavior description and/or the examples so they accurately reflect how Doris actually expands zero-padded numeric ranges.

Suggested change
| `{01..05}` | `{1,2,3,4,5}` | `1`, `2`, `3`, `4`, `5` (leading zeros stripped) |
| `{3..1}` | `{1,2,3}` | `1`, `2`, `3` (reverse ranges supported) |
| `{01..05}` | `{01,02,03,04,05}` | `01`, `02`, `03`, `04`, `05` (leading zeros preserved) |
| `{3..1}` | `{3,2,1}` | `3`, `2`, `1` (reverse ranges supported) |

Copilot uses AI. Check for mistakes.
@dataroaring dataroaring force-pushed the docs/file-path-pattern branch 4 times, most recently from 314d317 to 3f15079 Compare February 4, 2026 20:23
- Add new documentation page for file path patterns under sql-manual/basic-element
- Document supported URI formats (S3, HDFS, cloud providers)
- Document wildcard patterns (*, ?, [...]) and range expansion ({1..10})
- Add examples for S3 TVF, Broker Load, and INSERT INTO SELECT
- Include performance considerations and troubleshooting guide
- Add caution about zero-padded directory names with range patterns
- Update S3 TVF documentation to reference file-path-pattern
- Update Broker Load documentation to reference file-path-pattern
- Update INSERT INTO SELECT documentation to reference file-path-pattern
- Simplify file-analysis.md by replacing duplicate content with reference
- Add sidebar entry for new documentation
- Add Chinese translations for all documentation
- Add versioned docs for 4.x (both English and Chinese)
Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 7466507 into apache:master Feb 5, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants