Skip to content

docs: Add release 1.2.0 documentation#18880

Merged
yihua merged 5 commits into
apache:asf-sitefrom
yihua:cut-1.2.0-docs
May 29, 2026
Merged

docs: Add release 1.2.0 documentation#18880
yihua merged 5 commits into
apache:asf-sitefrom
yihua:cut-1.2.0-docs

Conversation

@yihua
Copy link
Copy Markdown
Contributor

@yihua yihua commented May 29, 2026

Describe the issue this Pull Request addresses

Cuts the 1.2.0 documentation by snapshotting the current docs into versioned_docs/version-1.2.0 and making 1.2.0 the default version in docusaurus.config.js. Also refreshes hardcoded Hudi versions in code examples (1.0.x / 1.1.1 → 1.2.0) within the 1.2.0 docs.

Release-page redirects continue to point at release-1.1; they will be flipped to release-1.2 in the follow-up PR that adds releases/release-1.2.md.

Summary and Changelog

  • Add versioned_docs/version-1.2.0 + sidebar, register 1.2.0 in versions.json
  • Set 1.2.0 as the default released version in docusaurus.config.js
  • Bump hardcoded Maven / bundle / Dataproc examples to 1.2.0 in the 1.2.0 docs

Impact

As above

Risk Level

none

Documentation Update

Release 1.2.0 documentation

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

Snapshot the current docs into versioned_docs/version-1.2.0 and make 1.2.0
the default in docusaurus.config.js. Bump release-1.1 redirect targets to
release-1.2 and refresh hardcoded Hudi versions (1.0.x/1.1.1) in code
examples to 1.2.0.
@yihua yihua changed the title [DOCS] Cut 1.2.0 docs docs: Cut release 1.2.0 documentation May 29, 2026
@yihua yihua changed the title docs: Cut release 1.2.0 documentation docs: Add release 1.2.0 documentation May 29, 2026
@github-actions github-actions Bot added docs size:XL PR with lines of changes > 1000 labels May 29, 2026
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR snapshots the current docs into versioned_docs/version-1.2.0 and bumps hardcoded versions, registering 1.2.0 as the default, while also introducing new 1.2.0 content around Lance, vector search, BLOB/VARIANT types, HMS 4.x JDBC fallback, and new Flink/Spark configs. The version bumps are mostly consistent, but a few stale version strings appear to have been missed (e.g., hudi-gcp-bundle-0.13.0.jar, hudi-flink-bundle_2.11-0.9.0.jar, the CLI table-version mapping), and there are some typos, a leftover author TODO, and minor inconsistencies in the new feature descriptions worth a cleanup pass. Please take a look at the inline comments, and this should be ready for a Hudi committer or PMC member to take it from here.

Comment thread website/versioned_docs/version-1.2.0/gcp_bigquery.md Outdated
|----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Flink global RLI with native MDT access | `index.type=GLOBAL_RECORD_LEVEL_INDEX`<br />`metadata.enabled=true`<br />`index.global.enabled=true`<br />`index.bootstrap.enabled=false`<br />`hoodie.metadata.record.level.index.defer.init=false` | Flink reads global record locations directly from the metadata table and uses the in-task RLI cache for recently accessed keys. Use this when a shared metadata-table index is preferred over task-local RocksDB state. |
| Flink global RLI with local RocksDB cache | `index.type=GLOBAL_RECORD_LEVEL_INDEX`<br />`metadata.enabled=true`<br />`index.global.enabled=true`<br />`index.bootstrap.enabled=true`<br />`index.bootstrap.rocksdb.path=<local-path>`<br />`hoodie.metadata.record.level.index.defer.init=false` | Flink bootstraps the global RLI into a local RocksDB backend. This can reduce repeated metadata-table lookups at the cost of local disk usage and bootstrap time. |
| Dynamic bucket scaling with partitioned RLI | `index.type=RECORD_LEVEL_INDEX`<br />`metadata.enabled=true`<br />`index.global.enabled=false`<br />`hoodie.metadata.record.level.index.min.filegroup.count=<initial-file-groups-per-partition>`<br />`hoodie.metadata.record.level.index.max.filegroup.count=<max-file-groups-per-partition>`<br />Optionally tune `index.rli.cache.size` and `index.rli.cache.concurrent.partitions.num` for the partition cache. | Flink uses partition-scoped RLI to route existing keys to their recorded file group and assigns new keys through dynamic bucket assignment. This supports streaming upsert and insert overwrite workloads. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The default column for index.bootstrap.enabled says false (Optional), but the description says "If not explicitly set for global RLI, Flink enables bootstrap by default." These appear contradictory. The companion table in ingestion_flink.md simply lists the default as false. Could you clarify whether the actual runtime default flips to true for GLOBAL_RECORD_LEVEL_INDEX, or whether the description is overstating it? Otherwise users wiring this up for the first time may set it explicitly only because they're unsure.

- AI-generated; verify before applying. React 👍/👎 to flag quality.


| Option Name | Description | Default | Remarks |
| ----------- | ------- | ------- | ------- |
| `write.buffer.disruptor.ring.size` | Size of the Disruptor ring buffer (must be a power of 2) | `16384` | Larger values absorb write bursts but consume more heap memory |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The wording "As of Hudi 1.2.0, Flink writers support TIMELINE_SERVER_BASED marker type" reads as if this is a brand-new capability in 1.2.0. TIMELINE_SERVER_BASED markers themselves have been around since 0.9.0. Is the intent to call out that Flink specifically gained support in 1.2.0, or is this restating an existing feature? @danny0405 could you confirm when Flink writers gained TIMELINE_SERVER_BASED marker support so this can be worded accurately?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

Comment thread website/versioned_docs/version-1.2.0/flink-quick-start-guide.md Outdated
Comment thread website/versioned_docs/version-1.2.0/flink-quick-start-guide.md
Comment thread website/versioned_docs/version-1.2.0/sql_queries.md
Comment thread website/versioned_docs/version-1.2.0/timeline.md
a query vector under the chosen distance metric.

```sql
SELECT *
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Minor wording cleanup: secondary keys stores in the secondary index partitionsecondary keys stored in the secondary index partition. Also two lines later, All the different index types can be queries by specifying...queried.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

-- COW
CREATE TABLE my_ai_table (
id STRING,
embedding VECTOR(768),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The Lance CREATE TABLE example sets hoodie.record.merger.impls = 'org.apache.hudi.DefaultSparkRecordMerger'. This config typically accepts a comma-separated list of merger class names. Is DefaultSparkRecordMerger the correct FQCN required here, and is it a required property for Lance-backed tables, or only recommended? A short sentence clarifying why this needs to be set explicitly (and that it's a single value here, not the full list) would help users not familiar with the merger API. @yihua could you confirm the recommended value here?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:10000
hoodie.datasource.hive_sync.username=<username>
hoodie.datasource.hive_sync.password=<password>
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The HMS 4.x JDBC fallback section says fallback only happens for callers that go through Thrift first, and the flag never resets within a HoodieHiveSyncClient instance. Should we explicitly note here whether long-running writers (e.g., Hudi Streamer continuous mode) will re-probe Thrift on each sync run, or whether the same sync-client instance is reused across runs and therefore stays in JDBC-fallback mode for the lifetime of the process? Users tuning continuous pipelines on HMS 4.x will want to know.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

@yihua yihua merged commit 51a185c into apache:asf-site May 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants