docs: Update Parquet scan documentation by andygrove · Pull Request #3433 · apache/datafusion-comet

andygrove · 2026-02-06T14:09:19Z

Overview

This PR removes all references to the deprecated native_comet scan implementation from the documentation and
configuration, and improves the accuracy and clarity of the Parquet scan documentation.

Changed Files

`common/src/main/scala/org/apache/comet/CometConf.scala`

Changed the category of spark.comet.scan.impl from CATEGORY_SCAN to CATEGORY_PARQUET
Rewrote the doc string to describe native_datafusion and native_iceberg_compat without referencing
native_comet
Removed the .internal() marker, making this configuration visible to users

`docs/source/contributor-guide/parquet_scans.md`

Major rewrite of the Parquet scan documentation:

Removed all references to the deprecated native_comet scan (previously listed as one of three implementations)
Removed the comparison table that included native_comet and the "benefits over native_comet" section
Removed the separate native_comet S3 section (which described Hadoop-AWS-based S3 access)
Updated the S3 configuration and examples sections to reference both native_datafusion and native_iceberg_compat
(previously only referenced native_datafusion)
Clarified that auto mode currently always selects native_iceberg_compat
Separated limitations into two clear categories:
- Fallback to Spark (safe): unsupported features that cause Comet to fall back to Spark, producing correct
  results with reduced performance
- Potential incorrect results: issues that do not fall back and may produce wrong answers (datetime rebasing
  for both scans, hard-coded config defaults for native_iceberg_compat)
Added previously undocumented native_datafusion limitations that cause fallback:
- Dynamic Partition Pruning (DPP)
- input_file_name(), input_file_block_start(), input_file_block_length() SQL functions
- Spark metadata columns (e.g., _metadata.file_path)
Added Parquet encryption as a shared fallback limitation
Fixed misleading wording for ignoreMissingFiles/ignoreCorruptFiles (previously said "not compatible with Spark",
now clarifies it falls back to Spark)
Removed stale issue links (#1545, #1758) that referenced old native_datafusion issues

`docs/source/contributor-guide/ffi.md`

Replaced reference to native_comet with a general description of scans that use mutable buffers

`docs/source/contributor-guide/roadmap.md`

Removed the "Removing the native_comet scan implementation" roadmap section (now completed)
Simplified the Iceberg integration description by removing the mention of the native_comet to
native_iceberg_compat transition

Fix grammar, add encryption fallback and native_iceberg_compat hard-coded config limitations, clarify S3 section applies to both scan implementations, and remove orphaned link references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Clarify which limitations fall back to Spark vs which may produce incorrect results. Add missing documented limitations for native_datafusion (DPP, input_file_name, metadata columns). Fix misleading wording for ignoreCorruptFiles/ignoreMissingFiles. Note that auto mode currently always selects native_iceberg_compat. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The section intro already states all limitations fall back to Spark, so individual bullet points don't need to repeat it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Restructure shared and per-scan limitation lists into two clear categories: features that fall back to Spark (safe) and issues that may produce incorrect results without falling back. Remove redundant "Comet falls back to Spark" from individual bullets where the section intro already states it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

andygrove and others added 5 commits February 6, 2026 07:08

docs: remove all mentions of native_comet scan

8f3d2de

update

87cd794

prettier

e394f2a

update config docs

fbe2f33

andygrove changed the title ~~docs: remove all mentions of native_comet scan~~ docs: Update Parquet scan documentation Feb 9, 2026

prettier

c25a7cd

andygrove added this to the 0.14.0 milestone Feb 9, 2026

andygrove and others added 5 commits February 13, 2026 12:52

Merge remote-tracking branch 'apache/main' into native_comet_docs

69a4e0b

docs: remove redundant fallback language in native_datafusion section

32334bd

The section intro already states all limitations fall back to Spark, so individual bullet points don't need to repeat it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix

2789c36

andygrove marked this pull request as ready for review February 13, 2026 20:07

update

0266613

andygrove requested review from comphead, mbutrovich and parthchandra February 13, 2026 20:12

andygrove and others added 2 commits February 13, 2026 13:53

remove encryption from unsupported list, move DPP to common list

ba192a1

Merge branch 'main' into native_comet_docs

a696cf1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Update Parquet scan documentation#3433

docs: Update Parquet scan documentation#3433
andygrove wants to merge 14 commits intoapache:mainfrom
andygrove:native_comet_docs

andygrove commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andygrove commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changed Files

common/src/main/scala/org/apache/comet/CometConf.scala

docs/source/contributor-guide/parquet_scans.md

docs/source/contributor-guide/ffi.md

docs/source/contributor-guide/roadmap.md

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andygrove commented Feb 6, 2026 •

edited

Loading

`common/src/main/scala/org/apache/comet/CometConf.scala`

`docs/source/contributor-guide/parquet_scans.md`

`docs/source/contributor-guide/ffi.md`

`docs/source/contributor-guide/roadmap.md`