apache · mbutrovich · Jan 23, 2026 · Jan 23, 2026 · Jan 23, 2026 · Jan 23, 2026
diff --git a/docs/source/contributor-guide/parquet_scans.md b/docs/source/contributor-guide/parquet_scans.md
@@ -45,8 +45,8 @@ The `native_datafusion` and `native_iceberg_compat` scans share the following li
 - When reading Parquet files written by systems other than Spark that contain columns with the logical types `UINT_8`
   or `UINT_16`, Comet will produce different results than Spark because Spark does not preserve or understand these
   logical types. Arrow-based readers, such as DataFusion and Comet do respect these types and read the data as unsigned
-  rather than signed. By default, Comet will fall back to `native_comet` when scanning Parquet files containing `byte` or `short`
-  types (regardless of the logical type). This behavior can be disabled by setting
+  rather than signed. By default, Comet will fall back to Spark's native scan when scanning Parquet files containing
+  `byte` or `short` types (regardless of the logical type). This behavior can be disabled by setting
   `spark.comet.scan.allowIncompatible=true`.
 - No support for default values that are nested types (e.g., maps, arrays, structs). Literal default values are supported.
 
@@ -59,11 +59,11 @@ The `native_datafusion` scan has some additional limitations:
 
 ## S3 Support
 
-There are some
+There are some differences in S3 support between the scan implementations.
 
 ### `native_comet`
 
-The default `native_comet` Parquet scan implementation reads data from S3 using the [Hadoop-AWS module](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html), which
+The `native_comet` Parquet scan implementation reads data from S3 using the [Hadoop-AWS module](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html), which
 is identical to the approach commonly used with vanilla Spark. AWS credential configuration and other Hadoop S3A
 configurations works the same way as in vanilla Spark.
 

diff --git a/docs/source/contributor-guide/roadmap.md b/docs/source/contributor-guide/roadmap.md
@@ -27,11 +27,10 @@ helpful to have a roadmap for some of the major items that require coordination
 ### Iceberg Integration
 
 Iceberg integration is still a work-in-progress ([#2060]), with major improvements expected in the next few
-releases. Once this integration is complete, we plan on switching from the `native_comet` scan to the
-`native_iceberg_compat` scan ([#2189]) so that complex types can be supported.
+releases. The default `auto` scan mode now uses `native_iceberg_compat` instead of `native_comet`, enabling
+support for complex types.
 
 [#2060]: https://github.com/apache/datafusion-comet/issues/2060
-[#2189]: https://github.com/apache/datafusion-comet/issues/2189
 
 ### Spark 4.0 Support
 
@@ -44,12 +43,12 @@ more Spark SQL tests and fully implementing ANSI support ([#313]) for all suppor
 ### Removing the native_comet scan implementation
 
 We are working towards deprecating ([#2186]) and removing ([#2177]) the `native_comet` scan implementation, which
-is the originally scan implementation that uses mutable buffers (which is incompatible with best practices around
+is the original scan implementation that uses mutable buffers (which is incompatible with best practices around
 Arrow FFI) and does not support complex types.
 
-Once we are using the `native_iceberg_compat` scan (which is based on DataFusion's `DataSourceExec`) in the Iceberg
-integration, we will be able to remove the `native_comet` scan implementation, and can then improve the efficiency
-of our use of Arrow FFI ([#2171]).
+Now that the default `auto` scan mode uses `native_iceberg_compat` (which is based on DataFusion's `DataSourceExec`),
+we can proceed with removing the `native_comet` scan implementation, and then improve the efficiency of our use of
+Arrow FFI ([#2171]).
 
 [#2186]: https://github.com/apache/datafusion-comet/issues/2186
 [#2171]: https://github.com/apache/datafusion-comet/issues/2171

diff --git a/docs/source/user-guide/latest/datasources.md b/docs/source/user-guide/latest/datasources.md
@@ -36,7 +36,11 @@ Comet accelerates Iceberg scans of Parquet files. See the [Iceberg Guide] for mo
 
 ### CSV
 
-Comet does not provide native CSV scan, but when `spark.comet.convert.csv.enabled` is enabled, data is immediately
+Comet provides experimental native CSV scan support. When `spark.comet.scan.csv.v2.enabled` is enabled, CSV files
+are read natively for improved performance. This feature is experimental and performance benefits are
+workload-dependent.
+
+Alternatively, when `spark.comet.convert.csv.enabled` is enabled, data from Spark's CSV reader is immediately
 converted into Arrow format, allowing native execution to happen after that.
 
 ### JSON

diff --git a/docs/source/user-guide/latest/expressions.md b/docs/source/user-guide/latest/expressions.md
@@ -68,6 +68,7 @@ Expressions that are not Spark-compatible will fall back to Spark by default and
 | Contains        | Yes               |                                                                                                            |
 | EndsWith        | Yes               |                                                                                                            |
 | InitCap         | No                | Behavior is different in some cases, such as hyphenated names.                                             |
+| Left            | Yes               | Length argument must be a literal value                                                                    |
 | Length          | Yes               |                                                                                                            |
 | Like            | Yes               |                                                                                                            |
 | Lower           | No                | Results can vary depending on locale and character set. Requires `spark.comet.caseConversion.enabled=true` |
@@ -94,15 +95,20 @@ Expressions that are not Spark-compatible will fall back to Spark by default and
 | Expression     | SQL                          | Spark-Compatible? | Compatibility Notes                                                                                                  |
 | -------------- | ---------------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------- |
 | DateAdd        | `date_add`                   | Yes               |                                                                                                                      |
+| DateDiff       | `datediff`                   | Yes               |                                                                                                                      |
+| DateFormat     | `date_format`                | Yes               | Partial support. Only specific format patterns are supported.                                                        |
 | DateSub        | `date_sub`                   | Yes               |                                                                                                                      |
 | DatePart       | `date_part(field, source)`   | Yes               | Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` |
 | Extract        | `extract(field FROM source)` | Yes               | Supported values of `field`: `year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute` |
 | FromUnixTime   | `from_unixtime`              | No                | Does not support format, supports only -8334601211038 <= sec <= 8210266876799                                        |
 | Hour           | `hour`                       | Yes               |                                                                                                                      |
+| LastDay        | `last_day`                   | Yes               |                                                                                                                      |
 | Minute         | `minute`                     | Yes               |                                                                                                                      |
 | Second         | `second`                     | Yes               |                                                                                                                      |
 | TruncDate      | `trunc`                      | Yes               |                                                                                                                      |
-| TruncTimestamp | `trunc_date`                 | Yes               |                                                                                                                      |
+| TruncTimestamp | `date_trunc`                 | Yes               |                                                                                                                      |
+| UnixDate       | `unix_date`                  | Yes               |                                                                                                                      |
+| UnixTimestamp  | `unix_timestamp`             | Yes               |                                                                                                                      |
 | Year           | `year`                       | Yes               |                                                                                                                      |
 | Month          | `month`                      | Yes               |                                                                                                                      |
 | DayOfMonth     | `day`/`dayofmonth`           | Yes               |                                                                                                                      |
@@ -163,6 +169,7 @@ Expressions that are not Spark-compatible will fall back to Spark by default and
 | ----------- | ----------------- |
 | Md5         | Yes               |
 | Murmur3Hash | Yes               |
+| Sha1        | Yes               |
 | Sha2        | Yes               |
 | XxHash64    | Yes               |
 
@@ -256,12 +263,13 @@ Comet supports using the following aggregate functions within window contexts wi
 
 ## Struct Expressions
 
-| Expression           | Spark-Compatible? |
-| -------------------- | ----------------- |
-| CreateNamedStruct    | Yes               |
-| GetArrayStructFields | Yes               |
-| GetStructField       | Yes               |
-| StructsToJson        | Yes               |
+| Expression           | Spark-Compatible? | Compatibility Notes                        |
+| -------------------- | ----------------- | ------------------------------------------ |
+| CreateNamedStruct    | Yes               |                                            |
+| GetArrayStructFields | Yes               |                                            |
+| GetStructField       | Yes               |                                            |
+| JsonToStructs        | No                | Partial support. Requires explicit schema. |
+| StructsToJson        | Yes               |                                            |
 
 ## Conversion Expressions
 

diff --git a/docs/source/user-guide/latest/iceberg.md b/docs/source/user-guide/latest/iceberg.md
@@ -183,14 +183,93 @@ $SPARK_HOME/bin/spark-shell \
 The same sample queries from above can be used to test Comet's fully-native Iceberg integration,
 however the scan node to look for is `CometIcebergNativeScan`.
 
+### Supported features
+
+The native Iceberg reader supports the following features:
+
+**Table specifications:**
+
+- Iceberg table spec v1 and v2 (v3 will fall back to Spark)
+
+**Schema and data types:**
+
+- All primitive types including UUID
+- Complex types: arrays, maps, and structs
+- Schema evolution (adding and dropping columns)
+
+**Time travel and branching:**
+
+- `VERSION AS OF` queries to read historical snapshots
+- Branch reads for accessing named branches
+
+**Delete handling (Merge-On-Read tables):**
+
+- Positional deletes
+- Equality deletes
+- Mixed delete types
+
+**Filter pushdown:**
+
+- Equality and comparison predicates (`=`, `!=`, `>`, `>=`, `<`, `<=`)
+- Logical operators (`AND`, `OR`)
+- NULL checks (`IS NULL`, `IS NOT NULL`)
+- `IN` and `NOT IN` list operations
+- `BETWEEN` operations
+
+**Partitioning:**
+
+- Standard partitioning with partition pruning
+- Date partitioning with `days()` transform
+- Bucket partitioning
+- Truncate transform
+- Hour transform
+
+**Storage:**
+
+- Local filesystem
+- Hadoop Distributed File System (HDFS)
+- S3-compatible storage (AWS S3, MinIO)
+
+### REST Catalog
+
+Comet's native Iceberg reader also supports REST catalogs. The following example shows how to
+configure Spark to use a REST catalog with Comet's native Iceberg scan:
+
+```shell
+$SPARK_HOME/bin/spark-shell \
+    --packages org.apache.datafusion:comet-spark-spark3.5_2.12:0.12.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \
+    --repositories https://repo1.maven.org/maven2/ \
+    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
+    --conf spark.sql.catalog.rest_cat=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.rest_cat.catalog-impl=org.apache.iceberg.rest.RESTCatalog \
+    --conf spark.sql.catalog.rest_cat.uri=http://localhost:8181 \
+    --conf spark.sql.catalog.rest_cat.warehouse=/tmp/warehouse \
+    --conf spark.plugins=org.apache.spark.CometPlugin \
+    --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
+    --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
+    --conf spark.comet.scan.icebergNative.enabled=true \
+    --conf spark.comet.explainFallback.enabled=true \
+    --conf spark.memory.offHeap.enabled=true \
+    --conf spark.memory.offHeap.size=2g
+```
+
+Note that REST catalogs require explicit namespace creation before creating tables:
+
+```scala
+scala> spark.sql("CREATE NAMESPACE rest_cat.db")
+scala> spark.sql("CREATE TABLE rest_cat.db.test_table (id INT, name STRING) USING iceberg")
+scala> spark.sql("INSERT INTO rest_cat.db.test_table VALUES (1, 'Alice'), (2, 'Bob')")
+scala> spark.sql("SELECT * FROM rest_cat.db.test_table").show()
+```
+
 ### Current limitations
 
-The following scenarios are not yet supported, but are work in progress:
+The following scenarios will fall back to Spark's native Iceberg reader:
 
-- Iceberg table spec v3 scans will fall back.
-- Iceberg writes will fall back.
-- Iceberg table scans backed by Avro or ORC data files will fall back.
-- Iceberg table scans partitioned on `BINARY` or `DECIMAL` (with precision >28) columns will fall back.
-- Iceberg scans with residual filters (_i.e._, filter expressions that are not partition values,
-  and are evaluated on the column values at scan time) of `truncate`, `bucket`, `year`, `month`,
-  `day`, `hour` will fall back.
+- Iceberg table spec v3 scans
+- Iceberg writes (reads are accelerated, writes use Spark)
+- Tables backed by Avro or ORC data files (only Parquet is accelerated)
+- Tables partitioned on `BINARY` or `DECIMAL` (with precision >28) columns
+- Scans with residual filters using `truncate`, `bucket`, `year`, `month`, `day`, or `hour`
+  transform functions (partition pruning still works, but row-level filtering of these
+  transforms falls back)