Expose parquet row-number virtual column (RowNumber) on ParquetSource/ParquetOpener

## Is your feature request related to a problem or challenge?

`ParquetSource` / `ParquetOpener` (in `datafusion-datasource-parquet`) cannot emit the parquet reader's **row-number virtual column**, even though the underlying `parquet` crate (58.x) fully supports it:

```rust
let row_number = Field::new("row_number", DataType::Int64, false)
    .with_extension_type(parquet::arrow::...::RowNumber);
let builder = builder.with_virtual_columns(vec![row_number_field])?;
```

The row-number virtual column gives each row its **true physical position within the file even under row-group / page / row-filter pruning**. This is exactly what engines need to reconstruct stable per-row identity while still benefiting from predicate pushdown.

Concretely, this blocks **Delta Lake row tracking** (`_metadata.row_id` = `baseRowId + physical_row_index`) on top of DataFusion: to keep the synthesized `row_id`/`row_index` correct, an integrating engine must currently *disable* data-filter pushdown (so the reader returns every row in physical order and a running counter stays aligned). That defeats row-group skipping whenever `_metadata.row_id` is projected alongside a selective filter.

There is no hook to inject this today:
- `ParquetOpener` never calls `with_virtual_columns`, and its `expr_adapter_factory` field is `pub(crate)`, so the opener can't be reused/extended from outside the crate.
- `ParquetSource` exposes no builder-customization hook.
- The `ParquetFileReaderFactory` provides only the `AsyncFileReader`, not builder configuration.

So the only workaround is to re-implement a custom `FileOpener` (duplicating projection / row-filter / pruning plumbing), which is what we're doing downstream in Apache DataFusion Comet (apache/datafusion-comet — Delta contrib).

## Describe the solution you'd like

Expose virtual columns on `ParquetSource` / `ParquetOpener`, e.g.:

```rust
let source = ParquetSource::new(schema)
    .with_virtual_columns(vec![row_number_field]); // RowNumber-extension field(s)
```

…and have `ParquetOpener` forward them to `ParquetRecordBatchStreamBuilder::with_virtual_columns(...)` and include them in the projected output schema, so the rest of the existing pruning/row-filter/projection logic is reused unchanged.

## Describe alternatives you've considered

- Re-implementing a custom `FileOpener` that builds the stream with `with_virtual_columns` (our current downstream approach — works, but duplicates a lot of well-tested opener logic and is a maintenance burden).
- A reader-factory hook — insufficient, since virtual columns are configured on the stream *builder*, not the reader.

## Additional context

Downstream consumer: Apache DataFusion Comet's native Delta Lake scan (apache/datafusion-comet#4366). We'd be happy to contribute a PR if the API shape above is agreeable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose parquet row-number virtual column (RowNumber) on ParquetSource/ParquetOpener #22517

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Expose parquet row-number virtual column (RowNumber) on ParquetSource/ParquetOpener #22517

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions