Currently, a Substrait plan with a RelRoot containing a ReadRel will contain extra, unexpected fields, namely __fragment_index et. al. Right now they are always included by default. There are a few things to be done:
- ReadRel's
base_schema could be converted into a ScanOptions.dataset_schema to limit the fields read. (Also see ARROW-15585, these fields should be used for pushdown projection)
- The scanner always adds these extra fields - maybe it should be opt-in instead
- There's no way to manually insert a Project to "fix" things because as implemented, it can only add new columns
Reporter: David Li / @lidavidm
Related issues:
Note: This issue was originally created as ARROW-17229. Please see the migration documentation for further details.
Currently, a Substrait plan with a RelRoot containing a ReadRel will contain extra, unexpected fields, namely
__fragment_indexet. al. Right now they are always included by default. There are a few things to be done:base_schemacould be converted into aScanOptions.dataset_schemato limit the fields read. (Also see ARROW-15585, these fields should be used for pushdown projection)Reporter: David Li / @lidavidm
Related issues:
Note: This issue was originally created as ARROW-17229. Please see the migration documentation for further details.