ArrowScan to_table fails if the data is mixed between dict-encoded strings and plain strings.

### Apache Iceberg version

0.11.0 (latest release)

### Please describe the bug 🐞

We have recently updated our functions to call the pyiceberg table.append() function with dict encoded arrow tables. Now we have in our iceberg tables mixed data from before this change, (where our data still is stored as string) and after the change, where the data is stored as dict-encoded strings. 

If we now call to_arrow() of a DataScan class, on this table we get this error:

```
pyarrow.lib.ArrowTypeError: Unable to merge: Field col has incompatible types: string vs dictionary<values=string, indices=int32, ordered=0>
```

Here is a minimal example that reproduces this error:
```python
from pyiceberg.io.pyarrow import ArrowScan
from pyiceberg.table import ALWAYS_TRUE
from pyiceberg.schema import Schema
from pyiceberg.types import NestedField
from pyiceberg.types import StringType

import pyarrow as pa


def create_scan_with_mixed_dict_encode_not_encode() -> ArrowScan:
    schema = Schema(
        NestedField(field_id=1, name="col", field_type=StringType(), required=False)
    )

    class FakeTableMetadata:
        def schema(self) -> Schema:
            return schema

    scan = ArrowScan(table_metadata=FakeTableMetadata(),
                     io=object(),
                     projected_schema=schema,
                     row_filter=ALWAYS_TRUE)

    def _batches_for_repro(self, _tasks):
        str_values = pa.array(["a"], type=pa.string())
        yield pa.record_batch([str_values], names=["col"])
        yield pa.record_batch([str_values.dictionary_encode()], names=["col"])

    ArrowScan.to_record_batches = _batches_for_repro
    return scan


if __name__ == "__main__":
    scan = create_scan_with_mixed_dict_encode_not_encode()
    arrow_table = ArrowScan.to_table(scan, tasks=[])
```

I am happy to provide a bugfix PR, but I need a small guidance on the best approach. 
One idea is to cast each batch in to_table to the arrow_schema. The more performant way is to check for each batch, if the schema is different. If they are different, then find the dict_encoded col and only cast that one to string. 


### Willingness to contribute

- [x] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ArrowScan to_table fails if the data is mixed between dict-encoded strings and plain strings. #3260

Apache Iceberg version

Please describe the bug 🐞

Willingness to contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ArrowScan to_table fails if the data is mixed between dict-encoded strings and plain strings. #3260

Description

Apache Iceberg version

Please describe the bug 🐞

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions