Tables with files imported to Iceberg using `add_files` contain inconsistent `format-version` manifest fields when using `format-version` 2

### Apache Iceberg version

1.9.2 (latest release)

### Query engine

Spark

### Please describe the bug 🐞

Hi, I have noticed a potential issue when using the `add_files` procedure.

When using this procedure, manifest files set the `format-version` header to 1, despite creating the table with (and the metadata JSON file stating that) the `format-version` should be 2.

I've been documenting this issue in the duckdb/duckdb-iceberg#374 repository, since DuckDB is unable to read manifest files that contain different `format-version`s from their metadata JSON files. This behavior throws an error, since the `content` field of the `data_file` struct is missing in all records in the manifest.

I believe that this occurs in the `add_files` procedure. I was tracing down serialization of the output Avro manifest file, and noticed that the `content` field was optional in the `DataFile` interface:

https://github.com/apache/iceberg/blob/f25e07db6318184272d5c98dede4d268ee5288ab/api/src/main/java/org/apache/iceberg/DataFile.java#L37-L42

I also noticed that there is no `format-version` dependent logic (to account for the required `content` field) in constructing the DataFiles:

https://github.com/apache/iceberg/blob/799925a4ef41e7b4231930377b83bd686001c2c0/data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java#L191

Or serializing them:

https://github.com/apache/iceberg/blob/54a62aeb14155c2c82a9008d9e2679646c2d703a/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java#L804-L807

### Willingness to contribute

- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time

	Types.NestedField CONTENT =
	optional(
	134,
	"content",
	IntegerType.get(),
	"Contents of the file: 0=data, 1=position deletes, 2=equality deletes");

	.map(
	(MapFunction<DataFile, Tuple2<String, DataFile>>)
	file -> Tuple2.apply(file.location(), file),
	Encoders.tuple(Encoders.STRING(), Encoders.javaSerialization(DataFile.class)))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tables with files imported to Iceberg using `add_files` contain inconsistent `format-version` manifest fields when using `format-version` 2 #13667

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tables with files imported to Iceberg using add_files contain inconsistent format-version manifest fields when using format-version 2 #13667

Description

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Tables with files imported to Iceberg using `add_files` contain inconsistent `format-version` manifest fields when using `format-version` 2 #13667