The estimated table size is inaccurate

### Apache Iceberg version

1.10.1 (latest release)

### Query engine

Spark

### Please describe the bug 🐞

Hi, team!
The SparkSchemaUtil. estimateSize method calculates the size based on the default size of the field type and the number of rows, may differ significantly from the actual size.
May I ask if there are any areas that can be improved?
https://github.com/apache/iceberg/blob/eb460a524f3a192e3584a3bdabcff237c38b04a4/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkSchemaUtil.java#L339

The discovery of this issue is due to the different execution plans of Spark querying the parquet source/iceberg tables.
Spark default field size percentage based on file size：
https://github.com/apache/spark/blob/10dd228d4c09166c2cb744cb0e3e7f15385afae0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L34


May I ask if it is possible to use the manifest file for more accurate statistics?

### Willingness to contribute

- [ ] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The estimated table size is inaccurate #15684

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The estimated table size is inaccurate #15684

Description

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions