Description
Now the validation metadata config(spark.gluten.sql.fallbackUnexpectedMetadataParquet) is default false, if set to true, for each root path, we check the file limit (spark.gluten.sql.fallbackUnexpectedMetadataParquet.limit), if the number of partitions are too much, the validation will be expensive.
The possible solution is to sample the rootPaths to select some files.
The sample file limit should be decided by file total limit number and the total file number in root paths, the latter should be decided by the percentage.
Gluten version
None