Make Spark readers function asynchronously for many small files.

### Feature Request / Improvement

In Spark readers, scan tasks are currently processed sequentially. The iteration logic in `BaseReader.next()` opens one task at a time, fully consumes it, and only then proceeds to the next task.

https://github.com/apache/iceberg/blob/f49b2fd97b48682d4e4ca6f1a552cb48f53c4ea5/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java#L131-L145

With a large number of small files let's say hundreds or thousands of 5–10 KB files, this sequential task processing can lead to significant overhead. 
Each task is opened and read independently, which may underutilize available CPU and I/O parallelism, especially on object stores with non-trivial per request latency.

Possible Improvement:
It may be beneficial to optionally allow Spark readers to process multiple small-file tasks concurrently, buffering rows into a shared iterator for downstream processing, while preserving the existing sequential behavior by default.

### Query engine

None

### Willingness to contribute

- [x] I would be willing to contribute this improvement/feature with guidance from the Iceberg community

	public boolean next() throws IOException {
	try {
	while (true) {
	if (currentIterator.hasNext()) {
	this.current = currentIterator.next();
	return true;
	} else if (tasks.hasNext()) {
	this.currentIterator.close();
	this.currentTask = tasks.next();
	this.currentIterator = open(currentTask);
	} else {
	this.currentIterator.close();
	return false;
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Spark readers function asynchronously for many small files. #15287

Feature Request / Improvement

Query engine

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make Spark readers function asynchronously for many small files. #15287

Description

Feature Request / Improvement

Query engine

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions