[format] ParquetInputStream doesn't support vectored reads

## Description

`ParquetInputStream` currently doesn't override the `readVectored()` method from its parent class `DelegatingSeekableInputStream`, causing it to fall back to the default implementation that throws `UnsupportedOperationException`.

## Problem

When `ParquetFileReader` attempts to use vectored reads for improved I/O performance, the operation fails because:

1. **ParquetFileReader** calls `readVectored()` on the underlying stream (line 667 in ParquetFileReader.java)
2. The stream is `org.apache.parquet.io.SeekableInputStream` which has a default implementation throwing `UnsupportedOperationException`
3. **ParquetInputStream** (Paimon's wrapper) extends `DelegatingSeekableInputStream` but doesn't override `readVectored()`
4. This causes the exception to be thrown, preventing vectored reads from working

## Impact

- **Performance**: Cannot leverage Parquet's vectored read optimization for parallel I/O
- **Efficiency**: Falls back to sequential reads even when the underlying FileIO supports vectored reads
- **Cloud Storage**: Missing optimization opportunities for S3, OSS, and other cloud storage systems that benefit from batch reads

## Root Cause

The gap exists between two interface systems:

**Paimon's Interface**:
- `VectoredReadable` interface with `readVectored(List<FileRange>)` 
- `FileRange` uses `CompletableFuture<byte[]>` for async results

**Parquet's Interface**:
- `SeekableInputStream.readVectored(List<ParquetFileRange>, ByteBufferAllocator)`
- `ParquetFileRange` uses `CompletableFuture<ByteBuffer>` for async results

**ParquetInputStream** needs to bridge these two interfaces.

## Proposed Solution

Implement `readVectored()` in `ParquetInputStream` to:

1. **Check capability**: Detect if underlying stream supports `VectoredReadable`
2. **Convert ranges**: Transform `ParquetFileRange` to `FileRange` 
3. **Delegate to Paimon**: Use Paimon's `VectoredReadable.readVectored()` 
4. **Transform data**: Convert `CompletableFuture<byte[]>` to `CompletableFuture<ByteBuffer>`
5. **Fallback**: Use serial reads when vectored reads are unavailable

## Benefits

✅ **Performance**: Enable vectored reads for Parquet files in Paimon
✅ **Compatibility**: Work with both vectored and non-vectored FileIO implementations  
✅ **Cloud Optimization**: Better I/O performance on S3, OSS, Azure, GCS
✅ **Backward Compatible**: Graceful fallback for older FileIO implementations

## Testing

Comprehensive test coverage will include:
- Vectored reads with `VectoredReadable` support
- Fallback to serial reads without vectored support
- Empty ranges handling
- End-to-end testing with real Parquet files

## Related Files

- `paimon-format/src/main/java/org/apache/paimon/format/parquet/ParquetInputStream.java`
- `paimon-format/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java`
- `paimon-common/src/main/java/org/apache/paimon/fs/VectoredReadable.java`
- `paimon-common/src/main/java/org/apache/paimon/fs/FileRange.java`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[format] ParquetInputStream doesn't support vectored reads #6657

Description

Problem

Impact

Root Cause

Proposed Solution

Benefits

Testing

Related Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[format] ParquetInputStream doesn't support vectored reads #6657

Description

Description

Problem

Impact

Root Cause

Proposed Solution

Benefits

Testing

Related Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions