[VL] Support GPU shuffle reader

### Description

1. Move the lock from WholeStageResultIterator constructor to shuffle reader, the threads can prepare the first batch in advance, now it is a small batch, for GPU, it will produce a big batch with bytes 1GB
2. Implement the decompression, convert  buffer to cudf table, resize batch in GPU
3. Then the threads can prepare read buffer from file
4. The other thread to notify needs to fetch more data than batch size, prepare batches for GPU to process, reserve the bytes from pool, if reserve successfully, CPU threads fetch data in the background, save the batches in Vector.
5. The waited CPU may can do the decompression and resize batch and wait GPU to fetch.

### Gluten version

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Support GPU shuffle reader #10933

Description

Gluten version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[VL] Support GPU shuffle reader #10933

Description

Description

Gluten version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions