Add runtime module to enable concurrent load of manifest files. #124

liurenjie1024 · 2023-12-19T02:03:55Z

Currently we implement manifest loading in a sequential approach, e.g. load them one by one. We should add load them concurrently. This requires submitting tasks to rust async runtime, and we should be careful to runtime agnostic.

odysa · 2024-02-02T03:33:14Z

Hi, is this what you refer to? Can you plz explain more about "careful to runtime agnostic"? Is there anything we need to be careful when implementing concurrent scanning?

iceberg-rust/crates/iceberg/src/scan.rs

Lines 140 to 145 in 9768b0e

    
           let mut file_scan_tasks = Vec::with_capacity(manifest_list.entries().len()); 
        
           for manifest_list_entry in manifest_list.entries().iter() { 
        
               // Data file 
        
               let manifest = manifest_list_entry.load_manifest(&self.file_io).await?; 
        
               for manifest_entry in manifest.entries().iter().filter(|e| e.is_alive()) {

liurenjie1024 · 2024-02-02T05:57:54Z

Hi, is this what you refer to?

Yes, exactly.

Can you plz explain more about "careful to runtime agnostic"? Is there anything we need to be careful when implementing concurrent scanning?

I mean we may need an extra layer for task scheduling, so that we can be adopted to any async runtime such as tokio, async-std.

odysa · 2024-02-02T17:02:32Z

I mean we may need an extra layer for task scheduling, so that we can be adopted to any async runtime such as tokio, async-std.

Do you want users to choose their own runtime like sqlx?
They are building an abstraction layer(Runtime) so sqlx can run on many blocking/non-blocking runtime.

# tokio (no TLS)
sqlx = { version = "0.7", features = [ "runtime-tokio" ] }
# async-std (no TLS)
sqlx = { version = "0.7", features = [ "runtime-async-std" ] }

I am interested in this feature, but it will take some time for me to draft a design.

odysa · 2024-02-02T20:45:12Z

Do you want users to choose their own runtime like sqlx? They are building an abstraction layer(Runtime) so sqlx can run on many blocking/non-blocking runtime.

Follow up on this. The sqlx uses a relatively simple solution. For example, the spawn function.

https://github.com/launchbadge/sqlx/blob/84d576004c93a32133688426eacb50434bb5c5f0/sqlx-core/src/rt/mod.rs#L66-L74

liurenjie1024 · 2024-02-03T02:15:35Z

Do you want users to choose their own runtime like sqlx?

Yes, exactly. I don't think we should bind to some specific runtime.

Follow up on this. The sqlx uses a relatively simple solution.

I agree that we may need to think about it carefully, sqlx's solution can only use current runtime.

I am interested in this feature, but it will take some time for me to draft a design.

Yeah, welcome to contribute, we can work together on this.

marvinlanhenke · 2024-04-30T13:39:27Z

@odysa
Just to follow up on this, any progress regarding some design ideas?

@liurenjie1024
Do we have any reference implementation where we can get "inspired" on how things could be done,
or do you have a particular design idea already in mind?

Is this something we should track in #348 perhaps for the v.0.4.0 release?

liurenjie1024 · 2024-05-01T07:20:35Z

It's already tracked here: #123

marvinlanhenke · 2024-05-02T04:28:05Z

in order to verify my understanding and possibly kick of a design discussion, we could follow the approach of sqlx:

have a runtime.rs
- to define a Runtime trait
- based on a feature-flag import / re-export the implementors of that trait
have a /runtime module with specific runtime implementations
- that implement the Runtime trait

For our particular use-case (loading manifest / or DataFiles on multiple threads) we could e.g. wrap tokio::spawn with our runtime and we're good to go? Then the Runtime trait would evolve to support more "use-cases"?

liurenjie1024 · 2024-05-02T13:23:18Z

Maybe currently we don't need a Runtime trait? From what we have learned, we currently need two methods:

spawn
block_on

I think the method here already provided a good example of what we need. Allowing user to specify Runtime for task scheduling could be treated as an advanced feature.

marvinlanhenke · 2024-05-03T04:19:38Z

... so as a first step - simply wrap tokio::spawn (for example) like here - and not even use a feature-flag for now; just the most simplest layer of abstraction?

@liurenjie1024
Have you already made up your mind; about how we want to read & filter the manifest files concurrently? An idea would be to:

async load ManifestList
for each entry spawn a new task
- apply ManifestEvaluator
- async load Manifest
- for each ManifestEntry / DataFile spawn a new task
  - apply ExpressionEvaluator
  - apply InclusiveMetricsEvaluator
  - if data file has not been pruned yet
  - send FileScanTask as a result via channel back to main stream
  - yield FileScanTask

I think this way we can achive maximum parallelism, throughput and performance;
however the tradeoffs are the complexity and the spike in ressource consumption, due to spawning multiple tasks and concurrently loading files in memory.

Here is a toy-example to illustrate the idea for further discussion:

async fn create_stream() -> Result<BoxStream<'static, Result<FileScanTask>>> {
    let (tx, mut rx) = mpsc::channel::<FileScanTask>(32);

    // manifest list with entries
    let manifests = Vec::from_iter(0..12);

    for entry in manifests {
        let sender = tx.clone();

        // for each entry spawn a new task
        tokio::spawn(async move {
            // apply `ManifestEvaluator`
            // if not pruned; load manifest
            println!("loading manifest {}", entry);
            time::sleep(Duration::from_millis(1000)).await;

            let data_files: Vec<_> = (0..48).map(|_| DataFile {}).collect();
            // for each DataFile spawn a new task
            for _ in data_files {
                let sender = sender.clone();
                tokio::spawn(async move {
                    // apply ExpressionEvaluator
                    // apply InclusiveMetricsEvaluator
                    process_data_file(sender).await;
                });
            }
        });
    }
    drop(tx);

    let stream = try_stream! {
        while let Some(file_scan_task) = rx.recv().await {
            yield file_scan_task;
        }
    };

    Ok(stream.boxed())
}

liurenjie1024 · 2024-05-06T06:17:50Z

Hi, @marvinlanhenke After #233 got merged, we will have a basic runtime framework.

Have you already made up your mind;

Not yet.

I think you solution generally LGTM. Creating on task for each manifest entry may look too much for me. Though spawing one task is lightweight in rust, it still consumes some memory. How do you feel starting with one task for one manifest file?

Fokko · 2024-05-06T07:52:58Z

With Iceberg, the manifests are written to a target size (8 megabyte) by default. Each manifest is bound to the same schema and partition, so you can re-use the evaluators here. I would not go overboard with the parallelism, and just create one task per manifest, and not spawn a task per manifest-entry.

marvinlanhenke · 2024-05-06T08:25:58Z

How do you feel starting with one task for one manifest file

you mean:

spawn a new task for each manifest, load the manifest (entry.load_manifest(...).await?)
and handle DataFiles synchronously on the same task

so if we have a manifest_list with e.g. 5 entries, 1 is pruned (ManifestEvaluator) we'd effectively spawn 4 tasks, to load the manifest and handle all the data files; is this correct?

Fokko · 2024-05-06T08:33:49Z

so if we have a manifest_list with e.g. 5 entries, 1 is pruned (ManifestEvaluator) we'd effectively spawn 4 tasks, to load the manifest and handle all the data files; is this correct?

That is correct 👍 I think there might be some confusion around the naming. In the spec we have the Manifest List that contains Manifests. Within the Manifest there are manifest-entries that each point to one Datafile.

liurenjie1024 · 2024-05-06T08:49:09Z

How do you feel starting with one task for one manifest file

you mean:

spawn a new task for each manifest, load the manifest (entry.load_manifest(...).await?)

and handle DataFiles synchronously on the same task

so if we have a manifest_list with e.g. 5 entries, 1 is pruned (ManifestEvaluator) we'd effectively spawn 4 tasks, to load the manifest and handle all the data files; is this correct?

Yeah, that's exactly what I mean.

sdd · 2024-05-13T18:08:41Z

Using try_for_each_concurrent here rather than just spawning in a for loop will allow us to tune the concurrncy as it accepts a max concurrent tasks argument. I'd advocate for a data-driven approach to determine a sensible default value for this, alongside the capability to override the default with optional config, probably a with_max_concurrency() method on the builder and some guidance in the docs.

liurenjie1024 · 2024-05-14T14:53:06Z

try_for_each_concurrent

Do you meam this method? I think it's ok to me.

liurenjie1024 · 2024-07-06T08:07:36Z

Close by #233

liurenjie1024 · 2024-08-07T06:52:06Z

Closed by #373

liurenjie1024 mentioned this issue Dec 19, 2023

Tracking: Reading iceberg tables. #123

Open

6 tasks

liurenjie1024 mentioned this issue Jan 4, 2024

feat: Introduce basic file scan planning. #129

Merged

liurenjie1024 mentioned this issue Apr 30, 2024

Basic Integration with Datafusion #324

Merged

liurenjie1024 mentioned this issue May 2, 2024

Refactor: Extract partition_filters from ManifestEvaluator #360

Merged

odysa mentioned this issue May 3, 2024

feat: runtime module #233

Merged

Fokko mentioned this issue May 6, 2024

Tracking issues of iceberg-rust v0.3.0 #348

Closed

73 tasks

marvinlanhenke mentioned this issue May 13, 2024

Concurrent table scans #373

Merged

liurenjie1024 added this to iceberg-rust Jun 15, 2024

liurenjie1024 closed this as completed Jul 6, 2024

github-project-automation bot moved this to Done in iceberg-rust Jul 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add runtime module to enable concurrent load of manifest files. #124

Add runtime module to enable concurrent load of manifest files. #124

liurenjie1024 commented Dec 19, 2023

odysa commented Feb 2, 2024

liurenjie1024 commented Feb 2, 2024

odysa commented Feb 2, 2024

odysa commented Feb 2, 2024

liurenjie1024 commented Feb 3, 2024

marvinlanhenke commented Apr 30, 2024

liurenjie1024 commented May 1, 2024

marvinlanhenke commented May 2, 2024

liurenjie1024 commented May 2, 2024 •

edited

Loading

marvinlanhenke commented May 3, 2024 •

edited

Loading

liurenjie1024 commented May 6, 2024

Fokko commented May 6, 2024

marvinlanhenke commented May 6, 2024

Fokko commented May 6, 2024

liurenjie1024 commented May 6, 2024

sdd commented May 13, 2024

liurenjie1024 commented May 14, 2024

liurenjie1024 commented Jul 6, 2024

liurenjie1024 commented Aug 7, 2024

Add runtime module to enable concurrent load of manifest files. #124

Add runtime module to enable concurrent load of manifest files. #124

Comments

liurenjie1024 commented Dec 19, 2023

odysa commented Feb 2, 2024

liurenjie1024 commented Feb 2, 2024

odysa commented Feb 2, 2024

odysa commented Feb 2, 2024

liurenjie1024 commented Feb 3, 2024

marvinlanhenke commented Apr 30, 2024

liurenjie1024 commented May 1, 2024

marvinlanhenke commented May 2, 2024

liurenjie1024 commented May 2, 2024 • edited Loading

marvinlanhenke commented May 3, 2024 • edited Loading

liurenjie1024 commented May 6, 2024

Fokko commented May 6, 2024

marvinlanhenke commented May 6, 2024

Fokko commented May 6, 2024

liurenjie1024 commented May 6, 2024

sdd commented May 13, 2024

liurenjie1024 commented May 14, 2024

liurenjie1024 commented Jul 6, 2024

liurenjie1024 commented Aug 7, 2024

liurenjie1024 commented May 2, 2024 •

edited

Loading

marvinlanhenke commented May 3, 2024 •

edited

Loading