[python] Add multi-threaded prefetch for pytorch streaming read #7143

XiaoHongbo-Hope · 2026-01-28T13:24:55Z

Purpose

Linked issue: close #xxx

Tests

API and Format

Documentation

JingsongLi

Maybe async is enough? Multiple threads may consume too many memory. And has there been any performance test to measure the improvement?

XiaoHongbo-Hope · 2026-01-31T15:38:01Z

Maybe async is enough? Multiple threads may consume too many memory. And has there been any performance test to measure the improvement?

Current read is synchronous. Yes, we have the performance test of data loader, single thread 200MB/s vs 260~270MB/s with 16 worker (process) and 10 prefetch threads.

XiaoHongbo-Hope · 2026-01-31T16:33:36Z

Maybe async is enough? Multiple threads may consume too many memory. And has there been any performance test to measure the improvement?

The multi-thread prefetch idea in this PR is from OSS connector of pytorch’s config.

JingsongLi · 2026-02-02T02:58:32Z

Maybe async is enough? Multiple threads may consume too many memory. And has there been any performance test to measure the improvement?

The multi-thread prefetch idea in this PR is from OSS connector of pytorch’s config.

Can you share the code link?

XiaoHongbo-Hope · 2026-02-02T08:23:11Z

Maybe async is enough? Multiple threads may consume too many memory. And has there been any performance test to measure the improvement?

The multi-thread prefetch idea in this PR is from OSS connector of pytorch’s config.

Can you share the code link?

Seems native code is not open source. python code is https://github.com/aliyun/oss-connector-for-ai-ml，doc: https://github.com/aliyun/oss-connector-for-ai-ml/blob/a9b536d174163f0cd6db8e83261fcffc628e5f8c/docs/torchconnector/configuration.md?plain=1#L94 but python code do nothing. The logic is in native side.

xiaohongbo added 9 commits January 28, 2026 14:00

support multithread when to torch dataset

5c6fa74

add test case for prefetch_concurrency

5a7729e

add comment back

dae10c6

revert

76d1ae1

clean code

1119e2b

clean code

afbb670

clean code

96ea77e

clean code

d01966a

fix hang producer threads indefinitely issue

288cfec

XiaoHongbo-Hope changed the title ~~[python] Add multi-threaded prefetch for PyTorch streaming read~~ [python] Add multi-threaded prefetch for pytorch streaming read Jan 28, 2026

fix code format

c01854c

XiaoHongbo-Hope marked this pull request as ready for review January 28, 2026 14:22

XiaoHongbo-Hope marked this pull request as draft January 28, 2026 15:22

XiaoHongbo-Hope marked this pull request as ready for review January 31, 2026 07:26

XiaoHongbo-Hope marked this pull request as draft January 31, 2026 08:21

add doc for prefetch_concurrency

2f7b318

XiaoHongbo-Hope marked this pull request as ready for review January 31, 2026 08:54

JingsongLi reviewed Jan 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Add multi-threaded prefetch for pytorch streaming read #7143

[python] Add multi-threaded prefetch for pytorch streaming read #7143

XiaoHongbo-Hope commented Jan 28, 2026

Uh oh!

JingsongLi left a comment •

edited

Loading

Uh oh!

XiaoHongbo-Hope commented Jan 31, 2026 •

edited

Loading

Uh oh!

XiaoHongbo-Hope commented Jan 31, 2026

Uh oh!

JingsongLi commented Feb 2, 2026

Uh oh!

XiaoHongbo-Hope commented Feb 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[python] Add multi-threaded prefetch for pytorch streaming read #7143

Are you sure you want to change the base?

[python] Add multi-threaded prefetch for pytorch streaming read #7143

Conversation

XiaoHongbo-Hope commented Jan 28, 2026

Purpose

Tests

API and Format

Documentation

Uh oh!

JingsongLi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XiaoHongbo-Hope commented Jan 31, 2026

Uh oh!

JingsongLi commented Feb 2, 2026

Uh oh!

XiaoHongbo-Hope commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JingsongLi left a comment •

edited

Loading

XiaoHongbo-Hope commented Jan 31, 2026 •

edited

Loading

XiaoHongbo-Hope commented Feb 2, 2026 •

edited

Loading