Skip to content

add threading functionality to dataflux#19

Merged
jdnurme merged 4 commits intomainfrom
download-threading
Mar 12, 2024
Merged

add threading functionality to dataflux#19
jdnurme merged 4 commits intomainfrom
download-threading

Conversation

@jdnurme
Copy link
Collaborator

@jdnurme jdnurme commented Mar 12, 2024

Parallel processes are not functional with daemonic workers, so threading must be used to integrate download parallelization with real ML workloads. When running on an already-distributed system (e.g. executing with Ray or DLIO managing parallelization) the daemons executing each portion of the training will not allow further sub-processes to be spun up.

It's a good idea to open an issue first for discussion.

  • Tests pass
  • Appropriate changes to documentation are included in the PR

@jdnurme jdnurme requested review from MattIrv and bernardhan33 March 12, 2024 21:16
@jdnurme jdnurme self-assigned this Mar 12, 2024
Copy link
Collaborator

@bernardhan33 bernardhan33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Parallel processes are not functional with daemonic workers" --> Could you provide more context there? What are "daemonic workers" and what happens there?

@bernardhan33 bernardhan33 self-requested a review March 12, 2024 21:43
@jdnurme jdnurme merged commit c680914 into main Mar 12, 2024
@jdnurme jdnurme deleted the download-threading branch April 5, 2024 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants