Specify which platforms can run parallel ETL flows #55
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR closes #46 and #47 by documenting which platforms can execute parallel ETL flows implemented pygrametl. Currently, pygrametl supports executing parallel ETL flows using CPython on platforms that start new processes using
fork
and Jython. Thus, executing a parallel ETL flow natively on Microsoft Windows using CPython is not supported, and macOS must be configured to usefork
usingmultiprocessing.set_start_method('fork')
due to the issues with macOS'sfork
implementation documented in CPython Issue 77906 (Thanks to @mFeigeInvia). An attempt to supportspawn
was made, however, it became clear that this would require major changes to pygrametl. This is primarily due to limitations ofpickle
and additional requirements when usingspawn or forkserver
compared tofork
. As CPython generally does not perform well when executing parallel ETL flows compared to Jython, @chrthomsen, @fromm1990, and I agreed to prioritize other improvements to pygrametl.