when max_threads=1, do not create a thread pool #16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@glevava
As you know from our discussions on Slack, I found that during checksum calculation in
esgprep drs
, there were some threads deadlock problems. I understand that you are implementing use ofhashlib
for the checksums in place of runningsha256sum
viaos.Popen
, and hopefully this will be thread-safe. However, I have not succeeded in setting up a reliable test environment to be sure that it will solve all the problems, and I do not want some unpleasant surprise during CMIP6 operations.I tried to investigate solving this by using processes in place of threads (that is, using
multiprocessing
in place ofmultiprocessing.dummy
), but this would require the input to be able to be pickled, which seems very difficult to achieve, given the complexity of theProcessingContext
object.So now I propose a simple "safety valve", just in case of any problems. Simply, if
--max-threads
is set to 1, then it will not create any pool, but it will instead use the simpleitertools.imap
for the processing, so then we can have good confidence that there will be no problems with interactions with the threading library (it will still be imported, but not used). It will be reassuring to have this option available, even if in the end we do not need it. In fact if it arises during operations that we need to set--max-threads=1
, then we can easily compensate for the slower run time by processing more datasets simultaneously in the calling script.I have tested
drs
andmapfile
functions to check that the result with--max-threads=1
is still the same as with more threads, and it seems to be (apart from possibly the ordering of the commands in drs or the order of the lines in the mapfiles).