-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreading implementation as a replacement of multiprocessing #131
Labels
enhancement
something nice to have but it is not neither critical nor urgent
request for comments
a draft idea that needs more brainstorming, comments are welcome!
Milestone
Comments
eldipa
added
enhancement
something nice to have but it is not neither critical nor urgent
request for comments
a draft idea that needs more brainstorming, comments are welcome!
labels
Nov 25, 2020
Profile results of 4 test scenarios: using multithreading / multiprocessing with/without writing to stdout. In all the scenarios we used 2 jobs and the default
|
eldipa
added a commit
that referenced
this issue
Mar 10, 2021
eldipa
added a commit
that referenced
this issue
Mar 10, 2021
Multiple threads can log at the same time. By default Python's logging system uses a lock to protect the shared resources for each handler. But this may generate a lot of contention on the lock if there are a lot of messages to be handled. More over the handling of the message writes it to disk/console which it is slow. Instead a QueueHandler is used to receive the messages and pass them to the QueueListener using a thread-safe queue. The listener sends the messages to the original handler (XStreamHandler) in background without blocking the rest of the threads.
eldipa
added a commit
that referenced
this issue
Mar 10, 2021
Originally `byexample` used a global stack of loggers to support contextual logging. This works fine when we work with subprocesses because each one will have its own stack. But with threads, all of them will see the same shared stack. This RC will confuse the logging system: while one thread enter in `'byexample.exec'` context it may end logging as `'byexample.init'` because other thread entered in that context. Using a thread-local logger stack fixes the RC. Now the `init_thread_specific_log_system` function needs to be called at the begin of each thread to initialize it.
eldipa
added a commit
that referenced
this issue
Mar 11, 2021
We found some small bugs that were only triggered in super-verbose level. To have a quick regression test we run the lib tests with that verbose level and see if something fails.
eldipa
added a commit
that referenced
this issue
Mar 11, 2021
This code was legacy from the old multiprocess impl. Now that we use threads the PID makes no sense.
eldipa
added a commit
that referenced
this issue
Mar 11, 2021
eldipa
added a commit
that referenced
this issue
Mar 11, 2021
The use of a queue made the log system to emit the messages out of order with respect the ones emitted by the concerns by themselves (like `Progress` printing stuff). So the queue impl was removed. Instead, to make the log system thread safe, `XStreamHandler` uses a lock to protect only the `StreamHandler` (stderr) access only. It relays on concerns' thread-safety for the rest. For that to make sense, `XStreamHandler` now uses a concerns set per thread: on init of each thread, the thread must configure its own thread-specific system with its own concerns set. This "thread-specific concerns set" concept is not new. Each thread (worker) has its own copy of the harvester, executor and concerns. This commits only ensures that that thread-specific concerns sets is available to the `XStreamHandler`.
eldipa
added a commit
that referenced
this issue
Mar 11, 2021
The use of a queue made the log system to emit the messages out of order with respect the ones emitted by the concerns by themselves (like `Progress` printing stuff). So the queue impl was removed. Instead, to make the log system thread safe, `XStreamHandler` uses a lock to protect only the `StreamHandler` (stderr) access only. It relays on concerns' thread-safety for the rest. For that to make sense, `XStreamHandler` now uses a concerns set per thread: on init of each thread, the thread must configure its own thread-specific system with its own concerns set. This "thread-specific concerns set" concept is not new. Each thread (worker) has its own copy of the harvester, executor and concerns. This commits only ensures that that thread-specific concerns sets is available to the `XStreamHandler`.
This was referenced Mar 13, 2021
eldipa
added a commit
that referenced
this issue
Mar 24, 2021
eldipa
added a commit
that referenced
this issue
Mar 24, 2021
Multiple threads can log at the same time. By default Python's logging system uses a lock to protect the shared resources for each handler. But this may generate a lot of contention on the lock if there are a lot of messages to be handled. More over the handling of the message writes it to disk/console which it is slow. Instead a QueueHandler is used to receive the messages and pass them to the QueueListener using a thread-safe queue. The listener sends the messages to the original handler (XStreamHandler) in background without blocking the rest of the threads.
eldipa
added a commit
that referenced
this issue
Mar 24, 2021
Originally `byexample` used a global stack of loggers to support contextual logging. This works fine when we work with subprocesses because each one will have its own stack. But with threads, all of them will see the same shared stack. This RC will confuse the logging system: while one thread enter in `'byexample.exec'` context it may end logging as `'byexample.init'` because other thread entered in that context. Using a thread-local logger stack fixes the RC. Now the `init_thread_specific_log_system` function needs to be called at the begin of each thread to initialize it.
eldipa
added a commit
that referenced
this issue
Mar 24, 2021
We found some small bugs that were only triggered in super-verbose level. To have a quick regression test we run the lib tests with that verbose level and see if something fails.
eldipa
added a commit
that referenced
this issue
Mar 24, 2021
This code was legacy from the old multiprocess impl. Now that we use threads the PID makes no sense.
eldipa
added a commit
that referenced
this issue
Mar 24, 2021
eldipa
added a commit
that referenced
this issue
Mar 24, 2021
The use of a queue made the log system to emit the messages out of order with respect the ones emitted by the concerns by themselves (like `Progress` printing stuff). So the queue impl was removed. Instead, to make the log system thread safe, `XStreamHandler` uses a lock to protect only the `StreamHandler` (stderr) access only. It relays on concerns' thread-safety for the rest. For that to make sense, `XStreamHandler` now uses a concerns set per thread: on init of each thread, the thread must configure its own thread-specific system with its own concerns set. This "thread-specific concerns set" concept is not new. Each thread (worker) has its own copy of the harvester, executor and concerns. This commits only ensures that that thread-specific concerns sets is available to the `XStreamHandler`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
something nice to have but it is not neither critical nor urgent
request for comments
a draft idea that needs more brainstorming, comments are welcome!
byexample 9.x
uses multiprocessing to have concurrency using thefork
model.This model have a very low barrier in terms of implementation: the whole process is forked and the objects are automatically shared and made independent (no RC).
But
fork
is not well supported in OSX (#116 ).Multithreading has a better support but it requires a very precise separation of the objects to avoid a race condition. And that's only the begin:
regex
package which implementation releases the GIL (Use theregex
package in replacement of the built-inre
module #128 )pexpect
when an interpreter is spawned gets into conflict with a thread in a similar way than in Python 3.8+: NameError: name 'cache' is not defined #116 (RC) or because the threads break the optimization of copy-on-write that makes the forks really fast.See #118 (comment)
The text was updated successfully, but these errors were encountered: