Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading implementation as a replacement of multiprocessing #131

Closed
eldipa opened this issue Nov 25, 2020 · 1 comment
Closed

Multithreading implementation as a replacement of multiprocessing #131

eldipa opened this issue Nov 25, 2020 · 1 comment
Labels
enhancement something nice to have but it is not neither critical nor urgent request for comments a draft idea that needs more brainstorming, comments are welcome!
Milestone

Comments

@eldipa
Copy link
Collaborator

eldipa commented Nov 25, 2020

byexample 9.x uses multiprocessing to have concurrency using the fork model.

This model have a very low barrier in terms of implementation: the whole process is forked and the objects are automatically shared and made independent (no RC).

But fork is not well supported in OSX (#116 ).

Multithreading has a better support but it requires a very precise separation of the objects to avoid a race condition. And that's only the begin:

See #118 (comment)

@eldipa eldipa added enhancement something nice to have but it is not neither critical nor urgent request for comments a draft idea that needs more brainstorming, comments are welcome! labels Nov 25, 2020
@eldipa
Copy link
Collaborator Author

eldipa commented Nov 25, 2020

Profile results of 4 test scenarios: using multithreading / multiprocessing with/without writing to stdout. In all the scenarios we used 2 jobs and the default re regex engine (that does not release the GIL)

threads - 2 jobs

real    0m5.130s
real    0m5.874s
real    0m5.257s
real    0m5.693s


threads - 2 jobs - quiet

real    0m4.770s
real    0m4.556s
real    0m4.649s
real    0m4.530s


process - 2 jobs

real    0m4.916s
real    0m4.871s
real    0m5.221s
real    0m5.202s


process - 2 jobs - quiet

real    0m4.023s
real    0m4.234s
real    0m4.619s
real    0m4.447s

@eldipa eldipa added this to the 10.0.0 milestone Mar 6, 2021
eldipa added a commit that referenced this issue Mar 10, 2021
Multiple threads can log at the same time. By default Python's logging
system uses a lock to protect the shared resources for each handler.

But this may generate a lot of contention on the lock if there are a lot
of messages to be handled. More over the handling of the message writes
it to disk/console which it is slow.

Instead a QueueHandler is used to receive the messages and pass them to
the QueueListener using a thread-safe queue.

The listener sends the messages to the original handler (XStreamHandler)
in background without blocking the rest of the threads.
eldipa added a commit that referenced this issue Mar 10, 2021
Originally `byexample` used a global stack of loggers to support
contextual logging.

This works fine when we work with subprocesses because each one will
have its own stack. But with threads, all of them will see the same
shared stack.

This RC will confuse the logging system: while one thread enter in
`'byexample.exec'` context it may end logging as `'byexample.init'` because
other thread entered in that context.

Using a thread-local logger stack fixes the RC. Now the
`init_thread_specific_log_system` function needs to be called at the begin
of each thread to initialize it.
eldipa added a commit that referenced this issue Mar 11, 2021
We found some small bugs that were only triggered in super-verbose
level. To have a quick regression test we run the lib tests with that
verbose level and see if something fails.
eldipa added a commit that referenced this issue Mar 11, 2021
This code was legacy from the old multiprocess impl. Now that we use
threads the PID makes no sense.
eldipa added a commit that referenced this issue Mar 11, 2021
eldipa added a commit that referenced this issue Mar 11, 2021
The use of a queue made the log system to emit the messages out of order
with respect the ones emitted by the concerns by themselves (like
`Progress` printing stuff).

So the queue impl was removed.

Instead, to make the log system thread safe, `XStreamHandler` uses a lock
to protect only the `StreamHandler` (stderr) access only. It relays on
concerns' thread-safety for the rest.

For that to make sense, `XStreamHandler` now uses a concerns set per
thread: on init of each thread, the thread must configure its own
thread-specific system with its own concerns set.

This "thread-specific concerns set" concept is not new. Each thread
(worker) has its own copy of the harvester, executor and concerns. This
commits only ensures that that thread-specific concerns sets is
available to the `XStreamHandler`.
eldipa added a commit that referenced this issue Mar 11, 2021
The use of a queue made the log system to emit the messages out of order
with respect the ones emitted by the concerns by themselves (like
`Progress` printing stuff).

So the queue impl was removed.

Instead, to make the log system thread safe, `XStreamHandler` uses a lock
to protect only the `StreamHandler` (stderr) access only. It relays on
concerns' thread-safety for the rest.

For that to make sense, `XStreamHandler` now uses a concerns set per
thread: on init of each thread, the thread must configure its own
thread-specific system with its own concerns set.

This "thread-specific concerns set" concept is not new. Each thread
(worker) has its own copy of the harvester, executor and concerns. This
commits only ensures that that thread-specific concerns sets is
available to the `XStreamHandler`.
@eldipa eldipa closed this as completed Mar 13, 2021
eldipa added a commit that referenced this issue Mar 24, 2021
Multiple threads can log at the same time. By default Python's logging
system uses a lock to protect the shared resources for each handler.

But this may generate a lot of contention on the lock if there are a lot
of messages to be handled. More over the handling of the message writes
it to disk/console which it is slow.

Instead a QueueHandler is used to receive the messages and pass them to
the QueueListener using a thread-safe queue.

The listener sends the messages to the original handler (XStreamHandler)
in background without blocking the rest of the threads.
eldipa added a commit that referenced this issue Mar 24, 2021
Originally `byexample` used a global stack of loggers to support
contextual logging.

This works fine when we work with subprocesses because each one will
have its own stack. But with threads, all of them will see the same
shared stack.

This RC will confuse the logging system: while one thread enter in
`'byexample.exec'` context it may end logging as `'byexample.init'` because
other thread entered in that context.

Using a thread-local logger stack fixes the RC. Now the
`init_thread_specific_log_system` function needs to be called at the begin
of each thread to initialize it.
eldipa added a commit that referenced this issue Mar 24, 2021
We found some small bugs that were only triggered in super-verbose
level. To have a quick regression test we run the lib tests with that
verbose level and see if something fails.
eldipa added a commit that referenced this issue Mar 24, 2021
This code was legacy from the old multiprocess impl. Now that we use
threads the PID makes no sense.
eldipa added a commit that referenced this issue Mar 24, 2021
eldipa added a commit that referenced this issue Mar 24, 2021
The use of a queue made the log system to emit the messages out of order
with respect the ones emitted by the concerns by themselves (like
`Progress` printing stuff).

So the queue impl was removed.

Instead, to make the log system thread safe, `XStreamHandler` uses a lock
to protect only the `StreamHandler` (stderr) access only. It relays on
concerns' thread-safety for the rest.

For that to make sense, `XStreamHandler` now uses a concerns set per
thread: on init of each thread, the thread must configure its own
thread-specific system with its own concerns set.

This "thread-specific concerns set" concept is not new. Each thread
(worker) has its own copy of the harvester, executor and concerns. This
commits only ensures that that thread-specific concerns sets is
available to the `XStreamHandler`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement something nice to have but it is not neither critical nor urgent request for comments a draft idea that needs more brainstorming, comments are welcome!
Projects
None yet
Development

No branches or pull requests

1 participant