Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker thread refactoring #12

Closed
adamcharnock opened this issue Dec 6, 2019 · 7 comments
Closed

Worker thread refactoring #12

adamcharnock opened this issue Dec 6, 2019 · 7 comments
Assignees
Milestone

Comments

@adamcharnock
Copy link
Owner

adamcharnock commented Dec 6, 2019

Here I will discuss the refactoring of the worker thread system used by Lightbus. I want to refactor this because:

  • It has specific cases in which it can fail. I wish to avoid these cases. Specifically WorkerNotReady and WorkerDeadlock errors.
  • I want to remove the ThreadSerializedTask as this constricts asyncio (which thread safety will allow us to do)
  • I want the Lightbus client to be thread safe

Requirements

Any solution must provide the following:

  • Thread-safety
  • Prevention of deadlock cases
  • Minimise other error cases
  • Allow user callables to start their own event loops
  • Does not complect different system components

How to achieve this

Thread-safety

Thread safety requires that access to global resources is carefully controlled to prevent race conditions. This either requires the removal of global state or – where this is not possible – marshalling of access to it. For example:

  • Network connections (e.g Redis connections) – This is global state, and only one thread should be able to read from and write to the connection at any one time. We cannot remove this global state.
  • Internal bus client state (e.g. listeners, schema).

Thread safety can be provided though mechanisms such as locking or queues.

Prevention of deadlock cases

Careful system design is the first step here. If A requires B and B requires A, then no amount of locking primitives will prevent that deadlocking. Rather, removing of blocking (in the threading sense) requirements is an important step.

Queues will likely be the preferable coordination mechanism as they can help keep different system components distinct (not complected).

Minimise other error cases

Hopefully this should fall out of a good design

Does not complect different system components

Again, thoughtful systems design, and isolation of system components.

@adamcharnock
Copy link
Owner Author

adamcharnock commented Dec 7, 2019

How do we deal with exceptions in sub-tasks?

So far we've been passing around a BusClient instance and calling the shutdown_server method upon error. This then puts a flag in the BusClients _server_shutdown_queue. The handler for this stops the loop, which then does the cleanup.

This involves passing the busclient everywhere, which is aweful.

In the test implementation we now have the command/invoker pattern. This passes commands in a single direction. Options:

  • Invoker can also pass errors in the opposite direction
  • Invoker can un-gradefully handle errors on the grounds that they should be handled outside the invoker. For example, a transport should catch an error and pass it to the ClientInvoker, rather than expecting the TrasportInvoker to work in reverse.

@adamcharnock
Copy link
Owner Author

This is ongoing and I’m working on it over the holiday period. The refactoring is largely done and I’m working on getting the tests sorted, plus a number of ‘TODO’s I’ve left in my wake.

@adamcharnock
Copy link
Owner Author

Tests now passing. TODOs next

@adamcharnock
Copy link
Owner Author

@apollo13, as we discussed on Discord, I'm going to add the on_error parameter back in as an argument to .listen(). Error's in handled RPCs should always be logged rather than cause a processes exit.

TODOs added in commit 9d7f2c5.

@adamcharnock
Copy link
Owner Author

adamcharnock commented Jan 9, 2020

@apollo13 I've just implemented a first pass at the on_error parameter (5b0fc13, 89c57df, c8cabbd)

What I didn't say earlier is: I actually think it makes much more sense for this to be a parameter to the listener, rather than a config option. How each service wants to handle errors is a choice for that service. There is no reason that every service should treat event-handling errors the same for a given API. Thanks for the good suggestion 👍

@adamcharnock
Copy link
Owner Author

I think this is now largely complete. I'll likely merge an do an 0.10.0 alpha1 release soon.

@adamcharnock adamcharnock added this to the 0.10.0 milestone Jan 11, 2020
@adamcharnock
Copy link
Owner Author

Merged in commit a8c106d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant