-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix thread safety in http pool manager #102
Conversation
I have identified another issue with thread safety. The default session_id is generated from uuid.uuid1 which is not generated in a multiprcessing safe way. This means that if two threads call the function at very near times, you can get the same uuid. This breaks the session locking stuff server side. Due to the nature of this, it can be a bit annoying to trigger. I have switched the uuid.uuid1 to uuid.uuid4, which has significantly higher entropy. |
Thanks for your investigations into multiprocessing! It's an area that we simply haven't had the time to test the limits. I'm very curious about the error you see when using the default_pool_manager. It's my understanding the urllib3 pool manager is thread safe and even before that PR the window where it was not thread safe was tiny (and would raise a different exception). In particular I'm fairly sure the actual connections are never shared between threads, although it's possible that the actual Python http_client somehow gets shared. Is it possible that the "bad status" line is just the result of some other ClickHouse server or loopback issue, possibly related to the load of 100 clients hitting the same ClickHouse server at once? In any case the choice to use a single PoolManager by default was intended for the simple and I assume most common use case of having a few clients always hitting the same host. Reusing connections in that case is desirable, especially if they can reuse "keep alive" HTTP connections. (However, looking through the code I did notice that the original In the case of a large number of clients/threads my intended solution was to construct those clients with their own PoolManager, preferable obtained by the get_pool_manager method. I consider that an advanced use case but it should fix your particular issue without a code change. Just change this line: client = get_client() to client = get_client(pool_mgr=get_pool_manager()) You could also use that mechanism to share a few different pool managers among multiple hosts, instead of having a 1 to 1 assignment. As for uuid1 vs uuid4, there's still 14 bits of random entropy on a single host in the same timestamp, vs 64 bits of entropy across all possible clients hitting the same ClickHouse server. I think there's no practical difference, although I can see uuid4 as being marginally "safer". But this is also something you can solve without a code change by settings the session_id parameter in the get_client factory method call. |
btw the way your concurrency code snippit does not trigger any exceptions on my M1 Mac -- Python 3.9.15, urllib 1.26.13 |
It is occuring every single run on arch linux with my 3950x (16 cores 32 threads). I also confirmed this issue on my XPS 9710 running Manjaro linux. If you are running your python through the hacky emulation layer thing they made for M1s, I suspect that python multiprocessing in general may be handled very differently. |
It's native ARM64, not Rosetta, but again that doesn't actually look like a concurrency error. |
As it turns out I was sort of right in my assumption about macOS doing multiprocessing differently: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
Updating the my example: import multiprocessing
import clickhouse_connect
def get_client() -> clickhouse_connect.driver.client.Client:
return clickhouse_connect.get_client(host='localhost')
def test_client(*_):
client = get_client()
client.command('SELECT version();')
client.close()
def test_concurrency():
# Force non-default spwan start method
multiprocessing.set_start_method('spawn')
# Call test client in original thread, before creating new
# process. This ensures that any globals that may be initialized
# in the library are initialized before forking next
test_client()
# Fork 8 processes, each initializing a fresh client. Any globals
# from this process that end up being used in child processes may
# very well become broken.
with multiprocessing.Pool(8) as pool:
pool.map(test_client, range(100))
if __name__ == '__main__':
test_concurrency() I would not consider doing this If this library is being developed only for macOS, then you need to disclose this up front and in the documentation. My org is simply not going to use clickhouse if this is the case. |
We build and test on linux, but I happen to develop on a Mac. And as I alluded to above, we don't currently have automated tests for large parallel processing. ClickHouse Connect is still very much beta and open source, so you know everything there is to know about tests and builds. Again, the urllib3 PoolManager is considered thread safe, but it appears you have found an exception. If you can reproduce the problem independent of ClickHouse Connect (which shouldn't be difficult, it really amounts to a bunch of parallel HTTP calls) I would suggest you open an issue in that project. It also seems like if you're going to use multiprocessing on Linux/Unix, you should not use a start method that is "problematic" according to the documentation and then blame the library. In any case, there's a simple workaround for the issue as I described above for your use case that doesn't require a code change and doesn't disable the current default connection sharing, and you have identified yet another workaround. As we have time and resources to update the documentation and move toward production status, those things will be improved. In the meantime, you are of course free to fork this project, use another Python client ( |
For the record, the python documentation is clear in saying that the "problematic" fork start method is only problematic on macOS. It is still the standard on linux because linux can handle posix forking. The issue they reference is clear it is only a macOS problem:
|
https://bugs.python.org/issue40379 -- not just MacOS |
Problem
There is a thread safety issue in the way the HttpClient initializes. Here is a snippet to trigger it:
This gives us this lovely error:
The problem is the global clickhouse_connect.driver.httputil.default_pool_manager, which is used only as the default http pool in clickhouse_connect.driver.httpclient.HttpClient.init
When this global is initialized in one process, then referenced in another it completely breaks any attempt at concurrency.
Solution
Remove the clickhouse_connect.driver.httputil.default_pool_manager global, and initialize a new pool manager on each new HttpClient.
Unless there is some huge perfomance cost to initializing new pool managers that I am unaware of, this should be the way it is done.