Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: What is your recommended approach to MultiThreading? #158

Closed
zurferr opened this issue Feb 27, 2021 · 7 comments
Closed

Question: What is your recommended approach to MultiThreading? #158

zurferr opened this issue Feb 27, 2021 · 7 comments

Comments

@zurferr
Copy link

zurferr commented Feb 27, 2021

Hi,
I want to use python-arango in a multithread environment (Webserver spawns thread for each request). Hundreds of concurrent requests are possible.

After reading the documentation, I understand that I might run into problems with the Session and indeed I had occasional 'Connection Refused' errors.
https://docs.python-arango.com/en/main/threading.html

I found the following issue, where a custom NoSessionHttpClient was used:
#92 (comment)

Is this the recommended solution for dealing with multithreading?
It seems sub-optimal because no connection pooling at all is used. So the performance especially latency will suffer.

Do you know better solutions? Thanks for reading. :)

@joowani
Copy link
Contributor

joowani commented Feb 27, 2021

Hi @zurferr,

According to this python-requests github thread, the Session object seems to be "almost" thread-safe. The recommended approach is to use a separate session per thread, or no session depending on your usecase. So unfortunately yes, it is sub-optimal and I don't see an easy way around it. There are some solutions like requests-futures available but python-arango is not equipped to use them (yet).

What you could do is find out why the Session objects are not considered thread-safe. If the reasons do not apply to you, you might be able to get away with simply increasing the connection pool count like this:

import requests

session = requests.Session()
adapter = requests.adapters.HTTPAdapter(
    pool_connections=100,
    pool_maxsize=100)
session.mount('http://', adapter)

See here for customizing your HTTP client.

@zurferr
Copy link
Author

zurferr commented Feb 27, 2021

Hi @joowani,
thanks a lot for the quick response. I went down the rabbit hole of your links.
It's seems that urllib3 that requests uses is thread safe. I tried to write a CustomHttpClient just with that.
In the end I was just reimplementing parts of the requests package. But these parts seem thread safe with basic authentication (which I use). So I concluded/hope that my usage is save as well. :)

On the way I found a small typo in https://docs.python-arango.com/en/main/http.html.

        http_adapter = HTTPAdapter(max_retries=retry_strategy)
        session.mount('https://', adapter) # should be http_adapter 
        session.mount('http://', adapter)

@zurferr
Copy link
Author

zurferr commented Feb 27, 2021

Alright, I now have a custom client that only uses urllib3 and so should be thread save.
It works in my case. I would be interested in what you think?

Right now, it only works with basic auth. Thus I am avoiding a session, but still am able to reuse the TCP connection pool handled by urllib3.

class ThreadSafeHTTPClient(HTTPClient):
    """HTTP client that only relies on urllib3, which is thread save"""

    def create_session(self, host):
        http = urllib3.PoolManager(num_pools=50)
        return http

    def send_request(self,
                     session: urllib3.poolmanager.PoolManager,
                     method,
                     url,
                     params=None,
                     data=None,
                     headers=None,
                     auth=None):

        # join headers with basic auth
        headers = {**headers, **urllib3.make_headers(basic_auth=':'.join(auth))}

        # Send a request.
        response:HTTPResponse = session.request(
            method=method,
            url=url,
            fields=params,
            body=data,
            headers=headers,
            timeout=urllib3.Timeout(connect=10.0, read=60.0) 
        )

        # Return an instance of arango.response.Response.
        return Response(
            method=method,
            url=response.geturl(),
            headers=response.headers,
            status_code=response.status,
            status_text=response.reason,
            raw_body=response.data,
        )

@joowani
Copy link
Contributor

joowani commented Feb 27, 2021

Hi @zurferr,

This is great! When I find some time I'll play around with this myself. If it passes all the tests I'll also put it in the threading documentation page.

@joowani
Copy link
Contributor

joowani commented Feb 27, 2021

Oh and I'll also fix the typo. Thanks for pointing it out.

@zurferr
Copy link
Author

zurferr commented Mar 1, 2021

Hi @joowani,

I currently moving apartments and probably should not publish code right now.
The above version only works for the most trivial requests.

Here is a version that should work for all sorts of requests.
At least it works for the ones I use. But it also only supports BasicAuthentication.

class ThreadSafeHTTPClient(HTTPClient):
    """My custom HTTP client with cool features."""

    def create_session(self, host):
        """not a real session, only a connection pool manager"""
        # allow 100 concurrent connection, queue/block after that
        http = urllib3.PoolManager(maxsize=100, block=True)
        return http

    def send_request(self,
                     session: urllib3.poolmanager.PoolManager,
                     method,
                     url,
                     params=None,
                     data=None,
                     headers=None,
                     auth=None):
        # join headers
        headers = {**headers, **urllib3.make_headers(basic_auth=':'.join(auth))}

        # prepare url parameter
        if params is not None:
            url = url + '?' + urlencode(params)

        # Send a requests
        response: HTTPResponse = session.request(
            method=method,
            url=url,
            body=data,
            headers=headers,
            timeout=urllib3.Timeout(connect=10.0, read=60.0)
        )

        # Return an instance of arango.response.Response.
        return Response(
            method=method,
            url=response.geturl(),
            headers=response.headers,
            status_code=response.status,
            status_text=response.reason,
            raw_body=response.data,
        )

@joowani
Copy link
Contributor

joowani commented Mar 9, 2021

Closing this out. Feel free to reopen if you have any more questions. Thanks.

@joowani joowani closed this as completed Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants