Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for HTTPConnectionPool maxsize. #106

Closed
wants to merge 4 commits into from

Conversation

laixintao
Copy link

pool_connections is for urllib3.ConnectionPools's maxsize, while
pool_maxsize is for urllib3.PoolManager's num_pools.

I think there are differences.

For example, if threads > ConnectionPool's maxsize, some connections won't be reuse. If maxsize < threads, connections may be wasted.

urllib3 doc:

By default, the pool will cache just one connection. If you’re planning on using such a pool in a multithreaded environment, you should set the maxsize of the pool to a higher number, such as the number of threads.

http://urllib3.readthedocs.io/en/1.2.1/pools.html

@CLAassistant
Copy link

CLAassistant commented Feb 9, 2018

CLA assistant check
All committers have signed the CLA.

@yami
Copy link
Contributor

yami commented Feb 9, 2018

看了一下requests的代码,num_pools (pool_connections) 的含义是PoolManager中包含的Pool数,也就是不同的(scheme, host, port)数。maxsize(pool_maxsize)的含义是每个pool中缓存的connection数。

现在我们的代码比较粗暴,两个值是设成一样的。注意到在绝大多数情况下(scheme, host, port)三元组只有一个,所以真实的场景是

  • Pool只有一个
  • 每个Pool里的连接数最多是connection_pool_size个

所以当前的代码运行能够符合预期。当然,我们现在num_threads和connection_pool_size是两个没有联动的配置,就会出现 @laixintao 说的场景,有时候threads被阻塞了,有时候connection浪费了。但是现在给的patch并不能解决这个问题。

@laixintao
Copy link
Author

@yami Hi, 我没想到实际的场景。不过按照这个情况:

Pool里面connections的数量 = num_threads
Poolsize默认是1

这样是不是比较合理?

有时候threads被阻塞了,有时候connection浪费了。但是现在给的patch并不能解决这个问题。

这个我有点疑惑,当前这个patch没有解决吗?

@laixintao
Copy link
Author

ps 我觉得只会出现connections浪费的问题,不会出现thread被阻塞的问题,连接池默认有一个block=False

class urllib3.connectionpool.HTTPConnectionPool(host, port=None, strict=False, timeout=None, maxsize=1, block=False, headers=None)

http://urllib3.readthedocs.io/en/1.2.1/pools.html#api

@yami
Copy link
Contributor

yami commented Feb 9, 2018

当前master的代码是
self.session.mount('http://', requests.adapters.HTTPAdapter(pool_connections=psize, pool_maxsize=psize))

两个值都是一样的,缺省是10。也就是说现在的代码不需要修改。

@laixintao
Copy link
Author

假如说应用的线程池有20个线程,事实上只有1个pool的10个连接缓存可用,其中有10个线程会不断创建和断开TCP连接。

默认让连接池的连接数=线程数不是更好吗?

@yami
Copy link
Contributor

yami commented Feb 11, 2018

设成一样是更好,但是你这个patch解不了这个问题,而且也不是单纯的和某一次multithread get/upload的参数相关,要看用户最大的并发线程数。

@laixintao
Copy link
Author

laixintao commented Feb 11, 2018

嗯,是应该设置成最大的。我以为这个参数是全局的默认参数。

@laixintao
Copy link
Author

现在这样呢? 只是暴露出来pool size可以让用户自己设置。

或者 如 #86 提到的这样,暴露出来Adapter可以让用户自己mount

@yami
Copy link
Contributor

yami commented Feb 11, 2018

#86 的问题怎么解,目前还没怎么想。今天就当前这个pr来说,我们可以先保持原有代码。

@laixintao
Copy link
Author

嗯好的。

@laixintao laixintao closed this Feb 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants