-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New DBSConcurrency module for concurrent execution of HTTP queries to DBS via pycurl manager #11913
Conversation
Jenkins results:
|
fc5494c
to
a894981
Compare
Jenkins results:
|
Alan, this is a first draft, feel free to provide your feedback. If you satisfied with this code I can import this module in #11884 and use it there after the merge of the latter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valentin, I have two general comments for this PR.
-
Ideally, we should make this pycurl-based API as generic as possible - hence accepting a kwargs as input parameter, matching the actual
files
DBS API. The problem with that is that we have to keep all the data returned by DBS in memory, pass it upstream and let the upstream code to parse and use the data as it needs.
Another approach would be to create a specific function for the pileup use case, similar to what is written here. But then we better clarify it in the code not to mislead people. -
For the unit test, we have two options. a) Either we mark it as integration, not to run it in Jenkins all the time; or b) we use a different dataset that has only a handful of blocks. That Neutrino dataset is huge and we should not add relevant load into the production system by running unit tests. In addition, let us please use a dataset from cmsweb-testbed instead, not to make any requests to the production system.
Alan, the second issue is addressed now. Regarding the first one. I think you mix two different concepts here:
They have nothing to do with each other. For the first one, I can put For the second, (memory) optimization part. The code generalization can be done via converting it to generator which will yield individual records, like |
Jenkins results:
|
bb78c39
to
5612c2a
Compare
Jenkins results:
|
Jenkins results:
|
Alan, I added memory measurement into my gist, and measured memory allocation before, after
Here are few observations:
This confirms my observation that |
Jenkins results:
|
6904a3d
to
6a6d5ae
Compare
Jenkins results:
|
Now, it is ready for another round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for confirming the behavior of multi_getdata() Valentin.
These changes look good to me and I think this closes the loop of partial pileup support in the agent.
Fixes #11899
Status
ready
Description
New
DBSConcurrency
module to hold helper functions to DBS via concurrent execution of HTTP calls.Is it backward compatible (if not, which system it affects?)
YES
Related PRs
#11884
External dependencies / deployment changes
It requires pycurl module