-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for large numbers of features #103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wasade, looks good, thank you.
Some minor comments. Also, would it be worth adding the batch sizes as global or env variables and defaulting to the current values in the code if not present? This will allow changing that values on the fly for debugging or testing the batch sizes. If you agree and out of the scope of this PR, fine to open as an issue.
Co-authored-by: Antonio Gonzalez <antgonza@gmail.com>
* TST: sample id content type bug * MAINT: fix issue where samples with .raw as a suffix were triggering unexpected returns * Don't suffix twice * Adjsut to account for force of json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of extra comments + the one about batch size.
req = s.post(config['hostname'], | ||
data=_format_request(context, cmd, payload)) | ||
|
||
if verbose: | ||
print(context, cmd, payload[:100]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary but have you consider the logging python package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I would like to consider that out of scope of this PR
|
||
Notes | ||
----- | ||
This method only supports count data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added
Sorry, missed the batchsize comment. I don't think it's understood well enough to motivate centralizing. At this point, based on what I know, I suspect that effort will not be for gain. |
1 similar comment
Sorry, missed the batchsize comment. I don't think it's understood well enough to motivate centralizing. At this point, based on what I know, I suspect that effort will not be for gain. |
This pull requests expands on the server side logic to support bulk load operations for obtaining indices from identifiers, and the load of identifier specific data.
The motivation is to support tables containing millions of identifiers. When performed individually, these operations require milliseconds per query, and 1 million times 1 millisecond begins to get large. What we're in effect doing here is packing in more data into the individual requests to reduce the HTTP request/response overhead.