Support for large numbers of features #103

wasade · 2021-04-27T17:38:25Z

This pull requests expands on the server side logic to support bulk load operations for obtaining indices from identifiers, and the load of identifier specific data.

The motivation is to support tables containing millions of identifiers. When performed individually, these operations require milliseconds per query, and 1 million times 1 millisecond begins to get large. What we're in effect doing here is packing in more data into the individual requests to reduce the HTTP request/response overhead.

antgonza

@wasade, looks good, thank you.

Some minor comments. Also, would it be worth adding the batch sizes as global or env variables and defaulting to the current values in the code if not present? This will allow changing that values on the fly for debugging or testing the batch sizes. If you agree and out of the scope of this PR, fine to open as an issue.

redbiom/_requests.py

redbiom/admin.py

Co-authored-by: Antonio Gonzalez <antgonza@gmail.com>

…hler

…hler

* TST: sample id content type bug * MAINT: fix issue where samples with .raw as a suffix were triggering unexpected returns * Don't suffix twice * Adjsut to account for force of json

antgonza

A couple of extra comments + the one about batch size.

antgonza · 2021-10-21T15:22:36Z

redbiom/_requests.py

            req = s.post(config['hostname'],
                         data=_format_request(context, cmd, payload))
+
+            if verbose:
+                print(context, cmd, payload[:100])


Not necessary but have you consider the logging python package?

Yes, but I would like to consider that out of scope of this PR

antgonza · 2021-10-21T15:25:36Z

redbiom/admin.py

+
+    Notes
+    -----
+    This method only supports count data.


Should this be checked?

Sure, added

wasade · 2021-10-21T15:40:33Z

Sorry, missed the batchsize comment. I don't think it's understood well enough to motivate centralizing. At this point, based on what I know, I suspect that effort will not be for gain.

wasade · 2021-10-21T15:45:45Z

Sorry, missed the batchsize comment. I don't think it's understood well enough to motivate centralizing. At this point, based on what I know, I suspect that effort will not be for gain.

wasade added 6 commits April 20, 2021 14:27

TST: bulk get index test

1531aae

Batch index requests

769b4de

A little cleanup on debug items

b809f9b

Additional comments

db30151

Merge branch 'master' of github.com:biocore/redbiom into batch-get-index

3b6d075

sty

f1af821

wasade requested a review from antgonza April 28, 2021 01:53

antgonza reviewed Apr 28, 2021

View reviewed changes

redbiom/_requests.py Outdated Show resolved Hide resolved

redbiom/admin.py Outdated Show resolved Hide resolved

redbiom/admin.py Outdated Show resolved Hide resolved

wasade and others added 22 commits April 28, 2021 08:21

Update redbiom/admin.py

b6b2d2f

Co-authored-by: Antonio Gonzalez <antgonza@gmail.com>

BUG: fixes biocore#108, thanks @cotillau!

764ef0c

VER: actually bump it, partially resolves biocore#107, thanks @BenKae…

c40ae27

…hler

tentative py3738 support

43654f0

update actions

d49782c

update actions

498cfa5

update actions

7730ae0

update actions

ad74b9e

update actions

287c7aa

update actions

94a9faa

update actions

3b95e10

update actions

76f176c

update actions

12900d2

update actions

0ed0cb6

update actions

f1254d2

update actions

90e248e

update actions

3aca8e1

BUG: fixes biocore#93

0f54e9b

BUG: fixes biocore#92

a9ff228

remove unhelpful print

d5ecbd6

Bump version

35cfe2c

Force json (biocore#113)

e294f0f

* TST: sample id content type bug * MAINT: fix issue where samples with .raw as a suffix were triggering unexpected returns * Don't suffix twice * Adjsut to account for force of json

wasade added 2 commits October 21, 2021 08:19

Address @antgonza's comments

6d80fa2

use the right variable name

e55b110

antgonza reviewed Oct 21, 2021

View reviewed changes

wasade added 2 commits October 21, 2021 08:35

Verify data appear count

83f4b3f

Verify data appear count

56820e1

antgonza approved these changes Oct 21, 2021

View reviewed changes

antgonza merged commit d4af5a2 into biocore:master Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for large numbers of features #103

Support for large numbers of features #103

wasade commented Apr 27, 2021

antgonza left a comment •

edited

Loading

antgonza left a comment

antgonza Oct 21, 2021

wasade Oct 21, 2021

antgonza Oct 21, 2021

wasade Oct 21, 2021

wasade commented Oct 21, 2021

wasade commented Oct 21, 2021

Support for large numbers of features #103

Support for large numbers of features #103

Conversation

wasade commented Apr 27, 2021

antgonza left a comment • edited Loading

Choose a reason for hiding this comment

antgonza left a comment

Choose a reason for hiding this comment

antgonza Oct 21, 2021

Choose a reason for hiding this comment

wasade Oct 21, 2021

Choose a reason for hiding this comment

antgonza Oct 21, 2021

Choose a reason for hiding this comment

wasade Oct 21, 2021

Choose a reason for hiding this comment

wasade commented Oct 21, 2021

wasade commented Oct 21, 2021

antgonza left a comment •

edited

Loading