-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an experimental service with bulk load APIs. #59
Conversation
// BulkLoadRelationshipsResponse is returned on successful completion of the | ||
// bulk load stream, and contains the total number of relationships loaded. | ||
message BulkLoadRelationshipsResponse { | ||
uint64 num_loaded = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this guaranteed order? Can I figure out which rels haven't been loaded if I get back a number that is half of my request? I guess not since streaming requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, all of the implementations are transactional. Yes they are in order, and you will get back a response with a number of ALL that you have fed the server, or an error with 0. That's up for debate as an implementation detail though. We could auto-chunk into smaller transactions and make this number represent what you're suggesting. We never made a decision on the auto-chunking decision though.
2bbdca7
to
f2c6de6
Compare
// BulkLoadRelationshipsRequest represents one batch of the streaming | ||
// BulkLoadRelationships API. The maximum size is unlimited, but optimal size | ||
// should be determined by the calling client. | ||
message BulkLoadRelationshipsRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maximum size is unlimited
I think we should still set a limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the client figure out the optimal size?
Is the datastore going to make proper batch sizes for the most efficient writes? If so, maybe it's ideal that this is a multiple of that number.
Anyways, there should be some limit on here to prevent massive request memory usage/DOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is that the number is going to vary wildly based on:
- datastore choice
- datastore scaling parameters
- compressibility of the data on the wire
- how many concurrent writers we're using
- how far we are from the receiving servers
This is the reason why I left it as "an exercise for the reader". I don't think there's any way to come up with those numbers except experimentally. Setting any kind of limit will just be limiting our throughput in many cases for no reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't say "unlimited" here then; "depends on the datastore" might be better
// subject.object.object_id, subject.optional_relation) | ||
// | ||
// EXPERIMENTAL | ||
rpc BulkLoadRelationships(stream BulkLoadRelationshipsRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth a requirement that any experimental API has a referenced Github issue with it, placed here
f2c6de6
to
7f33957
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.