Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an experimental service with bulk load APIs. #59

Merged
merged 1 commit into from
May 10, 2023

Conversation

jakedt
Copy link
Member

@jakedt jakedt commented May 4, 2023

No description provided.

// BulkLoadRelationshipsResponse is returned on successful completion of the
// bulk load stream, and contains the total number of relationships loaded.
message BulkLoadRelationshipsResponse {
uint64 num_loaded = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed order? Can I figure out which rels haven't been loaded if I get back a number that is half of my request? I guess not since streaming requests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, all of the implementations are transactional. Yes they are in order, and you will get back a response with a number of ALL that you have fed the server, or an error with 0. That's up for debate as an implementation detail though. We could auto-chunk into smaller transactions and make this number represent what you're suggesting. We never made a decision on the auto-chunking decision though.

authzed/api/v1/experimental_service.proto Outdated Show resolved Hide resolved
// BulkLoadRelationshipsRequest represents one batch of the streaming
// BulkLoadRelationships API. The maximum size is unlimited, but optimal size
// should be determined by the calling client.
message BulkLoadRelationshipsRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum size is unlimited

I think we should still set a limit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the client figure out the optimal size?

Is the datastore going to make proper batch sizes for the most efficient writes? If so, maybe it's ideal that this is a multiple of that number.

Anyways, there should be some limit on here to prevent massive request memory usage/DOS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that the number is going to vary wildly based on:

  1. datastore choice
  2. datastore scaling parameters
  3. compressibility of the data on the wire
  4. how many concurrent writers we're using
  5. how far we are from the receiving servers

This is the reason why I left it as "an exercise for the reader". I don't think there's any way to come up with those numbers except experimentally. Setting any kind of limit will just be limiting our throughput in many cases for no reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't say "unlimited" here then; "depends on the datastore" might be better

authzed/api/v1/experimental_service.proto Show resolved Hide resolved
// subject.object.object_id, subject.optional_relation)
//
// EXPERIMENTAL
rpc BulkLoadRelationships(stream BulkLoadRelationshipsRequest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth a requirement that any experimental API has a referenced Github issue with it, placed here

Copy link
Member

@josephschorr josephschorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakedt jakedt marked this pull request as draft May 10, 2023 18:08
@jakedt jakedt marked this pull request as ready for review May 10, 2023 22:10
@jakedt jakedt merged commit af4020f into main May 10, 2023
@jakedt jakedt deleted the bulk-load-experimental branch May 10, 2023 22:10
@github-actions github-actions bot locked and limited conversation to collaborators May 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants