-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
specs: repository diffs #354
Conversation
Should we link to https://ipld.io/specs/transport/car/carv1/#summary in there somewhere? It's helpful to have a diagram to look at that explains the layout of bytes in a CAR file. |
@ericvolp12 there is an existing section in the same specs page ("Repository") about CAR files generally (eg, repo exports) which links to that exact page |
Oh lmao okay whoops |
not the first to mention it! |
|
||
A concept which supports efficient synchronization of data between independent services is "diffs" of repository trees between different revisions. The basic principle is that a repository diff contains all the data (commit object, MST nodes, and records) that have changed between an older revision and the current revision of a repo. The diff can be "applied" to the older mirror of the repository, and the result will be the complete MST tree at the current (newer) commit revision. | ||
|
||
Repo diffs can be serialized as CAR files, sometimes referred to as "CAR slices". Some details about diff CAR slices: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we a. link to the CAR spec, b. specify that it's CAR v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is discussion elsewhere in this same spec file about CAR files generally (actually in the section just above this one). I didn't want to duplicate that level of granularity so I kind of just reference it. I think the desire is to have it be "the same" more than any specific format.
|
||
- same format, version, and atproto-specific constraints as full repo export CAR files | ||
- blocks "should" be de-duplicated by CID (only one copy included), though receiving implementations must be resilient to duplication | ||
- the root CID indicated in the CAR header (the first element of `roots`) should point to the commit block (which must be included) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently require that the CAR only has 1 root (though we could adapt the implementation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine / consistent enough? I think i'd like other folks to handle multiple roots on the "processing a CAR slice" side, but not assume they can have multiple roots on the "creating CAR slices" side, and I think this language achieves that?
IMO the next step would be a bit more flexibility in receiving implementation, but that this won't cause problems going out in the spec this way.
Describes the format and contents of "repo diffs" aka "CAR slices". These are returned by some sync API endpoints, and included in the firehose.
The firehose and sync details will be described in a separate spec document.