Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specs: repository diffs #354

Merged
merged 2 commits into from
Nov 9, 2024
Merged

specs: repository diffs #354

merged 2 commits into from
Nov 9, 2024

Conversation

bnewbold
Copy link
Contributor

@bnewbold bnewbold commented Oct 17, 2024

Describes the format and contents of "repo diffs" aka "CAR slices". These are returned by some sync API endpoints, and included in the firehose.

The firehose and sync details will be described in a separate spec document.

@ericvolp12
Copy link
Contributor

Should we link to https://ipld.io/specs/transport/car/carv1/#summary in there somewhere? It's helpful to have a diagram to look at that explains the layout of bytes in a CAR file.

@bnewbold
Copy link
Contributor Author

bnewbold commented Oct 17, 2024

@ericvolp12 there is an existing section in the same specs page ("Repository") about CAR files generally (eg, repo exports) which links to that exact page

@ericvolp12
Copy link
Contributor

Oh lmao okay whoops

@bnewbold
Copy link
Contributor Author

not the first to mention it!


A concept which supports efficient synchronization of data between independent services is "diffs" of repository trees between different revisions. The basic principle is that a repository diff contains all the data (commit object, MST nodes, and records) that have changed between an older revision and the current revision of a repo. The diff can be "applied" to the older mirror of the repository, and the result will be the complete MST tree at the current (newer) commit revision.

Repo diffs can be serialized as CAR files, sometimes referred to as "CAR slices". Some details about diff CAR slices:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we a. link to the CAR spec, b. specify that it's CAR v1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is discussion elsewhere in this same spec file about CAR files generally (actually in the section just above this one). I didn't want to duplicate that level of granularity so I kind of just reference it. I think the desire is to have it be "the same" more than any specific format.


- same format, version, and atproto-specific constraints as full repo export CAR files
- blocks "should" be de-duplicated by CID (only one copy included), though receiving implementations must be resilient to duplication
- the root CID indicated in the CAR header (the first element of `roots`) should point to the commit block (which must be included)
Copy link
Contributor

@dholms dholms Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently require that the CAR only has 1 root (though we could adapt the implementation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine / consistent enough? I think i'd like other folks to handle multiple roots on the "processing a CAR slice" side, but not assume they can have multiple roots on the "creating CAR slices" side, and I think this language achieves that?

IMO the next step would be a bit more flexibility in receiving implementation, but that this won't cause problems going out in the spec this way.

@bnewbold bnewbold merged commit 901c193 into main Nov 9, 2024
@bnewbold bnewbold deleted the bnewbold/repo-diffs branch November 9, 2024 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants