Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(PREVIEW BRANCH) all the new specs combined #120

Closed
wants to merge 29 commits into from

Conversation

bnewbold
Copy link
Contributor

@bnewbold bnewbold commented Jun 15, 2023

@bnewbold bnewbold marked this pull request as ready for review June 15, 2023 02:42
@bnewbold bnewbold marked this pull request as draft June 15, 2023 02:43
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jun 15, 2023

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9b4560a
Status: ✅  Deploy successful!
Preview URL: https://50b1fc1b.atproto-website.pages.dev
Branch Preview URL: https://bnewbold-draft-specs-june.atproto-website.pages.dev

View logs

@warpfork
Copy link

There's quite a few links to www.notion.so/blueskyweb/* which are non-public -- if those docs live in notion but are public, maybe you want to get their share links instead? Notion isn't very good about redirecting non-logged-in users to the public versions of pages if they arrive at the internal links.

@bnewbold
Copy link
Contributor Author

@warpfork missed some, thanks! These don't link to actual documents, they were relative links between spec pages in Markdown which Notion mangled. Will resolve these in the individual PRs.

@bnewbold
Copy link
Contributor Author

TODO(bnewbold): "The first entry in the array for a given node should contain the full key, and a common prefix length of 0." => must not should

clarify that prefix compression is required and deterministic.


## Server-to-server API
**Application:** APIs and record schemas for applications built on atproto are specified in [Lexicons](/specs/lexicon), which are are referenced by [Namespaced Identifiers](/specs/nsid) (NSIDs). Application-specific aggregations (such as search) are provided by an Application View (AppView) service. Clients can include mobile apps, desktop software, or web interfaces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm! AppView is new to me. Sounds like it overlaps a lot with BGS. Is BGS the architectural concept, and AppView is more of a service/interface?

Copy link
Contributor

@devinivy devinivy Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BGS is primarily a "dumb" rehoster of content across PDSes, whereas the AppView provides the views specific to the Bluesky application. The BGS doesn't need to have semantic awareness of the content it's hosting, whereas the AppView is interpreting "like", "follow", "post", "profile", etc. records into more complex views to service an application. Roughly speaking, BGS is concerned with com.atproto.sync.* and com.atproto.repo.* lexicons, and the Bluesky AppView is concerned with app.bsky.* lexicons.


## Possible Future Changes

The validation rules for unexpected additional fields may change. For example, a mechanism for Lexicons to indicate that the schema is "closed" and unexpected fields are not allowed, or a convention around field name prefixes (`x-`) to indicate unofficial extension.
Copy link
Contributor

@snarfed snarfed Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, the standards community has generally moved away from experimental X- prefixes, and agreed that they caused more harm than good. https://datatracker.ietf.org/doc/html/rfc6648#appendix-B . (Also relevant to your idea of including experimental in new lexicon NSIDs.) Same with vendor prefixes more recently, https://www.webstandards.org/2012/02/09/call-for-action-on-vendor-prefixes/index.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refs! We do need to think through what the best practice is here better. I guess we think of the primary extension mechanism (new lexicons) as being pretty clear, but that might be because we are in a weird perspective of having full control over the most popular lexicons. other folks sure do seem to want to stuff fields in these lexicons, and we should have a better answer.


Unexpected fields in data which otherwise conforms to the Lexicon should be ignored. When doing schema validation, they should be treated at worst as warnings. This is necessary to allow evolution of the schema by the controlling authority, and to be robust in the case of out-of-date Lexicons.

Third parties can technically insert any additional fields they want in to data. This is not the recommended way to extend applications, but it is not specifically disallowed. One danger with this is that the Lexicon may be updated to include fields with the same field names but different types, which would make existing data invalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...also one thing that's missing here is guidance on whether PDSes should accept records with unknown lexicons, and if so, how they should validate them, etc. Depends on lexicon resolution I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pfrazee implied/said yes in bluesky-social/atproto#855 :

the goal is to make the PDS indifferent to application schemas and provide generic behaviors only; that way adding some new schema doesnt run the risk of losing forward progress

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we definitely want PDS implementations to be more agnostic to the lexicons of data they host. Right now, in prod, we still have PDS+AppView conjoined, which is why we haven't loosened up on this yet.

We wanted to have a bit more confidence in this by first actually deploying federation with the app.bsky lexicons working with PDS+BGS+AppView as distinct network services (happening as a slow-roll transition right now), and probably us ourselves doing an example separate/independent lexicon as a secondary app to see how that goes, before we are confident in this entire architecture. That does take time and we should probably just get out of the way and let devs experiment in other spaces though. One step would be to allow random un-validated records outside the com.atproto and app.bsky spaces, to keep those on a shorter leash to start.

We also haven't worked out the "fetch Lexicon automatically over the network given the NSID". A couple options on how to do that, none super complex, but not clear if that will be reliable enough to require fetching and validation for new record types, or to end up with a schema where PDS instances have some "popular" or "common" set of lexicons built-in (compiled-in?) and work with just those.


Note that very long handles can not be resolved using this method if the additional `_atproto.` name segment pushes the overall name over the 253 character maximum for DNS queries. The HTTPS method will work for such handles.

DNSSEC is not required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 😭

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... sort of poking the hornets nest there.

maybe we should require IPv6 though? 😈

@@ -92,11 +92,68 @@ laptop.local
blah.arpa
```

### Usage and Implementation Guidelines
## Handle Resolution
Copy link
Contributor

@snarfed snarfed Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is all great.

As mentioned before, definitely questions here around how this interacts with federation, but that's for the future. And ideally guidance on when/how to deal with handles in records as alternatives to DIDs. (Sounds like consumers are generally expected to prefer DIDs but accept both interchangeably.)

|-|-|
|`query`|`GET`|
|`procedure`|`POST`|
### Admin Token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I'm guessing this is primarily a legacy holdover, so it's more descriptive of the current prod PDS than normative, right? Otherwise it seems like this could be an implementation detail that doesn't really need to standardized, or if it does, maybe it would fit better in the protocol or repository docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all the auth stuff is sort of temporary and we expect to improve a lot on this. But that isn't going to be a quick fix, and we want to be transparent and clear how the existing scheme works in the meanwhile.

We do expect to have some of the other things, like repo history and sync stream, firmed up sooner than auth, which is why we just don't include those at all yet.


If a client can not keep up with the rate of messages, the server may send a "too slow" error and close the connection.

### Sequence Numbers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, this is a lot. Sure would be nice if we could get some of it for free by switching the namespace from PDS hostname + endpoint to user DID + collection and then depending on monotonically increasing TIDs instead. I guess that would mean servers and clients would have to keep track of O(users) sequence TIDs instead of just O(servers/clients + endpoints) sequence numbers though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is under pretty active discussion internally, and with a few external tech advisors. Hard to compress all the conversation to a short summary. Also, realistically, this is fairly high up in the protocol stack and the sort of thing that might have multiple implementations over time. Eg, internally and at scale, can't really imagine not using Kafka for the big firehoses; maybe we'll have a formal "how to do this in Kafka within an org/consortium" spec or recommendation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, clarified that the sequence stuff is optional. Can totally create subscription Lexicons which don't include it.

@bnewbold
Copy link
Contributor Author

Thanks for feedback from everybody on this group of changes!

Everything in thing branch has now been merged and deployed in to the live website. Future corrections should go in new issues or PRs.

There are some active discussions here, and this branch was linked from a couple places. I'll leave it for another day or two then close the PR (which IIUC will remove the PR preview site).

@bnewbold bnewbold closed this Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants