Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record references in lexicon typed as unknown #999

Open
Matrix89 opened this issue May 8, 2023 · 10 comments
Open

Record references in lexicon typed as unknown #999

Matrix89 opened this issue May 8, 2023 · 10 comments

Comments

@Matrix89
Copy link
Contributor

Matrix89 commented May 8, 2023

According to feed/defs.json post view's record type is unknown. But the API calls which return post view, have the record $type set to app.bsky.feed.post.

Is this intentional or am I missing something about the implementation?

Couldn't postView#record be typed as { type: "recordRef", schema: "app.bsky.feed.post" }?

Edit:
Just to add, having them typed would allow for better codegen

@devinivy
Copy link
Collaborator

devinivy commented May 8, 2023

I agree that it would be convenient to have concrete types for records embedded in responses like this. The issues we run into here are that a. technically users have the ability to publish invalid records in their repository, and b. there's the possibility that the schema for a record (for example) expands over time. It should do this in such a way that it remains both forward- and backward-compatible, but there are still lots of difficult cases that can arise or the schema author may not perfectly follow that rule.

So programs ingesting raw record data should make sure they understand the semantics of the records that they're consuming when choosing how to deal with them. The way we typically handle this is by validating the record contents based on our application's version of the schema for the record. At that point the application can start to use a more specific type that it's prepared to handle, or decide to use some fallback behavior if it doesn't understand the record data it received.

@Matrix89
Copy link
Contributor Author

Matrix89 commented May 9, 2023

technically users have the ability to publish invalid records in their repository

I don't think that's a problem, more of a consideration that should be kept in mind when implementing deserializers.

there's the possibility that the schema for a record (for example) expands over time. It should do this in such a way that it remains both forward- and backward-compatible, but there are still lots of difficult cases that can arise or the schema author may not perfectly follow that rule.

I see that, without record schema versioning it sounds impossible. I think It could be solved either by:

  • Adding some kind of version indicator, but that makes devs responsible for properly incrementing the version and it doesn't address the backward compatibility issue completely
  • Storing schemas in a content addressable fashion(IPLD?), but that doesn't sound dev friendly

In summary, the issue can't be easily solved and all of the solutions I could think of have some drawbacks.

Now I have one more question, what is the meaning of unknown? It sounds like it could refer to any lexicon type not only records

@devinivy
Copy link
Collaborator

devinivy commented May 9, 2023

Yeah unknown can be thought of as the type Record<string, unknown> in typescript: it's guaranteed to be an object (i.e. not an array or primitive).

@tom-sherman
Copy link

tom-sherman commented Jun 4, 2023

Yeah unknown can be thought of as the type Record<string, unknown> in typescript: it's guaranteed to be an object (i.e. not an array or primitive).

This is not correct. unknown, along with any, is one of TypeScript's "top types" ie. it describes the set of all possible types. Therefore it's not safe to assume anything about the underlying value as it could be anything including primitives, arrays, or null.

If the semantics that the library is aiming for is "any object" then Record<string, unknown> is indeed the correct type.

@dholms
Copy link
Collaborator

dholms commented Jun 6, 2023

hey @tom-sherman, it may not have come across clearly, but what @devinivy was saying was that unknown in lexicon can be thought of as the type Record<string, unknown> in typescript

However, you're correct in your comment here that I believe we're generated the wrong type there by generating {} (any non-nullish value) instead of Record<string, unknown>

@dholms
Copy link
Collaborator

dholms commented Jun 6, 2023

important to note that unknown comes with one additional requirement that the contained object has a $type key on it

an unknown in lexicon is the same thing as an open union with no refs provided

@tom-sherman
Copy link

tom-sherman commented Jun 6, 2023

Yeah sorry about that, I think I got a little mixed up between those two threads!

unknown maybe can be best represented as {$type: string} & Record<string, unknown> then?

@dholms
Copy link
Collaborator

dholms commented Jun 7, 2023

yup that sounds accurate 👍

@str4d
Copy link

str4d commented Jul 28, 2024

@dholms wrote:

an unknown in lexicon is the same thing as an open union with no refs provided

This does match the specification for union (though the spec says "similar to", not "the same as".

important to note that unknown comes with one additional requirement that the contained object has a $type key on it

This does not match my reading of the specification for unknown (emphasis added):

Indicates than any data could appear at this location, with no specific validation. Note that the data must still be valid under the data model: it can't contain unsupported things like floats.

No type-specific fields.

as well as the specification for when to use $type, which states that $type is only required for record and union (emphasis added):

Data objects sometimes include a $type field which indicates their Lexicon type. The general principle is that this field needs to be included any time there could be ambiguity about the content type when validating data.

The specific rules are:

  • record objects must always include $type. While the type is often known from context (eg, the collection part of the path for records stored in a repository), record objects can also be passed around outside of repositories and need to be self-describing
  • union variants must always include $type, except at the top level of subscription messages

Note that blob objects always include $type, which allows generic processing.

As a reminder, main types must be referenced in $type fields as just the NSID, not including a #main suffix.

If it is the intention that unknown is always an object and always contains a $type key, it would be great to add this to the specification for unknown. It would also be useful to describe how unknowns should be generally handled, so that implementors don't presume they can automatically parse as records.

If instead the unknown spec is correct, that it can be any (data-model-valid) type, and if an object is not required to have a $type field, then the union documentation should be updated to clarify that a union with no refs is a subset of unknown in that it only supports objects and requires $type on them.

The reason I came across this is that I've found at least one Lexicon that violates @dholms's assertion: com.atproto.identity.getRecommendedDidCredentials. The verificationMethods and services fields of its response have type unknown, but do not include $type fields in their objects. I encountered this because currently atrium parses unknown fields as records (because many of them are), which currently forces $type to exist.

@bnewbold
Copy link
Collaborator

Unfortunately this is still an soft spot and under-specified. We have a tracking issue here: bluesky-social/atproto-website#319

which references a discussion where other questions come up: #2615

The current "vibes" (not a spec!) are:

  • unknown data should probably be either null or an object, not (eg) a string. not finalized though!
  • unknown objects don't need to have $type

Something like a DID document, or fragment of a DID document, is basically exactly why we have unknown in Lexicon: an object with arbitrary fields which might be hard to spec out within the Lexicon schemas and evolution rules, and lack a $type field.

Maybe we should add an additional lexicon type for arbitrary record! which would have to be an object and would need to have $type. that pattern shows up in a couple places and would probably be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants