Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to allow defining schema format other than default one (AsyncAPI Schema) #622

Closed
Tracked by #944
magicmatatjahu opened this issue Sep 10, 2021 · 57 comments
Closed
Tracked by #944
Labels
stale 💭 Strawman (RFC 0) RFC Stage 0 (See CONTRIBUTING.md)

Comments

@magicmatatjahu
Copy link
Member

magicmatatjahu commented Sep 10, 2021

Introduction

Currently we can define a scheme in a different format than the default (AsyncAPI Schema) only in the message's payload field (#528). E.g. defining the scheme in a different format for the message's headers field, or for the fields in bindings (see issue -> asyncapi/avro-schema-parser#67), or in extensions the user cannot define.

Proposal

My proposal is mainly based on extending Scheme Object to union type:

  • with form of current one
  • and the second one is with schema and schemaFormat fields so the following examples will be compatible with each other:
components:
  schemas:
  
    schemaWithFormat:
      schemaFormat: 'application/vnd.aai.asyncapi;version=2.1.0'
      schema:
        type: object
        required:
        - name
        properties:
          name:
            type: string
              
    plainSchema:
      type: object
      required:
      - name
      properties:
        name:
          type: string

In addition, we should define a field on the root of the document defaultSchemaFormat - similar to defaultContentType. Any schema that does not have the schemaFormat field defined will be treated as with the format indicated by defaultSchemaFormat.

asyncapi: 3.0.0
info: ...

defaultSchemaFormat: 'application/vnd.apache.avro;version=1.9.0'

components:
  schemas:
    mySchema: # infer `schemaFormat` from `defaultSchemaFormat`
      type: record
      doc: User information
      fields:
        - name: displayName
          type: string

This solution also allows us to define in easier way schemas that have a string format (GraphQL types, Protobuf or XSD):

components:
  schemas:
    # Schema described by protobuf
    mySchema:
      schemaFormat: ...protobuf
      schema: |
        message Person {
          required string name = 1;
          required int32 id = 2;
          optional string email = 3;
        }

NOTE: Currently (in 2.0.0 and 2.1.0 version) you have option to define that format, but only in the message's payload.

How will this work with references? Very simply, imagine two files - the main one and the one with models - e.g with protobuf

# models.yaml
components:
  schemas:
    protobuf:
      schemaFormat: ...protobuf
      schema: |
        message Person {
          required string name = 1;
          required int32 id = 2;
          optional string email = 3;
        } 
      

# asyncapi.yaml
asyncapi: 3.0.0
info: ...

channels:
  avroExample:
    publish:
      message:
        payload:
          $ref: '#components/schemas/avroSchema'
  protobufExample:
    publish:
      message:
        payload:
          $ref: './models.yaml#/components/schemas/protobuf'
  protobufExampleWithSchemaFormat:
    publish:
      message:
        payload:
          schemaFormat: ...protobuf
          schema:
            $ref: './models.yaml#/components/schemas/protobuf/schema'

components:
  schemas:
    avroSchema:
      schemaFormat: 'application/vnd.apache.avro;version=1.9.0'
      schema:
        type: record
        doc: User information
        fields:
          - name: displayName
            type: string

Additionally, the use of custom formats in binding and extension will also be possible (used example from the asyncapi/avro-schema-parser#67 issue):

asyncapi: '3.0.0'
info: ... 

channels:
  example:
    subscribe:
      summary: Get updated products
      message:
        $ref: "#/components/messages/exampleMessage"

components:
  messages:
    exampleMessage:
      payload:
        schemaFormat: 'application/vnd.apache.avro;version=1.9.0'
        schema:
          $ref: "SampleRecord.avsc" 
      bindings:
        kafka:
          key: 
            schemaFormat: 'application/vnd.apache.avro;version=1.9.0'
            schema: 
              $ref: "SampleRecord.avsc" 

x-extension:
  needSchema:
    schemaFormat: 'application/vnd.apache.avro;version=1.9.0'
    schema: 
      $ref: "SampleRecord.avsc" 

# SampleRecord.avsc
{
  "namespace": "sample",
  "type": "record",
  "name": "SampleRecord",
  "version": 1,
  "fields": [
    {"name": "NoUnionField", "type": { "type": "string",  "avro.java.string": "String" }, "doc": "Any doc"},
    {"name": "UnionField", "type": ["null", { "type": "string",  "avro.java.string": "String" }], "doc": "any doc"}
  ]
}

Parser implementation

Are we facing a breaking change in the parser and the corresponding Schema model? And this is surprising because no. Why? Every function in a Schema instance, first will check if we are dealing with a schema with schema and schemaFormat fields or not and do the appropriate fallback. I know this is an implementation detail, but there is a simple example:

class Schema extends Base {
  ...

  /**
   * @returns {number}
   */
  maximum() {
    return this.retrieveValue('maximum');
  }

  /**
   * @param {string} key - Name of field.
   */
  retrieveValue(key) {
    // we have case with `schema` and `schemaFormat` field
    if (this._json.schemaFormat && this._json.schema) {
      return this._json.schema[key];
    }
    return this._json[key];
  }
}

Obviously this will need to be optimized :)

Notes

  • The schemaFormat field at message level will be deprecated.
  • Added to the spec new field in the root - defaultSchemaFormat.
  • It may happen that one of the custom formats treats schemaFormat and schema as keywords - then we can use $schemaFormat, but I don't think it's necessary.
  • The parser should look for all the places where schemas are used in this second option (with schemaFormat) and transform them.
  • The proposal doesn't solve the problem of making references for deep values in the custom schema format - issue [2.0.0 REVIEW] Clarify usage of JSON References (/Pointers) for non-JSON data structures #216 - please see below update

Update

The above proposal was not intended to support References/Pointer for non-JSON data structures but after reading Jesse's comment (thanks @jessemenning), I realized that my proposal could also give the ability to support references (aka pointers) to objects in non-JSON/YAML schemas. In addition to the schema and schemaFormat fields, we should also give the option of defining a schemaRef field, which exactly means reference, but for the schema defined in schema. This is a crucial difference from $ref in JSON/YAML. To illustrate the solution I will use an examples.

Avro example

Please forgive me, I am not an expert (or even I can say that I do not know it at all) in Avro and based on the information I found on the Internet, references to nested objects are provided by dot char with using namespaces.

// avro schema - avro.avsc file
{
 "type": "record",
 "name": "Pet",
 "namespace": "com.intergral.example.avro",
 "fields": [
   {
     "name": "name",
     "type": "string"
   },
   {
     "name": "toys",
     "type": {
       "type": "array",
       "items": {
         "type": "record",
         "name": "Toy",
         "namespace": "com.intergral.example.avro",
         "fields": [
           {
             "name": "name",
             "type": "string"
           },
           {
             "name": "price",
             "type": "long"
           }
         ]
       },
       "java-class": "java.util.List"
     }
   }
 ]
}

To retrieving Toy record using JSON references we should do something like this:

# asyncapi.yaml
asyncapi: 3.0.0
info: ...

channels:
  avroExample:
    publish:
      message:
        schemaFormat: 'application/vnd.apache.avro;version=1.9.0'
        payload:
          $ref: './avro.avsc#/fields/1/type/items'

when using new schemaRef field we will make this:

# asyncapi.yaml
asyncapi: 3.0.0
info: ...

channels:
  avroExample:
    publish:
      message:
        payload:
          schemaFormat: 'application/vnd.apache.avro;version=1.9.0'
          schema: 
            $ref: "./avro.avsc" 
          schemaRef: "com.intergral.example.avro.Toy"

Protobuf example

package my.org;

enum MyEnum {
  UNKNOWN = 0;
  STARTED = 1;
  RUNNING = 2;
}

message Outer {
  message Inner {
      int test = 1;
  }
  MyEnum enum_field = 9;
}

with using new schemaRef field:

# asyncapi.yaml
asyncapi: 3.0.0
info: ...

channels:
  protoExample:
    publish:
      message:
        payload:
          schemaFormat: ...protobuf
          schema: 
            $ref: "./example.proto" 
          schemaRef: "Outer.Inner" # resolve 

Remarks:

  • $ref works as it did for JSON/YAML schemas

  • Often people don't know, but $ref can have reference to files that are not JSON/YAML. Then it treats these values (files) as a regular string (which is also a JSON derivative).

  • the responsibility of resolving references/pointer is moved to the custom parsers (Please note that currently we can define schemas other than JOSN only in messages, but in my proposal we can define them everywhere) - custom parser after parsing the schema should also extract the given reference pointed by schemaRef field (follows with syntax for given schema format) and "inject" it to the schema object, so (based on avro example) the resolved schema will be:

    asyncapi: 3.0.0
    info: ...
    
    channels:
      avroExample:
        publish:
          message:
            payload:
              type: object
              properties: 
                name: 
                  type: string
                price:
                  type: string
                  format: long

Any feedback are more than welcome :)

@derberg
Copy link
Member

derberg commented Sep 13, 2021

I like it. It is really a pain that you can only use AsyncAPI schema in components. Real pain for people using Avro and other formats.

This solution also allows us to define schemas that have a string format (GraphQL types or Protobuf)

I didn't really get this, how is this proposal solving this thing? It could be done with the current document structure too, right? Not sure if this is not adding unneeded complexity to the proposal, or I'm not getting something

Are we facing a breaking change in the parser and the corresponding Schema model

just keep in mind this is still a spec-breaking change. Cool that on the parser side we won't complicate things

@magicmatatjahu
Copy link
Member Author

magicmatatjahu commented Sep 13, 2021

@derberg Thanks for comment!

I didn't really get this, how is this proposal solving this thing? It could be done with the current document structure too, right? Not sure if this is not adding unneeded complexity to the proposal, or I'm not getting something

At the moment you can only define a custom format in the massage's payload. Instead of the schema and schemaFormat fields, we could define the schemaFormat field at the schema level itself, like this:

components:
  schemas:
  
    schemaWithFormat:
      schemaFormat: 'application/vnd.aai.asyncapi;version=2.1.0'
      type: object
      required:
      - name
      properties:
        name:
          type: string

So it can only works with schema that can be written with JSON/YAML. The problem is with "string" (with own SDL) formats like GraphQL or Protobuf. You cannot make this (because you concatenate JSON/YAML field with string):

components:
  schemas:
  
    schemaWithFormat: |
      schemaFormat: ...protobuf
      message Person {
          required string name = 1;
          required int32 id = 2;
          optional string email = 3;
        } 

I originally described this problem here -> #528 (comment)

So as you can see, adding possibility to define schema with two ways, with and without schema and schemaFormat fields, solve this problem.

just keep in mind this is still a spec-breaking change. Cool that on the parser side we won't complicate things

Yeah, it's a spec-breaking change but in

Are we facing a breaking change in the parser and the corresponding Schema model

sentence I had in mind breaking-change on the parser side.

@derberg
Copy link
Member

derberg commented Sep 14, 2021

Sorry but I still don't get how enabling support for GraphQL and Protobuf is better:

payload:
      schemaFormat: ...protobuf
      schema: |
        message Person {
          required string name = 1;
          required int32 id = 2;
          optional string email = 3;
        } 

vs

schemaFormat: ...protobuf
payload:
      schema: |
        message Person {
          required string name = 1;
          required int32 id = 2;
          optional string email = 3;
        } 

just want to make sure it relates really to this proposal as the only way or it is actually doable already, and just a side effect, for clarity of the proposal

@magicmatatjahu
Copy link
Member Author

Maybe I misspoke in my previous comments, but what I meant (in the GraphQL and Protobuf support) was that without separating the schema into schema and schemaFormat fields it is not (and would not be) possible to use the mentioned schemas, I also gave an example:

components:
  schemas:
  
    schemaWithFormat: |
      schemaFormat: ...protobuf # how to support it?
      message Person {
          required string name = 1;
          required int32 id = 2;
          optional string email = 3;
        } 

In addition, for the current version we have the ability to use Protobuf and GraphQL, but only for the message's payload. My proposal allows you to use schemas where you want, of course in easier way.

I will update relevant line about supporting string schemas to avoid misunderstandings.

@derberg
Copy link
Member

derberg commented Sep 14, 2021

@magicmatatjahu now I got it, thanks for your patience. You were referring only explicitly again to schemas from components. This is why I was confused, cause we could already do it for schemas in message 😄 Sorry for the confusion, and missing it from the examples. Please just update description, add a kind of note where you mention graphql and protobuf that atm this would be possible but only for schemas (payload) under the message but not under components directly.

Great proposal! I think this approach could also be suggested in question asked by Fran, on how to use kind and $ref in the server (haven't check the proposal yet, only saw a dedicated message in slack), just have servers["myserver"].server.$ref and servers["myserver"].kind -> but yeah, completely different topic, brain dump from my side, let us not pollute this proposal with it :D

@magicmatatjahu
Copy link
Member Author

@derberg No worries :) You had patience for me at previous job, I have patience for you now :trollface: I updated description for Protobuf example.

Great proposal! I think this approach could also be suggested in question asked by Fran, on how to use kind and $ref in the server (haven't check the proposal yet, only saw a dedicated message in slack), just have servers["myserver"].server.$ref and servers["myserver"].kind -> but yeah, completely different topic, brain dump from my side, let us not pollute this proposal with it :D

Yeah, it's one of the solution, but I also gave another (as comment) to solve that problem in different way, probably easier :)

@magicmatatjahu
Copy link
Member Author

@derberg I extended proposal to use references to nested non-JSON schema objects - please see Update section. You should be interested in it :)

@smoya
Copy link
Member

smoya commented Oct 12, 2021

@magicmatatjahu one question:

In the example you wrote about retrieving a schema from avro using schemaRef, the value you wrote is:

schemaRef: "com.intergral.example.avro.Toy"

Wouldn't be:

schemaRef: "com.intergral.example.avro.toys.Toy"

Note the toys level before Toy.

If it isnt, would you mind clarifying to me, please.

Thank you!

@magicmatatjahu
Copy link
Member Author

magicmatatjahu commented Oct 12, 2021

@smoya Thanks for your comment! I'll be honest, I don't know avro at all (I even gave a comment about it in proposal), but I found an example I used and it's there - https://www.nerd.vision/post/reusing-schema-definitions-in-avro Maybe your example is actually correct 🤷 We would need someone who knows avro :)

Whether my example or yours is correct, doesn't matter, it seems to me that the idea with schemaRef was understood :) The reference itself should have the same syntax that the format allows.

@smoya
Copy link
Member

smoya commented Oct 12, 2021

@smoya Thanks for your comment! I'll be honest, I don't know avro at all (I even gave a comment about it in proposal), but I found an example I used and it's there - https://www.nerd.vision/post/reusing-schema-definitions-in-avro Maybe your example is actually correct 🤷 We would need someone who knows avro :)

Whether my example or yours is correct, doesn't matter, it seems to me that the idea with schemaRef was understood :) The reference itself should have the same syntax that the format allows.

Yeah the idea is totally understood, but since I have no idea about avro neither I wanted to understand how the schemaRef field would work and what kind of ref path we will find in there.

@jessemenning
Copy link

@magicmatatjahu , for clarity, could you walk through how this proposal would address the 4 XSD use cases in my comment in #624

@magicmatatjahu
Copy link
Member Author

@jessemenning Thanks for comment!

  1. I want to have entire .xsd imported into AsyncAPI, as a string:
components:
  messages:
    UserSignedUp:
      payload:
        schemaParse: false
        schema:
          $ref: ./some_xsd.xsd
      contentType: "application/xml"

Here we can add a schemaParse field that indicates that the parser should not parse the schema. just import it as a string without any additional operations.

  1. I just want a pointer from AsyncAPI to a schema registry/file, not bringing in the whole thing (maybe because it's huge) components:

Here we can use extension - as we talked about on the slack - or use the $remoteRef field, which would be a new field:

components:
  messages:
    UserSignedUp:
      payload:
        x-remote-ref: "https://example.com/myschema.xsd"
      contentType: "application/xml"
components:
  messages:
    UserSignedUp:
      payload:
        $remoteRef: "https://example.com/myschema.xsd"
      contentType: "application/xml"

I remember I wrote you on slack that using x-remote-ref (or x-payload-remote-ref) in a schema would be a bad idea, but now I don't see such problems, because in JSON Schema we can add additional keywords that don't affect validation. However, if there will be problems - e.g. with parsing things - we could use a construction like this:

components:
  messages:
    UserSignedUp:
      payload:
        schemaParse: false
        schema:
          $remoteRef: "https://example.com/myschema.xsd"
          # or
          x-remote-ref: "https://example.com/myschema.xsd"
      contentType: "application/xml"

I hope you understand :)

  1. Provide a pointer to a particular element:
components:
  messages:
    UserSignedUp:
      payload:
        schemaParse: false
        schemaRef: "/PurchaseOrder"
        schema:
          $remoteRef: "https://example.com/myschema.xsd"
          # or
          x-remote-ref: "https://example.com/myschema.xsd"
      contentType: "application/xml"
  1. Pointer to a particular element if importing the schema:
components:
  messages:
    UserSignedUp:
      payload:
        schemaRef: "/PurchaseOrder"
        schema:
          $ref: ./some_xsd.xsd
      contentType: "application/xml"

I remember I proposed $pointer in our slack conversation but I finally decided to use schemaRef field.

If you have questions, please ask! :)

@zmt-Eason
Copy link

I think it's a great proposal, and it also considered various scenarios. I am looking forward to see this proposal be adopted in next version of asyncapi.

@GeraldLoeffler
Copy link
Contributor

GeraldLoeffler commented Nov 22, 2021

this proposal would also allow the definition of mixed schemas, such as the one for the payload of the UserSignedUp message in the following example (which currently is not possible):

components:
  schemas:
    UserSchema:
      $ref: user.raml
    TimestampSchema:
      type: object
      properties:
        timestamp:
          type: string
          format: date-time
  messages:
    SignupUser:
      contentType: application/json
      schemaFormat: application/raml+yaml;version=1.0
      payload:
        $ref: "#/components/schemas/UserSchema"
    UserSignedUp:
      contentType: application/json
      schemaFormat: ??? no unique value possible here ???
      payload:
        allOf:
          - $ref: "#/components/schemas/TimestampSchema"
          - $ref: "#/components/schemas/UserSchema"

@jonaslagoni
Copy link
Sponsor Member

jonaslagoni commented Nov 22, 2021

Based on this discussion, I am a bit "scared" that tooling will have a hard time implementing an expected behavior, created a separate issue as it is only partly related to this: #656

@magicmatatjahu
Copy link
Member Author

magicmatatjahu commented Nov 22, 2021

@GeraldLoeffler By my proposal we should be able to do:

components:
  schemas:
    UserSchema:
      schemaFormat: application/raml+yaml;version=1.0
      schema:
        $ref: 'user.raml'
    TimestampSchema:
      type: object
      properties:
        timestamp:
          type: string
          format: date-time
  messages:
    SignupUser:
      contentType: application/json
      payload:
        $ref: "#/components/schemas/UserSchema"
    UserSignedUp:
      contentType: application/json
      payload:
        allOf:
          - $ref: "#/components/schemas/TimestampSchema"
          - $ref: "#/components/schemas/UserSchema"

Thanks for great example, because while writing this proposal I didn't look at nested custom schemas at all, but you can see it would be possible to support it 😅

@magicmatatjahu
Copy link
Member Author

Sorry guys. Now I don't have a time for champion it. Feel free to pick it up.

@GreenRover
Copy link
Collaborator

As i saied: I want to be the champion (for #622, #216, #881) . and already try to push it, as hard as i can.
I created a pr: #910 for this issue.

@GreenRover
Copy link
Collaborator

I was thinking twice about "AsyncApi Schema object"
If i remove this again that would allow:

  • having avro schema for header
  • having avro schema for parameters

And the complexity will grow:

messageId: userSignup
name: UserSignup
title: User signup
summary: Action to sign a user up.
description: A longer description
contentType: application/json
tags:
  - name: user
  - name: signup
  - name: register
headers:
  schemaFormat: application/vnd.aai.asyncapi;version=3.0.0
  schema:
    type: object
    properties:
      correlationId:
  	    description: Correlation ID set by application
  	    type: string
      applicationInstanceId:
  	    description: Unique identifier for a given instance of the publishing application
  	    type: string
payload:
  schemaFormat: application/vnd.aai.asyncapi;version=2.2.0
  schema:
    type: object
    properties:
      schema:
        user:
          $ref: "#/components/asyncApiSchemas/userCreate"
        signup:
          $ref: "#/components/asyncApiSchemas/signup"

Do we really want that?

@derberg
Copy link
Member

derberg commented Mar 16, 2023

Is it a problem with headers? is there no use case for using avro?

In the case of parameters, we discussed that on one call and conclusion what that it is not a problem because we anyway need to change parameters and not allow JSON Schema there but we should go server variables direction

@GreenRover
Copy link
Collaborator

I see no use case for avro based header. Headers are always (i know none products where it would be different) serialized by the messaging solution. Having support for avro schema is that the message payload is avro serialized and you dont want a schema mapping.

I would vote for having custom schema only for payload.

  • i see no usecase for exanple avro ind header oder parameters
  • will keep schemas simpler (one level less) and easier to read

@derberg
Copy link
Member

derberg commented Mar 30, 2023

I see no use case for avro based header. Headers are always (i know none products where it would be different) serialized by the messaging solution.

Hard for me to clarify really as before only AsyncAPI Schema was used anyway

@dalelane can you have a look, Avro schema for headers?

@fmvilas on last 3.0 meeting https://youtu.be/O4TQWBEaXy0?t=92 mentioned he knows people doing it


last resort, we can still reuse new Schema object for payload and headers, but in headers we would specify that only AsyncAPI schema is allowed, and then we can expand with 3.1, 3.2, 3.3 etc

@smoya
Copy link
Member

smoya commented Mar 31, 2023

I created a Draft PR with the initial changes on the JSON Schema files asyncapi/spec-json-schemas#370

cc @GreenRover

@smoya
Copy link
Member

smoya commented Mar 31, 2023

Changes in Parser-JS and Parser-API will be also added

@GreenRover
Copy link
Collaborator

Recap of discussion from Slack:
@fmvilas:

  • Kafka has for header Map<String, byte[]> The byte array section can also be a Avro schema.
  • We should add a paragraph in the spec explaining mixing schemas should not be supported by implementations.

@smoya
Copy link
Member

smoya commented Jun 22, 2023

@GreenRover @derberg I can't find any other further discussion about allowing other schema formats into the headers finally. Did we drop that finally?

@derberg
Copy link
Member

derberg commented Jun 27, 2023

tbh I do not remember, but definitely it got in -> https://github.com/asyncapi/spec/pull/910/files so new multi format is in the headers too

@smoya
Copy link
Member

smoya commented Jun 27, 2023

tbh I do not remember, but definitely it got in -> https://github.com/asyncapi/spec/pull/910/files so new multi format is in the headers too

Ouch, for some reason I skipped when reading the messageObject spec doc 😮‍💨 . Thanks for pointing that out!

@smoya
Copy link
Member

smoya commented Jun 27, 2023

I created a new issue to discuss about the headers shape now that we allow multi format schemas #948

cc @GreenRover @derberg

@fmvilas
Copy link
Member

fmvilas commented Jul 14, 2023

This one is only missing the JS parser implementation. Anyone volunteering?

Copy link

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@github-actions github-actions bot added the stale label Nov 16, 2023
@smoya smoya closed this as completed Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale 💭 Strawman (RFC 0) RFC Stage 0 (See CONTRIBUTING.md)
Projects
None yet
Development

No branches or pull requests

10 participants