Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Unparsable Remote References (mainly to support XML.) #624

Closed
damaru-inc opened this issue Sep 10, 2021 · 21 comments
Closed

Proposal for Unparsable Remote References (mainly to support XML.) #624

damaru-inc opened this issue Sep 10, 2021 · 21 comments
Labels
stale 💭 Strawman (RFC 0) RFC Stage 0 (See CONTRIBUTING.md)

Comments

@damaru-inc
Copy link
Contributor

damaru-inc commented Sep 10, 2021

Solace has customers who love AsyncAPI but use XML payloads in their messaging systems. Also, the question of whether things like XSD schemas are supported has come up more than once in the Slack channels.

We would like to propose the notion of an Unparsable Remote Reference. These would be, at minimum, URLs represented by simple strings. By Unparsable we mean that in general, AsyncAPI parsers would not be expected to retrieve and/or parse the entities pointed to by these references. Code generators, on the other hand, could use these references.

The use case we are trying to solve immediately is how to provide a URL to an XSD schema, so that a code generator could created a model class from the schema and use it with XML libraries for serializing messages.

One simple way to do this is one that requires no change to the specification nor to the parser. A message could look like this:

messages:
    myXmlMessage:
      payload:
        remoteReference: "https://example.com/myschema.xsd"
      contentType: "application/xml"

(This fragment works fine with the current parser, you can try it in the playground.)

When this message is passed back from the parser, the payload contains an anonymous schema containing the field remoteReference with its value.

An improvement would be to create a parser plugin (similar to the avro parser. That would allow us to also specify the schemaFormat (currently the parser will fail if you try to set the schemaFormat to application/xml - that won't work now because there is no schema parser defined for that format.)

Yet another improvement would be to allow an object representing a schema registry, in cases where it would be desirable to add more fields besides just a URL.

This mechanism would also allow users to use Avro files in their original form (the current avro parser translates to JSON schema), and it could also be used to support protobuf or any other kind of schema.

The name remoteReference was intended to be general enough to be applied to other use cases, not just non-JSON schemas.

Ideally it would be nice to have a standard, documented way to do this.

@damaru-inc damaru-inc added the 💭 Strawman (RFC 0) RFC Stage 0 (See CONTRIBUTING.md) label Sep 10, 2021
@fmvilas fmvilas added this to the 3.0.0 Release milestone Sep 15, 2021
@jessemenning
Copy link

jessemenning commented Sep 16, 2021

I think this proposal addresses, in a generic way, an issue that will likely continue to re-emerge.

OpenAPI naturally uses JSON due to its focus on synchronous RESTful interactions. The async world is much more diverse in both protocols and data formats. Data formats include things like Avro, Protobuf, XML, EDI, and the inevitable "cool new format" that will emerge next year. It feels like we need an extensible way to accommodate that diversity without needing to explicitly include it in the spec.

Longer term, we may need to have something like a "format binding" (a parallel to protocol bindings.) The format binding would that provides format specific fields. For example, an XML format binding could include a namespace. But that seems like a lengthy, heavy lift, and @damaru-inc proposal seems like a good first step that will get early adopters off the ground.

Also, here is an example of what a protobuf implementation would look like:

messages:
  myProtobufMessage:
    remoteReference: "https://example.com/myschema.proto"
  contentType: "application/x-protobuf"

@magicmatatjahu
Copy link
Member

magicmatatjahu commented Sep 16, 2021

Interesting concept. Maybe it would be enough to add a flag which would specify whether a given schema should be parsed/formatted or not. We could extend my proposal to define schemas in other formats in different places (currently it is possible only in message's payload) - #622 and add to Schema Object a parse field:

messages:
    myXmlMessage:
      payload:
        parse: false
        schema:
          $ref: "https://example.com/myschema.xsd"
      contentType: "application/xml"

which would mean that it should not be parsed and transformed to JSON, but only resolved/fetched.

We can also use the remote: true field.

remoteReference isn't good solution for pointing to the remote source which should be fetched, because then link to this source must be handled by tool to fetch that, and the $ref is standarized to this purpose. We shouldn't reinvent wheel from scratch.

@magicmatatjahu
Copy link
Member

@jmenning-solace Could you create issue about (as you described it) format binding? It's very interesting idea and we should not forget it :)

@assets-cg
Copy link

Currently we have many objects defined in XSD. These historical objects can be repurposed as event attributes. This unlocks tremendous value(financial, technical) if re-use can be accomplished. Looking forward for this discussion and feature availability.

@jessemenning
Copy link

remoteReference isn't good solution for pointing to the remote source which should be fetched, because then link to this source must be handled by tool to fetch that, and the $ref is standarized to this purpose. We shouldn't reinvent wheel from scratch.

@magicmatatjahu, my concern is this would create valid AsyncAPI documents that are invalid JSON documents. $ref is defined within JSON schema as a reference to another JSON schema, not an adhoc schema type. Standard JSON parsers are coded to the spec. When standardized JSON/JSON schema parsers encounter the non-JSON schema (protobuf, XML, COBOL copybook, etc.) they will either ungracefully fail or return unpredictable results.

So where do we go from there? I see two options (would love to hear others):

  • Custom code the AsyncAPI specific parser. But the divergence between OpenAPI and JSON Schema was so painful it took years to reconcile them. And if there are standard JSON parsers baked into products like databases, we will never be able to apply workarounds to them.
  • Use something like remoteReference This keeps AsyncAPI documents as valid JSON documents. Standard parsers will not throw up when they encounter them, and just treat the URL as a string. While not ideal, this seems like a reasonable fall back. But when used in an "AsyncAPI aware" parser, it can import the full body of the external non-JSON schema into the main AsyncAPI document as a string that can be parsed by code generators, etc..

My preference would be the second.

We could extend my proposal to define schemas in other formats in different places (currently it is possible only in message's payload)

This is an interesting concept, and maybe that's the appropriate place for the payload binding. Let me think more on that. It seems extendable to these use cases, with the caution that Avro is an easier format to deal with because it's JSON.

@jessemenning
Copy link

Currently we have many objects defined in XSD. These historical objects can be repurposed as event attributes. This unlocks tremendous value(financial, technical) if re-use can be accomplished. Looking forward for this discussion and feature availability.

Thanks for chiming in @masterhead , it's nice to get an end user perspective. Can I ask you a couple questions about your use case?

  • Do you need the full text of the XML schema imported into the AsyncAPI spec, or is simply having a URL pointer to it sufficient?
  • Do your XML documents typically have a single root element? Or do they have multiple root elements?

@magicmatatjahu
Copy link
Member

magicmatatjahu commented Sep 18, 2021

@jmenning-solace

my concern is this would create valid AsyncAPI documents that are invalid JSON documents. $ref is defined within JSON schema as a reference to another JSON schema, not an adhoc schema type. Standard JSON parsers are coded to the spec. When standardized JSON/JSON schema parsers encounter the non-JSON schema (protobuf, XML, COBOL copybook, etc.) they will either ungracefully fail or return unpredictable results.

Do you know, that $ref doesn't have to point to a valid JSON schema? Also string is also valid JSON schema, not in the sense of validation, but of value (JSON spec treats string value as normal JSON instance 😄 ), so for example:

// some_xsd.xsd
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
           xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd"
           targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd"
           elementFormDefault="qualified">
 <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/>
 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/>
   <xsd:element name="BillTo" type="tns:USAddress"/>
  </xsd:sequence>
  <xsd:attribute name="OrderDate" type="xsd:date"/>
 </xsd:complexType>

 <xsd:complexType name="USAddress">
  <xsd:sequence>
   <xsd:element name="name"   type="xsd:string"/>
   <xsd:element name="street" type="xsd:string"/>
   <xsd:element name="city"   type="xsd:string"/>
   <xsd:element name="state"  type="xsd:string"/>
   <xsd:element name="zip"    type="xsd:integer"/>
  </xsd:sequence>
  <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
 </xsd:complexType>
</xsd:schema>

# asyncapi.yaml
asyncapi: '2.1.0'
info:
  title: Account Service
  version: 1.0.0
  description: This service is in charge of processing user signups

channels:
  user/signedup:
    subscribe:
      message:
        $ref: '#/components/messages/UserSignedUp'

components:
  messages:
    UserSignedUp:
      payload:
        type: object
        properties:
          displayName:
            type: string
            description: Name of the user
          email:
            type: string
            format: email
            description: Email of the user
        someCustomProp:
          $ref: ./some_xsd.xsd

and then after dereferencing I have:

{
  "asyncapi": "2.1.0",
  "info": {
    "title": "Account Service",
    "version": "1.0.0",
    "description": "This service is in charge of processing user signups"
  },
  ...
  "components": {
    "messages": {
      "UserSignedUp": {
        "payload": {
          "type": "object",
          "properties": {
            "displayName": {
              "type": "string",
              "description": "Name of the user",
              "x-parser-schema-id": "<anonymous-schema-2>"
            },
            "email": {
              "type": "string",
              "format": "email",
              "description": "Email of the user",
              "x-parser-schema-id": "<anonymous-schema-3>"
            }
          },
          "someCustomProp": "<xsd:schema xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:tns=\"http://tempuri.org/PurchaseOrderSchema.xsd\" targetNamespace=\"http://tempuri.org/PurchaseOrderSchema.xsd\" elementFormDefault=\"qualified\"> <xsd:element name=\"PurchaseOrder\" type=\"tns:PurchaseOrderType\"/> <xsd:complexType name=\"PurchaseOrderType\"> <xsd:sequence> <xsd:element name=\"ShipTo\" type=\"tns:USAddress\" maxOccurs=\"2\"/> <xsd:element name=\"BillTo\" type=\"tns:USAddress\"/> </xsd:sequence> <xsd:attribute name=\"OrderDate\" type=\"xsd:date\"/> </xsd:complexType>\n<xsd:complexType name=\"USAddress\"> <xsd:sequence> <xsd:element name=\"name\"   type=\"xsd:string\"/> <xsd:element name=\"street\" type=\"xsd:string\"/> <xsd:element name=\"city\"   type=\"xsd:string\"/> <xsd:element name=\"state\"  type=\"xsd:string\"/> <xsd:element name=\"zip\"    type=\"xsd:integer\"/> </xsd:sequence> <xsd:attribute name=\"country\" type=\"xsd:NMTOKEN\" fixed=\"US\"/> </xsd:complexType> </xsd:schema>",
          "x-parser-schema-id": "<anonymous-schema-1>"
        },
        ...
      }
    }
  },
  "x-parser-spec-parsed": true
}

So the dereferencer even fetches something from the web it still treats it as a string value, and only if it's valid JSON, i.e. a value starting with { then it treats it as JSON, otherwise it parses it as a string. This is how it works in JS (and also in our ParserJS), I don't know about other languages, but it should be similar because JSON is standarized for a long time.

If you are talking about this case with making references that should not be fetched, there is now a possibility to use e.g. an extension for this case:

messages:
    myXmlMessage:
       $ref: "https://example.com/myschema.xsd" # it will be fetched and treated as string value
      contentType: "application/xml"
      x-remote-ref: "https://example.com/myschema.xsd" # point to this reference that can be used in generators

Another possibility is to add each $ref before resolving to the schema/(part of document) as x-parser-original-ref and then you have the value of the reference and the link to it.

@assets-cg
Copy link

assets-cg commented Sep 21, 2021

Currently we have many objects defined in XSD. These historical objects can be repurposed as event attributes. This unlocks tremendous value(financial, technical) if re-use can be accomplished. Looking forward for this discussion and feature availability.

Thanks for chiming in @masterhead , it's nice to get an end user perspective. Can I ask you a couple questions about your use case?

  • Do you need the full text of the XML schema imported into the AsyncAPI spec, or is simply having a URL pointer to it sufficient?

We would not want full text of XML schema in the AsyncAPI file. More on the lines of remote pointer, where our tools can parse to provide how a sample payload look for user understanding. If it is remotely hosted on URL https://domain/ssss/schema_def it works good. But some times the URL may be relative path to AsyncAPI file too ) eg
Root Folder
AsyncApi File
SomeSchemaDef.xsd
refence pointer to be "./SomeSchemaDef.xsd" or on similar lines

  • Do your XML documents typically have a single root element? Or do they have multiple root elements?

For historical reasons, currently majority of objects are definitions are already in the WSDL which have to liberated and defined as events. It would be ideal if spec can support XSD for any combination of multiple objects. (Could re-use existing infrastructure of already defined XSD as-is instead of manually editing and creating xsd of each event object)

  1. Independent objects : Obj-1 has no relation ship with Obj-2

  2. Dependent objects : Obj-2 , Obj-3 has parent of Obj-1 (acyclic graphs/trees)

@damaru-inc
Copy link
Contributor Author

Do you know, that $ref doesn't have to point to a valid JSON schema? Also string is also valid JSON schema, not in the sense of validation, but of value (JSON spec treats string value as normal JSON instance 😄 ), so for example:

I tried your example, and it does work. My concern here isn't so much with your proposal, but with the fact that at least our parser-js treats $ref differently depending on whether it's under message/payload or above it.

And what should we call 'someCustomProp?' At least with remoteReference, we can put that somewhere like message/payload and then its purpose becomes clear, and that also gives us a standard name for a property that we can use elsewhere.

@magicmatatjahu
Copy link
Member

@damaru-inc

I tried your example, and it does work. My concern here isn't so much with your proposal, but with the fact that at least our parser-js treats $ref differently depending on whether it's under message/payload or above it.

Most probably you mean the situation when you make a reference to the schema, e.g. xml, but then get an error from the parser that it can't parse that? Here I have to tell you that our parser doesn't treat $ref differently. The $ref is used only to de-reference the given reference and replace it with a value. Only after that comes the validation phase and parsing against the schema format, so our parser doesn't change/adjust the logic of $ref.

And what should we call 'someCustomProp?' At least with remoteReference, we can put that somewhere like message/payload and then its purpose becomes clear, and that also gives us a standard name for a property that we can use elsewhere.

If one only needs a reference and not a value from a reference then remoteRef is ok, but I would prefer to be able to reuse in this use case the $ref so that later after parsing you still have the reference.

@damaru-inc
Copy link
Contributor Author

damaru-inc commented Sep 22, 2021

Most probably you mean the situation when you make a reference to the schema, e.g. xml, but then get an error from the parser that it can't parse that? Here I have to tell you that our parser doesn't treat $ref differently. The $ref is used only to de-reference the given reference and replace it with a value. Only after that comes the validation phase and parsing against the schema format, so our parser doesn't change/adjust the logic of $ref.
But I guess again (because I haven't yet studied the source code: clearly I should) : our parser-js doesn't throw errors every time it sees a $ref probably because there are only a few situations where it needs to interpret the reference as a JSON Schema. If it's in a place in the parse tree where it doesn't care, it just returns the string (contents of the file), right?

If one only needs a reference and not a value from a reference then remoteRef is ok, but I would prefer to be able to reuse in this use case the $ref so that later after parsing you still have the reference.

I agree that we need a way to keep the reference. The current parser always attaches its own internal schema-id to anything that parses as a schema, e.g.

 'x-parser-schema-id': '<anonymous-schema-1>'

so that would be the logical place to put whatever kind of schema id we want.

cheers
Michael

@TamimiGitHub
Copy link

TamimiGitHub commented Sep 29, 2021

Hey @magicmatatjahu, I had a simple NodeJS application using the json-schema-ref-parser used in the asyncapi parser.js to test out cml parsing. The parser threw an error when it encountered an xml schema file. Since the $ref uses json-schema-ref-parser as the $RefParser wouldnt it be problematic to assume that it'll handle it any non JSON format as a string? This is what I did for a quick local test

const $RefParser = require("@apidevtools/json-schema-ref-parser");

const myJSON = "./myJSON.json";

const myXML = "./sampleXML.xsd";

$RefParser.dereference(myJSON, (err, schema) => {
  if (err) {
    console.error(err);
  } else {
    console.log(schema);
  }
});

$RefParser.dereference(myXML, (err, schema) => {
  if (err) {
    console.error(err);
  } else {
    console.log(schema);
  }
});

{
  stack: 'SyntaxError: "/Users/taltamimi/hacks/json-schema-parser-test/sampleXML.xsd" is not a valid JSON Schema\n' +
    '    at $RefParser.parse (/Users/taltamimi/hacks/json-schema-parser-test/node_modules/@apidevtools/json-schema-ref-parser/lib/index.js:131:17)\n' +
    '    at async $RefParser.resolve (/Users/taltamimi/hacks/json-schema-parser-test/node_modules/@apidevtools/json-schema-ref-parser/lib/index.js:184:5)\n' +
    '    at async $RefParser.dereference (/Users/taltamimi/hacks/json-schema-parser-test/node_modules/@apidevtools/json-schema-ref-parser/lib/index.js:268:5)',
  message: '"/Users/taltamimi/hacks/json-schema-parser-test/sampleXML.xsd" is not a valid JSON Schema',
  toJSON: [Function: toJSON],
  name: 'SyntaxError',
  toString: [Function: toString]
}

sampleXML: https://gist.githubusercontent.com/TamimiGitHub/4bfcd8e553b83c86a0f7fb65a9a23726/raw/ce9b574437627a4c56a1c7924f32fed6d28db85e/sampleXML.xsd

What are your thoughts on this?

@magicmatatjahu
Copy link
Member

magicmatatjahu commented Sep 29, 2021

@TamimiGitHub Hi! You have error, because you try pass the non JSON as argument to the $RefParser.dereference function what isn't supported. The root object for dereference must be JSON. What I meant in my comment about reference to non JSON schemas, that it works when you have referenced e.g. xsd schema as reference (by $ref keyword) in JSON. I used your xsd schema in this JSON:

{
  "test": "test",
  "reference": {
    "$ref": "./sampleXML.xsd"
  }
}

and then after dereferencing I have:

{
  "test": "test",
  "reference": "<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd" targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd" elementFormDefault="qualified"> <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/> <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/> <xsd:element name="BillTo" type="tns:USAddress"/> </xsd:sequence> <xsd:attribute name="OrderDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name"   type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city"   type="xsd:string"/> <xsd:element name="state"  type="xsd:string"/> <xsd:element name="zip"    type="xsd:integer"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/> </xsd:complexType> </xsd:schema>"
}

@jessemenning
Copy link

Many thanks to @magicmatatjahu , @TamimiGitHub , @masterhead and @damaru-inc for helping me understand the implications here. I've learned a lot thanks to you all.

In an attempt to summarize, I wanted to walk through a couple scenarios and see if we are the on same page

I want to have entire .xsd imported into AsyncAPI, as a string

components:
  messages:
    UserSignedUp:
      payload:
         $ref: ./some_xsd.xsd
      contentType: "application/xml"
  1. I just want a pointer from AsyncAPI to a schema registry/file, not bringing in the whole thing (maybe because it's huge)
    components:
messages:
   UserSignedUp:
     contentType: "application/xml"
     x-payload-remote-ref: "https://example.com/myschema.xsd"
  1. Provide a pointer to a particular element
components:
  messages:
    UserSignedUp:
	  contentType: "application/xml"
	  x-payload-remote-ref: "https://example.com/myschema.xsd"
	  x-payload-remote-pointer: "/PurchaseOrder"

For instance, if xsd has multiple root elements:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
           xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd"
           targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd"
           elementFormDefault="qualified">
 <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/>
 <xsd:element name="AnotherRootElement" type="tns:PurchaseOrderType"/>
  1. Pointer to a particular element if importing the schema
components:
  messages:
    UserSignedUp:
      payload:
         $ref: ./some_xsd.xsd
         $pointer: "/PurchaseOrder"
      contentType: "application/xml"

@magicmatatjahu
Copy link
Member

The described proposal/problem itself is related to my proposal, which I extended to use references to nested non-JSON schema objects - Proposal to allow defining schema format other than default one (AsyncAPI Schema) - please see section Update

@jessemenning You may be interested in this :)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@github-actions github-actions bot added the stale label Feb 10, 2022
@magicmatatjahu
Copy link
Member

Still valid. @derberg Could you remove stale label?

@derberg derberg removed the stale label Feb 10, 2022
@rober15
Copy link

rober15 commented Apr 26, 2022

Any conclusion on this topic ?
I agree that most important would be to get it possible to reference other schemas but skip any automated parsing.
It is better to describe any event no matter which format in a industry standard then only being able to support JSON.

What needs to happen to get that into the next releases ?

@derberg
Copy link
Member

derberg commented May 9, 2022

We definitely need a champion that wants to drive the change, come up with proposal, respond to feedback, and present it to others

@MichaelDavisSolace
Copy link

Apologies for letting this lie dormant for so long. Anyway, the proposal, as I see it, is what I wrote here with the added refinements that Jessie made. I think we've responded to feedback (tell me if I missed something.) Here it is presented to others. Is the next step, then, to merge Jessie's suggestions in with my original proposal and re-present it? Or are the next steps to actually do PRs against the spec and the parser? If the latter, I'd be happy to do a PR against the spec, but I'm not the best person to add features to the parser.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@github-actions github-actions bot added the stale label Sep 25, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale 💭 Strawman (RFC 0) RFC Stage 0 (See CONTRIBUTING.md)
Projects
None yet
Development

No branches or pull requests

9 participants