Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host schema publicly #151

Closed
webron opened this issue Oct 13, 2014 · 34 comments
Closed

Host schema publicly #151

webron opened this issue Oct 13, 2014 · 34 comments
Assignees

Comments

@webron
Copy link
Member

webron commented Oct 13, 2014

In order to accomplish #103 elegantly, we should host the json schema for the 2.0 spec under the swagger.io domain. Probably something like http://swagger.io/schemas/v2.0/schema (based on json-schema's id declaration "id": "http://json-schema.org/draft-04/schema#").

@fehguy
Copy link
Contributor

fehguy commented Oct 14, 2014

@webron
Copy link
Member Author

webron commented Oct 14, 2014

Is it possible to set the content-type to application/json?

@fehguy
Copy link
Contributor

fehguy commented Oct 14, 2014

Sure, just updated it

@fehguy
Copy link
Contributor

fehguy commented Oct 14, 2014

note, i can change the name to schema instead of schema.json if you prefer

@fehguy
Copy link
Contributor

fehguy commented Oct 14, 2014

ok before we make this the "official" schema location, I may add a subdomain so it can live on a CDN instead of our server for performance.

@mohsen1
Copy link
Contributor

mohsen1 commented Oct 14, 2014

How you handle versioning with this? Is it always going to server the latest? How can one get a fixed version?

NPM solves all those problems! Why don't you separate the schema folder into a separate repo and use it via npm here? that way it's going to be always up to date (because swagger-spec uses it) and people can use older versions much easier

@fehguy
Copy link
Contributor

fehguy commented Oct 14, 2014

Because many people want to reference the spec outside of a copy that they maintain. It's not a requirement to use but this will make the latest schema available without npm, etc

Eventually I'd prefer to have this hosted on a cdn

@mohsen1
Copy link
Contributor

mohsen1 commented Oct 14, 2014

Understood. Please make sure you leave older versions there too. I would like to lock my version if I were to use the schema from the CDN

@earth2marsh
Copy link
Member

👍 to having versions addressable. Can this be treated like Google hosts libraries like jQuery?

Specify a version like:
//ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js

So for the schema, it might live at:
//cdn.swagger.io/schemas/2.0.0.json

… and a minor rev might be:
//cdn.swagger.io/schemas/2.0.1.json

A nice-to-have might be if a digit could be replaced by an x, such as:
//cdn.swagger.io/schemas/2.x.x.json
… for all revs of the 2nd gen schema.

Minor revs could be:
//cdn.swagger.io/schemas/2.0.x.json

Also, this connotes a semver approach to schemas. :)

@webron
Copy link
Member Author

webron commented Oct 14, 2014

I don't think that's the right approach. I'm trying to find out if we can add a property to the schema saying which version it is, but the schema is and always will be 2.0 until we release 2.1. Adding revisions or versions to the schema name itself will just confuse users. Since the schema itself is work in progress at the moment, it will have updates. Once that's done, it may have fixes but no additions.

@earth2marsh
Copy link
Member

Over the last 6 weeks changes to the schema to correct bugs have broken projects like swagger-tools. Having a minor version like 2.0.1 be directly addressable would allow folks to control their dependency on the schema. If it were a property of the schema, then the ground could shift underneath your app, no?

@mohsen1
Copy link
Contributor

mohsen1 commented Oct 14, 2014

In swagger-schema-official I used the short commit sha1 for versioning

versions:
   [ '1.2.0',
     '2.0.0-a33091a',
     '2.0.0-b6411e9' ]

It would be more semantic to follow the same approach as the NPM

@webron
Copy link
Member Author

webron commented Oct 14, 2014

Yes, this what happens when things are being worked on concurrently. I'm not saying there shouldn't be a version control, I just don't think the file name is the correct way. The current schema has an "id" tag which points to a URL hosting the schema, and that's going to stay the same (though I admit we may change the location of the schema as stated in this issue, again, work in progress).

2.0 should always be the most up to date version. If you want to to have additional copies per revision, that's fine by me, but we need to see how and where that is managed.

@webron
Copy link
Member Author

webron commented Oct 14, 2014

@earth2marsh, @mohsen1 - I have a feeling we're talking about two separate things (and please correct me if I'm wrong).

Your concern with the versioning of the schema is with regards to the packaged schema and keeping proper versions of it for development purposes.

I'm talking about the schema as it is presented in this repository and as it is exposed publicly per this issue.

If that's the case, then I think the solution should come in a different way, but I'll go into details if you confirm this.

@mohsen1
Copy link
Contributor

mohsen1 commented Oct 14, 2014

@webron You're right. I'm more concerned about package.json and bower.json versioning. Version of schema itself is in a weird state. We claim we released 2.0 a while back but we keep pushing new changes to 2.0 again. So version for the schema is not really how a version should really work. I would say start changing version from 2.0.x and keep incrementing the version of the schema when we push new changes. The I could use the same version number for package.json.

@webron
Copy link
Member Author

webron commented Oct 14, 2014

I agree the release process of the spec is far from being perfect, but the version of both the spec file and the schema need to remain 2.0 as long as they describe what we intended to include in 2.0 (which currently includes missing features (there's still a couple of those) and fixes). Once that's stabilized, it's possible that there would be fixes but no additions.

As for the packaging process - I understand the importance of it and the importance of version-control of it, but this indeed is a separate issue. We've discussed this briefly in the past in a different location and unfortunately due to some miscommunication the packaging process was merged into this repository even though it shouldn't have.

I believe I have a concept that could work into solving both requirements. It would require some extra work but would keep this repository clean and hopefully allow for better version control for the packaging. I think going over it in comments would take too long, and I would love to have a chat with you and if needed @earth2marsh and @whitlockjc to find a solution that would serve everyone.

@whitlockjc
Copy link
Member

I don't have any preferences how it's done but I would love for there to be something. The initial problem will be that different needs will require different things. For example, putting these files on a CDN makes sense for web consumption but for Node.js, it might make sense to put them into an NPM module, or more than one if necessary to support multiple concurrent stable releases like 1.2 and 2.0.

Right now, I manually check to see if there are changes to the 1.2 or 2.0 schemas and when there are, I pull them down into my project and bundle these files with my project. It would be great to not to have to do that.

If I can help further, let me know.

@earth2marsh
Copy link
Member

@webron I agree we have some gap between us that I'd love to reconcile.

The Swagger spec today is 2.0. While it will certainly change in the future, we don't know when will be. For the foreseeable future the 2.0 specification will remain at 2.0.

The formal schema that is intended to be a formal representation of that spec, however, is a constant work-in-progress. There are gaps (like security) and bugs (like a misshapen regex) that get addressed along the way. But because tooling depends on that schema, when the schema changes, it breaks the tooling.

In order for Swagger tooling to succeed, it has to have a stable foundation. If the schema is hosted and can be accessed at a version level, then it will have that stability. Let's say the schema currently is 2.0.1. If a regex is fixed tomorrow and published to the CDN, then it would become 2.0.2. Until the spec increments to 2.1, any update is versioned at the least significant dotted integer.

Does that jibe with your view?

@jewest27
Copy link

Now that the schema changed reflecting 'version' as a string instead of an integer, every API specification written in Swagger 2.0 now becomes invalid. This affects our tooling and all every PoC at every Customer who is using Apigee-127, as well as all internal projects and documentation. Changes in the schema are causing pain for people who are trying to define an api specification in Swagger. Customers don't like 'specifications' that are consistently 'changing'.

I don't agree with "2.0 should be the most up-to-date version" because 2.0 is one single version which should be static. The schema represents the spec. If the schema is changing, the spec is changing. If the spec is changing, there should be a corresponding version change.

If it is not final, and we want to drive adoption and promote Swagger 2.0 for use with customers, we need to finalize it quickly. When we finalize the spec, the schema should not change :)

I have serious concerns about promoting swagger at this point - until we can have a version of the spec and corresponding schema that do not change.

@fehguy
Copy link
Contributor

fehguy commented Oct 15, 2014

@jwest-apigee I completely understand the concern. We have to do two things, though:

  1. Understand that the written spec IS the source of truth. The schema is NOT the spec, it's an attempt (that is quite close) to model the spec. That said, the JSON schema is a tool that we should rely upon to validate our swagger specifications. It is NOT however the source of truth.

  2. I do not want the schema to break clients when something is "fixed". This is impossible for two reasons: (1) the schema is not perfect and may allow people to not follow the spec. If that's the case, one's tooling, which may produce a swagger specification which is not valid, will correctly be flagged as invalid.

I understand that we don't want this to happen, and there are many solutions, two are below.

  1. Many people are making copies of the JSON schema to lock them into that point-in-time representation. I strongly suggest this (more in a minute)

  2. Have your tooling validate against your own, hosted copy of the schema. Similar to point (1) above, but slightly different in that you have wider control over updates affecting your install base.

But we have to acknowledge the first point, which is the written spec (https://github.com/wordnik/swagger-spec/blob/master/versions/2.0.md) IS the source of truth for swagger 2.0.

@jewest27
Copy link

@fehguy thanks for the clarification - I now understand that the source of truth is not the schema.

I feel strongly that we need to make the schema validate the spec reliably ASAP. When people were building SOAP services with WSDL they didn't read http://www.w3.org/TR/wsdl each time - the WSDL was validated against a schema and if a WSDL did not, then it wasn't considered valid.

We still need to be able to reference a specific version of the schema from the spec document, i.e. 2.0.1. Otherwise, something which was previously valid can become invalid if validated against "2.0" without the API Spec having changed. If we're all hosting our own versions then the version can/will appear arbitrary.

@webron
Copy link
Member Author

webron commented Oct 16, 2014

Sorry for replying relatively late, but with the time difference there's not much I can do.

Honestly, I completely understand all the points you're making regarding the changes in the schema. I believe that had I been in your shoes, I would have found it confusing and actually quite annoying. I think we've made quite a mistakes in the process and hope to learn from them regarding future releases.

First, I'll try to reply to points that were raised.

@jwest-apigee

  1. Specifically regarding the change in the version field from a number to a string - that's indeed frustrating. I opened an issue about it as soon as I found out the problem exists, even before going and fixing the schema (which I should have probably done at that stage as well). However, the original schema that was available from the reverb fork of swagger-spec (the one we used for the WIP on 2.0) had so many other issues with it that honestly, almost anything that was built based on it would have ended up being faulty in some way. I believe, this was one of our major mistakes with regards to how we treated that schema.
  2. I agree that we need to finalize the schema as soon as possible, however, no json schema that we currently build will be able to validate against the spec reliably. JSON Schema has a few limitations and some of the constraints defined in the written spec cannot be represented by a schema. This is something that any developer that wants to rely on the schema needs to understand. I think @whitlockjc has integrated some non-schema based constraints in the swagger-tools project, which I believe should be further expanded (if needed, not sure) and then integrated with javascript-based projects. That would help avoiding many possible issues with written specs.

@earth2marsh, @jwest-apigee and @mohsen1
Regarding the version management - I really think we can find a solution that will satisfy everyone's needs, but I fear we may not have the full understanding of the requirements from each side. At first, I though you were talking about packaged schema files that can be used with NPM. Now, I understand you're referring to hosted schema files.

"2.0" is the version of the spec, not the schema. They schema may be have changes that can fix "bugs" with regards to how it validates against the spec. This does not affect the version of the spec. The thing that's important to me is that the schema file that's reflected here in the swagger-spec repository and the one we're going to host will always be the latest one when referenced directly. That is https://github.com/wordnik/swagger-spec/blob/master/schemas/v2.0/schema.json will always be the latest version and so will http://swagger.io/v2/schema.json (or whichever URL we may change it do with the usage of a CDN). I'd rather not have people having to search for the latest version. I don't want people having to browse through 2.0.X to find the highest X for the latest version (and unfortunately, git or github don't have support for symlinks). Regarding @jwest-apigee's last comment, that's one of the reasons I'm trying to find out if there's a way to contain the revision of the schema inside the schema itself.

The various tools that rely on the schema internally should thrive to use the latest version, especially if it fixes bugs. When we want to say that a Swagger spec file is 2.0 compliant, if it validates against a buggy schema file, it doesn't make it compliant. As Tony said, the written spec is the source of truth.

@earth2marsh - I the text above explains why I have a wording issue what what you wrote in your last comment. The revision of the schema can change, but it always represents "2.0". Increments in changes will not lead to "2.1". It's not like we're going to say that some tools support Swagger 2.0.1 and some support 2.0.3. We can't expect the tools to be able to process each revision of the schema itself.

That said, it's definitely important for tool developers to know which revision of the schema they are using, to be able to follow the modifications that were made to the schema, and that's definitely something we need to solve (and I'm open for suggestions here). First, I'm trying to understand if we need to keep revisions in both hosted schema files and packaged schema files. Second, we need to find a way to automate this process. If we rely on manual maintenance of this, it will fail.

Another solution we may offer on top of what's mentioned above is to have some kind of a conversion tool that will fix the changes between schema revisions. There are three problems with that:

  1. It will make the schema updates slower (that may not necessarily be a bad thing).
  2. We need to clarify how such a tool is written and maintained.
  3. Not all fixes could be done automatically.

I am confident that we can find a solution here that would work for all parties involved, we just need to work on it together. I think we started off on the wrong foot because we each had a different view on what the version/revision means and how it affects the ecosystem around Swagger, but once everyone has a clear view on the issues, we can definitely reach a common ground. Of course, feel free to disagree with some things or everything that I wrote. With more information I can get better understanding.

I do have a few additional points I'd want to raise (regarding foreseeable future changes to the schema and how the revision numbering should work), but this comment is already long enough and I'd like to hear you opinions before diving into that.

@mohsen1
Copy link
Contributor

mohsen1 commented Oct 16, 2014

For versioning of the schema, if we host the bower.json here (as we do) it will conflict with git tags of parent repository. It have to have it's own repository in order to bower work.

@webron
Copy link
Member Author

webron commented Oct 16, 2014

Right, as part of the overall solution, I was going to suggest a separate repository for the packaging, where the build system would push updated schemas once it is committed and passes the tests. The packaging repo could then issue its own build routine to deal with the versioning, packaging and publishing of the new version. It would require some travis-ci-fu.

@webron
Copy link
Member Author

webron commented Nov 5, 2014

Okay, so one solution to create the revision id is to use:
git rev-list --count HEAD -- schema.json

That would basically give the number of revisions that the schema has gone through.

The ID is to create a script that runs as part of the build process which will modify the schema.json file and edit a "revision" property in it with the new number (possibly +1 since the edit commit is going to add to the revision). According to the feedback I got from the JSON Schema google group, there should be no problem adding such a property to the schema without breaking the validators.

The next step would be to wait for the build to complete - the build runs validations and whatnot to make sure the schema is valid and doesn't break anything.

If two conditions apply:

  1. The build finishes successfully
  2. The schema file was indeed updated

then the build script would push the change to a separate repository for the bower packaging.
On that repository, the build system can get the revision from the schema.json, modify the bower.json and proceed with the packaging and publishing on the new schema version.

Since the revision number is an increasing integer, it would be easy to see the revision order (unlike the git hashes). I imagine the packaging numbers could be something along the lines of 2.0-65 or 2.0.65 (I prefer the former).

It's possible that the second condition above (the schema file was indeed updated) is not going to be so simple since there may be commits that would cause failures that are unrelated to the schema and combining everything together may lead to unnecessary publishing or possibly skipping necessary ones. The solution to that would be to check the latest bower package version and see if the current revision is indeed newer. This again may be a tad risky with quick consequent commits that may cause a build to run before the bower repository finishes. I don't know what would happen when publishing a bower package with the same version twice.

As a final step, the build process may also create copies of the older revisions so that they are easily accessible. I'm not sure this is necessary since they are all available in the git commit history.

All this requires some scripting magic which is not my strong suit, but I wanted to run this by you before making any changes. There are some pending schema modifications that I'm holding off for now until we finalize this process.

@mohsen1
Copy link
Contributor

mohsen1 commented Nov 5, 2014

This works for me. You can use mversion to update bower and npm together with one command.

@webron
Copy link
Member Author

webron commented Nov 10, 2014

@mohsen1 - since I'm not familiar with npm's and bower's publishing mechanisms, can you provide me with the required commands to run during the build?

@mohsen1
Copy link
Contributor

mohsen1 commented Nov 10, 2014

So basically in every patch all you need to do is

mversion patch
npm publish
git push --tags origin #for bower

You need to use Travis gem to have your npm password as an environment variable in Travis environment and npm login before doing this.

@webron
Copy link
Member Author

webron commented Nov 10, 2014

You'd have to walk me through further than that. I've never done anything related to either npm or bower (not a javascript/node developer). I don't have a password or anything.

And I don't entirely understand how pushing the tags would affect bower (without publishing it).

@mohsen1
Copy link
Contributor

mohsen1 commented Nov 10, 2014

Bower uses git repo as "the registry" and git tags for "versions".

Actually Travis support pushing to npm out of the box http://docs.travis-ci.com/user/deployment/npm/
For bower, all it needs is git tags

@handrews
Copy link
Contributor

handrews commented May 21, 2023

@webron I think we agreed this should be the issue tracking how/when we deploy schema updates. Including deploying new revisions of the OAS 3.1 JSON Schema vocabulary and dialect meta-schemas which don't currently have dates in their $ids. Which I think happened because I wrote them as tentative proposals and then by the time OAS 3.1 shipped I wasn't around to catch that sort of thing.

@jdesrosiers
Copy link
Contributor

I've been handling schema deployment. I don't usually prepare a deployment for every change, but tend to wait a little while in case more changes come in (they usually come in waves). There are currently two changes that have been merged but not deployed yet and one open PR.

Here's the process I've been following for deployment

  1. Update release dates in
    • schema.yaml
    • schema-base.yaml (Don't forget the one in the $ref)
    • tests/v3.1/test.js
  2. Generate the JSON versions from the YAML versions
    • The README says to only modify the YAML version, but contributors usually include changes to both. So, this step is usually just a sanity check.
  3. Run the tests
    • The tests are very minimal and haven't been added to since they were first added, but I make sure that we at least don't break what we have.
  4. Deploy the schemas
    • Add the schemas to the gh-pages branch
    • Update the latest symlink to point to the newest version
  5. Other stuff I'm forgetting about ???

Here's an example of a deploy I've done in the past

Including deploying new revisions of the OAS 3.1 JSON Schema vocabulary and dialect meta-schemas which don't currently have dates in their $ids. Which I think happened because I wrote them as tentative proposals and then by the time OAS 3.1 shipped I wasn't around to catch that sort of thing.

Actually, I wrote those meta-schemas. I didn't include a date in the schema for the same reasons we don't include one in JSON Schema meta-schemas. The $ids of meta-schemas are what schemas authors use in $schema to indicate the version of JSON Schema their schema is using. If we version meta-schemas, users would need to update all of their schemas that declare this dialect every time we release a bugfix. There are lots of reasons that's not a good user experience.

To avoid this, at JSON Schema, any time we need to make a fix in the meta-schema, we've always just fixed it and kept the same $id. This has never cause a problem. I intended for these meta-schemas to be maintained the same way, but so far, there hasn't been a need to update the vocabulary/dialect meta-schemas.

Ideally, I'd think that meta-schemas probably should be versioned, but there should also be an unversioned URI that's effectively a redirect to latest version that the vast majority of users would use for $schema rather than the versioned URI.

I'm happy to hand the schema maintenance and deployment process over to whoever wants it. I've always just done the minimum to keep it afloat, so it would benefit from someone actually putting effort into improving this process.

@handrews
Copy link
Contributor

The date-in-$id policy should be the same for all the OAS schemas (whether they are JSON Schema meta-schemas or not). I'm fine with whatever policy @OAI/tsc wants (date or no date), as long as it is consistent. Having two different policies is worse UX than either individual policy. (Also, my apologies for mis-remembering who wrote those!)

@handrews
Copy link
Contributor

I'm closing this now very long and very out-of-date (aside from the last three comments) issue as too confusing for current tracking. Please use the following:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

9 participants