Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SchemaStore #110

Closed
henryiii opened this issue Oct 6, 2023 · 21 comments
Closed

Support for SchemaStore #110

henryiii opened this issue Oct 6, 2023 · 21 comments

Comments

@henryiii
Copy link
Collaborator

henryiii commented Oct 6, 2023

Would it make sense to support SchemaStore? Currently, there's only one subschema (Ruff), though I'm asking to see if Poetry can be pulled out as a subschema (currently part of the core schema for pyproject.toml) and there's a proposed addition for Hatch. I'm hoping to add my schemas there too, for VSCode and such support. It would be nice to have in validate-pyproject too, I think - you wouldn't have to install the various projects, just pull from one place (schemastore). I don't think it can be implemented as a third party plugin very easily, since you don't know what tool sections are provided beforehand (unless it was a regularly updated copy or something).

I don't have a good idea of how it would work, how someone would opt-in to it (I guess it might require getting the schemas from a URL? I'm not that sure how schemastore works or what would be ideal. I know there's also a repo.) Maybe a regular updated copy would not be too bad. Thoughts?

@abravalheri
Copy link
Owner

Hi @henryiii , in principle, I think this feature is OK.

Something that I was already considering before is to have some form of loading a schema from a local path (opt-in), so this could be a variation of that, e.g.:

validate-pyproject --schema tool_name=./file.schema.json other-tool-name=https://... ...
# or even a custom uri schema-store://tool-name

Something to have in mind is that fastjsonschema is compatible with draft 07, so we have to be careful with external sources that were not explicitly designed to interoperate with validate-pyproject.

Do you have any thoughts on that?

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 9, 2023

SchemaStore is designed around draft 07. :)

I've been looking into this a bit more, and there's one fundamental issue we'd need to solve, but I think it's easy. I'll open a separate issue about that #111. (In an online conference for three days, so depends on how interesting/boring the talks are. :) )

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 9, 2023

Is this still true:

"Please notice for the time being the ``setuptools`` project does not specify",
"a way of configuring builds via ``pyproject.toml``.",
"Therefore this schema should be taken just as a *\"thought experiment\"* on how",
"this *might be done*, by following the principles established in",

I'd like to also submit at least setuptools.schema.json to SchemaStore, but this seems like it needs updating. :)

@abravalheri
Copy link
Owner

Is this still true:

Oops. Fix in #112.

@henryiii
Copy link
Collaborator Author

Is "format" (https://json-schema.org/understanding-json-schema/reference/string#format) used in fastjsonschema? Or was it just part of some future work?

@abravalheri
Copy link
Owner

abravalheri commented Oct 11, 2023

Hi @henryiii , fastjsonschema allows using format.
If I am not wrong, there are some default ones, but I wrote a bunch of custom ones in the formats sub-module.

@henryiii
Copy link
Collaborator Author

Ahh! Not sure why I couldn’t find them via grep.

@abravalheri
Copy link
Owner

Maybe that is relevant in this context:


Side note:

BTW, thank you very much @henryiii for all the interest you have been showing in this project and all the contributions. I really appreciate that.

Should I go ahead and create a new release with the PRs that were already merged, or do you have any pending change that you have in mind?

@henryiii
Copy link
Collaborator Author

Probably fine to make a release. I'm going to work on making a contribution to SchemaStore, and that requires processing setuptools.schema.json quite a bit to make it standards compliant - no format, no $$description, etc. How often do you expect it to change? I'm building a little conversion script, but if changes are pretty rare, we could just make them twice if we need to in the future. In either case, though, it's unlikely to change the existing file unless I find any bugs with it.

I'd also like to try to work on this issue itself (validate-pyproject pulling schemas from SchemaStore), but that's fine for a future release.

@abravalheri
Copy link
Owner

it standards compliant - no format...

Just one minor comment: format is standard, but what to do with it is pretty much up to the tool. The "standard" (or docs website?) says:

By default, format is just an annotation and does not effect validation.

Optionally, validator implementations can provide a configuration option to enable format to function as an assertion rather than just an annotation.
...
A JSON Schema validator will ignore any format type that it does not understand.

So, it should possible to keep it there for the sake of "semantics" (with the understanding that this way they will be more "informational" than "enforcement").

How often do you expect it to change? I'm building a little conversion script, but if changes are pretty rare, we could just make them twice if we need to in the future. In either case, though, it's unlikely to change the existing file unless I find any bugs with it.

It is been a while that it does not change... So I don't think it should change too much.
But it will follow the changes in setuptools (there are some stuff like ext_modules that are in the back of my mind).

One option is to add this script to this repository and run it as part of the test suite to ensure compatibility1.

In the future, I was considering to have the schema files themselves directly on the setuptools repository to facilitate contributors (other than me) to introduce new configurations, but this is something open for debate.

Footnotes

  1. I am happy to add you as a maintainer to this repo if you are interested, BTW. You have always given great contributions.

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 12, 2023

I think the problem is SchemaStore doesn't allow custom formats1, since it's used by about a dozen different editors, etc., and they aren't familiar with non-standard formats. Also things like

{"format": "python-module-name"},
{"format": "pep561-stub-name"}
kind of assume the validator knows how to use the format.

I think it's fine, we can just replace format with a ref to a regex in the conversion script.

Sure, I'm fine to be added, as long as you don't expect me to do too much - happy to help a bit. The main benefit, IMO, is I can fix PRs like #109, and I get to put it in the "My Projects" listing https://iscinumpy.gitlab.io. ;)

Footnotes

  1. Those are also missing "type": "string". There's a chance adding that would make SchemaStore happy.

@abravalheri
Copy link
Owner

Sure, I'm fine to be added, as long as you don't expect me to do too much - happy to help a bit.

No expectations 😝

Those are also missing "type": "string". There's a chance adding that would make SchemaStore happy.

I don't remember now, but I remember seeing in the reference docs at that point in time type being treated as string by default... but my memory is fuzzy. Anyway that was my mistake 😅

I think the problem is SchemaStore doesn't allow custom formats

That looks like a bug on SchemaStore... The "spec/website" explicitly say "By default, format is just an annotation and does not effect validation ... A JSON Schema validator will ignore any format type that it does not understand".

But anyway, if problematic, processing it out is a good option.

@abravalheri
Copy link
Owner

abravalheri commented Oct 12, 2023

The reason why I decided to go with format in some cases rather than regex was also because some validations where tricky to express as regex (e.g. the PEP 508 is usually expressed in terms of a BNF grammar right?), or too long to be humanly understandable in a JSON format.

In other cases I could not find a good regex definition (like Python UTF-8 identifiers), or it would be better to use a validation function, so if a PEP updates some definition, we are not surprised with changes.

So in the end, I settled with format because it gave me just the right amount of "editing"/"comprehending error message" capabilities.

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 13, 2023

Quick question: is this valid?

[tool.setuptools.packages.find]
where = "src" 

This is in one of the samples already in SchemaStore and doesn't pass validation (which would require

[tool.setuptools.packages.find]
where = ["src"] 

)

@henryiii
Copy link
Collaborator Author

Nope:

$ cookiecutter gh:scientific-python/cookie
...
  [8/9] Choose a build backend
    1 - Hatchling                      - Pure Python (recommended)
    2 - Flit-core                      - Pure Python (minimal)
    3 - PDM-backend                    - Pure Python
    4 - Whey                           - Pure Python
    5 - Poetry                         - Pure Python
    6 - Setuptools with pyproject.toml - Pure Python
    7 - Setuptools with setup.py       - Pure Python
    8 - Setuptools and pybind11        - Compiled C++
    9 - Scikit-build-core              - Compiled C++ (recommended)
    10 - Meson-python                  - Compiled C++ (also good)
    11 - Maturin                       - Compiled Rust (recommended)
    Choose from [1/2/3/4/5/6/7/8/9/10/11] (1): 6
...
$ cd package
$ vi pyproject.toml
$ pipx run build
...
ValueError: invalid pyproject.toml config: `tool.setuptools.packages`.
configuration error: `tool.setuptools.packages` must be valid exactly by one definition (0 matches found):

    - type: array
      items:
        type: string
        at least one of the following:
          - {format: 'python-module-name'}
          - {format: 'pep561-stub-name'}
    - type: table
      additional keys: False
      keys:
        'find':
          type: table
          additional keys: False
          keys:
            'where':
              type: array
              items: {type: string}
            'exclude':
              type: array
              items: {type: string}
            'include':
              type: array
              items: {type: string}
            'namespaces': {type: boolean}


ERROR Backend subprocess exited when trying to invoke get_requires_for_build_sdist

So the example in SchemaStore is wrong. :)

@abravalheri
Copy link
Owner

Setuptools expects to find an array there. I remember it being a suggestion from the discord server.

@henryiii
Copy link
Collaborator Author

FYI, my PR is here: SchemaStore/schemastore#3307

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 17, 2023

Seems like SchemaStore's validator is not the only one to not accept unknown formats:

$ validate-pyproject tests/invalid-examples/poetry/poetry-bad-multiline.toml --tools=http://json.schemastore.org/pyproject.json#properties/tool -vv
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema
[INFO] https://json.schemastore.org/cibuildwheel.json defines `tool.cibuildwheel` schema
[INFO] https://json.schemastore.org/ruff.json defines `tool.ruff` schema
[INFO] https://json.schemastore.org/scikit-build.json defines `tool.scikit-build` schema
[INFO] https://json.schemastore.org/poetry.json defines `tool.poetry` schema
[ERROR] JsonSchemaDefinitionException: Unknown format: uint8
...
fastjsonschema.exceptions.JsonSchemaDefinitionException: Unknown format: uint8

😅

PS: I did find the place in SchemaStore where I can ignore unknown formats.

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 17, 2023

Okay, this is my current proposal, based on #110 (comment):

Stage 1: we add a way to load external files, either via path or URL. I've made a draft implementation:

$ validate-pyproject tests/examples/poetry/poetry-complete.toml -t poetry=http://json.schemastore.org/poetry.json -vv
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema
[INFO] https://json.schemastore.org/poetry.json defines `tool.poetry` schema
Valid file: tests/examples/poetry/poetry-complete.toml
$ validate-pyproject tests/invalid-examples/poetry/poetry-bad-multiline.toml -t poetry=http://json.schemastore.org/poetry.json
Invalid file: tests/invalid-examples/poetry/poetry-bad-multiline.toml
[ERROR] `tool.poetry.description` must match pattern ^[^
]*$

Nested JSON docs can be loaded, too, like t poetry=http://json.schemastore.org/cibuildwheel.json#properties/tool/properties/cibuildwheel.

Stage 2 will be to add a way to load all sections, something like --tools http//json.schemastore.org/pyproject.json#properties/tool. This will read the file, get the properties/tool section, and load all the sections present, including resolving $refs, that are not already loaded as plugins. (Also testing in draft)

We could provide a --schemastore shortcut for the above with the canonical URL.

@abravalheri
Copy link
Owner

Seems like SchemaStore's validator is not the only one to not accept unknown formats:

$ validate-pyproject tests/invalid-examples/poetry/poetry-bad-multiline.toml --tools=http://json.schemastore.org/pyproject.json#properties/tool -vv
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema
[INFO] https://json.schemastore.org/cibuildwheel.json defines `tool.cibuildwheel` schema
[INFO] https://json.schemastore.org/ruff.json defines `tool.ruff` schema
[INFO] https://json.schemastore.org/scikit-build.json defines `tool.scikit-build` schema
[INFO] https://json.schemastore.org/poetry.json defines `tool.poetry` schema
[ERROR] JsonSchemaDefinitionException: Unknown format: uint8
...
fastjsonschema.exceptions.JsonSchemaDefinitionException: Unknown format: uint8

😅

PS: I did find the place in SchemaStore where I can ignore unknown formats.

Ouch... I think we can work with that to be compliant with the spec.
Probably by modifying FORMAT_FUNCTIONS to use a defaultdict we can achieve a "lenient" by default behaviour.

I would also be happy to add int/uintX functions to the formats.py module.
Do you think this approach is worthy?

@abravalheri
Copy link
Owner

Okay, this is my current proposal, based on #110 (comment):

Stage 1: we add a way to load external files, either via path or URL. I've made a draft implementation:
...
Stage 2 will be to add a way to load all sections, something like --tools http//json.schemastore.org/pyproject.json#properties/tool. This will read the file, get the properties/tool section, and load all the sections present, including resolving $refs, that are not already loaded as plugins.
We could provide a --schemastore shortcut for the above with the canonical URL.

I like this approach. Thank you very much @henryiii for taking the time to look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants