Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the schema for MONAI Bundle #7303

Closed
KumoLiu opened this issue Dec 8, 2023 · 7 comments · Fixed by #7409
Closed

Update the schema for MONAI Bundle #7303

KumoLiu opened this issue Dec 8, 2023 · 7 comments · Fixed by #7409
Assignees
Labels
Bundles Anything related to bundles enhancement New feature or request

Comments

@KumoLiu
Copy link
Contributor

KumoLiu commented Dec 8, 2023

There are several things we want to update in the schema.
https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json

@KumoLiu KumoLiu added enhancement New feature or request Bundles Anything related to bundles labels Dec 8, 2023
@ericspod
Copy link
Member

ericspod commented Jan 19, 2024

I have prepared a new bundle schema which should address these open issues and a few other issues I've considered/discovered. The following will include a "required_packages_version" component, an ability to differentiate between raw network output and postprocess output, a way to specify the types of arguments and return values which can be other than tensors, and an ability to specify the data format for multiple networks using pattern-based properties.

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "$defs": {
    "tensor": {
      "$comment": "Represents a tensor object argument/return value, can be MetaTensor.",
      "type": "object",
      "properties": {
        "type": { "type": "string" },
        "format": { "type": "string" },
        "num_channels": { "type": "integer" },
        "spatial_shape": {
          "type": "array",
          "items": {
            "type": ["string", "integer"]
          }
        },
        "dtype": { "type": "string" },
        "value_range": {
          "type": "array",
          "items": { "type": "number" }
        },
        "is_patch_data": { "type": "boolean" },
        "channel_def": { "type": "object" }
      },
      "required": [
        "type",
        "format",
        "num_channels",
        "spatial_shape",
        "dtype",
        "value_range"
      ]
    },
    "argument": {
      "$comment": "Acceptable input types for model forward pass, arbitrary Python objects not covered.",
      "anyOf": [
        { "type": "number" },
        { "type": "integer" },
        { "type": "string" },
        { "type": "boolean" },
        { "$ref": "#/$defs/tensor" }
      ]
    },
    "result": {
      "$comment": "Return value from a model's forward pass, same as an argument for now.",
      "$ref": "#/$defs/argument"
    },
    "network_io": {
      "description": "Defines the format, shape, and meaning of inputs and outputs to the model.",
      "type": "object",
      "$comment": "Arguments/return values described by pattern property, order considered significant.",
      "properties": {
        "inputs": {
          "type": "object",
          "patternProperties": {
            "^.+$": { "$ref": "#/$defs/argument" }
          }
        },
        "outputs": {
          "type": "object",
          "patternProperties": {
            "^.+$": { "$ref": "#/$defs/result" }
          }
        },
        "post_processed_outputs": {
          "$comment": "Return value format after post-processing, not needed if not changed.",
          "type": "object",
          "patternProperties": {
            "^.+$": { "$ref": "#/$defs/result" }
          }
        }
      },
      "required": [
        "inputs",
        "outputs"
      ]
    }
  },
  "type": "object",
  "properties": {
    "schema": {
      "description": "URL of the schema file.",
      "type": "string"
    },
    "version": {
      "description": "Version number of the bundle.",
      "type": "string"
    },
    "changelog": {
      "description": "Dictionary relating previous version names to strings describing the version.",
      "type": "object"
    },
    "monai_version": {
      "description": "Version of MONAI the bundle was generated with.",
      "type": "string"
    },
    "pytorch_version": {
      "description": "Version of PyTorch the bundle was generated with.",
      "type": "string"
    },
    "numpy_version": {
      "description": "Version of NumPy the bundlewas generated with.",
      "type": "string"
    },
    "required_packages_version": {
      "description": "Dictionary relating required additional package names to their versions.",
      "type": "object"
    },
    "optional_packages_version": {
      "description": "Dictionary relating optional additional package names to their versions.",
      "type": "object"
    },
    "task": {
      "description": "Plain-language description of what the bundle is meant to do.",
      "type": "string"
    },
    "description": {
      "description": "Longer form description of what the bundle is, what it does, etc.",
      "type": "string"
    },
    "authors": {
      "description": "State author(s) of the bundle.",
      "type": "string"
    },
    "copyright": {
      "description": "State copyright of the bundle.",
      "type": "string"
    },
    "data_source": {
      "description": "Where to download or prepare the data used in this bundle.",
      "type": "string"
    },
    "data_type": {
      "description": "Type of the data, like: `dicom`, `nibabel`, etc.",
      "type": "string"
    },
    "image_classes": {
      "description": "Description for every class of the input tensors.",
      "type": "string"
    },
    "label_classes": {
      "description": "Description for every class of the input tensors if present.",
      "type": "string"
    },
    "pred_classes": {
      "description": "Description for every class of the output prediction(s).",
      "type": "string"
    },
    "eval_metrics": {
      "description": "Dictionary relating evaluation metrics to the achieved scores.",
      "type": "object"
    },
    "intended_use": {
      "description": "What the bundle is to be used for, ie. what task it accomplishes.",
      "type": "string"
    },
    "references": {
      "description": "List of published referenced relating to the bundle.",
      "type": "array"
    },
    "network_data_format": { "$ref": "#/$defs/network_io" }
  },
  "$comment": "This permits definitions for multiple networks with format <network-name>_data_format",
  "patternProperties": {
    "^[_a-zA-Z0-9]+_data_format$": { "$ref": "#/$defs/network_io" }
  },
  "required": [
    "schema",
    "version",
    "monai_version",
    "pytorch_version",
    "numpy_version",
    "required_packages_version",
    "optional_packages_version",
    "task",
    "description",
    "authors",
    "copyright",
    "network_data_format"
  ]
}

@ericspod
Copy link
Member

Points about this schema:

  • A lot of the structural description was moved to $defs for reuse. This is the clean way of doing things it seems with $comment keys for comment information pertaining to the schema only.
  • With "patternProperties": { "^[_a-zA-Z0-9]+_data_format$": { "$ref": "#/$defs/network_io" }}, the schema now permits multiple network definitions to be included based on the naming pattern, so a network called foo in any of the scripts is specified with foo_data_format. This is different from what we had thought of before of having one bundle per network.
  • This schema validates against all the existing model zoo models except for 2 recent generative models which don't have a network_data_format field. This could be fixed with a rename of the networks in those bundles or by removing network_data_format from the schema and relying on the pattern property instead (but this would allow schemas without any data format).
  • The CI/CD checks in the model zoo should enforce this schema using the jsonschema package and a simple script to check every schema and ensure there's at least one *_data_format value.
  • We can also add scripts to format all json and yaml files automatically, I have simple versions for these two tasks that produces a slightly more compact result for both than is typically done.

@ericspod
Copy link
Member

Both required_packages_version and optional_packages_version fields are required, the current model zoon metadata.json files will validate as said except for the absense of required_packages_version.

@KumoLiu
Copy link
Contributor Author

KumoLiu commented Jan 22, 2024

Both required_packages_version and optional_packages_version fields are required, the current model zoon metadata.json files will validate as said except for the absense of required_packages_version.

Since we already know all of the packages we need after the bundle was created, we could remove 'optional_packages_version', which may be confusing, and instead, directly use 'required_packages_version'. What do you think? cc @Nic-Ma

@KumoLiu
Copy link
Contributor Author

KumoLiu commented Jan 22, 2024

This is different from what we had thought of before of having one bundle per network.

So do we currently have such a need to add multiple networks in a bundle?
It seems generative models have such requirements.
https://github.com/Project-MONAI/model-zoo/blob/e99e61aeb6fd21fc8613fd6099463ca8d7420aca/models/brats_mri_axial_slices_generative_diffusion/configs/metadata.json#L33

This could be fixed with a rename of the networks in those bundles or by removing network_data_format from the schema and relying on the pattern property instead (but this would allow schemas without any data format).

Instead of renaming and replying on the pattern property, I think these two generative models missing the network_data_format, so we should add them. network_data_format should still be a required field.
https://github.com/Project-MONAI/model-zoo/blob/e99e61aeb6fd21fc8613fd6099463ca8d7420aca/models/brats_mri_axial_slices_generative_diffusion/configs/train_diffusion.json#L18C6-L18C17

@KumoLiu
Copy link
Contributor Author

KumoLiu commented Jan 22, 2024

Hi @ericspod, thanks for the new schema you updated, I agree with the idea of $defs for reuse.
We may need more people to help review the updated schema, would you mind creating a draft PR so that people can easily leave comments?

@ericspod
Copy link
Member

I've made the PR #7409 but I don't know if we want the schema to be here. The schema lives in a release storage data here rather than in a repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bundles Anything related to bundles enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants