Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cwl-input-schema implementation #288

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alexiswl
Copy link
Contributor

@alexiswl alexiswl commented Feb 15, 2024

Extension of #282

Related issues #273

Usage example

cwl-inputs-schema-gen \
"https://raw.githubusercontent.com/umccr/cwl-ica/main/workflows/illumina-interop-qc/1.2.0--1.14.0/illumina-interop-qc__1.2.0--1.14.0.cwl"
Details
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "definitions": {
    "File": {
      "additionalProperties": false,
      "description": "Represents a file (or group of files when `secondaryFiles` is provided) that\nwill be accessible by tools using standard POSIX file system call API such as\nopen(2) and read(2).\n\nFiles are represented as objects with `class` of `File`.  File objects have\na number of properties that provide metadata about the file.\n\nThe `location` property of a File is a URI that uniquely identifies the\nfile.  Implementations must support the `file://` URI scheme and may support\nother schemes such as `http://` and `https://`.  The value of `location` may also be a\nrelative reference, in which case it must be resolved relative to the URI\nof the document it appears in.  Alternately to `location`, implementations\nmust also accept the `path` property on File, which must be a filesystem\npath available on the same host as the CWL runner (for inputs) or the\nruntime environment of a command line tool execution (for command line tool\noutputs).\n\nIf no `location` or `path` is specified, a file object must specify\n`contents` with the UTF-8 text content of the file.  This is a \"file\nliteral\".  File literals do not correspond to external resources, but are\ncreated on disk with `contents` with when needed for executing a tool.\nWhere appropriate, expressions can return file literals to define new files\non a runtime.  The maximum size of `contents` is 64 kilobytes.\n\nThe `basename` property defines the filename on disk where the file is\nstaged.  This may differ from the resource name.  If not provided,\n`basename` must be computed from the last path part of `location` and made\navailable to expressions.\n\nThe `secondaryFiles` property is a list of File or Directory objects that\nmust be staged in the same directory as the primary file.  It is an error\nfor file names to be duplicated in `secondaryFiles`.\n\nThe `size` property is the size in bytes of the File.  It must be computed\nfrom the resource and made available to expressions.  The `checksum` field\ncontains a cryptographic hash of the file content for use it verifying file\ncontents.  Implementations may, at user option, enable or disable\ncomputation of the `checksum` field for performance or other reasons.\nHowever, the ability to compute output checksums is required to pass the\nCWL conformance test suite.\n\nWhen executing a CommandLineTool, the files and secondary files may be\nstaged to an arbitrary directory, but must use the value of `basename` for\nthe filename.  The `path` property must be file path in the context of the\ntool execution runtime (local to the compute node, or within the executing\ncontainer).  All computed properties should be available to expressions.\nFile literals also must be staged and `path` must be set.\n\nWhen collecting CommandLineTool outputs, `glob` matching returns file paths\n(with the `path` property) and the derived properties. This can all be\nmodified by `outputEval`.  Alternately, if the file `cwl.output.json` is\npresent in the output, `outputBinding` is ignored.\n\nFile objects in the output must provide either a `location` URI or a `path`\nproperty in the context of the tool execution runtime (local to the compute\nnode, or within the executing container).\n\nWhen evaluating an ExpressionTool, file objects must be referenced via\n`location` (the expression tool does not have access to files on disk so\n`path` is meaningless) or as file literals.  It is legal to return a file\nobject with an existing `location` but a different `basename`.  The\n`loadContents` field of ExpressionTool inputs behaves the same as on\nCommandLineTool inputs, however it is not meaningful on the outputs.\n\nAn ExpressionTool may forward file references from input to output by using\nthe same value for `location`.",
      "properties": {
        "basename": {
          "description": "The base name of the file, that is, the name of the file without any\nleading directory path.  The base name must not contain a slash `/`.\n\nIf not provided, the implementation must set this field based on the\n`location` field by taking the final path component after parsing\n`location` as an IRI.  If `basename` is provided, it is not required to\nmatch the value from `location`.\n\nWhen this file is made available to a CommandLineTool, it must be named\nwith `basename`, i.e. the final component of the `path` field must match\n`basename`.",
          "type": "string"
        },
        "checksum": {
          "description": "Optional hash code for validating file integrity.  Currently, must be in the form\n\"sha1$ + hexadecimal string\" using the SHA-1 algorithm.",
          "type": "string"
        },
        "class": {
          "const": "File",
          "description": "Must be `File` to indicate this object describes a file.",
          "type": "string"
        },
        "contents": {
          "description": "File contents literal.\n\nIf neither `location` nor `path` is provided, `contents` must be\nnon-null.  The implementation must assign a unique identifier for the\n`location` field.  When the file is staged as input to CommandLineTool,\nthe value of `contents` must be written to a file.\n\nIf `contents` is set as a result of a Javascript expression,\nan `entry` in `InitialWorkDirRequirement`, or read in from\n`cwl.output.json`, there is no specified upper limit on the\nsize of `contents`.  Implementations may have practical limits\non the size of `contents` based on memory and storage\navailable to the workflow runner or other factors.\n\nIf the `loadContents` field of an `InputParameter` or\n`OutputParameter` is true, and the input or output File object\n`location` is valid, the file must be a UTF-8 text file 64 KiB\nor smaller, and the implementation must read the entire\ncontents of the file and place it in the `contents` field.  If\nthe size of the file is greater than 64 KiB, the\nimplementation must raise a fatal error.",
          "type": "string"
        },
        "dirname": {
          "description": "The name of the directory containing file, that is, the path leading up\nto the final slash in the path such that `dirname + '/' + basename ==\npath`.\n\nThe implementation must set this field based on the value of `path`\nprior to evaluating parameter references or expressions in a\nCommandLineTool document.  This field must not be used in any other\ncontext.",
          "type": "string"
        },
        "format": {
          "description": "The format of the file: this must be an IRI of a concept node that\nrepresents the file format, preferably defined within an ontology.\nIf no ontology is available, file formats may be tested by exact match.\n\nReasoning about format compatibility must be done by checking that an\ninput file format is the same, `owl:equivalentClass` or\n`rdfs:subClassOf` the format required by the input parameter.\n`owl:equivalentClass` is transitive with `rdfs:subClassOf`, e.g. if\n`<B> owl:equivalentClass <C>` and `<B> owl:subclassOf <A>` then infer\n`<C> owl:subclassOf <A>`.\n\nFile format ontologies may be provided in the \"$schemas\" metadata at the\nroot of the document.  If no ontologies are specified in `$schemas`, the\nruntime may perform exact file format matches.",
          "type": "string"
        },
        "location": {
          "description": "An IRI that identifies the file resource.  This may be a relative\nreference, in which case it must be resolved using the base IRI of the\ndocument.  The location may refer to a local or remote resource; the\nimplementation must use the IRI to retrieve file content.  If an\nimplementation is unable to retrieve the file content stored at a\nremote resource (due to unsupported protocol, access denied, or other\nissue) it must signal an error.\n\nIf the `location` field is not provided, the `contents` field must be\nprovided.  The implementation must assign a unique identifier for\nthe `location` field.\n\nIf the `path` field is provided but the `location` field is not, an\nimplementation may assign the value of the `path` field to `location`,\nthen follow the rules above.",
          "type": "string"
        },
        "nameext": {
          "description": "The basename extension such that `nameroot + nameext == basename`, and\n`nameext` is empty or begins with a period and contains at most one\nperiod.  Leading periods on the basename are ignored; a basename of\n`.cshrc` will have an empty `nameext`.\n\nThe implementation must set this field automatically based on the value\nof `basename` prior to evaluating parameter references or expressions.",
          "type": "string"
        },
        "nameroot": {
          "description": "The basename root such that `nameroot + nameext == basename`, and\n`nameext` is empty or begins with a period and contains at most one\nperiod.  For the purposes of path splitting leading periods on the\nbasename are ignored; a basename of `.cshrc` will have a nameroot of\n`.cshrc`.\n\nThe implementation must set this field automatically based on the value\nof `basename` prior to evaluating parameter references or expressions.",
          "type": "string"
        },
        "path": {
          "description": "The local host path where the File is available when a CommandLineTool is\nexecuted.  This field must be set by the implementation.  The final\npath component must match the value of `basename`.  This field\nmust not be used in any other context.  The command line tool being\nexecuted must be able to access the file at `path` using the POSIX\n`open(2)` syscall.\n\nAs a special case, if the `path` field is provided but the `location`\nfield is not, an implementation may assign the value of the `path`\nfield to `location`, and remove the `path` field.\n\nIf the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02)\n(`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\\`, `\"`, `'`,\n`<space>`, `<tab>`, and `<newline>`) or characters\n[not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml)\nfor [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452)\nthen implementations may terminate the process with a\n`permanentFailure`.",
          "type": "string"
        },
        "secondaryFiles": {
          "description": "A list of additional files or directories that are associated with the\nprimary file and must be transferred alongside the primary file.\nExamples include indexes of the primary file, or external references\nwhich must be included when loading primary document.  A file object\nlisted in `secondaryFiles` may itself include `secondaryFiles` for\nwhich the same rules apply.",
          "items": {
            "anyOf": [
              {
                "$ref": "#/definitions/File"
              },
              {
                "$ref": "#/definitions/Directory"
              }
            ]
          },
          "type": "array"
        },
        "size": {
          "description": "Optional file size (in bytes)",
          "type": "number"
        }
      },
      "required": [
        "class"
      ],
      "type": "object"
    },
    "Directory": {
      "additionalProperties": false,
      "description": "Represents a directory to present to a command line tool.\n\nDirectories are represented as objects with `class` of `Directory`.  Directory objects have\na number of properties that provide metadata about the directory.\n\nThe `location` property of a Directory is a URI that uniquely identifies\nthe directory.  Implementations must support the file:// URI scheme and may\nsupport other schemes such as http://.  Alternately to `location`,\nimplementations must also accept the `path` property on Directory, which\nmust be a filesystem path available on the same host as the CWL runner (for\ninputs) or the runtime environment of a command line tool execution (for\ncommand line tool outputs).\n\nA Directory object may have a `listing` field.  This is a list of File and\nDirectory objects that are contained in the Directory.  For each entry in\n`listing`, the `basename` property defines the name of the File or\nSubdirectory when staged to disk.  If `listing` is not provided, the\nimplementation must have some way of fetching the Directory listing at\nruntime based on the `location` field.\n\nIf a Directory does not have `location`, it is a Directory literal.  A\nDirectory literal must provide `listing`.  Directory literals must be\ncreated on disk at runtime as needed.\n\nThe resources in a Directory literal do not need to have any implied\nrelationship in their `location`.  For example, a Directory listing may\ncontain two files located on different hosts.  It is the responsibility of\nthe runtime to ensure that those files are staged to disk appropriately.\nSecondary files associated with files in `listing` must also be staged to\nthe same Directory.\n\nWhen executing a CommandLineTool, Directories must be recursively staged\nfirst and have local values of `path` assigned.\n\nDirectory objects in CommandLineTool output must provide either a\n`location` URI or a `path` property in the context of the tool execution\nruntime (local to the compute node, or within the executing container).\n\nAn ExpressionTool may forward file references from input to output by using\nthe same value for `location`.\n\nName conflicts (the same `basename` appearing multiple times in `listing`\nor in any entry in `secondaryFiles` in the listing) is a fatal error.",
      "properties": {
        "basename": {
          "description": "The base name of the directory, that is, the name of the file without any\nleading directory path.  The base name must not contain a slash `/`.\n\nIf not provided, the implementation must set this field based on the\n`location` field by taking the final path component after parsing\n`location` as an IRI.  If `basename` is provided, it is not required to\nmatch the value from `location`.\n\nWhen this file is made available to a CommandLineTool, it must be named\nwith `basename`, i.e. the final component of the `path` field must match\n`basename`.",
          "type": "string"
        },
        "class": {
          "const": "Directory",
          "description": "Must be `Directory` to indicate this object describes a Directory.",
          "type": "string"
        },
        "listing": {
          "description": "List of files or subdirectories contained in this directory.  The name\nof each file or subdirectory is determined by the `basename` field of\neach `File` or `Directory` object.  It is an error if a `File` shares a\n`basename` with any other entry in `listing`.  If two or more\n`Directory` object share the same `basename`, this must be treated as\nequivalent to a single subdirectory with the listings recursively\nmerged.",
          "items": {
            "anyOf": [
              {
                "$ref": "#/definitions/File"
              },
              {
                "$ref": "#/definitions/Directory"
              }
            ]
          },
          "type": "array"
        },
        "location": {
          "description": "An IRI that identifies the directory resource.  This may be a relative\nreference, in which case it must be resolved using the base IRI of the\ndocument.  The location may refer to a local or remote resource.  If\nthe `listing` field is not set, the implementation must use the\nlocation IRI to retrieve directory listing.  If an implementation is\nunable to retrieve the directory listing stored at a remote resource (due to\nunsupported protocol, access denied, or other issue) it must signal an\nerror.\n\nIf the `location` field is not provided, the `listing` field must be\nprovided.  The implementation must assign a unique identifier for\nthe `location` field.\n\nIf the `path` field is provided but the `location` field is not, an\nimplementation may assign the value of the `path` field to `location`,\nthen follow the rules above.",
          "type": "string"
        },
        "path": {
          "description": "The local path where the Directory is made available prior to executing a\nCommandLineTool.  This must be set by the implementation.  This field\nmust not be used in any other context.  The command line tool being\nexecuted must be able to access the directory at `path` using the POSIX\n`opendir(2)` syscall.\n\nIf the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02)\n(`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\\`, `\"`, `'`,\n`<space>`, `<tab>`, and `<newline>`) or characters\n[not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml)\nfor [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452)\nthen implementations may terminate the process with a\n`permanentFailure`.",
          "type": "string"
        }
      },
      "required": [
        "class"
      ],
      "type": "object"
    }
  },
  "description": "Auto-generated class implementation for https://w3id.org/cwl/cwl#WorkflowInputParameter",
  "type": "object",
  "properties": {
    "input_run_dir": {
      "$ref": "#/definitions/Directory",
      "description": "The bcl directory\n"
    },
    "multiqc_cl_config": {
      "oneOf": [
        {
          "type": "null"
        },
        {
          "type": "string",
          "description": "Configuration via the cli for multiqc\n"
        }
      ]
    },
    "multiqc_comment": {
      "oneOf": [
        {
          "type": "null"
        },
        {
          "type": "string",
          "description": "Any commentary to place in the multiqc report\n"
        }
      ]
    },
    "multiqc_config": {
      "oneOf": [
        {
          "type": "null"
        },
        {
          "$ref": "#/definitions/File",
          "description": "Configuration file for multiqc\n"
        }
      ]
    },
    "multiqc_output_directory_name": {
      "type": "string",
      "description": "Name of the output directory for multiqc\n"
    },
    "multiqc_output_filename": {
      "type": "string",
      "description": "The name of the multiqc output file\n"
    },
    "multiqc_title": {
      "type": "string",
      "description": "The name of the title for multiqc\n"
    }
  },
  "required": [
    "input_run_dir",
    "multiqc_output_directory_name",
    "multiqc_output_filename",
    "multiqc_title"
  ]
}

Complex usage example

cwl-inputs-schema-gen https://raw.githubusercontent.com/umccr/cwl-ica/main/workflows/bclconvert-with-qc-pipeline/4.0.3/bclconvert-with-qc-pipeline__4.0.3.cwl

Gives

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "definitions": {
    "File": {
      "additionalProperties": false,
      "description": "Represents a file (or group of files when `secondaryFiles` is provided) that\nwill be accessible by tools using standard POSIX file system call API such as\nopen(2) and read(2).\n\nFiles are represented as objects with `class` of `File`.  File objects have\na number of properties that provide metadata about the file.\n\nThe `location` property of a File is a URI that uniquely identifies the\nfile.  Implementations must support the `file://` URI scheme and may support\nother schemes such as `http://` and `https://`.  The value of `location` may also be a\nrelative reference, in which case it must be resolved relative to the URI\nof the document it appears in.  Alternately to `location`, implementations\nmust also accept the `path` property on File, which must be a filesystem\npath available on the same host as the CWL runner (for inputs) or the\nruntime environment of a command line tool execution (for command line tool\noutputs).\n\nIf no `location` or `path` is specified, a file object must specify\n`contents` with the UTF-8 text content of the file.  This is a \"file\nliteral\".  File literals do not correspond to external resources, but are\ncreated on disk with `contents` with when needed for executing a tool.\nWhere appropriate, expressions can return file literals to define new files\non a runtime.  The maximum size of `contents` is 64 kilobytes.\n\nThe `basename` property defines the filename on disk where the file is\nstaged.  This may differ from the resource name.  If not provided,\n`basename` must be computed from the last path part of `location` and made\navailable to expressions.\n\nThe `secondaryFiles` property is a list of File or Directory objects that\nmust be staged in the same directory as the primary file.  It is an error\nfor file names to be duplicated in `secondaryFiles`.\n\nThe `size` property is the size in bytes of the File.  It must be computed\nfrom the resource and made available to expressions.  The `checksum` field\ncontains a cryptographic hash of the file content for use it verifying file\ncontents.  Implementations may, at user option, enable or disable\ncomputation of the `checksum` field for performance or other reasons.\nHowever, the ability to compute output checksums is required to pass the\nCWL conformance test suite.\n\nWhen executing a CommandLineTool, the files and secondary files may be\nstaged to an arbitrary directory, but must use the value of `basename` for\nthe filename.  The `path` property must be file path in the context of the\ntool execution runtime (local to the compute node, or within the executing\ncontainer).  All computed properties should be available to expressions.\nFile literals also must be staged and `path` must be set.\n\nWhen collecting CommandLineTool outputs, `glob` matching returns file paths\n(with the `path` property) and the derived properties. This can all be\nmodified by `outputEval`.  Alternately, if the file `cwl.output.json` is\npresent in the output, `outputBinding` is ignored.\n\nFile objects in the output must provide either a `location` URI or a `path`\nproperty in the context of the tool execution runtime (local to the compute\nnode, or within the executing container).\n\nWhen evaluating an ExpressionTool, file objects must be referenced via\n`location` (the expression tool does not have access to files on disk so\n`path` is meaningless) or as file literals.  It is legal to return a file\nobject with an existing `location` but a different `basename`.  The\n`loadContents` field of ExpressionTool inputs behaves the same as on\nCommandLineTool inputs, however it is not meaningful on the outputs.\n\nAn ExpressionTool may forward file references from input to output by using\nthe same value for `location`.",
      "properties": {
        "basename": {
          "description": "The base name of the file, that is, the name of the file without any\nleading directory path.  The base name must not contain a slash `/`.\n\nIf not provided, the implementation must set this field based on the\n`location` field by taking the final path component after parsing\n`location` as an IRI.  If `basename` is provided, it is not required to\nmatch the value from `location`.\n\nWhen this file is made available to a CommandLineTool, it must be named\nwith `basename`, i.e. the final component of the `path` field must match\n`basename`.",
          "type": "string"
        },
        "checksum": {
          "description": "Optional hash code for validating file integrity.  Currently, must be in the form\n\"sha1$ + hexadecimal string\" using the SHA-1 algorithm.",
          "type": "string"
        },
        "class": {
          "const": "File",
          "description": "Must be `File` to indicate this object describes a file.",
          "type": "string"
        },
        "contents": {
          "description": "File contents literal.\n\nIf neither `location` nor `path` is provided, `contents` must be\nnon-null.  The implementation must assign a unique identifier for the\n`location` field.  When the file is staged as input to CommandLineTool,\nthe value of `contents` must be written to a file.\n\nIf `contents` is set as a result of a Javascript expression,\nan `entry` in `InitialWorkDirRequirement`, or read in from\n`cwl.output.json`, there is no specified upper limit on the\nsize of `contents`.  Implementations may have practical limits\non the size of `contents` based on memory and storage\navailable to the workflow runner or other factors.\n\nIf the `loadContents` field of an `InputParameter` or\n`OutputParameter` is true, and the input or output File object\n`location` is valid, the file must be a UTF-8 text file 64 KiB\nor smaller, and the implementation must read the entire\ncontents of the file and place it in the `contents` field.  If\nthe size of the file is greater than 64 KiB, the\nimplementation must raise a fatal error.",
          "type": "string"
        },
        "dirname": {
          "description": "The name of the directory containing file, that is, the path leading up\nto the final slash in the path such that `dirname + '/' + basename ==\npath`.\n\nThe implementation must set this field based on the value of `path`\nprior to evaluating parameter references or expressions in a\nCommandLineTool document.  This field must not be used in any other\ncontext.",
          "type": "string"
        },
        "format": {
          "description": "The format of the file: this must be an IRI of a concept node that\nrepresents the file format, preferably defined within an ontology.\nIf no ontology is available, file formats may be tested by exact match.\n\nReasoning about format compatibility must be done by checking that an\ninput file format is the same, `owl:equivalentClass` or\n`rdfs:subClassOf` the format required by the input parameter.\n`owl:equivalentClass` is transitive with `rdfs:subClassOf`, e.g. if\n`<B> owl:equivalentClass <C>` and `<B> owl:subclassOf <A>` then infer\n`<C> owl:subclassOf <A>`.\n\nFile format ontologies may be provided in the \"$schemas\" metadata at the\nroot of the document.  If no ontologies are specified in `$schemas`, the\nruntime may perform exact file format matches.",
          "type": "string"
        },
        "location": {
          "description": "An IRI that identifies the file resource.  This may be a relative\nreference, in which case it must be resolved using the base IRI of the\ndocument.  The location may refer to a local or remote resource; the\nimplementation must use the IRI to retrieve file content.  If an\nimplementation is unable to retrieve the file content stored at a\nremote resource (due to unsupported protocol, access denied, or other\nissue) it must signal an error.\n\nIf the `location` field is not provided, the `contents` field must be\nprovided.  The implementation must assign a unique identifier for\nthe `location` field.\n\nIf the `path` field is provided but the `location` field is not, an\nimplementation may assign the value of the `path` field to `location`,\nthen follow the rules above.",
          "type": "string"
        },
        "nameext": {
          "description": "The basename extension such that `nameroot + nameext == basename`, and\n`nameext` is empty or begins with a period and contains at most one\nperiod.  Leading periods on the basename are ignored; a basename of\n`.cshrc` will have an empty `nameext`.\n\nThe implementation must set this field automatically based on the value\nof `basename` prior to evaluating parameter references or expressions.",
          "type": "string"
        },
        "nameroot": {
          "description": "The basename root such that `nameroot + nameext == basename`, and\n`nameext` is empty or begins with a period and contains at most one\nperiod.  For the purposes of path splitting leading periods on the\nbasename are ignored; a basename of `.cshrc` will have a nameroot of\n`.cshrc`.\n\nThe implementation must set this field automatically based on the value\nof `basename` prior to evaluating parameter references or expressions.",
          "type": "string"
        },
        "path": {
          "description": "The local host path where the File is available when a CommandLineTool is\nexecuted.  This field must be set by the implementation.  The final\npath component must match the value of `basename`.  This field\nmust not be used in any other context.  The command line tool being\nexecuted must be able to access the file at `path` using the POSIX\n`open(2)` syscall.\n\nAs a special case, if the `path` field is provided but the `location`\nfield is not, an implementation may assign the value of the `path`\nfield to `location`, and remove the `path` field.\n\nIf the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02)\n(`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\\`, `\"`, `'`,\n`<space>`, `<tab>`, and `<newline>`) or characters\n[not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml)\nfor [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452)\nthen implementations may terminate the process with a\n`permanentFailure`.",
          "type": "string"
        },
        "secondaryFiles": {
          "description": "A list of additional files or directories that are associated with the\nprimary file and must be transferred alongside the primary file.\nExamples include indexes of the primary file, or external references\nwhich must be included when loading primary document.  A file object\nlisted in `secondaryFiles` may itself include `secondaryFiles` for\nwhich the same rules apply.",
          "items": {
            "anyOf": [
              {
                "$ref": "#/definitions/File"
              },
              {
                "$ref": "#/definitions/Directory"
              }
            ]
          },
          "type": "array"
        },
        "size": {
          "description": "Optional file size (in bytes)",
          "type": "number"
        }
      },
      "required": [
        "class"
      ],
      "type": "object"
    },
    "Directory": {
      "additionalProperties": false,
      "description": "Represents a directory to present to a command line tool.\n\nDirectories are represented as objects with `class` of `Directory`.  Directory objects have\na number of properties that provide metadata about the directory.\n\nThe `location` property of a Directory is a URI that uniquely identifies\nthe directory.  Implementations must support the file:// URI scheme and may\nsupport other schemes such as http://.  Alternately to `location`,\nimplementations must also accept the `path` property on Directory, which\nmust be a filesystem path available on the same host as the CWL runner (for\ninputs) or the runtime environment of a command line tool execution (for\ncommand line tool outputs).\n\nA Directory object may have a `listing` field.  This is a list of File and\nDirectory objects that are contained in the Directory.  For each entry in\n`listing`, the `basename` property defines the name of the File or\nSubdirectory when staged to disk.  If `listing` is not provided, the\nimplementation must have some way of fetching the Directory listing at\nruntime based on the `location` field.\n\nIf a Directory does not have `location`, it is a Directory literal.  A\nDirectory literal must provide `listing`.  Directory literals must be\ncreated on disk at runtime as needed.\n\nThe resources in a Directory literal do not need to have any implied\nrelationship in their `location`.  For example, a Directory listing may\ncontain two files located on different hosts.  It is the responsibility of\nthe runtime to ensure that those files are staged to disk appropriately.\nSecondary files associated with files in `listing` must also be staged to\nthe same Directory.\n\nWhen executing a CommandLineTool, Directories must be recursively staged\nfirst and have local values of `path` assigned.\n\nDirectory objects in CommandLineTool output must provide either a\n`location` URI or a `path` property in the context of the tool execution\nruntime (local to the compute node, or within the executing container).\n\nAn ExpressionTool may forward file references from input to output by using\nthe same value for `location`.\n\nName conflicts (the same `basename` appearing multiple times in `listing`\nor in any entry in `secondaryFiles` in the listing) is a fatal error.",
      "properties": {
        "basename": {
          "description": "The base name of the directory, that is, the name of the file without any\nleading directory path.  The base name must not contain a slash `/`.\n\nIf not provided, the implementation must set this field based on the\n`location` field by taking the final path component after parsing\n`location` as an IRI.  If `basename` is provided, it is not required to\nmatch the value from `location`.\n\nWhen this file is made available to a CommandLineTool, it must be named\nwith `basename`, i.e. the final component of the `path` field must match\n`basename`.",
          "type": "string"
        },
        "class": {
          "const": "Directory",
          "description": "Must be `Directory` to indicate this object describes a Directory.",
          "type": "string"
        },
        "listing": {
          "description": "List of files or subdirectories contained in this directory.  The name\nof each file or subdirectory is determined by the `basename` field of\neach `File` or `Directory` object.  It is an error if a `File` shares a\n`basename` with any other entry in `listing`.  If two or more\n`Directory` object share the same `basename`, this must be treated as\nequivalent to a single subdirectory with the listings recursively\nmerged.",
          "items": {
            "anyOf": [
              {
                "$ref": "#/definitions/File"
              },
              {
                "$ref": "#/definitions/Directory"
              }
            ]
          },
          "type": "array"
        },
        "location": {
          "description": "An IRI that identifies the directory resource.  This may be a relative\nreference, in which case it must be resolved using the base IRI of the\ndocument.  The location may refer to a local or remote resource.  If\nthe `listing` field is not set, the implementation must use the\nlocation IRI to retrieve directory listing.  If an implementation is\nunable to retrieve the directory listing stored at a remote resource (due to\nunsupported protocol, access denied, or other issue) it must signal an\nerror.\n\nIf the `location` field is not provided, the `listing` field must be\nprovided.  The implementation must assign a unique identifier for\nthe `location` field.\n\nIf the `path` field is provided but the `location` field is not, an\nimplementation may assign the value of the `path` field to `location`,\nthen follow the rules above.",
          "type": "string"
        },
        "path": {
          "description": "The local path where the Directory is made available prior to executing a\nCommandLineTool.  This must be set by the implementation.  This field\nmust not be used in any other context.  The command line tool being\nexecuted must be able to access the directory at `path` using the POSIX\n`opendir(2)` syscall.\n\nIf the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02)\n(`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\\`, `\"`, `'`,\n`<space>`, `<tab>`, and `<newline>`) or characters\n[not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml)\nfor [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452)\nthen implementations may terminate the process with a\n`permanentFailure`.",
          "type": "string"
        }
      },
      "required": [
        "class"
      ],
      "type": "object"
    },
    "BclconvertRunConfiguration": {
      "type": "object",
      "properties": {
        "bcl_conversion_threads": {
          "type": "integer",
          "description": ""
        },
        "bcl_num_compression_threads": {
          "type": "integer",
          "description": ""
        },
        "bcl_num_decompression_threads": {
          "type": "integer",
          "description": ""
        },
        "bcl_num_parallel_tiles": {
          "type": "integer",
          "description": ""
        },
        "bcl_only_lane": {
          "type": "integer",
          "description": ""
        },
        "bcl_only_matched_reads": {
          "type": "boolean",
          "description": ""
        },
        "bcl_sampleproject_subdirectories": {
          "type": "boolean",
          "description": ""
        },
        "bcl_validate_sample_sheet_only": {
          "type": "boolean",
          "description": ""
        },
        "exclude_tiles": {
          "type": "string",
          "description": ""
        },
        "fastq_compression_format": {
          "type": "string",
          "description": ""
        },
        "fastq_gzip_compression_level": {
          "type": "integer",
          "description": ""
        },
        "first_tile_only": {
          "type": "boolean",
          "description": ""
        },
        "no_lane_splitting": {
          "type": "boolean",
          "description": ""
        },
        "num_unknown_barcodes_reported": {
          "type": "integer",
          "description": ""
        },
        "ora_reference": {
          "$ref": "#/definitions/Directory",
          "description": ""
        },
        "output_directory": {
          "type": "string",
          "description": ""
        },
        "output_legacy_stats": {
          "type": "boolean",
          "description": ""
        },
        "run_info": {
          "$ref": "#/definitions/File",
          "description": ""
        },
        "sample_name_column_enabled": {
          "type": "boolean",
          "description": ""
        },
        "samplesheet": {
          "oneOf": [
            {
              "type": "object",
              "properties": {
                "bclconvert_data": {
                  "type": "array",
                  "items": {
                    "type": "object",
                    "properties": {
                      "index": {
                        "type": "string"
                      },
                      "index2": {
                        "oneOf": [
                          {
                            "type": "null"
                          },
                          {
                            "type": "string"
                          }
                        ]
                      },
                      "lane": {
                        "type": "integer"
                      },
                      "override_cycles": {
                        "oneOf": [
                          {
                            "type": "null"
                          },
                          {
                            "type": "string"
                          }
                        ]
                      },
                      "sample_id": {
                        "type": "string"
                      },
                      "sample_name": {
                        "oneOf": [
                          {
                            "type": "null"
                          },
                          {
                            "type": "string"
                          }
                        ]
                      },
                      "sample_project": {
                        "oneOf": [
                          {
                            "type": "null"
                          },
                          {
                            "type": "string"
                          }
                        ]
                      }
                    }
                  }
                },
                "bclconvert_settings": {
                  "oneOf": [
                    {
                      "type": "object",
                      "properties": {
                        "adapter_behavior": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "string"
                            }
                          ]
                        },
                        "adapter_read_1": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "string"
                            }
                          ]
                        },
                        "adapter_read_2": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "string"
                            }
                          ]
                        },
                        "adapter_stringency": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "number"
                            }
                          ]
                        },
                        "barcode_mismatches_index_1": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "integer"
                            }
                          ]
                        },
                        "barcode_mismatches_index_2": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "integer"
                            }
                          ]
                        },
                        "create_fastq_for_index_reads": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "boolean"
                            }
                          ]
                        },
                        "fastq_compression_format": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "string"
                            }
                          ]
                        },
                        "find_adapter_with_indels": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "boolean"
                            }
                          ]
                        },
                        "mask_short_reads": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "integer"
                            }
                          ]
                        },
                        "minimum_adapter_overlap": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "integer"
                            }
                          ]
                        },
                        "minimum_trimmed_read_length": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "integer"
                            }
                          ]
                        },
                        "no_lane_splitting": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "boolean"
                            }
                          ]
                        },
                        "override_cycles": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "string"
                            }
                          ]
                        },
                        "trim_umi": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "boolean"
                            }
                          ]
                        }
                      }
                    }
                  ]
                },
                "header": {
                  "oneOf": [
                    {
                      "type": "object",
                      "properties": {
                        "application": {
                          "type": "string"
                        },
                        "assay": {
                          "type": "string"
                        },
                        "chemistry": {
                          "type": "string"
                        },
                        "date": {
                          "type": "string"
                        },
                        "experiment_name": {
                          "type": "string"
                        },
                        "file_format_version": {
                          "type": "integer"
                        },
                        "iem_file_version": {
                          "type": "integer"
                        },
                        "index_adapters": {
                          "type": "string"
                        },
                        "instrument_type": {
                          "type": "string"
                        },
                        "workflow": {
                          "type": "string"
                        }
                      }
                    }
                  ]
                },
                "reads": {
                  "oneOf": [
                    {
                      "type": "object",
                      "properties": {
                        "read_1_cycles": {
                          "type": "integer"
                        },
                        "read_2_cycles": {
                          "oneOf": [
                            {
                              "type": "null"
                            },
                            {
                              "type": "integer"
                            }
                          ]
                        }
                      }
                    }
                  ]
                }
              }
            },
            {
              "$ref": "#/definitions/File"
            }
          ],
          "description": ""
        },
        "shared_thread_odirect_output": {
          "type": "boolean",
          "description": ""
        },
        "strict_mode": {
          "type": "boolean",
          "description": ""
        },
        "tiles": {
          "type": "string",
          "description": ""
        }
      },
      "required": [
        "output_directory"
      ]
    }
  },
  "description": "Auto-generated class implementation for https://w3id.org/cwl/cwl#WorkflowInputParameter",
  "type": "object",
  "properties": {
    "bclconvert_run_configurations": {
      "type": "array",
      "items": {
        "$ref": "#/definitions/BclconvertRunConfiguration"
      },
      "description": "The BCLConvert run configuration jsons\n"
    },
    "bclconvert_run_input_directory": {
      "$ref": "#/definitions/Directory",
      "description": "The input directory for BCLConvert\n"
    },
    "runfolder_name": {
      "type": "string",
      "description": "Name to use in multiqc outputs\n"
    }
  },
  "required": [
    "bclconvert_run_configurations",
    "bclconvert_run_input_directory",
    "runfolder_name"
  ]
}

TODO

  • Run validation against inputs in the cwl-v1.2 tests repository
  • Generate tests that shouldn't pass

Copy link

codecov bot commented Mar 19, 2024

Codecov Report

Attention: Patch coverage is 46.57040% with 148 lines in your changes are missing coverage. Please review.

Project coverage is 33.69%. Comparing base (d2af76f) to head (1d65704).

Files Patch % Lines
cwl_utils/inputs_schema_gen.py 50.68% 93 Missing and 15 partials ⚠️
cwl_utils/utils.py 21.56% 40 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #288      +/-   ##
==========================================
+ Coverage   33.26%   33.69%   +0.43%     
==========================================
  Files          30       31       +1     
  Lines       31015    31292     +277     
  Branches     8998     9087      +89     
==========================================
+ Hits        10317    10545     +228     
- Misses      18282    18317      +35     
- Partials     2416     2430      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexiswl alexiswl force-pushed the enhancement/cwl-inputs-schema-gen branch 2 times, most recently from b4b829a to 6294b44 Compare March 19, 2024 21:57
@alexiswl
Copy link
Contributor Author

@mr-c, @suecharo this is passing all tests (finally! - had a few issues where make format-check and make diff_pydocstyle_report were contradicting each other as to best practices.

Note, currently failing as I repushed some documentation notes and the codecov token limit has been exceeded - will retrigger this in an hour.

Shown here I've provided a script to run the json schema generation against all tools in all conformance tests and then compare the inputs using jsonschema-validate.

@cwl-bot
Copy link

cwl-bot commented Mar 19, 2024

This pull request has been mentioned on Common Workflow Language Discourse. There might be relevant details there:

https://cwl.discourse.group/t/creating-config-json-file-from-a-record/172/4

README.rst Show resolved Hide resolved
@alexiswl alexiswl force-pushed the enhancement/cwl-inputs-schema-gen branch from b91a3d6 to 7b2655c Compare April 10, 2024 00:38
* Use default definitions
* Collect slim definition schema
* Use Any as having any properties
* Add passing and expected failing tests
* Use type objects to generate schemas
* Recurse through record schema objects

Remove unused imports

Fix test input urls

Conformance tests should be their own header in input schema gen readme

Also move part headers to secondary header and added toc
Re-trigger CICD github actions testing
@alexiswl alexiswl force-pushed the enhancement/cwl-inputs-schema-gen branch from 7b2655c to 1d65704 Compare April 10, 2024 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants