Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested enhancement to validation-mode #350

Open
stevorobs3 opened this issue Dec 6, 2023 · 1 comment
Open

Suggested enhancement to validation-mode #350

stevorobs3 opened this issue Dec 6, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@stevorobs3
Copy link

stevorobs3 commented Dec 6, 2023

Currently validation mode reports validation errors using csv2rdf.logging/log-warning, which prints the errors to stdout either using clojure.tools.logging (the default) or println (on graalvm) depending on the logger that has been configured. This works fine for working with this tool via the command line, but it may be useful to report the errors in a more structured way, such that they could be ingested into somewhere (e.g. airflow) such that they can be interrogated more thoroughly. The proposal is as follows:

Allow an extra command line argument (--error-formatter) which specifies how the errors should be formatted. It can take the following values:

  • "default" -> the current behaviour of printing to stdout, this could also be named "stdout"
  • "csv" -> the errors are written to a csv file

in the latter case, we can re-use the existing (--output-file) argument to optionally specify the local file name of where to write the errors to.

The csv file would have headers:

  • file (the name of the data file / where the error is located)
  • row (the row corresponding to the invalid cell)
  • col (the column corresponding to the invalid cell)
  • message (some text to describe the error in detail, which could include and metadata required to contextualise the error, although this context/metadata could arguably be placed in a separate column)
  • error_type (either equal to cell or schema to indicate the type of error)

Further to this, it may be worth generating a schema file to clarify the contents of the errors, but this is up for discussion.

@stevorobs3 stevorobs3 added the enhancement New feature or request label Dec 6, 2023
@stevorobs3
Copy link
Author

We could also support another column of metadata about errors that would indicate something more like error_level or error_severity - which might (at some point in the future be extended to) range from hard errors like we have now to more “linted” best practice violations/warnings etc…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant