Skip to content

ISAITB/csv-validator

Repository files navigation

BuildStatus Security Rating Maintainability Rating Reliability Rating Vulnerabilities Bugs licence docs docker

CSV validator

The CSV validator is a web application to validate CSV data against Table Schema. The application provides a fully reusable core that requires only configuration to determine the supported specifications, configured validation types and other validator customisations. The web application allows validation via:

  • A SOAP web service API for contract-based machine-machine integrations.
  • A REST web service API for machine-machine integrations.
  • A web form for validation via user interface.

The SOAP web service API conforms to the GITB validation service API which makes it usable as a building block in GITB Test Description Language (TDL) conformance test cases for the verification of content (as a verify step handler). Note additionally that the validator can also be used to build a command-line tool as an executable JAR with pre-packaged or provided configuration.

This validator is maintained by the European Commission's DIGIT and specifically the Interoperability Test Bed, a conformance testing service for projects involved in the delivery of cross-border public services. Find out more here.

Usage

Usage and configuration of this validator is documented as a step-by-step tutorial in the Test Bed's CSV validation guide.

The validator's key principle is that the software is built as a generic core that can be configured to validate any supported specifications. Configuration is organised in domains which represent logically separate setups supported by the same application instance. Each such domain defines the offered validation types and their options along with the validation artefacts needed to carry out validation (local, remote or user-provided). A domain's configuration is grouped in a folder that contains a configuration property file along with any other necessary resources.

When built from source, the simplest way to get started using the validator is to use the all-in-one executable JAR built from the csvvalidator-war module. You can use and configure this as described in the validator installation guide.

If you do not plan on modifying the validator's source you can reuse the Test Bed's provided packages. Specifically:

It is interesting to note that the second option (executable web application JAR) matches what you would build from this repository. The command line package is produced from the csvvalidator-jar although this requires an additional step of JAR post-processing to configure the validator's domain(s).

Once the validator's web application is up you can use it as follows:

Note that the DOMAIN placeholder in the above URLs is the name of a domain configuration folder beneath your configured validator.resourceRoot. This can be adapted by providing validator.domainName.DOMAIN mapping(s) for your domain(s). These are application-level configuration properties that can be set in the default application.properties or via environment variables and system properties.

Building

The validator is a multi-module Maven project from which the artefact to use is the web application package, produced from module csvvalidator-war. This is an all-in-one Spring Boot web application. To build issue mvn clean install.

Prerequisites

To build this project's libraries you require:

  • A JDK installation (17+).
  • Maven (3+)
  • Locally installed itb-commons dependencies (see below).

Building this validator from source depends on libraries that are available on public repositories. The exception is currently the set of itb-commons dependencies, common libraries that are shared by all Test Bed validators. To be able to build you need to first clone itb-commons and install its artefacts in your local Maven repository.

Configuration for development

When building for development you should provide the validator's basic configuration to allow it to bootstrap. The simplest approach to do this is to use environment variables that set the validator's configuration properties.

The minimum properties you should define this way are:

  • validator.resourceRoot: The root folder from which all domain configurations will be loaded.
  • logging.file.path: The validator’s log output folder.
  • validator.tmpFolder: The validator’s temporary work folder.

In addition, you should include within the validator.resourceRoot folder additional folder(s) for your configuration domains, each with its configuration property file and any other needed resources. A simple example of such configuration that you can also download and reuse, is provided in the CSV validation guide's configuration step.

If you decide not to load any configurations, the generic CSV validator configuration will be loaded instead. This generic configuration does not contain any validation artefacts and will instead allow you to upload your own validation schemas from the validator's UI and APIs. The domain of this generic validator is called any. you can access it like any other domain, from http://localhost:8080/csv/any/upload.

Using Docker

If you choose Docker to run your validator you can use the sample Dockerfile as a starting point. To use this:

  1. Create a folder and copy within it the Dockerfile and JAR produced from the csvvalidator-war module.
  2. (Optional) Create a sub-folder (e.g. resources) as your resource root within which you place your domain configuration folder(s).
  3. (Optional) Adapt the Dockerfile to also copy the resources folder and set its path within the image as the validator.resourceRoot:
...
COPY resources /validator/resources/
ENV validator.resourceRoot /validator/resources/
...

Note: If you skip step 2 and 3, only the generic validator will be available under the any domain.

Plugin development

The CSV validator supports custom plugins to extend the validation report. Plugins are implementations of the GITB validation service API for which the following applies. Note that plugin JAR files need to be built as "all-in-one" JARs.

Input to plugins

The CSV validator calls plugins in sequence passing in the following input:

Input name Type Description
contentToValidate String The absolute and full path to the input provided to the validator.
domain String The validation domain relevant to the specific validation call.
validationType String The validation type of the domain that is selected for the specific validation call.
tempFolder String The absolute and full path to a temporary folder for plugins. This will be automatically deleted after all plugins complete validation.
hasHeaders String true or false depending on whether the input should be considered as having a header row.
delimiter String The character to use as the field delimiter.
quote String The character to use for the quote character.
locale String The locale (language code) to use for reporting of results (e.g. "fr", "fr_FR").

Output from plugins

The output of plugins is essentially a GITB ValidationResponse that wraps a TAR instance. The report items within this TAR instance are merged with any reports produced by Table Schema validation.

Plugin configuration

Plugin configuration for a validator instance is part of its domain configuration. Once you have your plugins implemented you can configure them using the validator.defaultPlugins and validator.plugins properties where you list each plugin by providing:

  • The path to its JAR file.
  • The fully qualified class name of the plugin entry point.

Licence

This software is shared using the European Union Public Licence (EUPL) version 1.2.

Legal notice

The authors of this library waive any and all liability linked to its usage or the interpretation of results produced by its downstream validators.

Contact

For feedback or questions regarding this library you are invited to post issues in the current repository. In addition, feel free to contact the Test Bed team via email at DIGIT-ITB@ec.europa.eu.

See also

The Test Bed provides similar validators for other content types. Check these out for more information: