Skip to content

Latest commit

 

History

History
157 lines (116 loc) · 6.56 KB

README.md

File metadata and controls

157 lines (116 loc) · 6.56 KB

The BioPAX Validator

MIT licence

Java CI with Maven

The BioPAX Validator is a command line tool, Java library, and online web service for BioPAX formatted pathway data validation. The validator checks for more than a hundred BioPAX Level3 rules and best practices, provides human-readable reports and can automatically fix some common mistakes in data (can also process Level1 and Level2 data, which are first auto-converted to the Level3, and then Level3 rules apply). The validator is in use by the BioPAX community and is continuously being improved and expanded based on community feedback.

BioPAX is a community developed standard language for integration, exchange and analysis of biological pathway data. BioPAX is defined in Web Ontology Language (OWL) and can represent a broad spectrum of biological processes including metabolic and signaling pathways, molecular interactions and gene networks. Pathguide.org lists the pathway databases and tools that support BioPAX.

Usage

Download and expand the latest ZIP distribution from http://www.biopax.org/downloads/validator/

Or build it from the sources:

mvn clean install
cd dist
mvn assembly:assembly

then use the resulting biopax-validator-${version}-all.zip (move somewhere, expand)

Console (batch)

When run as

$sh validate.sh

it prints the information about available command-line options.

The following command checks all the BioPAX (.owl) files in a directory:

sh validate.sh input_dir --profile=notstrict

Validation results are saved to the current work directory.

For data files under ~100Mb, you can also use biopax-validator-client.jar, which can do faster, for it does not do initialization every time (loading large ontology files); try it without arguments for more information:

java -jar biopax-validator-client.jar <in> <out> [optional parameters...]

Client app parameters are slightly different from the command-line ones.

Smaller files validate quickly, however, actual time may vary for the same size data; it takes longer for networks that contain loops (e.g., in nextStep->PathwayStep sequence);

If you want to validate several files, it always much more efficient to copy them all in a directory and use that directory as the first parameter for the validate.sh. This is because Validator's initialization is very time/resources consuming task (mainly, due to OBO files parsing); after it's done, next validations are performed much faster.

Web service

sh server.sh

starts the BioPAX Validator app with built-in application server: go to http://localhost:8080 in a browser.

Use --help parameter to see all the server options (e.g., httpPort, ajpPort)

Developer notes

Validation Rules are java objects that implement Rule interface and extend AbstractRule, where is usually either a BioPAX class or Model. Controlled vocabulary rules extend AbstractCvRule and use CvRestriction and OBO Ontology Manager (derived from PSIDEV EBI code) to lookup for valid ontology terms and synonyms. Validation Rules can call other Rules, but this is not recommended (better keep it simple, independent).

Post-model validation mode is to check all the rules/objects after the BioPAX model is built (created in memory or read from a file). Fail-fast mode is to fail short on critical BioPAX and RDF/XML errors during the model is being read and built (with Paxtools). So, the Validator turns Paxtools' default fail-fast mode into greedy validation mode to collect and report all the issues at once. Maximum number of (not auto-fixed) errors per validation/report can be configured. Various specific BioPAX error types, levels, categories, messages, cases are reported.

Spring AOP, MessageSource, resource bundles, and OXM help collect the errors, translate to human-readable messages and write the validation report (xml or html). Settings such as behavior (level), error code, category and message templates are configured via the resource bundles: rules.properties, codes.properties and profiles.properties (e.g., /rules_fr_CA.properties can be added to see messages in French).

To disable LTW AOP, set <context:load-time-weaver aspectj-weaving="off"/> in the applicationContext.xml or edit the META-INF/aop.xml to "physically" exclude any rule from being checked - in java source file comment out the @Component annotation (the corresponding singleton rule won't be automatically created nor added to the validator bean). Set <ruleClass>.behavior=ignore in the profiles.properties file (good for testing AOP and beans configuration without real validation job).

You can also edit links in obo.properties file and classpath in validate.sh script to use alternative OBO files (e.g., the latest). However, when an ontology is unavailable or broken, the validator fails with a message:

Caused by: psidev.ontology_manager.impl.OntologyLoaderException: Failed loading/parsing ontology CL 
from http://obo.cvs.sourceforge.net/*checkout*/obo/obo/ontology/anatomy/cell_type/cell.obo

(check the logs; try with another revision/location of failing ontology, or revert to the default configuration)

The validator-core module is not specific to BioPAX; it could be used for another domain with alternative validation rules module.

You can also start the web app with maven (plugin)

mvn clean install
cd biopax-validator-web
mvn spring-boot:run

and also access it at localhost:8080 (live reload is enabled - can edit js/css/html/jsp and immediately see the effect without restarting the app)

This is convenient for development but the problem is that LTW does not properly/fully work for some reason, - the validator will not catch syntax errors or unknown property errors, unlike when it's run via sh server.sh or when docker container (i.e. when the app was run using e.g. java -javaagent:agent.jar ... -jar app.war command)

Docker

build the project and image(s) from sources

mvn clean install
cd biopax-validator-web
mvn dockerfile:build
#mvn dockerfile:tag
#mvn dockerfile:push

run

Run with docker (can also do with compose or terraform)

docker run --name validator -it pathwaycommons/biopax-validator -p 8080:8080