Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Implement XML validator, with quarantining of invalid files. #3

Open
bzillig1 opened this Issue Jan 12, 2012 · 2 comments

Comments

Projects
None yet
2 participants
Owner

bzillig1 commented Jan 12, 2012

Validation in the MONK version of Abbot was accomplished using MSV, the Multi Schema Validator. For each file processed by the meta-stylesheet, the MSV jar file produced a log file that specified what, if anything, failed to validate in that file. A Bash script moved any invalid files from the output directory to the quarantined directory, and reported the proportion of valid to invalid files. Can we implement this in Clojure?

@ghost ghost assigned sramsay Jan 12, 2012

Owner

sramsay commented Feb 29, 2012

We can, but a couple of questions arise.

  1. Should the Abbot core be responsible for validating the files?

If validation always has to be performed (and I suspect it does, at least for the Web version) then the most efficient thing to do is have the core do it. This restricts the user's options slightly, because I can imagine scenarios (particularly with the command-line version) in which the user is certain the resulting texts will validate and doesn't want to incur the extra overhead of having Abbot run through the process. We could make it so that validation is mandatory for any conversion performed through the Java API, but optional on the command line.

Having it optional in the command-line version would also allow users to use another validator (MSV, for example) as their taste dictates.

  1. Should the Abbot core physically sequester the files?

This sounds like a matter of taste on the part of the user. For the Web UI, the caller really just needs to know which files validated and which didn't -- whether the invalid files reside in a separate directory is immaterial (and might just complicate things). A CLI user might want to physically quarantine the files somehow, but not necessarily. So my suggestion would be create a log that lists invalid files in some standard form that is easily grepped out. If the user wants to write a script that physically moves those files, they can easily do that. But the core shouldn't define in advance how those files are moved around the file system -- particularly since "moving files around" is OS dependent to some degree.

Owner

bzillig1 commented Feb 29, 2012

  1. My preference would be for the Abbot core to perform validation, as an on/off option specified by the user. Maybe it makes sense for the default position to be set to 'on'. As a user, the first thing I want to know is "Did my changes break the validation, and where is the problem?" As for whether this default should be different for the API versus the command line, I don't have a strong opinion, but I do feel certain that I want Abbot to validate.
  2. Your argument makes sense. I think a log list would be fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment