Support list of DITAVAL filter files #2637

Merged
merged 7 commits into from May 1, 2017
+358 −46

Conversation

Projects
None yet
8 participants
Member

robander commented Mar 23, 2017 • edited

Opening this early in order to get feedback on parameter name, design, etc.

The user goal / use case here is to allow a build to specify a list of DITAVAL files to be used instead of the existing args.filter parameter. Currently, the single parameter means that for every unique set of conditions, a separate build file must be used that also includes all common conditions. Supporting a list allows the maintenance of common conditions where they make sense, while storing unique conditions where they make sense.

I've seen this most often with teams that want to store a large set of revision conditions separately from other conditions. As a real world example, a list of 4 files for a single PDF build might include:

• A set of revisions that applies to everything within a single product
• A set of filter conditions that applies to all printed material
• A final set of conditions used within this single document

Design plans [now revised! see below]

Looking for review / input:

• Parameter named args.filter.list specifies a text file, with one filter file listed per line. This parameter is ignored if args.filter is also specified.
• Listed files are evaluated in order, and concatenated into a generated file within a single <val> root
• Conflicting conditions do not present a new issue here, so should not need to be addressed separately. That is - anything you can do with 4 or 10 files, you can do in a single filter file today. You can define a single file with 10 conditions for otherprops="testy". Whether that is done in one file, or results from using 10 files, our processing should respond the same way.
• The resulting single file is generated in the temp directory, and the existing args.filter property is set to that value; this allows all remaining processing to function as it does today without change

I generally don't like "text file with one file per line" but other options (an XML file, comma separated list, etc) seem worse.

Comments welcome, as I start playing with a larger implementation. Goal is to have this completed for 2.5.

Updated design

• args.filter updated to handle a list of files separated by path separator ; character
• For any file(s) specified with args.filter, a generated version is placed in the temp dir, named based on a configuration option (but maybe this should be hard coded?)
• For any image that uses a relative path, a basedir attribute is added with the name of the ditaval directory; DitaValReader.java is updated to use this when finding images for copying to [wherever]
• Because the image reference itself isn't changed, any steps that process a relative path entry continue to work as they did (whether this is appropriate or not)
• gen-list, debug-filter, and flag-module now take in the name of the generated file, rather than taking in args.filter or a URL version of args.filter; parameters to those steps do not need to change

Member

jelovirt commented Mar 23, 2017 • edited

 Since args.filter value is a system path, we can just add support for a list of paths, separated by the platform path separator character. That way we would not need to add a new property that works exactly like args.filter but takes a list/array as a value. If we concat the filter files into a single temporary file, we cannot change the value of args.filter because Ant properties are immutable. Instead, and intermediate property would need to be used.
Contributor

shaneataylor commented Mar 23, 2017

 @robander This would be hugely useful. The method I have thought of in the past to address this problem is for ditaval files to support including other ditaval files, but that involves extending the spec. In practice, I've used XML entities and ditaval fragment files to implement this kind of inclusion, but the resulting ditaval file is kind of equivalent to your text file with one per line (except more complicated). What about — instead of creating a list of ditaval files in a separate (text or XML) file, we simply support args.filter._XX_ where XX is a two-digit number specifying the sequence in which the specified ditaval file should be applied? My sense is that even 0-9 would cover 99% of use cases.
Contributor

shaneataylor commented Mar 23, 2017

 @jelovirt I like the simplicity of your suggestion, but for even a handful of chained ditaval files, the resulting property value could be quite long & harder for users to troubleshoot (I see the main issues being filters applied out of order or missing/unwanted filters). Including each as a separate property (especially in a .properties file) makes them easier to manage.
Member

robander commented Mar 23, 2017 • edited

 @jelovirt I haven't used parameters as a list that way ... feels a bit painful to author but does feel a bit cleaner than having a separate file to reference other files (I've had pain with that in the past). Although, as @shaneataylor points out, the ones we'd be using will have long paths from our CMS systems, so that one property would probably be a few hundred characters long. That's based on some lists we have today. It does seem like doing it as a separate parameter would still be preferable, after which we set args.filter to the merged file. Otherwise, we (and everybody who has special processing for ditaval) will need to change scripts to use a new parameter. If starting from scratch, I could buy an argument that one parameter is sufficient. At this point, the changes it requires + plus the fact that it's a different type of value seem like enough to use a new parameter. (FWIW we have an internal implementation for this, but it's part of a wrapper for DITA-OT, not integrated. It follows the rough logic above. Teams that use it often have several in the list, and each path is quite long. We did use a single parameter for both, which proved endlessly confusing, but partly that's because we had to read the file to figure out if it was a list or a DITAVAL.) I ... really couldn't go for parameters named something1, something2, ... somethingN. It immediately feels like a hack with an artificial limit, that limit being "only as many as we decided you should need". I remember having the same reaction when somebody said that they already used @otherprops for a specific thing in DITA topics, so could OASIS please add @otherprops2 and @otherprops3 to hold other unrelated values.
Contributor

shaneataylor commented Mar 23, 2017 • edited

 @robander My first impulse was actually to suggest not limiting the parameter's numeric component, but to parse all args.filter.X parameters (where X matches [0-9]+) in numerical order so users could chain an arbitrary number of ditaval files, but I figured that introduced complexity that just wasn't useful. Maybe the single args.filter.list parameter would become more usable if we added an optional args.filter.dir against which relative paths to each listed .ditaval would be resolved? For example: args.filter.dir = /my/path/to/my/cms/and/repo/ditaval/dir args.filter.list = first.ditaval second.ditaval third.ditaval special_subdir/fourth.ditaval  I think that would simplify most use cases in that at least the base CMS path could be specified with args.filter.dir. If we did add args.filter.dir, should args.filter also be resolved against that? It's a change, but only if you specify args.filter.dir, and I think it would be expected behavior.
Member

robander commented Mar 23, 2017 • edited

 *.basedir generally make me nervous just because I've had so many problems over the years (DITA-OT and elsewhere) dealing with and explaining paths -- "OK this image will be located relative to DITAVAL, which you referenced with a relative parameter, which is evaluated against this other specified *.basedir directory, and if *.basedir is relative, then all of this is relative to your properties file when calling the build." That said ... I can understand the use here, and it would make passing in a long list of ditaval files easier. Common case: they're all located either in the same directory, or very near each other, so base dir gets rid of most of the path Edge case: they're located all over the drive so basedir isn't helpful ... but in that case don't use it. And I think this is really an edge case, at least with my customers, the common case is to store them together.
Member

robander commented Mar 23, 2017

 RE: one parameter versus two (regardless of path length) ... we currently set up internal parameters dita.input.valfile and dita.input.valfile.url from the existing args.filter parameter. (Arguably, dita.input.valfile.url should have been based on the calculated internal dita.input.valfile rather than going back to the original.) Processing extensions from that point on probably should be working with the validated internal property rather than the original. So I've reconsidered, and I as long as we properly set up the internal property, extending args.filter to support a list shouldn't cause compatibility issues or any real changes to later steps. (This is true whether or not a *.basedir parameter is created.)

Open

Member

robander commented Apr 7, 2017

 Discussed this just a bit at the contributor call today, and followed up with @jelovirt with some design questions - sharing a summary (and question) here now. Just to see what was possible - I initially got this working by updating both gen-list and debug-filter so that they took in a list of files (from a parameter), processed them in sequence, and continued. This was mostly to play around because I haven't played much in that code. As confirmed with Jarno - a better design is really to merge the ditaval files before gen-list. To keep it simple, this process should run whenever the parameter is used (whether just one file or multiple). The merged result will be placed in the temp directory, with some adjustment of any relative image paths. Today -- the code takes in args.filter, validates it, and uses that to set dita.input.valfile, which is used for the remainder of preprocess. A warning is generated whenever dita.input.valfile is passed into the build ("This is deprecated, use args.filter), but it still works. As part of this pull request, passing in that parameter will become a fatal error. The new process will thus be able to reliably set dita.input.valfile to a generated file in the temp dir, meaning that all remaining steps (including any customizations outside of DITA-OT core) can continue to rely on dita.input.valfile to reflect all conditions. To merge the files, I need to create a new Java module that runs before gen-list. As I haven't done that in a long time, I have a question (probably for @jelovirt). I started by tweaking a module that sounded similar (TopicMergeModule). It extends AbstractPipelineModuleImpl, while a lot of the newer code extends SourceReaderModule. Is there a reason to pick one over the other for merging DITAVAL files?
Member

robander commented Apr 12, 2017

 Update: still making progress on this and still planning to have it in 2.5. Have not pushed code up to github yet because it's still about 30% code / 70% debug logging. I've now got a standalone preprocess step that reads in the DITAVAL files before gen-list, and sets up a generated file to contain the conditions from each file; next up is to write them back out, with any image paths adjusted as needed.
Member

robander commented Apr 18, 2017

 [Sigh]. I was using "default Java coding conventions" as laid out by Eclipse, modified to replace tabs with spaces. I did not expect it to reflow comments and longer lines.
Member

robander commented Apr 19, 2017

 Curious ... the test case testsubjectschema_case is failing in Travis but does not fail for me locally. When running on my system, the only failing tests are the same 14 Chunk test cases that always fail on Windows.
Member

robander commented Apr 19, 2017

 Working on the subjectschema_case test case, it looks like this is the only test case in our IntegrationTest which: has a DITAVAL has  within the ditaval (not just include/exclude) has an imageref attribute to specify a flag (other tests just use text for flagging) The test works fine on Windows - filter conditions are properly evaluated, flags are evaluated, and the flag image is copied to the output directory. Sorry for all the noise with the various commits - easiest way to test if / why it fails in Travis is to add more debug info and push up another commit.
Member

robander commented Apr 20, 2017

 Found the issues and merged my 20 or 25 debug commits. The merge processing adds @basedir attribute to any filter element that uses imageref. it's supposed to be the path to the DITAVAL. On Windows, that value (retrieved runningtoURI(currentDitaval.getParent())) started with file:/ and ended with a file separator. In the travis build, it did not start with file:/ (causing the isAbsolute() test to fail later). Once that was fixed, the missing slash caused basedir + imageref to return an invalid path. Latest commit fixes both of those, but it's ... not very pretty so another one is coming. (Not that the rest of the code is exactly pretty yet, but ... 🙄

robander reviewed Apr 20, 2017

src/main/java/org/dita/dost/util/Constants.java Outdated

robander reviewed Apr 20, 2017

src/main/plugins/org.dita.base/plugin.xml Outdated

jelovirt requested changes Apr 21, 2017

src/main/plugins/org.dita.base/plugin.xml Outdated
src/main/java/org/dita/dost/module/MergeDitavalModule.java Outdated

jelovirt reviewed Apr 25, 2017

 Update args.filter to support list of ditaval files 
Signed-off-by: Robert D Anderson <robander@us.ibm.com>
 5c7a2db 

SanFanDocu commented Apr 26, 2017

 We are using DITAVAL modules that are transformed in a separate XSLT to txt files which are included in turn as entities in the final publishing DITAVALs. The benefit of this solution: The final publishing view for each set is available within the authoring environment (Oxygen) and authors are able to preview the final filtering situation. If I understand your solution correctly the final DITAVAL sets are only temporarily available during publishing procedure, right? So there should be a transformation that publishes the final DITAVALS to a selected location for permanently storage from where authors can import them e.g. as profiling sets in Oxygen.

robander added some commits Apr 28, 2017

 Add test case for filter file list 
Signed-off-by: Robert D Anderson <robander@us.ibm.com>
 362850b 
 Update test case copy of messages.xml with DOTJ071E 
Signed-off-by: Robert D Anderson <robander@us.ibm.com>
 e0481f3 
Member

robander commented Apr 28, 2017

 Hi @SanFanDocu - I want to make sure I understand what you're saying. Today - you've got multiple DITAVAL files. You use an XSLT process to create one "merged" file by importing each individual DITAVAL (done by importing them as entities) That merged file is what you use in Oxygen when editing, and what you pass into DITA-OT today for processing Does that all sound right? If so - I can see how that would be useful and why you'd like to get a copy from the temp directory. I don't really think that should be part of the core feature though. For the core feature - it's really just writing the file to temp as a convenience that works well with our current process that reads/writes a lot of files in temp. Ideally, it would only exist in memory and no file would be written out. I'd be nervous to write this back to my own source directory, because I think it would get out of sync with the actual source DITAVAL files. Clearly this hasn't been an issue for you, though it might once you no longer need to pass the merged version in as a DITA-OT parameter. For what it's worth ... with the code as posted at this point, it would be trivial to add a post-process step in any transform type that automatically copied this file out:  
Member

robander commented Apr 28, 2017

 Going back to a comment from @jelovirt well up the chain: Since args.filter value is a system path, we can just add support for a list of paths, separated by the platform path separator character. That way we would not need to add a new property that works exactly like args.filter but takes a list/array as a value. This is how the latest commits work, and I'm getting nervous about it. We've got sets of build properties that are transferred around between users and build systems - Windows, Mac, and Linux. The different systems all have different separator characters. Currently, the properties work fine whichever system pulls them down. I don't think that can work with the platform path separator - my properties file on Windows uses semi-colons, but would have to change to colons to run on a Mac ... at which point I couldn't use it anymore. Are there common ways to resolve this, or are we stuck with one of either 1) property values that can't work across platforms or 2) defining our own separator?

jelovirt added some commits May 1, 2017

 Fix integration test 
 b29951b 
 Use Stream processing to collect filters in Main class 
 703c430 
 Refactor MergeDitavalModule 
* Change super class to be abstract pipeline module, not reader because this module
* Remove uplevels counting because it’s never actually done by the module
* Don’t treat module as a filter because it’s not a filter
 4118ca0 
 Clean ditaval merge Ant 
 4d274e2 

jelovirt merged commit 4ae2b44 into dita-ot:develop May 1, 2017 1 check passed

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

Merged

Contributor

mironovalexey commented May 4, 2017 • edited

 @robander Robert, this is a very useful feature. Thank you. Do I understand correctly, now I can pass multiple filters in the following syntax (ant)? 
Member

robander commented May 4, 2017 • edited

 @mironovalexey yes, that is correct, though the path separator character is platform-dependent so the semicolon ; will only work on Windows. For Mac/Linux, use a colon : instead.
Member

 Ideally we should have the same path separator work, no matter for what platform the publishing is done. In this way the value of the args.filter would be portable no matter on what OS.
Member

robander commented May 4, 2017

 @raducoravu that would be my preference as well, I'm already thinking about how to get the same properties file to work that my customers create on Windows but then run on Linux or Mac (when transferred to another user or to a build system). If I understand correctly @jelovirt was pushing for the path character because it's More Proper...?
Contributor

xephon2 commented May 4, 2017 • edited

 This is probably not the correct solution, but this works with pure Ant.  BEFORE: ${ditavals} AFTER:${ditavals.normalized} Is Windows OS but uses Unix delimiter: Delimiter should be replaced Is Unix OS but uses Windows delimiter: Delimiter should be replaced 
Member

robander commented May 4, 2017

 I think that's right, but we should be able to do something simpler, making use of ${path.separator}. Because we don't really care what OS we're on or what OS the property was written for, we just care about converting to a value that will be recognized on the current OS. So regardless of input value, if we converted ; and : to ${path.separator}, we'd end up with a normalized value that could be passed into the merge module.
Member

robander commented May 4, 2017

 Ah, reminding myself -- if you call the build with the dita command, the --filter parameter parses any value expecting the platform separator, and then puts the value into args.filter. It's not possible to set args.filter directly using the dita command. So, with dita, the only way to fix this would be to update the Java code that parses the property; I'm guessing @jelovirt doesn't like that idea: https://github.com/dita-ot/dita-ot/blob/develop/src/main/java/org/dita/dost/invoker/Main.java#L145 I guess realistically that's lower priority -- if you're calling dita and specifying --format, you're doing it on a specific OS and can use the right value. If you're specifying args.filter directly in a property file or by calling ant, it looks like this variant of @xephon2 's suggestion works (at least on Windows, it lets me specify either ; or :):  
Contributor

xephon2 commented May 4, 2017 • edited

 @robander, this would also replace the colon in file: or http:. I fixed the code snippet. This would work, but is still ugly. I'm curious about the final solution. 😄  
Member

robander commented May 4, 2017

 this would also replace the colon in file: or http:. Details, details. ;-) But yeah ... seems the general feeling is that we need a property that can work across platforms. I'm not sure of the ideal solution.
Member

drmacro commented May 4, 2017

 Surely a list of URIs is fundamentally different from an OS-specific path--the two are not directly interchangeable.
Member

jelovirt commented May 5, 2017

 The Ant Way of doing this is to use filesets to create a path and combining them with pathconvert into a single property. I understand the wish for multi-OS solution, but if you have eg just a single filter path C:\foo\bar.ditaval, the path itself will never work on Linux. The problem is not the separator character.
Member

robander commented May 5, 2017

 RE: the Ant way of doing things ... I think that would only work (??) if we have a different property for each ditaval, and I don't want to do that. I get that an absolute path is never going to work on different systems. That's the reason we never use absolute paths in our args.filter parameter today -- our DITA builds always have to be portable between machines (Win/AIX/MacOS), because any user on a team might be running the build on their local system (either Win or MacOS), and eventually they will be running on an AIX server. We will always be setting up our property files to use something like args.filter=first.ditaval;second.ditaval. In that case the separator character does end up causing problems, unless I'm missing something. Thinking through simple workarounds in my environment, I might end up using my own property and tell people to always use ; or ,, and then convert to the system separator in my own init step, but I'd love to hear better suggestions.
Member

jelovirt commented May 8, 2017

 The root issue here is that some users want to use Java properties file to define a list of filter files. Since properties file does not have support for array or list types, the values need to be separated with some delimiter. The is no character that could be used for URIs, Windows paths and UNIX paths.

Merged

infotexture added a commit to dita-ot/docs that referenced this pull request May 31, 2017

 Add multi-file support info to filter descriptions 
Per dita-ot/dita-ot#2637

Signed-off-by: Roger Sheen <roger@infotexture.net>
 6aa92bc