Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile DITA OT speed on very large DITA project #3568

Closed
raducoravu opened this issue Sep 2, 2020 · 36 comments
Closed

Profile DITA OT speed on very large DITA project #3568

raducoravu opened this issue Sep 2, 2020 · 36 comments
Labels
priority/high High severity or high priority issue
Projects
Milestone

Comments

@raducoravu
Copy link
Member

raducoravu commented Sep 2, 2020

One of our end users shared their 50k topic DITA project with us. It has about 2k DITA submaps.
I started profiling it using JProfiler, one of the problems I encountered is that we are using linkedlists in GenMapAndTopicListModule instead of Linkedhashsets which would yield better results when checked with contains()
Screen Shot 2020-09-02 at 2 38 03 PM

No need to document this separately in 3.6 release notes, fixes have been implemented in PRs for 3.6

@raducoravu
Copy link
Member Author

Querying the ANT entity resolver for each parsed topic's DOCTYPE also seems to take some time:
Screen Shot 2020-09-02 at 3 01 14 PM

@raducoravu
Copy link
Member Author

The configureSaxonCollationResolver() method also seems to take its time:

Screen Shot 2020-09-02 at 3 13 14 PM

@raducoravu
Copy link
Member Author

What I don't like the most is that 3GBs is not enough for the HTML-based output, I need to look into the memory allocations as well.

@raducoravu
Copy link
Member Author

in org.dita.dost.writer.DebugFilter maybe we could keep a cache between attribute class value and DITAClass objects.

@raducoravu
Copy link
Member Author

raducoravu commented Sep 2, 2020

The project has very many folders, about 10k, in each folder being a couple of topics.
The DITA OT has a grammar pool org.dita.dost.util.XMLGrammarPoolImplUtils which should cache DTDs based on the XMLGrammarDescription descriptions.
An org.apache.xerces.impl.dtd.XMLDTDDescription has an expanded system ID and a public ID, these two fields are used on the equals() and hashcode() methods.
The expanded system ID for each DOCTYPE declaration:

<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">

is an absolute URL location relative to the DITA topic location. So for a DITA topic which is placed in "c:\Downloads\dita\someFolder", the expanded system ID is "file:/c:/Downloads/dita/someFolder.topic.dita".
This means that the cache keeps accumulating DTDs inside, because it can only cache grammars for the topics located in the same folder really.
Our end user needed to disable this cache in order not to receive the out of memory early in the gen-list stage..

@raducoravu
Copy link
Member Author

If in the "org.dita.dost.util.XMLGrammarPoolImplUtils" we override the equals and hashcode methods to look only at public IDs, the grammars no longer accumulate in the pool and are reused across different folders containing topics:

public int hashCode(final XMLGrammarDescription desc) {
    if (desc instanceof XSDDescription) {
        // return -1 for XSD grammar hashcode because we want to disable XSD grammar caching
        return -1;
    } else {
    	if(desc.getPublicId() != null) {
    		return desc.getPublicId().hashCode();
    	}
        return desc.hashCode();
    }
}

public boolean equals(final XMLGrammarDescription desc1,
        final XMLGrammarDescription desc2) {
    if (desc1 instanceof XSDDescription
            && desc2 instanceof XSDDescription) {
        // always return false for XSD grammar to disable XSD grammar caching
        return false;
    } else {
    	if(desc1.getPublicId() != null && desc2.getPublicId() != null) {
    		return desc1.getPublicId().equals(desc2.getPublicId());
    	}
        return desc1.equals(desc2);
    }
}

@robander
Copy link
Member

robander commented Sep 2, 2020

I hit a problem in my last job with grammar cache and having more than a few hundred directories -- this sounds like it may be hitting the same thing. In that case, we started with about 1,000 folders and a few thousand topics. My test case eventually got to about 400 folders (each with only one topic), and that hit 2 gig memory limits and crashed. Turning off the grammar cache allowed that sample to build quickly.

Moving the same sample files into one directory built very quickly with grammar cache on (even adding a few thousand topics was not a problem). It was definitely an issue with grammar cache + number of directories. I assumed the problem was internal to Xerces at that point though. (I no longer have access to the sample files but I remember that issue pretty clearly from a year ago.)

@raducoravu
Copy link
Member Author

Right, our end user also sets these parameters:

args.grammar.cache=no
generate-debug-attributes=false
conserve-memory=true

the build takes about 4 hours to finish after this but it no longer runs out of memory.

@GCWait
Copy link

GCWait commented Sep 3, 2020

I joined today's call a little late. Is the above solution planned to go into 3.6?

@jelovirt jelovirt added this to Needs triage in Radu via automation Sep 3, 2020
@jelovirt jelovirt added the priority/high High severity or high priority issue label Sep 3, 2020
@jelovirt
Copy link
Member

jelovirt commented Sep 3, 2020

in org.dita.dost.writer.DebugFilter maybe we could keep a cache between attribute class value and DITAClass objects.

Implemented in #3569

@jelovirt
Copy link
Member

jelovirt commented Sep 3, 2020

GenMapAndTopicListModule instead of Linkedhashsets which would yield better results when checked with contains()

Implemented in #3570

@raducoravu
Copy link
Member Author

@GCWait there are a couple of possible problems described here. I was told to add a fix for the grammar pool problem on the 3.5.4 branch because it's a bug. The rest of the enhancements were committed by @jelovirt on the development branch, meaning that indeed they will probably be available in 3.6.

@raducoravu
Copy link
Member Author

@jelovirt to avoid searching in the classpath for extension function definitions in "org.dita.dost.util.XMLUtils.configureSaxonExtensions(Configuration)" each time a parser is being made, we could maybe search for extension function definitions only once lazily and then keep them in a static list.

@raducoravu
Copy link
Member Author

raducoravu commented Sep 4, 2020

I moved on to the "filter" stage, on each start element the job.getfileinfo seems to stall (or it's just that it's called very often).
Screen Shot 2020-09-04 at 2 10 01 PM

@raducoravu
Copy link
Member Author

To remove the delays in org.dita.dost.util.Job.getFileInfo(URI) we would need to have two additional maps in the Job.class, something like:

private final Map<URI, FileInfo> fileSrcToFileInfo = new ConcurrentHashMap<>();
private final Map<URI, FileInfo> fileOutToFileInfo = new ConcurrentHashMap<>();

in this way we would not need to iterate the values to find certain mappings.

@raducoravu
Copy link
Member Author

The serialization of the job also seems to take a while, not sure how that could be enhanced though:

Screen Shot 2020-09-04 at 2 01 57 PM

@raducoravu
Copy link
Member Author

The three improvements in the pre-process Java code for which no PRs have been opened are:

XMLUtils.configureSaxonExtensions - search for the possible classes implementors of that specific interface only once.
Job.getFileInfo - keep extra maps between a fileinfo input/output and a FileInfo object to increase the speed of the "getFileInfo" method.
Job.write - somehow make the serialization of the job faster.

@jelovirt
Copy link
Member

jelovirt commented Sep 8, 2020

If in the "org.dita.dost.util.XMLGrammarPoolImplUtils" we override the equals and hashcode methods to look only at public IDs...3574

Fixed in #3574

@jelovirt
Copy link
Member

jelovirt commented Sep 8, 2020

@raducoravu are you able to test the Job serialization, if we add a BufferedOutputStream there

public void write() throws IOException {
        if (!tempDir.exists() && !tempDir.mkdirs()) {
            throw new IOException("Failed to create " + tempDir + " directory");
        }
        try (OutputStream outStream = new BufferedOutputStream(new FileOutputStream(jobFile))) {
            XMLStreamWriter out = null;
            try {
                out = XMLOutputFactory.newInstance().createXMLStreamWriter(outStream, "UTF-8");
                serialize(out, prop, byTemp.values());
            } catch (final XMLStreamException e) {
                throw new IOException("Failed to serialize job file: " + e.getMessage());
            } finally {
                if (out != null) {
                    try {
                        out.close();
                    } catch (final XMLStreamException e) {
                        throw new IOException("Failed to close file: " + e.getMessage());
                    }
                }
            }
        } catch (final IOException e) {
            throw new IOException("Failed to write file: " + e.getMessage());
        }
        lastModified = jobFile.lastModified();
    }

or if OutputStreamWriter is faster at UTF-8 encoding

try (Writer writer = Files.newBufferedWriter(jobFile.toPath())) {
            XMLStreamWriter out = null;
            try {
                out = XMLOutputFactory.newInstance().createXMLStreamWriter(writer);
                serialize(out, prop, byTemp.values());
…

@jelovirt
Copy link
Member

jelovirt commented Sep 8, 2020

The serialization of the job also seems to take a while, not sure how that could be enhanced though:

So how large is the resulting .job.xml file in your test case?

@raducoravu
Copy link
Member Author

raducoravu commented Sep 9, 2020

Feedback from end user after testing the grammar cache fix:

I can confirm I managed to run the publication on a commodity laptop without issue in 5Gb. I was also able to run the PDF version of the same scenario however that took over 5 hours. The memory was set at 15Gb, having failed at 8Gb.

I suspect the huge amount of memory for PDF is taken by the Apache FOP.

@raducoravu
Copy link
Member Author

raducoravu commented Sep 9, 2020

@jelovirt the .job.xml size is 22 megabytes and after adding the bufferedoutputstream I did not see any performance problems during the serialization, the serialization was so fast (1-2 seconds) that I did not have time to start the CPU recording when the logging announced that it had begun.

@jelovirt
Copy link
Member

Tested with mock data that results in about 23 MB job file. Using a buffer results takes the serialization down to about 0.5% of the original time. So 200 fold performance improvement. I'll create a PR for this.

@jelovirt
Copy link
Member

Did some more investigation on the serialization. Actually, it seems the XMLOutputFactory returns a serializer that is really slow in UTF-8 encoding. Using JDK's OutputStreamWriter to handle the encoding gives the 200 fold improvement. Just using the BufferedOutputStream gives almost nothing.

@raducoravu
Copy link
Member Author

raducoravu commented Sep 11, 2020

Further in the pre-processing, going to the [keyref] stage, when the very large DITA Map is processed (30 MBs, 40k keys, no keyscopes, no keyrefs):
Screen Shot 2020-09-11 at 9 14 00 AM

There is probably only one key scope there (the default one) which is hashed multiple times, each hash hashing its 40K descendant keys.

@raducoravu
Copy link
Member Author

the same problem on "KeyrefModule.adjustResourceRenames"
Screen Shot 2020-09-11 at 9 22 43 AM

@raducoravu
Copy link
Member Author

The [topic-fragment] stage seems to be applied on all topics, it's purpose seems to be to normalize tables content, normalized codeblocks, expand coderefs. I think few topics in this project have tables and codeblock. Maybe it could be applied only on topics for which it was previosly determined that they have such elements inside...

@raducoravu
Copy link
Member Author

A screenshot from the "chunk" stage, they have about 9000 chunk=to-content. I don't see there anything to be improved but attaching the screenshot nonetheless:
Screen Shot 2020-09-11 at 9 45 22 AM

@raducoravu
Copy link
Member Author

The move-meta stage takes a large amount of time on the map.ditamap (which by now has about 40 MBs)
By now it's not our Java code anymore, its the mappull.xsl being applied by Saxon. Such timing problems in the XSLT would probably need to be approached separately, somehow by trying to optimize the XSLTs.
Screen Shot 2020-09-11 at 10 45 58 AM

@raducoravu
Copy link
Member Author

The image-metadata module seems to go through all files, although there are only 4 images in the entire project.

@raducoravu
Copy link
Member Author

As an overview, the client said that initially publishing their map to WebHelp took 5 hours (with grammar caching and XSLT document caching disabled) and they did it on a server with about 20 GBs of internal RAM. As you might suspect I never tested that on my side.
After the grammar cache fixes I run the publishing on my side without disabling any caches, it was about 111 minutes and no longer seemed to require much internal memory, 1GB seems to be almost enough.
After adding the speed improvements in GenMapAndTopicListModule and Job.getFileInfo, today it took about 60 minutes.

I did not have the latest Job serialization fix installed.
I also did not have the DITAClass fixed installed.
I'm not sure how much time these two fixes would shave off, maybe a couple of minutes.

I did have an extra fix in "XMLUtils.configureSaxonExtensions" to avoid looking for XSLT extensions every time a transformer is created, fix which in my opinion should also be included.

@raducoravu
Copy link
Member Author

When having so many files, we need to try to avoid going through those files, loading and serializing them back in all stages.
For example the "image-metadata" is as fast as it can be, but just going through 50K files and writing them back with an identity transform takes time. Same for "topic-fragment".

@markgif
Copy link

markgif commented Sep 11, 2020

Just want to say I'm really enjoying reading this bug.

@raducoravu
Copy link
Member Author

@jelovirt should we close this issue? I think various performance improvements have been done, also we now have those independent parameters for using the memory as disk and for parallel processing. Independent issues can be further added in the future for example for XSLT profiling if at some point

@jelovirt
Copy link
Member

jelovirt commented Dec 3, 2020

Closing as fixed. New issues should be opened for hot spots found in released code.

@jelovirt jelovirt closed this as completed Dec 3, 2020
Radu automation moved this from Needs triage to Closed Dec 3, 2020
@jelovirt jelovirt added this to the Next milestone Dec 3, 2020
@raducoravu
Copy link
Member Author

The project from which this discussion started is now available here:
https://github.com/NHSDigital/DataDictionaryPublication

It is available under Open Government Licence (nationalarchives.gov.uk).

kostya2011 pushed a commit to Tweddle-SE-Team/dita-ot that referenced this issue Sep 15, 2021
* Generate correct metadata wrapper for maps

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix getting language of a composite document without nested topic

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Throw warning on file reference what uses different case

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update message text

Co-authored-by: Roger Sheen <roger@infotexture.net>
Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Split OASIS table test into individual tables

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Inline simple table normalization filter super class

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Refactor OASIS table normalization based on simpletable normalization

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Extract colspec generation into own method

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix vertical spanning

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update version number

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Split simpletable tests

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Support nested simple table normalization

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add support for over 2 row spans

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix two equal parallel row spans

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Simplify test DITA resource

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Ignore foreign content

When map contains flagging, ditaval-startprop element will contain prop elements without @Class. This will cause dita-ot:matches-shortdesc-class() in template patterns to throw errors due to missing @Class attribute.

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add rotate outputclass to entry output in HTML5

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Change rotate direction

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Save map after rewriting references

Fixes regression from cf7d7be

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update version to 3.5.2 and docs submodule

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Use argument long form in Docker deploy configuration

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Use unfiltered source topic as basis for topic fileinfo features

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update file info fields for all topics that share the same source URI

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add steps wrapper for single step task

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Automatically bump Homebrew formula on release

https://github.com/mislav/bump-homebrew-formula-action
Signed-off-by: Roger Sheen <roger@infotexture.net>

* Fix download URL generation

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Use manual dispatch

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Test with fork of homebrew-core repo

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Test workflow using forked action and tap

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Pass root map FileInfo to functions to reduce calls to Job for input map

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add support for multiple filter arguments

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add conversion arguments test

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add info on options that can be repeated

Per dita-ot#3556 (comment)

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Fix incorrect conversion usage line

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Refactor to use store for all temp IO

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add Store test

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Remove deprecated msgprefix variable

Signed-off by: Robert D Anderson <gorodki@gmail.com>

Update message DOTX071E to use proper params

Signed-off-by: Robert D Anderson <gorodki@gmail.com>

* Update version number and docs submodule

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Use Homebrew specific secret

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Use debug action for Homebrew workflow

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix input name in Brew bump action

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add second line to commit message

The second line of the commit message is used for PR description body.

* Retain mapref keyscope in retable

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Inject parent keyscope into reltable elements instead of using topicgroup

Since reltable cannot have a @keyscope, we can push parent @keyscope there without
worrying that we override the original @keyscope.

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Remove unused JAXP Transformer logging utility method

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add S9api message listener

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Throw exception on terminating xsl:message

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Log unsupported level as an info message and log an error about unsupported level

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix child key cascading to enable sibling keys

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update version to 3.5.4

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Cache DitaClass instances

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix for dita-ot#3568, prefer public IDs when comparing descriptions for grammars stored in grammar cache.

Signed-off-by: Radu Coravu <radu_coravu@sync.ro>

* Use concurrent sets or lists

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add test for dita-ot#3573

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Reset table stack on tgroup dita-ot#3573

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add test case for broken table dita-ot#3566

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix indentation and remove change log comments

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Improve grammar pool test

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix indentation

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Enable nonworking dita.xsl.html5.cover extension point (dita-ot#2981)

Signed-off-by: Shane Taylor <shane.taylor@cengage.com>

* Use BufferedWriter to output Job for 200 fold performance improvement

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Make DitaClass package private and add tests

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update docs submodule

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add support for reading debug information from S9api node

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add support for S9api node in DitaClass match method

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Use XdmNode for key reader

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Migrate tests to use XdmNode

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix bugs in keyref replacement filter

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix root element select expression

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix namespace binding in receiver output

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix walking document to pass all nodes, not just topicref elements

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix map processing on key rewrite

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Clean

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Allow FOP to generate changebars

DITA-OT 3.4 upgraded the bundled Apache™ FOP library to version 2.4, which includes support for changebars, but did not enable that support.

These changes remove the FOP-specific flagging overrides that disabled changebars in FOP, allowing the default PDF2 flagging routines to be applied.

Fixes dita-ot#3511.

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Fix build init order to initialise project after temp dir

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add generator=DITA-OT metadata to HTML header

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Extract Dublin Core into own stylesheet

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Clean metadata stylesheets

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Move Dublin Core to separate plug-in

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix tests for Dublin Core

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* [fix] note type notice missing from xhtml

Signed-off-by: thendarion <16006640+thendarion@users.noreply.github.com>

* Remove serialization of schemekeydef.xml and obsolete flag stylesheet

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Remove obsolete key processing from XHTML

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* In-memory Store that caches resources when possible

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Read and write subject scheme files with Store

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Generate subject scheme files into Store

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix HTML5 without list files

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add last modified to store

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Read and write job to store

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Don't create temp dir unless needed and verify temp dir URI has trailing slash

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Clean test

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Use store to get input/output stream

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix typo in parameter description attribute

When the `store-type` parameter was added in dita-ot/dita-ot@b402e2f for dita-ot#3548, it looks like the `@deprecated` attribute name may have been chosen from an IDE autocompletion process instead of the `@desc` attribute that provides the parameter description.

This commit fixes the attribute name, so the description will appear in the docs at <https://www.dita-ot.org/dev/parameters/parameters-base.html#base__store-type>.

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Add missing terminal punctuation

Information about default values is appended to the parameter description in the documentation, so it's important that each description is a complete sentence.

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Fix setting html5.map.url property

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix broken Slack badge

After DNS settings were changed a while back, the previous badge is no longer available at http://slack.dita-ot.org, which previously pointed to a self-hosted app on Heroku, but now redirects to the native Slack invite form at https://dita-ot.slack.com.

This replaces the old badge with an SVG variant from https://shields.io.

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Use constant for root scope ID

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add repeat argument to CLI to run process N times

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Share latest from oasis-tcs/dita and oasis-tcs/dita-techcomm

Signed-off-by: Robert D Anderson <gorodki@gmail.com>

* Add support for parallel XSLT

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add pool

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add parallel support to modules

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Refactor SAX pipe configuration

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Enable parallel XML filter pipe

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add parallel to topicpull

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add parallel processing to conref

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Use parallel processing in preprocess2

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add parallel processing to keyref

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Only allow parallel XSLT task without Ant's xmlcatalog

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Remove xmlcatalog from parallel XSTL task

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Document

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Clean

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Remove duplicate setter call

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Remove unused code

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix development build folder name

Development versions of the distribution package are downloaded as `dita-ot-develop.zip`, which unpacks to reveal a folder named with the current version number and a suffix with the commit hash that the package is based on, such as `dita-ot-3.6.0-SNAPSHOT+61f95e5`

Builds fail when run from this folder as described in dita-ot#2414.

This commit changes the commit hash separator to the `@` sign, so the folder name is generated as `dita-ot-3.6.0-SNAPSHOT@61f95e5`, which allows builds to run from the folder without errors, and follows the convention GitHub uses to represent commits, such as dita-ot/dita-ot@61f95e5.

In dita-ot#2414 (comment), @jelovirt expressed a preference for resolving the underlying issue rather than a workaround, but until that happens, we should ensure that our dev build packages can be run without renaming the folder.

Fixes dita-ot#2414.

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Fix workdir-uri generation bug

Make sure URI for a directory will always end in a trailing slash. If the File points to a path that doesn't exist, File.toURI() will not output a trailing slash.

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix JobSourceSet cases where src is null

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix current file directory URI resolution when file doesn't exist

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update Gradle wrapper to 6.7

Update Gradle distribution version & regenerate wrapper to pick up recent changes per
https://docs.gradle.org/current/userguide/gradle_wrapper.html#sec:upgrading_wrapper

    ./gradlew wrapper --gradle-version=6.7

https://docs.gradle.org/6.7/release-notes.html

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Fix for dita-ot#3558, update Apache FOP to 2.5, Batik to 1.13, PDF Box to 2.0.21, fop-pdf-images to 2.5, jcl-over-slf4j to 1.7.30

Signed-off-by: Radu Coravu <radu_coravu@sync.ro>

* Add missing terminal punctuation

Information about default values is appended to the parameter description in the documentation, so it's important that each description is a complete sentence.

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Sort partial imports

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Move hard-coded highlight domain styles to partial

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Move hard-coded syntax diagram styles to partial

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Move hard-coded long quote link styles to partial

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Move hard-coded Boolean state styles to partial

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Add Gradle Wrapper Validation Action

https://github.com/marketplace/actions/gradle-wrapper-validation

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Remove invalid code remnant from HTMLHelp plug-in

- Amends f88cc9b
- Fixes dita-ot#3627

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Only validate Gradle Wrapper on pull requests

Per dita-ot#3629 (comment)

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Remove hard-coded long quote citation alignment

Amends 52e0b48, which adds the necessary rule to the Sass partial

Per dita-ot#3632 (comment)

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Fix repeat count in install and uninstall subcommands

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add TopicRefWriter unit test

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

Use parameterized test

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add GitHub Actions for unit tests

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add integration tests to GA and remove tests from Travis CI

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Remove disabled pdf2.index.skip param and deprecate old PDF indexing code

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update Ant to 1.10.9 dita-ot#3613

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update Commons IO to 2.8.0

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Move dist snapshot upload to GA

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix snapshot condition expression

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Remove snapshot GA condition

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix org/repo name in condition

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Catch exception from reading JPEG image dita-ot#3604

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update version to 3.6

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add RELAX NG schemas to catalog

Fixes dita-ot#3641.

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Exclude RELAX NG schemas from 'dita-ot.jar'

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Exclude resources catalog from 'dita-ot.jar'

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Add license for DITA 2.0 grammar files dita-ot#3608

Signed-off-by: Robert D Anderson <gorodki@gmail.com>

* Restore RELAX NG schemas to 'dita-ot.jar'

Without these schemas in the .jar file, builds with project files fail with the following error:

> Error: invalid input for CompactParseable

- Amends 09c1d94 from dita-ot#3645

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Switch rotated table entries to 'vertical-rl' mode

The 'sideways-lr' writing mode implemented in dita-ot#3541 is only supported in Firefox.

Switching the writing mode to 'vertical-rl' should allow rotated text to appear in Chrome & Safari as well.

While the initial 'sideways-lr' implementation rotated the contents of the cell 90 degrees counterclockwise as stipulated by the [DITA 1.3 spec](http://docs.oasis-open.org/dita/dita/v1.3/errata02/os/complete/part3-all-inclusive/langRef/base/entry.html#entry), that writing mode value is currently only supported in Firefox, so it seems better to rotate clockwise with 'vertical-rl' instead, which should be supported by more browsers (except IE).

See https://developer.mozilla.org/en-US/docs/Web/CSS/writing-mode

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Add support for multimedia oasis-tcs/dita#351

Signed-off-by: Robert D Anderson <gorodki@gmail.com>

* Update doc to 3.6

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix NPE when initializing store

Fixes dita-ot#3656

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Compile CSS with Dart Sass via sass-gradle-plugin

The `jsass` compiler used up to DITA-OT 3.6 relies on LibSass, which has been [deprecated in favor of Dart Sass](https://sass-lang.com/blog/libsass-is-deprecated), and throws errors when building on the 64-bit extension of the ARM architecture.

These changes replace jsass with the Sass Compile plugin for Gradle, which compiles sass or scss files using the official Dart Sass compiler: https://github.com/EtienneMiret/sass-gradle-plugin

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Remove plug-in source comment

Per dita-ot#3659 (comment)

Co-authored-by: Jarno Elovirta <jarno@elovirta.com>
Signed-off-by: Roger Sheen <roger@infotexture.net>

* Update DOTX037W warning message

Fixes dita-ot#3669.

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Align `args.input` CLI description with docs

Per dita-ot/docs#328

Signed-off-by: Roger Sheen <roger@infotexture.net>

* Fix resolution of custom PDF resources.
Fix for dita-ot#3688
Signed-off-by: Radu Coravu <radu_coravu@sync.ro>

* Transform URIs which might have anchor.
Fix for dita-ot#3677
Signed-off-by: Radu Coravu <radu_coravu@sync.ro>

* Use sax source directly instead of toString() representation
Fix for dita-ot#3689
Signed-off-by: Radu Coravu <radu_coravu@sync.ro>

* Take into account URI may not be file when checking its existence
Fix for dita-ot#3677
Signed-off-by: Radu Coravu <radu_coravu@sync.ro>

* Properly compute absolute locations in merge map and topic parsers
Fix for dita-ot#3687
Signed-off-by: Radu Coravu <radu_coravu@sync.ro>

* Fix copy of topicmeta between keydef and keyref
Fix for dita-ot#3694
Signed-off-by: Radu Coravu <radu_coravu@sync.ro>

* Fix copy of topicmeta between keydef and keyref
Fix for dita-ot#3694
Signed-off-by: Radu Coravu <radu_coravu@sync.ro>

* Use delegating resolver dita-ot#3688

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Use utility method to null URI fragment and simplify test

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Refactor test to reduce setup boilerplate

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Test for file scheme for methods that use File operations

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Clean tests and remove tabs

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Fix formatting

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Refactor catalog setter code

Extract catalog resolver into own variable to handle issue with varargs

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Add guard for unset logger

Fixes dita-ot#3667

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

* Update version to 3.6.1 and docs submodule

Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

Co-authored-by: Jarno Elovirta <jarno@elovirta.com>
Co-authored-by: Roger Sheen <roger@infotexture.net>
Co-authored-by: Robert D Anderson <gorodki@gmail.com>
Co-authored-by: Radu Coravu <radu_coravu@sync.ro>
Co-authored-by: Shane Taylor <shane.taylor@cengage.com>
Co-authored-by: thendarion <16006640+thendarion@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/high High severity or high priority issue
Projects
Status: Closed
Radu
  
Closed
Development

No branches or pull requests

5 participants