New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove leading spaces in coderefs #2907 #2953

Merged
merged 8 commits into from May 17, 2018

Conversation

Projects
3 participants
@jelovirt
Member

jelovirt commented May 5, 2018

Add filter to remove leading whitespace from codeblock elements. Implements #2907.

Signed-off-by: Jarno Elovirta jarno@elovirta.com

@jelovirt

This comment has been minimized.

Member

jelovirt commented May 10, 2018

@robander Do you think we should make the outputclass that triggers whitespace normalization configurable via plugin extension point, or just hardcode normalize-space for now?

@jelovirt jelovirt added this to To Do in 3.1 via automation May 10, 2018

@jelovirt jelovirt moved this from To Do to Review in 3.1 May 10, 2018

@jelovirt jelovirt requested a review from robander May 10, 2018

@@ -354,6 +354,7 @@ See the accompanying LICENSE file for applicable license.
<param name="processing-mode" value="${processing-mode}" if:set="processing-mode"/>
</filter>
<filter class="org.dita.dost.writer.CoderefResolver" unless:set="preprocess.coderef.skip"/>
<filter class="org.dita.dost.writer.NormalizeCodeblock" unless:set="preprocess.codeblock.skip"/>

This comment has been minimized.

@jelovirt

jelovirt May 10, 2018

Member

Is this a good guard condition property name? Maybe preprocess.normalize-codeblock.skip would be better?

This comment has been minimized.

@robander

robander May 16, 2018

Member

I think preprocess.normalize-codeblock.skip is probably better? It feels too long -- but like many of the skip conditions, it will probably only be seen by / used by a small number of people, so length is not a big issue.

This comment has been minimized.

@jelovirt

jelovirt May 16, 2018

Member

Changed

public final int length;
public CharactersEvent(char ch[], int start, int length) {
this.ch = ch;

This comment has been minimized.

@jelovirt

jelovirt May 10, 2018

Member

Maybe this should make a copy of the char array, to prevent modification and allow GC to collect the array.

@jelovirt jelovirt moved this from Review to In progress in 3.1 May 10, 2018

@jelovirt jelovirt moved this from In progress to Review in 3.1 May 10, 2018

jelovirt added some commits May 5, 2018

Add initial codeblock filter
Signed-off-by: Jarno Elovirta <jarno@elovirta.com>
Implement stripping leading spaces from codeblock
Signed-off-by: Jarno Elovirta <jarno@elovirta.com>
Trigger codeblock normalization by outputclass
Signed-off-by: Jarno Elovirta <jarno@elovirta.com>
Move SAX cache to own class
Signed-off-by: Jarno Elovirta <jarno@elovirta.com>
Create protective copy of char array and attributes
Signed-off-by: Jarno Elovirta <jarno@elovirta.com>
Handle case where characters events are split between linefeeds
Signed-off-by: Jarno Elovirta <jarno@elovirta.com>
@robander

This comment has been minimized.

Member

robander commented May 16, 2018

Missed that question 6 days ago -- I don't think it's really necessary to make that configurable, seems like if you want to use the feature you can use that token?

Looks good, only added one comment (to answer existing question from @jelovirt ).

Only thing I wonder is if there is any need to add a test case like this, where the sub-element includes newlines; I remember, way back before DITA in our SGML days, we had some parsers that reliably broke down on this - but it may not be an issue anymore. I've usually seen this as a creative way to use add a profiled line into a codeblock (filtered out for some conditions):

<codeblock>   test<b>
      stuff here</b>
   and finish</codeblock>
@jelovirt

This comment has been minimized.

Member

jelovirt commented May 16, 2018

Additional test with nested linebreak added

jelovirt added some commits May 16, 2018

Add test for nested element with newline inside
Signed-off-by: Jarno Elovirta <jarno@elovirta.com>
Change skip property name for normalizing codeblock whitespace
Signed-off-by: Jarno Elovirta <jarno@elovirta.com>

@jelovirt jelovirt merged commit b768808 into dita-ot:develop May 17, 2018

2 checks passed

DCO All commits have a DCO sign-off from the author
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

3.1 automation moved this from Review to Done May 17, 2018

@jelovirt jelovirt added this to the Next milestone May 17, 2018

infotexture added a commit to dita-ot/docs that referenced this pull request May 17, 2018

Remove codeblock/coderef indentation workarounds
(No longer necessary after implementation of dita-ot/dita-ot#2907 in dita-ot/dita-ot#2953)

Signed-off-by: Roger Sheen <roger@infotexture.net>

infotexture added a commit to dita-ot/docs that referenced this pull request May 17, 2018

Add output class to normalize codeblock whitespace
Amends effac80 per dita-ot/dita-ot#2953.

Signed-off-by: Roger Sheen <roger@infotexture.net>
@infotexture

This comment has been minimized.

Member

infotexture commented May 26, 2018

@jelovirt Found an edge case where non-whitespace characters are trimmed from the first line in the codeblock after these changes.

In the “Branch filtering” section of the DITA features in docs topic, there's a code block with hanging indentation, where the first line is indented less than the lines that follow. The source of this code block is a coderef on Line 55 that pulls a range of several lines from the publishing.ditamap.

<topicref href="using-dita-command.dita" keys="build-using-dita-command" locktitle="yes">
  <topicmeta>
    <navtitle>Using the <cmdname>dita</cmdname> command</navtitle>
  </topicmeta>
  <ditavalref href="../resources/expert.ditaval">
    <ditavalmeta>
      <dvrResourcePrefix>build-</dvrResourcePrefix>
    </ditavalmeta>
  </ditavalref>

After the changes in this PR, the hanging indentation is removed when the leading whitespace is normalized — the first two characters are missing from the first line: <t.

opicref href="using-dita-command.dita" keys="build-using-dita-command" locktitle="yes">
<topicmeta>
  <navtitle>Using the <cmdname>dita</cmdname> command</navtitle>
</topicmeta>
<ditavalref href="../resources/expert.ditaval">
  <ditavalmeta>
    <dvrResourcePrefix>build-</dvrResourcePrefix>
  </ditavalmeta>
</ditavalref>

While this particular case is certainly unusual and could be worked around in source by referencing a different range of lines with uniform indentation, the normalization process should probably guard against this sort of thing to ensure that only whitespace characters are deleted — never actual content.

@infotexture

This comment has been minimized.

Member

infotexture commented May 28, 2018

Found another case today where non-whitespace content is removed from the beginning of each line.

In a codeblock that pulls in the following properties file via <coderef>,

output.dir = out/pdf
args.gen.task.lbl = YES
args.rellinks = nofamily

the first 6 characters are stripped from each line when @outputclass="normalize-space":

.dir = out/pdf
en.task.lbl = YES
ellinks = nofamily

Of course in this case there's no leading whitespace to adjust, so the problem can be solved by simply removing the normalize-space keyword from the outputclass.

But since the primary use case for this option is to normalize the content of external <coderef> sources that may change after the <codeblock> markup is authored, it might be useful to be able to enable normalization in advance, and trust that it will do no harm.

infotexture added a commit to dita-ot/docs that referenced this pull request May 29, 2018

Disable normalize-space to prevent character loss
Per dita-ot/dita-ot#2953 (comment)

Signed-off-by: Roger Sheen <roger@infotexture.net>
@infotexture

This comment has been minimized.

Member

infotexture commented May 29, 2018

The remaining character loss issues described above were resolved with 8f16633.

infotexture added a commit to dita-ot/docs that referenced this pull request May 29, 2018

Restore whitespace normalization to codeblocks
The remaining character loss issues were resolved with dita-ot/dita-ot@8f16633:

* dita-ot/dita-ot#2953 (comment)
* dita-ot/dita-ot#2953 (comment)

Signed-off-by: Roger Sheen <roger@infotexture.net>

@jelovirt jelovirt deleted the jelovirt:feature/codeblock_normalize_space branch Jun 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment