Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve Children Order in case of mixed content. #317

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

chavdam
Copy link

@chavdam chavdam commented Aug 4, 2016

In current functionality, in order to maintain Chilren Order in case if mixed content like this:

<TailParas fid="td">
    <Para>That capitulation reflects the conundrum facing Mr. Ballmer's successor, <En cat="pe" fpRef="958861" instRef="7" djRef="9589153">Satya Nadella</En>, who took over two years ago, as he runs out of options to bolster <En cat="co" fpRef="27147355" instRef="8" djRef="mcrost">Microsoft</En>'s standing in mobile technology. His new strategy aims to help information-technology managers cope with the increasingly central position of smartphones in corporate networks. The company is betting that corporate customers will want to use <En cat="co" fpRef="27147355" instRef="9" djRef="mcrost">Microsoft</En>'s technology to lock down and control devices on their networks.</Para>
    <Para>In a statement, Mr. Nadella also touted its Continuum feature, which enables a smartphone running Windows 10 to function as a surrogate PC when connected to a monitor and keyboard.</Para>
    <Para>
        <En cat="co" fpRef="27147355" instRef="10" djRef="mcrost">Microsoft</En> must establish its Windows operating system on a substantial portion of smartphones or accept a diminished presence in the technology landscape, where mobile devices outnumber personal computers—and it may not get another chance, said Forrester Research Inc. analyst Julie Ask.</Para>
</TailParas>

you need to pass these flags:
explicitChildren: true, preserveChildrenOrder : true, charsAsChildren: true, includeWhiteChars: true
And these flag generates lots of extraneous json nodes.

With this change I have added a flag called preserveChildrenOrderForMixedContent (default is false). But if it's turned on, it will generate following output. Basically, it's smart enough to detect that it has mixed content and will wrap the object only if required in order to maintain the order.

{
  "$": {
    "fid": "td"
  },
  "Para": [
    {
      "$$": [
        {
          "_": "That capitulation reflects the conundrum facing Mr. Ballmer\'s successor, "
        },
        {
          "En": {
            "_": "Satya Nadella",
            "$": {
              "cat": "pe",
              "fpRef": "958861",
              "instRef": "7",
              "djRef": "9589153"
            }
          }
        },
        {
          "_": ", who took over two years ago, as he runs out of options to bolster "
        },
        {
          "En": {
            "_": "Microsoft",
            "$": {
              "cat": "co",
              "fpRef": "27147355",
              "instRef": "8",
              "djRef": "mcrost"
            }
          }
        },
        {
          "_": "\'s standing in mobile technology. His new strategy aims to help information-technology managers cope with the increasingly central position of smartphones in corporate networks. The company is betting that corporate customers will want to use "
        },
        {
          "En": {
            "_": "Microsoft",
            "$": {
              "cat": "co",
              "fpRef": "27147355",
              "instRef": "9",
              "djRef": "mcrost"
            }
          }
        },
        {
          "_": "\'s technology to lock down and control devices on their networks."
        }
      ]
    },
    "In a statement, Mr. Nadella also touted its Continuum feature, which enables a smartphone running Windows 10 to function as a surrogate PC when connected to a monitor and keyboard.",
    {
      "$$": [
        {
          "_": "\\n            "
        },
        {
          "En": {
            "_": "Microsoft",
            "$": {
              "cat": "co",
              "fpRef": "27147355",
              "instRef": "10",
              "djRef": "mcrost"
            }
          }
        },
        {
          "_": " must establish its Windows operating system on a substantial portion of smartphones or accept a diminished presence in the technology landscape, where mobile devices outnumber personal computers—and it may not get another chance, said Forrester Research Inc. analyst Julie Ask."
        }
      ]
    }
  ]
}

In current functionality, in order to maintain Chilren Order in case if mixed content like this:

<TailParas fid="td">
	<Para>That capitulation reflects the conundrum facing Mr. Ballmer's successor, <En cat="pe" fpRef="958861" instRef="7" djRef="9589153">Satya Nadella</En>, who took over two years ago, as he runs out of options to bolster <En cat="co" fpRef="27147355" instRef="8" djRef="mcrost">Microsoft</En>'s standing in mobile technology. His new strategy aims to help information-technology managers cope with the increasingly central position of smartphones in corporate networks. The company is betting that corporate customers will want to use <En cat="co" fpRef="27147355" instRef="9" djRef="mcrost">Microsoft</En>'s technology to lock down and control devices on their networks.</Para>
	<Para>In a statement, Mr. Nadella also touted its Continuum feature, which enables a smartphone running Windows 10 to function as a surrogate PC when connected to a monitor and keyboard.</Para>
	<Para>
		<En cat="co" fpRef="27147355" instRef="10" djRef="mcrost">Microsoft</En> must establish its Windows operating system on a substantial portion of smartphones or accept a diminished presence in the technology landscape, where mobile devices outnumber personal computers—and it may not get another chance, said Forrester Research Inc. analyst Julie Ask.</Para>
</TailParas>

you need to pass these flags: 
explicitChildren: true, preserveChildrenOrder : true, charsAsChildren: true, includeWhiteChars: true
And these flag generates lots of extraneous json nodes.

With this change I have added a flag called preserveChildrenOrderForMixedContent (default is false). But if it's turned on, it will generate following output. Basically, it's smart enough to detect that it has mixed content and will wrap the object only if required in order to maintain the order.

{
  "$": {
    "fid": "td"
  },
  "Para": [
    {
      "$$": [
        {
          "_": "That capitulation reflects the conundrum facing Mr. Ballmer\'s successor, "
        },
        {
          "En": {
            "_": "Satya Nadella",
            "$": {
              "cat": "pe",
              "fpRef": "958861",
              "instRef": "7",
              "djRef": "9589153"
            }
          }
        },
        {
          "_": ", who took over two years ago, as he runs out of options to bolster "
        },
        {
          "En": {
            "_": "Microsoft",
            "$": {
              "cat": "co",
              "fpRef": "27147355",
              "instRef": "8",
              "djRef": "mcrost"
            }
          }
        },
        {
          "_": "\'s standing in mobile technology. His new strategy aims to help information-technology managers cope with the increasingly central position of smartphones in corporate networks. The company is betting that corporate customers will want to use "
        },
        {
          "En": {
            "_": "Microsoft",
            "$": {
              "cat": "co",
              "fpRef": "27147355",
              "instRef": "9",
              "djRef": "mcrost"
            }
          }
        },
        {
          "_": "\'s technology to lock down and control devices on their networks."
        }
      ]
    },
    "In a statement, Mr. Nadella also touted its Continuum feature, which enables a smartphone running Windows 10 to function as a surrogate PC when connected to a monitor and keyboard.",
    {
      "$$": [
        {
          "_": "\\n            "
        },
        {
          "En": {
            "_": "Microsoft",
            "$": {
              "cat": "co",
              "fpRef": "27147355",
              "instRef": "10",
              "djRef": "mcrost"
            }
          }
        },
        {
          "_": " must establish its Windows operating system on a substantial portion of smartphones or accept a diminished presence in the technology landscape, where mobile devices outnumber personal computers—and it may not get another chance, said Forrester Research Inc. analyst Julie Ask."
        }
      ]
    }
  ]
}
Compiled xml2js file for the change I added in corresponding coffee script.
@coveralls
Copy link

coveralls commented Aug 5, 2016

Coverage Status

Coverage decreased (-13.7%) to 83.333% when pulling 6841f84 on chavdam:patch-1 into 129ebba on Leonidas-from-XIV:master.

Added Test case for preserveChildrenOrderForMixedContent to improve the test coverage.
@coveralls
Copy link

coveralls commented Aug 5, 2016

Coverage Status

Coverage decreased (-13.7%) to 83.333% when pulling 12cbab1 on chavdam:patch-1 into 129ebba on Leonidas-from-XIV:master.

@coveralls
Copy link

coveralls commented Aug 5, 2016

Coverage Status

Coverage decreased (-4.5%) to 92.473% when pulling 0652162 on chavdam:patch-1 into 129ebba on Leonidas-from-XIV:master.

Updated Input to cover more conditions
@coveralls
Copy link

coveralls commented Aug 5, 2016

Coverage Status

Coverage increased (+0.3%) to 97.312% when pulling 0f768e0 on chavdam:patch-1 into 129ebba on Leonidas-from-XIV:master.

@sarikav1986
Copy link

👍 Change was very useful for my use case. Can we merge this with master branch?

Preserve space between consecutive nodes for mixed content
Preserve space between two consecutive tags.
Preserve space between two consecutive nodes
@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 97.067% when pulling e3218af on chavdam:patch-1 into 129ebba on Leonidas-from-XIV:master.

@chavdam
Copy link
Author

chavdam commented Aug 20, 2016

@Leonidas-from-XIV - Could you please look at the pull request I created and see if there is any possibility of merging this to master ?

Thanks,
Mahesh

@paumayr
Copy link

paumayr commented Oct 5, 2016

@Leonidas-from-XIV I would also need this, as I am trying to process XLIFF (xlf) files generated by angular2 for merging and difference operations.
@chavdam will this allow round-tripping (e.g. loading and saving back) the example you mentioned initially?

@chavdam
Copy link
Author

chavdam commented Oct 5, 2016

@paumayr - If I understand it correctly, you are asking if we can have bidirectional conversion to/from xml to json. Right? It should work.

@paumayr
Copy link

paumayr commented Oct 6, 2016

@chavdam Awesome... I currently have the following issue: angular2s ng-xi18n generates an xlf file that contains something like this:

<trans-unit id="aa69f7913cb1d24946f9aae8366cc24bd405dc8d" datatype="html">
   <source>Version: <x id="INTERPOLATION"/>, 
BuildInfrastructureVersion: <x id="INTERPOLATION_1"/>.<x id="INTERPOLATION_2"/>.<x id="INTERPOLATION_3"/>.<x id="INTERPOLATION_4"/>,
BuildInfrastructureInformation: <x id="INTERPOLATION_5"/>
   </source>
   <note priority="1" from="description">setting page server versions, build infrastructure version field</note>
</trans-unit>

I writing tools to deal with such files. ng-xi18n currently always generates localizations for all packages, including dependencies. Now, once I load and store such a file, it reorders mixed content like this (this time including the translation node):

  <trans-unit id="aa69f7913cb1d24946f9aae8366cc24bd405dc8d" datatype="html">
    <source>
      &#xD;
     ASDFJKLMVersion: , &#xD;
BuildInfrastructureVersion: ...,&#xD;
BuildInfrastructureInformation:&#xD;
&#xD;

      <x id="INTERPOLATION"/>
      <x id="INTERPOLATION_1"/>
      <x id="INTERPOLATION_2"/>
      <x id="INTERPOLATION_3"/>
      <x id="INTERPOLATION_4"/>
      <x id="INTERPOLATION_5"/>
    </source>
    <target state="translated" xml:lang="de">
      &#xD;
Version: , &#xD;
BuildInfrastructureVersion: ...,&#xD;
BuildInfrastructureInformation:&#xD;

      <x id="INTERPOLATION"/>
      <x id="INTERPOLATION_1"/>
      <x id="INTERPOLATION_2"/>
      <x id="INTERPOLATION_3"/>
      <x id="INTERPOLATION_4"/>
      <x id="INTERPOLATION_5"/>
    </target>
<note priority="1" from="description">setting page server versions, build infrastructure version field</note>
</trans-unit>

So obviously it causes confusion in the UI with the mixed-up text.

@chavdam
Copy link
Author

chavdam commented Oct 7, 2016

@paumayr - Did you test using my patch to see if it satisfies your requirement ?

@paumayr
Copy link

paumayr commented Nov 14, 2016

@chavdam sorry for the late (really late) reply. I just tested your patch, and I think it works for parsing. But I think the builder requires support for converting $$ back to text, by only printing the content of $$ nodes.

source:

<trans-unit id="aa69f7913cb1d24946f9aae8366cc24bd405dc8d" datatype="html">
    <source>BuildInfrastructureVersion: <x id="INTERPOLATION"/>.<x id="INTERPOLATION_1"/>.<x id="INTERPOLATION_2"/>.<x id="INTERPOLATION_3"/></source>
    <target state="translated" xml:lang="de">
      Build Infrastruktur Version: <x id="INTERPOLATION"/>.<x id="INTERPOLATION_1"/>.<x id="INTERPOLATION_2"/>.<x id="INTERPOLATION_3"/>
    </target>
    <note priority="1" from="description">setting page server versions, build infrastructure version field</note>
</trans-unit>

The result is the following (with pretty printing)

<trans-unit id="aa69f7913cb1d24946f9aae8366cc24bd405dc8d" datatype="html">
    <source>
      <$$>BuildInfrastructureVersion: </$$>
      <$$>
        <x id="INTERPOLATION"/>
      </$$>
      <$$>.</$$>
      <$$>
        <x id="INTERPOLATION_1"/>
      </$$>
      <$$>.</$$>
      <$$>
        <x id="INTERPOLATION_2"/>
      </$$>
      <$$>.</$$>
      <$$>
        <x id="INTERPOLATION_3"/>
      </$$>
    </source>
    <target state="translated" xml:lang="de">
      <$$>&#xD;
      Build Infrastruktur Version: </$$>
      <$$>
        <x id="INTERPOLATION"/>
      </$$>
      <$$>.</$$>
      <$$>
        <x id="INTERPOLATION_1"/>
      </$$>
      <$$>.</$$>
      <$$>
        <x id="INTERPOLATION_2"/>
      </$$>
      <$$>.</$$>
      <$$>
        <x id="INTERPOLATION_3"/>
      </$$>
      <$$>&#xD;
    </$$>
    </target>
    <note priority="1" from="description">setting page server versions, build infrastructure version field</note>
</trans-unit>

When disabling pretty-printing and removing <$$> and </$$> tags using something like this after writing I get the results I need:

xml = xml.replace(/(<\$\$>|<\/\$\$>)/g, "");

soo, almost :)..

Could we add an option to write the content of a specific element tag without writing the element tags?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants