Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml2js does not preserve element order within a level #31

Closed
DaBlick opened this issue Sep 22, 2011 · 33 comments
Closed

xml2js does not preserve element order within a level #31

DaBlick opened this issue Sep 22, 2011 · 33 comments

Comments

@DaBlick
Copy link

DaBlick commented Sep 22, 2011

Here's the input XML

<nj:entitySection>
    <nj:entity entityName="Student">
        <nj:subEntity subEntityName="Contact" >
            <nj:fieldGroup displayName="group1">
                <nj:inModelField displayName="field1"  />
                <nj:foobar>x</nj:foobar>
                <nj:inModelField  displayName="field2"  />
            </nj:fieldGroup>
            <nj:fieldGroup displayName="group2">
                <nj:inModelField displayName="field3"/>
            </nj:fieldGroup>
        </nj:subEntity>
    </nj:entity>
</nj:entitySection>
</nj:ReportingStandard>

And here's what xml2js reads in:

{ 'nj:entitySection': 
  [ { 'nj:entity': 
       [ { '@': { entityName: 'Student' },
           'nj:subEntity': 
            [ { 'nj:fieldGroup': 
                 [ { 'nj:foobar': [ 'x', [length]: 1 ],
                     '@': { displayName: 'group1' },
                     'nj:inModelField': [ [Object], [Object], [length]: 2 ] },
                   { '@': { displayName: 'group2' },
                     'nj:inModelField': [ [Object], [length]: 1 ] },
                   [length]: 2 ],
                '@': { subEntityName: 'Contact' } },
              [length]: 1 ] },
         [length]: 1 ] },
    [length]: 1 ],

The key thing to notice is that in the xml its:
Originally the order was:

  • inModelField
  • foo
  • inModelField

but in the result, it's more like:

  • foo
  • inModelField
  • inModelField

it basically combines all tags with the same element name at a particular level, and in so doing, BLOWS away the (XML proscribed) preservation of tag order. And there's no way to re-establish the original order.

So if you tried to represent your XHTML with this, all your p, h, a and such tags would all scrunched with identical tags

H P P A H A P P H   would become
A A H H H P P P P   !!!!!

Note: I'm not trying to read in XHTML - i just used those tags as an example of why it might be a problem for some schemas.

xml2js does have this sexy sounding option:

explicitArray (default: false): Always put child nodes in an array if true;    otherwise an array is created only if there is more than one.

While it does indeed "put child nodes in an array", it doesn't preserve the document order.

@Leonidas-from-XIV
Copy link
Owner

I guess I need to add an additional option somewhere around line 92 in xml2js.coffee to check whether to squash these items. Probably needs another option not to break what other people use already. Or enhance explicitArray to do this properly.

Fancy, the original xml2js output style was broken in so many ways, there's really a lot of border-cases prohibiting sane parsing.

@DaBlick
Copy link
Author

DaBlick commented Sep 22, 2011

Another difficult is that if you use namespace prefixes, you get property keys that must be strings because of the colon:

xml:output ----> {"xml:output" :"x"} rather than just {output : "x"}

This isn't a big deal, but it means you code a lot of ugly element["xml:output"] rather than element.output

We coded up a quick SAX parser that his this basic model:

{ name : "xml",
 children : [list of children],
 attrs  : {name : value, name : value,...}
}

I'm almost tempted to tweak it to:

  { name : { name : "name-of-tag",  prefix : "xml", etc....}

@matthewjcarlson
Copy link

Any chance order preservation might be added in the near future? Would love to see it in the library (without, ahem, necessarily teaching myself coffeescript)

@Leonidas-from-XIV
Copy link
Owner

Well, not before 0.2 release, sorry. I plan to rewrite it a little and keep things like this in mind but time and motivation are a bit of a problem.

Oh and sorry for replying so late, I saw your comment but somehow it slipped my mind to answer!

@matthewjcarlson
Copy link

No worries, I think your library is great. Would it help if I could raise
some funding to assist with the motivation?

On Mon, Apr 30, 2012 at 10:49 PM, Marek Kubica <
reply@reply.github.com

wrote:

Well, not before 0.2 release, sorry. I plan to rewrite it a little and
keep things like this in mind but time and motivation are a bit of a
problem.

Oh and sorry for replying so late, I saw your comment but somehow it
slipped my mind to answer!


Reply to this email directly or view it on GitHub:

#31 (comment)

@Leonidas-from-XIV
Copy link
Owner

Not sure. I'll try to push 0.2 as soon as possible and then start thinking about fixing this problem. The way it is done currently is rather ugly and hackish, rewriting will make the code way clearer.

@kaero
Copy link

kaero commented Nov 9, 2012

Looks like following issue has the same background:

<suggest>o<fix>p</fix>en issue</suggest>

translates to

{ suggest: { _: 'oen issue', fix: [ 'p' ] } }

This case is more painful than squashing elements, because all text nodes are merged into one string instead of array of strings at least.

@Flackus
Copy link

Flackus commented Nov 9, 2012

I agree with @kaero, that is really painful :(

@zoobestik
Copy link

👍 Yep, one entity for all text nodes looks not good.

@Leonidas-from-XIV
Copy link
Owner

Yes. Some parsers handle it by having a tail attribute but I think it is rather confusing.

Boy I wish I had the time to start a proper 0.3 branch and rewrite the whole ugly element mess, maybe even on top of some other parser library.

@vjpr
Copy link

vjpr commented May 17, 2013

+1

@kaero
Copy link

kaero commented May 17, 2013

@vjpr try xamel – it preserve nodes order ;)

@jpravetz
Copy link

jpravetz commented Dec 1, 2013

I'm parsing SVG where node order matters. xamel does a good job and is easy to use. +1 on xamel.

@PigBoT
Copy link

PigBoT commented Jan 3, 2014

Looking really good. But I wish it would preserve the order. Would it be a major re-write?--I haven't looked through the code yet.

@Leonidas-from-XIV
Copy link
Owner

Well, I did plan a rewrite in 0.3, but that kinda didn't work out due to time limitations. My plan was to use htmlparser2 which is faster and already provides a xml2js-like object that could be transformed to the easy to use object that xml2js provides. Maybe I'll be able to do this with xml2js 0.5, contributions are welcome :-)

@amitgur
Copy link

amitgur commented Jan 7, 2014

Also need that order despertly. anyone has an idea how to do it ?

@kaero
Copy link

kaero commented Jan 7, 2014

@amitgur try the xamel module. It's easy to use, preserves a nodes order and can neatly serialize a node-set back to an XML string (since version 0.2.0-pre).

@amitgur
Copy link

amitgur commented Jan 7, 2014

thanks, i'll try it

@mgan59
Copy link

mgan59 commented Jan 29, 2014

Just ran into this issue trying to parse svg documents and I was losing my mind. thanks for the heads up on xamel.

@stevenvachon
Copy link

Thanks as well for the xamel mention. Another worth checking out is cheerio.

@ryedin
Copy link
Contributor

ryedin commented Apr 15, 2015

Any update on this issue?

This is a very common issue in a lot of XML docs. I would think the solution here could be as simple as changing the object model to add a _children_ array to all nodes, which populates in order of child node occurrence... and yes, probably as an optional behavior.

@Leonidas-from-XIV
Copy link
Owner

Currently no work is being done, but if you feel like submitting a PR that changes the current behaviour to something more sane (a.k.a preserving the order), I'd definitely be interested.

@ryedin
Copy link
Contributor

ryedin commented Apr 16, 2015

OK, I'll make a fork and take a look at the code to see if I can come up with something. No promises of course :)

@ryedin
Copy link
Contributor

ryedin commented Apr 16, 2015

OK, so, the above PR provides us with what I view is a viable workaround for this issue. In this configuration the parser behaves like normal (still collects nodes in named properties for convenience) with the "children" field now becoming and ordered array. The nodes all keep the #name field now so the children array can be iterated and node names checked if needed (very common case).

Note: both explicitChildren and preserveChildrenOrder must be true, and it does modify the behavior of explicitChildren in a way that will break code that relied on it previously ($$ is no longer a single object but the actual array of children) -- this breakage is why I made the preserveChildrenOrder an explicit option to override the behavior of explicitChildren instead of simply changing how explicitChildren works.

@apendua
Copy link

apendua commented May 26, 2015

@ryedin @Leonidas-from-XIV Is it already published in the latest release, i.e. 0.4.8? To me it still looks like the $$ property is an object rather than an array even though I am using both explicitChildren and 'explicitChildren' modifiers. Maybe I am doing something wrong.

@Leonidas-from-XIV
Copy link
Owner

@apendua As the README says, these options are only available starting with 0.4.9 (when it will be released) and git master.

@ryedin
Copy link
Contributor

ryedin commented May 26, 2015

@apendua if you need the feature right away, you can change your package.json to point to Leonidas-from-XIV/node-xml2js which will tell npm to pull directly from the github repo instead of the last published npm package.

@apendua
Copy link

apendua commented May 26, 2015

@ryedin Thanks for response. I was confused because for some reason I thought that your latest PR ended up in 0.4.8 - probably because of the commit date. I can see now that in fact it's the other way around:

0.4.8...master

This said I also thought 0.4.9 was a typo error in your README.md @Leonidas-from-XIV because such a release does not even exists. But anyway, thanks for explanation. A lot of confusion but everything seems to be clear now.

@Leonidas-from-XIV
Copy link
Owner

This said I also thought 0.4.9 was a typo error in your README.md @Leonidas-from-XIV because such a release does not even exists.

The master version will be released as 0.4.9, so the README.md is correct. If it references a version that is in the future, it means that it will be part of a future release.

I released 0.4.8 before merging @ryedin's patch, to make sure there is a recent release in case things will break, so there is a version to quickly roll back to. I suppose since there have't been many complaints, I'll release 0.4.9 soon.

@boopathi
Copy link

While rebuilding this ordered js to xml, the builder doesn't rebuild it in the ordered form. The order gets messed up again while building it back. Is there any workaround for that ?

@drasive
Copy link

drasive commented Dec 5, 2019

This is still an issue. "At the moment, a one to one bi-directional conversion is guaranteed only for default configuration" (from the README) is therefore not correct.

@Leonidas-from-XIV
Copy link
Owner

Yes, I should remove the note about that for now until I have time to rewrite it.

@bluenote10
Copy link

It is also weird that it is still the default behavior to mess up the order. That is not what one expects an XML parser does, because in XML order matters. A bit of an annoying gotcha to default to something that doesn't make sense with XML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests