Skip to content
This repository

Improve support for URI parts in namespaces #78

Open
trygvis opened this Issue · 8 comments

4 participants

Trygve Laugstøl Erlend Hamnaberg Daniel Spiewak Jesper Steen Møller
Trygve Laugstøl

I've been trying to use anti-xml to parse and generate some XML documents that use some elements from the atom namespace 1. My code uses our own private namespace, but in the documents it is bound to the default namespace. An example XML looks like this:

<profile xmlns="http://.." xmlns:atom="http://www.w3.org/2005/Atom">
  <atom:link id=".." href=".."/>
</profile>

I've run into two issues with this.

The first is that the conversions drop the namespace entirely (see 2). I've fixed this so that it's more in line with the existing API.

The second issue I see now is that the entire API is oriented around the "prefix" parts instead of the namespace part. When I'm converting the XML to my Profile object I don't care about the prefix, I just want the elements named "link" inside the Atom namespace. Right now I'll have to do this by hand.

I would like to adjust the API so that it's more in line with java.xml.QName and W3's definition, see 3 and 4. The "prefix" is not a part of the qualified name of an element and really not very interesting when it comes to matching on objects.

I'm hoping to be able to write something like this:

def atomLinks(e: Elem) = (e / (Namespaces.atom -> "link"))

Does this make sense? I really like anti-xml and we're using it for most of our XML stuff now but this came up as an issue the other day and I don't see a way to fix it without changing anti-xml.

I've implemented my ideas under my own repository, 5. I'm not entirely satisfied with the current solution but it shows what I want to achieve. All the existing tests passes and I've added some more too.

Erlend Hamnaberg

+1.
This is a very annoying problem when using lots of namespaces.
I am building an atom library for Scala using anti-xml.

Daniel Spiewak
Owner

So, this is one of those areas where we're consciously compromising in order to get a more usable functional tree. I believe @jespersm raised these same points. Theoretically speaking, this is a symptom of XML's scoping semantics. In XML, data flows down the tree (from the root to the leaves). However, a functional tree is built bottom up (from the leaves to the root). This creates a fairly annoying impedance mismatch. Consider the following fragment:

val foo = Elem(Some("ns"), "foo", Attributes(), Map(), Group())

This would map to an XML node <ns:foo/> where ns is an unbound prefix, and thus corresponds to no namespace. In XML, even unbound prefixes are significant and we need to preserve them. Unfortunately (and here is where the bottom-up impedance comes into play), it's not difficult to exploit this to generate absurdities:

val bar = Elem(None, "bar", Attributes(), Map("ns" -> "http://www.google.com"), Group(foo))

We have explicitly put foo into scope of a parent element which has bound ns to a specific URI. However, foo's scope doesn't reflect this because foo's scope was built before bar's! There is no way to serialize this back into XML without losing some information (from the functional tree).

Even worse, we can create trees that are self-contradicting:

val foo = Elem(Some("ns"), "foo", Attributes(), Map("ns" -> "http://www.google.com"), Group())
val bar = Elem(None, "bar", Attributes(), Map("ns" -> "http://www.yahoo.com"), Group(foo))

Now what? And just to illustrate that this problem is fundamental to the bottom-up nature of functional trees, we can create a pair of trees (bar and baz) that use the exact same element in different ways:

val foo = Elem(Some("ns"), "foo", Attributes(), Map("ns" -> "http://www.google.com"), Group())
val bar = Elem(None, "bar", Attributes(), Map("ns" -> "http://www.yahoo.com"), Group(foo))
val baz = Elem(None, "baz", Attributes(), Map("ns" -> "http://www.bing.com"), Group(foo))

There's just no way we can solve this problem in general with functional, bottom-up trees. Making the namespace primary only serves to make it harder for users to manually untangle these cases. By focusing on the prefix, we're forcing the user to maintain context and scoping information in their top-down traversal of the tree if they need this information. It's a bit painful for applications like Atom, where you have multiple namespaces, but at present, I'm not sure I see a good alternative.

Anyway, I'll spend some quality time with your code and see how it addresses these issues. I'm always open to being wrong!

Jesper Steen Møller

Well, there is a way which keeps the nice bottom up semantics, which is the one I suggested in the original patch for the namespace handling. The idea is to enforce namespace mappings at every element and then optimize when generating XML.

It doesn't really allow for upper-level nodes to change the meaning of the children's namespace mappings, that must be done by rewriting the tree.

Daniel Spiewak
Owner

It doesn't really allow for upper-level nodes to change the meaning of the children's namespace mappings, that must be done by rewriting the tree.

Right, so one way or another, we're compromising on some aspect of the functionality. Without inverting the parenting of the tree, I don't see a way to avoid this.

Jesper Steen Møller

It doesn't really allow for upper-level nodes to change the meaning of the children's namespace mappings, that must be done by rewriting the tree.

Right, so one way or another, we're compromising on some aspect of the functionality. Without inverting the parenting of the tree, I don't see a way to avoid this.

Agreed, but how often would you want to change the namespaces like that? I never do, since it's very rare that whole structures of local names in one namespace is directly transferable to another, except for some tricky versioning transformation cases.
It's be like having the surname at the top of the (paper) phonebook, just so you can change it from Jones to Smith. Sure it's cheap, but it makes little sense :-)

Trygve Laugstøl

I fail to see the issue here. As far as I can read from the XML NS spec it's not allowed to have unbound prefix. See 1, "Namespace constraint: Prefix Declared".

This

val foo = Elem(Some("ns"), "foo", Attributes(), Map("ns" -> "http://www.google.com"), Group())
val bar = Elem(None, "bar", Attributes(), Map("ns" -> "http://www.yahoo.com"), Group(foo))

is the same as

<bar xml:ns="http://www.yahoo.com">
  <ns:foo xmlns:ns="http://www.google.com"/>
</bar>

which is fine. bar itself wouldn't have a namespace, but at the same time it declares a namespace called ns. foo is in the http://www.google.com namespace which it declared with itself.

Any namespace can be declared at any level in the tree and they apply from that point and down. Any child elements can override whatever they want. From what I can tell this meshes just fine with how I would expect XML generation to happen.

Ideally the entire prefix could be dropped from the model as it's only the namespace that's really relevant and anti-xml could just generate namespaces when serializing the data (like most SOAP implementations does). What I would like to see is something similar to this:

def foo2entry(foo: Foo) = Elem(Atom.namespace, "entry", Attributes(),
  Group(foo.bars.map(bar2Link)) ++ Group(..))

def bar2link(bar: Bar) = Elem(Atom.namespace, "bar",
  Attributes("href" -> "http://..", "rel" -> ".."), Group())

where Atom.namespace is a NSRepr wrapping Atom's namespace.

Ideally I would like something like this:

object Atom {
  val namespace = NSRepr("http://www.w3.org/2005/Atom")
  val entry = namespace.elem("entry")
  val link = namespace.elem("link")
  val href = namespace.attr("href")
  val rel = namespace.attr("rel")
}

import Atom._

def foo2entry(foo: Foo) = entry(Attributes(), NBS.empty,
  Group(foo.bars.map(bar2Link)) ++ Group(..))

def bar2link(bar: Bar) = link(Attributes(href(http://..), rel("..")), NBS.empty, Group())

but that's another issue :)

Jesper Steen Møller

What if:

val foo = Elem("http://www.google.com", "foo", Attributes(),None, Group())
val bar = Elem("http://www.yahoo.com", "bar", Attributes(), None, Group(foo, foo))

would give you

<bar xmlns="http://www.yahoo.com">
  <foo xmlns="http://www.google.com"/>
  <foo xmlns="http://www.google.com"/>
</bar>

But then it also had an explicit "preferred" binding of prefixes to namespaces, like this:

val foo = Elem("http://www.google.com", "foo", Attributes(), None, None, Group())
val baz = Elem("http://www.bing.com", "baz", Attributes(), None, None, Group())
val bar = Elem("http://www.yahoo.com", "bar", Attributes(),
    Some(Map("go" -> "http://www.google.com")), Group(foo, baz,foo))

would give you explicit control over the namespaces you were interested in, and handle the rest automatically:

<bar xmlns="http://www.yahoo.com" xmlns:go="http://www.google.com">
  <go:foo/>
  <baz xmlns="http://www.bing.com"/>
  <go:foo/>
</bar>

Finally, there could be an optimizing traversal which would pick up the namespaces and bubble them as far up as needed (needs only replace near the top i most cases)

I think this would strike a fair balance between bottom-up functional-ness and still provide the control needed for special applications like XSD, XSLT and other languages which use XPath and similar expressions to reference qualified elements.

Trygve Laugstøl trygvis referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
Trygve Laugstøl trygvis referenced this issue from a commit in trygvis/anti-xml
Trygve Laugstøl trygvis Getting closer to a solution for #78.
o Storing all previously declared prefixed namespaces and unprefixed namespaces in two different stacks.
4a7e808
Trygve Laugstøl

@djspiewak: I've pushed some code that's getting quite close to what I want. I haven't implemented the NSRepr stuff as I couldn't figure out how that should be implemented. It would be nice with some more explanation on exactly how you'd like that part to be like.

@jespersm: It's easy to walk the tree, find all namespaces and put them in the namespace list in the root node. CPU intensitive, but it'll only replace the root object. Should be easy to implement too. Getting exactly your result is a bit harder.

The commit: trygvis@4a7e808.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.