Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XPath not working as expected with findall #155

Closed
anandijain opened this issue Mar 5, 2021 · 4 comments
Closed

XPath not working as expected with findall #155

anandijain opened this issue Mar 5, 2021 · 4 comments

Comments

@anandijain
Copy link

I cannot seem to make findall work as intended. According to https://www.w3schools.com/xml/xpath_syntax.asp, it seems like the Xpath //ci should get compartment, k1, S1, but findall returns empty Node.

<math xmlns="http://www.w3.org/1998/Math/MathML">
  <apply>
    <times/>
    <ci> compartment 
    </ci>
    <ci> k1 
    </ci>
    <ci> S1 
    </ci>
  </apply>
</math>
julia> findall("//ci", xml)
EzXML.Node[]
@kescobo
Copy link
Member

kescobo commented Mar 9, 2021

Your document has a namespace. See here

In particular:

There is a caveat on the combination of XPath and namespaces: if a document contains elements with a default namespace, you need to specify its prefix to the find* function. For example, in the following example, the root element and its descendants have a default namespace "http://www.foobar.org", but it does not have its own prefix. In this case, you need to assign a prefix to the namespance when finding elements in the namespace:

julia> doc = parsexml("""
       <parent xmlns="http://www.foobar.org">
           <child/>
       </parent>
       """)
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x00007fdc67710030>))

julia> findall("/parent/child", doc.root)  # nothing will be found
0-element Array{EzXML.Node,1}

julia> namespaces(doc.root)  # the default namespace has an empty prefix
1-element Array{Pair{String,String},1}:
 "" => "http://www.foobar.org"

julia> ns = namespace(doc.root)  # get the namespace
"http://www.foobar.org"

julia> findall("/x:parent/x:child", doc.root, ["x"=>ns])  # specify its prefix as "x"
1-element Array{EzXML.Node,1}:
 EzXML.Node(<ELEMENT_NODE[child]@0x00007fdc6774c990>)

So for yours:

julia> str = """
              <math xmlns="http://www.w3.org/1998/Math/MathML">
                <apply>
                  <times/>
                  <ci> compartment
                  </ci>
                  <ci> k1
                  </ci>
                  <ci> S1
                  </ci>
                </apply>
              </math>
              """
"<math xmlns=\"http://www.w3.org/1998/Math/MathML\">\n  <apply>\n    <times/>\n    <ci> compartment\n    </ci>\n    <ci> k1\n    </ci>\n    <ci> S1\n    </ci>\n  </apply>\n</math>\n"

julia> xml = parsexml(str)
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x0000558771efacd0>))

julia> ns = namespace(xml.root)
"http://www.w3.org/1998/Math/MathML"

julia> findall("//x:ci", xml.root, ["x"=>ns])
3-element Vector{EzXML.Node}:
 EzXML.Node(<ELEMENT_NODE[ci]@0x00005587718755e0>)
 EzXML.Node(<ELEMENT_NODE[ci]@0x000055877175d460>)
 EzXML.Node(<ELEMENT_NODE[ci]@0x0000558772b529d0>)

@anandijain
Copy link
Author

Thanks!

@anandijain
Copy link
Author

@kescobo I have another case here that I think would be better not lost to Slackhole.

julia> str = """<cn type="e-notation" cellml:units="molar_per_minute">5   <sep/>-2</cn>"""
"<cn type=\"e-notation\" cellml:units=\"molar_per_minute\">5   <sep/>-2</cn>"

julia> parsexml(str)
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x0000000002880610>))

julia> parsexml(str).root
ERROR: AssertionError: isempty(XML_GLOBAL_ERROR_STACK)

I'm wondering if I can ignore the undefined namespace, basically remove all cellml: attributes from a Document, or what the standard workaround here is.

I suppose I could do findall("//*[@cellml:*]", node) and then just delete! the attributes.
This is sort of an annoying hack, as other formats might have other namespaces.

If you have any thoughts, I'd really appreciate it!

@kescobo
Copy link
Member

kescobo commented Mar 11, 2021

Alas, I have no idea. The only reason I knew how to answer the previous question is because I'd run into the same issue before and someone else helped me. I'm no expert! Good luck :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants