Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of declaring namespace for attributes and using said namespace should not matter. #22

Open
jbrayfaithlife opened this issue Apr 18, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@jbrayfaithlife
Copy link
Contributor

Bug Report

According to the xml (spec)[https://www.w3.org/TR/2006/REC-xml-names11-20060816/#sec-namespaces]:

The namespace prefix, unless it is xml or xmlns, must have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element (i.e. an element in whose content the prefixed markup occurs). Furthermore, the attribute value in the innermost such declaration must not be an empty string.

Though it is admittedly harder to read, these two declarations should be both valid uses of the prefix:
<div xmlns:epub="http://www.idpf.org/2007/ops" epub:type="footnote">Test</div>
<div epub:type="footnote" xmlns:epub="http://www.idpf.org/2007/ops">Test</div>

Unfortunately, the way that the parser works, it parses attributes in the order they are declared, so the first example parses correctly to the expected namespace uri, but the second one does not.

Prerequisites

  • [/] Can you reproduce the problem in a MWE?
  • [/] Are you running the latest version of AngleSharp?
  • [/] Did you check the FAQs to see if that helps you?
  • [/] Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • [/] Did you perform a search in the issues?

For more information, see the CONTRIBUTING guide.

Description

Namespace declarations need to be parsed before other attributes on an element.

Steps to Reproduce

var document = new XmlParser().ParseDocument(@"<xml xmlns:epub=""http://www.idpf.org/2007/ops"" epub:type=""noteref"">1</xml>");
var root = document.DocumentElement;
root.Attributes.First(att => att.LocalName == "type").NamespaceUri.Dump();

document = new XmlParser().ParseDocument(@"<xml epub:type=""noteref"" xmlns:epub=""http://www.idpf.org/2007/ops"" >1</xml>");
root = document.DocumentElement;
root.Attributes.First(att => att.LocalName == "type").NamespaceUri.Dump();

Expected behavior: both Dump() calls should print out http://www.idpf.org/2007/ops.

Actual behavior: the first call to Dump() outputs the correct uri, the second outputs null.

Environment details: Win 10 .NET 6.0.15

Possible Solution

There are two approaches that could be taken, both around

for (var i = 0; i < tagToken.Attributes.Count; i++)
{
var attr = tagToken.Attributes[i];
var item = CreateAttribute(attr.Key, attr.Value.Trim());
element.AddAttribute(item);
}

First, we could make sure to process any namespace declarations before any other attributes, which seems like the simplest approach. I have a PR to this effect that I will put up for your review.

Second, we could do a second run through the created attributes, double checking the namespaces after all the attributes have been processed.

@jbrayfaithlife jbrayfaithlife added the bug Something isn't working label Apr 18, 2023
FlorianRappl added a commit to jbrayfaithlife/AngleSharp.Xml that referenced this issue Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant