Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior of XDocumentType/XmlDocumentType.InternalSubset parsing is inconsistent to manual creation #44866

Open
krwq opened this issue Nov 18, 2020 · 2 comments

Comments

@krwq
Copy link
Member

krwq commented Nov 18, 2020

As can be seen below when manually passing null to the constructor/CreateDocumentType as internal subset it prints nothing when serializing and reports null in the property but when parsing it prints extra when serialized [] (empty internal subset) and doesn't serialize to the same as input and property reports string.Empty.

See following example:

using System;
using System.Xml.Linq;

XDocumentType dtd1 = new XDocumentType("foo", null, null, null);
Console.WriteLine(dtd1); // Prints: <!DOCTYPE foo >
Console.WriteLine(dtd1.InternalSubset ?? "<null>"); // Prints: "<null>"

XDocumentType dtd2 = new XDocumentType("foo", null, null, string.Empty);
Console.WriteLine(dtd2); // Prints: <!DOCTYPE foo []>
Console.WriteLine(dtd2.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

XDocument xd1 = XDocument.Parse("<!DOCTYPE foo ><foo/>");
Console.WriteLine(xd1.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)
Console.WriteLine(xd1.DocumentType.ToString()); // Prints: <!DOCTYPE foo []> (differs from input)

XDocument xd2 = XDocument.Parse("<!DOCTYPE foo []><foo/>");
Console.WriteLine(xd2.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)
Console.WriteLine(xd2.DocumentType.ToString()); // Prints: <!DOCTYPE foo []>

and similar for XmlDocument

using System;
using System.Xml;

XmlDocument xmldoc3 = new XmlDocument();
XmlDocumentType xdocType1 = xmldoc3.CreateDocumentType("foo", null, null, null);
Console.WriteLine(xdocType1.OuterXml); // Prints: <!DOCTYPE foo>
Console.WriteLine(xdocType1.InternalSubset ?? "<null>"); // Prints: "<null>"

XmlDocumentType xdocType2 = xmldoc3.CreateDocumentType("foo", null, null, string.Empty);
Console.WriteLine(xdocType2.OuterXml); // Prints: <!DOCTYPE foo[]>
Console.WriteLine(xdocType2.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

XmlDocument xmldoc1 = new XmlDocument();
xmldoc1.LoadXml("<!DOCTYPE foo ><foo/>");
Console.WriteLine(xmldoc1.DocumentType.OuterXml); // Prints: <!DOCTYPE foo[]>
Console.WriteLine(xmldoc1.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

XmlDocument xmldoc2 = new XmlDocument();
xmldoc2.LoadXml("<!DOCTYPE foo []><foo/>");
Console.WriteLine(xmldoc2.DocumentType.OuterXml); // Prints: <!DOCTYPE foo[]>
Console.WriteLine(xmldoc2.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

For XmlDocument the docs mention that "The DTD internal subset on the DOCTYPE. If there is no DTD internal subset, String.Empty is returned.". But that does not seem to be true as seen in the first example.

Good thing they're at least consistent between each other.

Additionally recently suggested by the docs telling that XmlDocument.InternalSubset doesn't return null I changed XDocumentType to treat null and string.Empty the same (if passed null you would get same output as for the second example rather than null). I plan to revert this change as part of fix for #44489 - we can re-think the behavior in this issue.

My suggestion is to do one of the following (first and last option would fix consistency):

  • change both parsers and update doc to distinguish between null and string.Empty (third example would show null) - this would be a bit breaking (when parsing you might get null where you wouldn't expect it although rather rare scenario) but probably match with what people would expect - there is a clear distinction what null and empty means here
  • do not change anything (state before my change to XDocumentType), update the docs to reflect reality - not breaking but docs currently document something else and expectations might differ
  • make it treat null and string.Empty the same and always produce <!DOCTYPE foo[]> when serializing and return string.Empty - assuming users do not care about having extra [] in the doctype (cosmetic difference as they have no semantic meaning) this would make it slightly easier to use and at least be consistent between parsing and manually passing subset

cc: @buyaa-n @Jozkee @stephentoub

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Xml untriaged New issue has not been triaged by the area owner labels Nov 18, 2020
@ghost
Copy link

ghost commented Nov 18, 2020

Tagging subscribers to this area: @buyaa-n, @krwq, @jeffhandley
See info in area-owners.md if you want to be subscribed.

Issue Details
Description:

As can be seen below when manually passing null to the constructor/CreateDocumentType as internal subset it prints nothing when serializing and reports null in the property but when parsing it prints extra when serialized [] (empty internal subset) and doesn't serialize to the same as input and property reports string.Empty.

See following example:

using System;
using System.Xml.Linq;

XDocumentType dtd1 = new XDocumentType("foo", null, null, null);
Console.WriteLine(dtd1); // Prints: <!DOCTYPE foo >
Console.WriteLine(dtd1.InternalSubset ?? "<null>"); // Prints: "<null>"

XDocumentType dtd2 = new XDocumentType("foo", null, null, string.Empty);
Console.WriteLine(dtd2); // Prints: <!DOCTYPE foo []>
Console.WriteLine(dtd2.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

XDocument xd1 = XDocument.Parse("<!DOCTYPE foo ><foo/>");
Console.WriteLine(xd1.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)
Console.WriteLine(xd1.DocumentType.ToString()); // Prints: <!DOCTYPE foo []> (differs from input)

XDocument xd2 = XDocument.Parse("<!DOCTYPE foo []><foo/>");
Console.WriteLine(xd2.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)
Console.WriteLine(xd2.DocumentType.ToString()); // Prints: <!DOCTYPE foo []>

and similar for XmlDocument

using System;
using System.Xml;

XmlDocument xmldoc3 = new XmlDocument();
XmlDocumentType xdocType1 = xmldoc3.CreateDocumentType("foo", null, null, null);
Console.WriteLine(xdocType1.OuterXml); // Prints: <!DOCTYPE foo>
Console.WriteLine(xdocType1.InternalSubset ?? "<null>"); // Prints: "<null>"

XmlDocumentType xdocType2 = xmldoc3.CreateDocumentType("foo", null, null, string.Empty);
Console.WriteLine(xdocType2.OuterXml); // Prints: <!DOCTYPE foo[]>
Console.WriteLine(xdocType2.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

XmlDocument xmldoc1 = new XmlDocument();
xmldoc1.LoadXml("<!DOCTYPE foo ><foo/>");
Console.WriteLine(xmldoc1.DocumentType.OuterXml); // Prints: <!DOCTYPE foo[]>
Console.WriteLine(xmldoc1.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

XmlDocument xmldoc2 = new XmlDocument();
xmldoc2.LoadXml("<!DOCTYPE foo []><foo/>");
Console.WriteLine(xmldoc2.DocumentType.OuterXml); // Prints: <!DOCTYPE foo[]>
Console.WriteLine(xmldoc2.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

For XmlDocument the docs mention that "The DTD internal subset on the DOCTYPE. If there is no DTD internal subset, String.Empty is returned.". But that does not seem to be true as seen in the first example.

Good thing they're at least consistent between each other.

Additionally recently suggested by the docs telling that XmlDocument.InternalSubset doesn't return null I changed XDocumentType to treat null and string.Empty the same (if passed null you would get same output as for the second example rather than null). I plan to revert this change as part of fix for #44489 - we can re-think the behavior in this issue.

My suggestion is to do one of the following (first and last option would fix consistency):

  • change both parsers and update doc to distinguish between null and string.Empty (third example would show null) - this would be a bit breaking (when parsing you might get null where you wouldn't expect it although rather rare scenario) but probably match with what people would expect - there is a clear distinction what null and empty means here
  • do not change anything (state before my change to XDocumentType), update the docs to reflect reality - not breaking but docs currently document something else and expectations might differ
  • make it treat null and string.Empty the same and always produce <!DOCTYPE foo[]> when serializing and return string.Empty - assuming users do not care about having extra [] in the doctype (cosmetic difference as they have no semantic meaning) this would make it slightly easier to use and at least be consistent between parsing and manually passing subset

cc: @buyaa-n @Jozkee @stephentoub

Author: krwq
Assignees: -
Labels:

area-System.Xml, untriaged

Milestone: -

@krwq krwq removed the untriaged New issue has not been triaged by the area owner label Nov 18, 2020
@krwq krwq added this to the 6.0.0 milestone Nov 18, 2020
@krwq
Copy link
Member Author

krwq commented Nov 18, 2020

We should fix this in 6.0 timeline. If we do not plan to make product change we should update the docs to reflect reality.

@krwq krwq modified the milestones: 6.0.0, Future Jul 23, 2021
@krwq krwq moved this from 6.0 to Future in Triage POD for Meta, Reflection, etc Jul 23, 2021
@ghost ghost moved this from Future to Needs triage in Triage POD for Meta, Reflection, etc Jul 23, 2021
@ghost ghost added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jul 23, 2021
@krwq krwq removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jul 23, 2021
@buyaa-n buyaa-n moved this from Needs triage to Future in Triage POD for Meta, Reflection, etc Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

2 participants