Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Xml.Resolvers.XmlPreloadedResolver does not support XmlKnownDtds and fails with surprising error #29280

Closed
idg10 opened this issue Apr 16, 2019 · 5 comments
Milestone

Comments

@idg10
Copy link
Contributor

idg10 commented Apr 16, 2019

In desktop .NET, the XmlPreloadedResolver offers some baked in DTDs, as listed in the XmlKnownDtds enumeration. If you construct a new XmlPreloadedResolver(XmlKnownDtds.Xhtml10), it will be able to resolve DTDs with common ids such as ""-//W3C//DTD XHTML 1.0 Strict//EN";". (In fact, this will make a family of related DTDs available. It makes it possible to parse XHTML documents. Without these DTDs, certain standard entities that are valid in XHTML but which are not standard XML entities, will cause failures during XML parsing.)

However, in .NET core, these DTDs are not compiled in. See commit dotnet/corefx@9a4b239 for where the relevant <EmbeddedResource> was commented out.

It seems that this wasn't a deliberate decision. The comment in the .csproj where the code embedding these DTDs was commented out indicates that the committer didn't know why they were being compiled in in the first place.

It may well be a reasonable choice not to embed these DTDs in .NET Core, because the space they take up might not be justified. (Maybe I'm the first person ever to try to read an XHTML document that contains standard XHTML entities that are not standard XML entites using XmlReader in .NET Core.) But if that is the case, the XmlPreloadedResolver should probably throw an exception (e.g., NotSupportedException) when you try to construct it in a way that asks it to make these well-known DTDs available.

As it is, it lets you construct it just fine, and then fails at the point where something first tries to resolve the relevant DTD. This makes it hard to work out what has gone on, because it's not at all obvious that the XmlPreloadedResolver simply doesn't support this particular use case in the same way that it always has done in desktop .NET. If it's not going to support the usage model you specify at construction, then it would be better for it to let you know that at construction.

@krwq
Copy link
Member

krwq commented Apr 16, 2019

@idg10 thanks for reporting this. Do you think you can send a test case and corresponding fix (presumably uncommenting the stuff I commented out)? If those files are missing in here and were never in the history of git I can dig them up and share.

@idg10
Copy link
Contributor Author

idg10 commented Apr 17, 2019

So just to confirm, you consider the correct fix here to be to bring .NET Core's behaviour into line with the long-standing behaviour in the desktop .NET Framework?

(The alternative was to have the XmlPreloadedResolver constructor throw an exception if any XmlKnownDtds other than None is passed.)

@krwq
Copy link
Member

krwq commented Apr 17, 2019

@idg10 we need to check 2 things first:

  • test which confirms we worked correctly on full framework and now we don't
  • check the resources size we are planning to add

assuming resources are not unreasonably large I think the correct fix is to add this behavior back - if they are large we can bring back only most common DTDs and throw in other cases to not increase overall framework size too much.

idg10 referenced this issue in idg10/corefx Apr 17, 2019
The XmlPreloadedResolver class purports to provide certain well-known DTDs - its constructor accepts a value from the XmlKnownDtds enumeration, and if you specify either of the supported DTD sets (sXHTML 1.0, and RSS 0.91) it is meant to make the relevant DTDs available.

In desktop .NET, this works as advertised. In .NET Core, it does not, for the reasons described in https://github.com/dotnet/corefx/issues/36929

This was missed because there are no tests to verify that the XmlPreloadedResolver makes the relevant DTDs available when asked to. (In fact, the tests that existed seemed to be based on a misunderstanding of how this class works.) This adds tests that verify that the known DTDs are provided when requested. And it reinstates the relevant EmbeddedResource entries required to enable the functionality. (The code for this has been in .NET Core's XmlPreloadedResolver all along. It was only the absence of the necessary embedded resources preventing it from working.)
@idg10
Copy link
Contributor Author

idg10 commented Apr 17, 2019

OK, as you can see I just created a commit for this in my fork of the repo. I've not created a PR yet because I don't know if we're ready for that.

This commit does the following:

  • Adds tests to verify that the well-known DTDs are supplied when requested
  • Modifies existing tests that didn't quite seem to understand how XmlPreloadedResolver was meant to be used (they were supplying their own content for the well-known URIs, which isn't the intended use model - either you use the well-known URIs, or you supply your own content for non-well-known URIs; this changes the tests to reflect this)
  • Reinstates the <EmbeddedResource> elements that had been commented out.

If I run these tests against the desktop FX, they pass. If I run them in situ in the corefx repo without the modification to System.Private.Xml.csproj they fail. If I run them against the modified System.Private.Xml.csproj they succeed.

@krwq
Copy link
Member

krwq commented Apr 17, 2019

@idg10 tests look good to me - I've just checked the sizes of the files and they seem reasonable (~ 65K total). I think you can go ahead with the PR

krwq referenced this issue in dotnet/corefx Apr 19, 2019
* Support XmlKnownDtds in XmlPreloadedResolver

The XmlPreloadedResolver class purports to provide certain well-known DTDs - its constructor accepts a value from the XmlKnownDtds enumeration, and if you specify either of the supported DTD sets (sXHTML 1.0, and RSS 0.91) it is meant to make the relevant DTDs available.

In desktop .NET, this works as advertised. In .NET Core, it does not, for the reasons described in https://github.com/dotnet/corefx/issues/36929

This was missed because there are no tests to verify that the XmlPreloadedResolver makes the relevant DTDs available when asked to. (In fact, the tests that existed seemed to be based on a misunderstanding of how this class works.) This adds tests that verify that the known DTDs are provided when requested. And it reinstates the relevant EmbeddedResource entries required to enable the functionality. (The code for this has been in .NET Core's XmlPreloadedResolver all along. It was only the absence of the necessary embedded resources preventing it from working.)

* Completed refactoring that was half-complete

In the previous commit I introduced the NormalizeContent method to avoid duplication of various calls to string.Replace but I left in one such call. This replaces it with a call to the new NormalizeContent, as intended.

* Replace \ with / in test paths

Tests failed on Linux with this:

Could not find a part of the path '/root/helix/work/workitem/Utils\\DTDs\\/XHTML10\

I'm hoping that replacing the backslashes with forward slashes will get the tests passing.

* Replaced fixed / and \ with Path.Combine

Although using / everywhere did seem to work, the preferred way to generate paths is Path.Combine.

Note that this still uses string concatenation for the system ID URI because that's not a path, it's just an identifier.
@krwq krwq closed this as completed Apr 24, 2019
@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 3.0 milestone Feb 1, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants