New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Xml.Resolvers.XmlPreloadedResolver does not support XmlKnownDtds and fails with surprising error #29280
Comments
@idg10 thanks for reporting this. Do you think you can send a test case and corresponding fix (presumably uncommenting the stuff I commented out)? If those files are missing in here and were never in the history of git I can dig them up and share. |
So just to confirm, you consider the correct fix here to be to bring .NET Core's behaviour into line with the long-standing behaviour in the desktop .NET Framework? (The alternative was to have the |
@idg10 we need to check 2 things first:
assuming resources are not unreasonably large I think the correct fix is to add this behavior back - if they are large we can bring back only most common DTDs and throw in other cases to not increase overall framework size too much. |
The XmlPreloadedResolver class purports to provide certain well-known DTDs - its constructor accepts a value from the XmlKnownDtds enumeration, and if you specify either of the supported DTD sets (sXHTML 1.0, and RSS 0.91) it is meant to make the relevant DTDs available. In desktop .NET, this works as advertised. In .NET Core, it does not, for the reasons described in https://github.com/dotnet/corefx/issues/36929 This was missed because there are no tests to verify that the XmlPreloadedResolver makes the relevant DTDs available when asked to. (In fact, the tests that existed seemed to be based on a misunderstanding of how this class works.) This adds tests that verify that the known DTDs are provided when requested. And it reinstates the relevant EmbeddedResource entries required to enable the functionality. (The code for this has been in .NET Core's XmlPreloadedResolver all along. It was only the absence of the necessary embedded resources preventing it from working.)
OK, as you can see I just created a commit for this in my fork of the repo. I've not created a PR yet because I don't know if we're ready for that. This commit does the following:
If I run these tests against the desktop FX, they pass. If I run them in situ in the |
@idg10 tests look good to me - I've just checked the sizes of the files and they seem reasonable (~ 65K total). I think you can go ahead with the PR |
* Support XmlKnownDtds in XmlPreloadedResolver The XmlPreloadedResolver class purports to provide certain well-known DTDs - its constructor accepts a value from the XmlKnownDtds enumeration, and if you specify either of the supported DTD sets (sXHTML 1.0, and RSS 0.91) it is meant to make the relevant DTDs available. In desktop .NET, this works as advertised. In .NET Core, it does not, for the reasons described in https://github.com/dotnet/corefx/issues/36929 This was missed because there are no tests to verify that the XmlPreloadedResolver makes the relevant DTDs available when asked to. (In fact, the tests that existed seemed to be based on a misunderstanding of how this class works.) This adds tests that verify that the known DTDs are provided when requested. And it reinstates the relevant EmbeddedResource entries required to enable the functionality. (The code for this has been in .NET Core's XmlPreloadedResolver all along. It was only the absence of the necessary embedded resources preventing it from working.) * Completed refactoring that was half-complete In the previous commit I introduced the NormalizeContent method to avoid duplication of various calls to string.Replace but I left in one such call. This replaces it with a call to the new NormalizeContent, as intended. * Replace \ with / in test paths Tests failed on Linux with this: Could not find a part of the path '/root/helix/work/workitem/Utils\\DTDs\\/XHTML10\ I'm hoping that replacing the backslashes with forward slashes will get the tests passing. * Replaced fixed / and \ with Path.Combine Although using / everywhere did seem to work, the preferred way to generate paths is Path.Combine. Note that this still uses string concatenation for the system ID URI because that's not a path, it's just an identifier.
In desktop .NET, the
XmlPreloadedResolver
offers some baked in DTDs, as listed in theXmlKnownDtds
enumeration. If you construct anew XmlPreloadedResolver(XmlKnownDtds.Xhtml10)
, it will be able to resolve DTDs with common ids such as""-//W3C//DTD XHTML 1.0 Strict//EN";"
. (In fact, this will make a family of related DTDs available. It makes it possible to parse XHTML documents. Without these DTDs, certain standard entities that are valid in XHTML but which are not standard XML entities, will cause failures during XML parsing.)However, in .NET core, these DTDs are not compiled in. See commit dotnet/corefx@9a4b239 for where the relevant
<EmbeddedResource>
was commented out.It seems that this wasn't a deliberate decision. The comment in the
.csproj
where the code embedding these DTDs was commented out indicates that the committer didn't know why they were being compiled in in the first place.It may well be a reasonable choice not to embed these DTDs in .NET Core, because the space they take up might not be justified. (Maybe I'm the first person ever to try to read an XHTML document that contains standard XHTML entities that are not standard XML entites using
XmlReader
in .NET Core.) But if that is the case, theXmlPreloadedResolver
should probably throw an exception (e.g.,NotSupportedException
) when you try to construct it in a way that asks it to make these well-known DTDs available.As it is, it lets you construct it just fine, and then fails at the point where something first tries to resolve the relevant DTD. This makes it hard to work out what has gone on, because it's not at all obvious that the
XmlPreloadedResolver
simply doesn't support this particular use case in the same way that it always has done in desktop .NET. If it's not going to support the usage model you specify at construction, then it would be better for it to let you know that at construction.The text was updated successfully, but these errors were encountered: