Description
I noticed this issue while using PreMailer.Net and was able to isolate the cause down to AngleSharp. I describe the issue in some detail in an issue on PreMailer.Net's repository.
When the HTML I parse contains a non-ASCII value, like an ampersand, AngleSharp will encode the character. For example:
static async Task FirstExample()
{
var config = Configuration.Default;
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content("<html><head></head><body><p>&</p></body></html>"));
Console.WriteLine(document.DocumentElement.OuterHtml);
}
The following code will take the input:
<html><head></head><body><p>&</p></body></html>
And will output:
<html><head></head><body><p>&</p></body></html>
In my research on this issue, which has been reported and caused issues for many users, I found the following statement from @FlorianRappl in this closed issue.
This is by specification, see the string escaping that needs to be applied on attribute values..
However, as demonstrated in my above example, this is not just encoding attribute values, it is actually encoding innerHTML content which unless I am mistaken, is certainly valid HTML. I am not aware of any standards that say that HTML content must only include encoded strings.