Option to bypass GetContent HTML encoding in TagHelper #643

khalidabuhakmeh · 2015-12-14T01:04:38Z

The current implementation of TagHelperContent.GetContent automatically HTML encodes when called from within the TagHelper implementation. Using Dave Paquette's implementation of Markdown Taghelper, I am able to reproduce the unexpected behavior of premature encoding.

Implementation

[HtmlTargetElement("p", Attributes = "markdown")]
[HtmlTargetElement("markdown")]
[OutputElementHint("p")]
public class MarkdownTagHelper : TagHelper
{
    public async override Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
    {
        if (output.TagName == "markdown")
        {
            output.TagName = "p";
        }
        var childContent = await context.GetChildContentAsync();

        // This call to get markdownContent is HTML encoded
        // prematurely and strips meaningful newline characters
        // necessary for the markdown output
        var markdownContent = childContent.GetContent();

        var markdown = new MarkdownSharp.Markdown();
        var htmlContent = markdown.Transform(markdownContent);
        output.Content.SetContentEncoded(htmlContent);
    }
}

Input

### Features\r\n\r\n* API namespace cleanup for tag helpers ([#578](https://github.com/aspnet/Razor/issues/578))\r\n* Refactor WriteAttribute API surface. ([#177](https://github.com/aspnet/Razor/issues/177))\r\n\r\n### Bugs Fixed\r\n\r\n* Multiple TagHelpers targeting same element cannot effectively communicate to children. ([#571](https://github.com/aspnet/Razor/issues/571))\r\n* Compilation error within tag attribute shows generated code instead of markup ([#569](https://github.com/aspnet/Razor/issues/569))\r\n* Specifying `RestrictChildren` and empty `HtmlTargetElement` results in an error ([#562](https://github.com/aspnet/Razor/issues/562))\r\n* `TreeStructureChanged` marked as `true` when whitespace is added to a page with `TagHelper`s ([#553](https://github.com/aspnet/Razor/issues/553))\r\n* Expose `FilePath` on `MappingLocation` to let Razor editor know mapping origination. ([#552](https://github.com/aspnet/Razor/issues/552))\r\n\r\n

Expected

Actual

Proposed Solution

I expect to be able to pull the exact contents of the tag element, even if there is content that is potentially unsafe at runtime. The responsibility of HTML encoding should be left up to the TagHelper developer.

http://www.khalidabuhakmeh.com/asp-net-5-mvc-6-taghelpers-and-the-cake

The text was updated successfully, but these errors were encountered:

DamianEdwards · 2015-12-14T01:13:17Z

Thanks for the write up. Here's a gist I did to demonstrate the issue too: https://gist.github.com/DamianEdwards/d57810d28d977058e702

khalidabuhakmeh · 2015-12-14T01:27:28Z

I did notice that this "fixes" the issue, like you suggested.

<markdown>@(new HtmlString(release.Body))</markdown>

It just smells.

DamianEdwards · 2015-12-14T01:29:45Z

Yeah this is a layering issue. The Razor parser is emitting code that always evaluates the content inside a Tag Helper as a Razor expression, and thus writes it out that way, which always HTML encodes. I think we need to either:

Allow for the Tag Helper to declare that it doesn't want the content encoded
Always capture the encoded and unencoded result of the expression then let the Tag Helper get at each via API

dougbu · 2015-12-14T02:00:05Z

@DamianEdwards suggestion 1 is similar to letting tag helpers declare authors should treat the element content like a bound string property. As such we'd need some new tooling support.

For suggestion 2, note one issue with concepts such as "encoded and unencoded result of the expression" is the mix of scopes in a .cshtml file. There is often no single expression. Even an attribute bound to a string property of a tag helper can contain more than just C#. Doubled HTML encoding won't be eradicated.

Separately, binding to a string property (instead of using the element body) is almost a workaround for this issue. Except in <text> and similar corner cases, nothing is encoded before the tag helper gets it.

DamianEdwards · 2015-12-14T03:23:19Z

@dougbu why would suggestion 1 require tooling support? It's just changing the behavior of contained expressions, not literal content.

dougbu · 2015-12-14T04:31:43Z

why would suggestion 1 require tooling support? It's just changing the behavior of contained expressions, not literal content.

Because users HTML encode literal content when they're typing in an HTML context.

DamianEdwards · 2015-12-14T04:33:38Z

Example?

dougbu · 2015-12-14T04:36:58Z

<tagHelper>I love me some <tagHelper>.</TagHelper>

DamianEdwards · 2015-12-14T04:40:55Z

@dougbu ah yes, of course. A tooling feature isn't a huge deal, and frankly we could ship a version without it. This will be applicable only to very particular Tag Helpers. Let's chat about it in person this week.

NTaylorMullen · 2015-12-14T22:42:15Z

@DamianEdwards and I worked through the implications of this issue and decided that in order to fix this we'll add an overload to GetChildContentAsync that takes an encoder. It's not ideal since you'll also have a GetContent(HtmlEncoder) overload; the idea behind it is that TagHelper rendering has two pieces:

Prepare the body to be rendered. This essentially builds out a buffer used to render final content, in cases like @("<p>This should be encoded</p>") it needs to execute the code and encode it when preparing. - GetChildContentAsync
Render the body with the specified encoder. - GetContent

Both parts need the encoder.

dougbu · 2015-12-14T22:54:14Z

@NTaylorMullen please loop in @Eilon on these conversations or talk to me about it tomorrow. What you're suggesting will be incorrect in at least some cases e.g. literal text between the start and end tags. Note as well that this approach assumes all encoding is deferred until GetContent(HtmlEncoder) (ick) is called.

NTaylorMullen · 2015-12-14T22:59:19Z

@dougbu I think you misunderstood the suggestion.

Note as well that this approach assumes all encoding is deferred until GetContent(HtmlEncoder) (ick) is called.

The suggested approach does not defer all encoding. Anything that is @_____ (aka RazorPage.WriteTo where the value is not a string and not an IHtmlContent will be encoded with the encoder passed to GetChildContentAsync. Therefore, to get the "raw" value of the body users would be able to pass in no-op encoders into GetChildContentAsync and GetContent methods.

dougbu · 2015-12-14T23:05:24Z

suggested approach does not defer all encoding

Huh? It assumes everything inside that @ expression has not yet been encoded -- that's deferral.

The approach also assumes literal strings contain neither characters that are already encoded <tag> nor characters that need to be encoded <tag>, depending on how the .cshtml author thought it would be handled.

NTaylorMullen · 2015-12-14T23:10:01Z

Huh? It assumes everything inside that @ expression has not yet been encoded -- that's deferral.

Oh, I wouldn't consider that deferred. That's just how Razor has always worked. Anything after @ is run through WriteTo which may or may not encode the value. In this new case, it'd just use a different encoder IF it decided a value should be encoded.

The approach also assumes literal strings contain neither characters that are already encoded <tag> nor characters that need to be encoded , depending on how the .cshtml author thought it would be handled.

It's just Razor, it doesn't make any assumptions, I'm not seeing how this is different than classic Razor. If you want your stuff to not be encoded and its a string, wrap it in @Html.Raw.

Eilon · 2015-12-30T20:31:15Z

@dougbu please also add to this bug the final decisions that were made when you met with @DamianEdwards and @NTaylorMullen so that we can track those here.

dougbu · 2015-12-31T00:41:41Z

@DamianEdwards, @NTaylorMullen and I talked about this before the ⛄🎁Days of Quietude🎆🎉

As I remember it, we all recognize literal text in the element's body involves lots of odd cases e.g. when the .cshtml author types in <. But @DamianEdwards wants to provide an escape hatch that disables further encoding, with lots of warnings glowing around it.

So we'll go with what @NTaylorMullen described above:

an overload to GetChildContentAsync that takes an encoder

Ideally we'd also have Tooling support of some kind -- something to let the .cshtml author know they're not in a regular HTML context.

But I won't be doing the work this year and this extends the ITagHelper API. 🆗 to do in January?

Eilon · 2015-12-31T21:08:52Z

I'm cool with this. @DamianEdwards please confirm.

DamianEdwards · 2015-12-31T21:56:00Z

👍

…ding an `HtmlEncoder` - #643 - change is viral and requires an update to `RazorPage.StartTagHelperWritingScope()` - note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected WIP: - new tests of the new `TagHelperOutput` methods yet - test do not pass a non-`null` `HtmlEncoder` into `TagHelperScopeManager.Begin()` Nit: remove unused `using`s in files I had open

- aspnet/Razor#643 part 2 - add `HtmlEncoder` parameter to `RazorPage.StartTagHelperWritingScope()` - note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected Also simplify scope management, removing bizarre assertion when user does something odd. - see code comments to reviewers about other options here. Loads of test fallout though this is a WIP: Not yet testing out the new feature.

…ding an `HtmlEncoder` - #643 - change is viral and requires an update to `RazorPage.StartTagHelperWritingScope()` - note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected WIP: - new tests of the new `TagHelperOutput` methods yet - test do not pass a non-`null` `HtmlEncoder` into `TagHelperScopeManager.Begin()` Nit: remove unused `using`s in files I had open

- aspnet/Razor#643 part 2 - add `HtmlEncoder` parameter to `RazorPage.StartTagHelperWritingScope()` - note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected Also simplify scope management, removing bizarre assertion when user does something odd. - see code comments to reviewers about other options here. Loads of test fallout though this is a WIP: Not yet testing out the new feature.

- improve doc comments

…HtmlEncoder` - #643 part 1 - change is viral and requires an update to `RazorPage.StartTagHelperWritingScope()` - memoize `GetChildContentAsync()` per-encoder - update generation tests to match and to test new behaviour - note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected Add `NullHtmlEncoder` Nits: - generally clean up affected doc comments and make them more consistent - remove unused `using`s in files I had open

- aspnet/Razor#643 part 2: react to aspnet/Razor#653 - change test calls and delegates to include new parameter - add tests specifically of the new feature - add tag helpers using new feature to `TagHelpersTest` functional test - note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected - i.e. changing one `RazorPage.HtmlEncoder` property only affects C# expressions in that page Also simplify scope management, removing bizarre assertion when user does something odd. nits: - correct `// Act` and `// Act & Assert` content - remove #YOLO wrapping

dougbu · 2016-01-18T07:02:30Z

DFieldhouse · 2017-09-22T16:22:18Z

The way to get the raw content is like this:

var markdown = (await output.GetChildContentAsync(NullHtmlEncoder.Default)).GetContent(NullHtmlEncoder.Default);

DamianEdwards mentioned this issue Dec 14, 2015

Discussion: TagHelperContent APIs now HTML encode strings by default #566

Closed

Eilon added bug 1 - Ready labels Dec 30, 2015

Eilon added this to the 4.0.0-rc2 milestone Dec 30, 2015

Eilon assigned dougbu Dec 30, 2015

dougbu mentioned this issue Jan 6, 2016

Add TagHelperOutput.GetChildContentAsync() overloads including an HtmlEncoder #653

Closed

dougbu added 2 - Working and removed 1 - Ready labels Jan 6, 2016

dougbu mentioned this issue Jan 6, 2016

React to aspnet/Razor#653 aspnet/Mvc#3871

Closed

dougbu added a commit to aspnet/Mvc that referenced this issue Jan 10, 2016

Address aspnet/Razor#643 PR comments

8d8dd16

- improve doc comments

dougbu closed this as completed Jan 18, 2016

dougbu added 3 - Done and removed 2 - Working labels Jan 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to bypass GetContent HTML encoding in TagHelper #643

Option to bypass GetContent HTML encoding in TagHelper #643

khalidabuhakmeh commented Dec 14, 2015

DamianEdwards commented Dec 14, 2015

khalidabuhakmeh commented Dec 14, 2015

DamianEdwards commented Dec 14, 2015

dougbu commented Dec 14, 2015

DamianEdwards commented Dec 14, 2015

dougbu commented Dec 14, 2015

DamianEdwards commented Dec 14, 2015

dougbu commented Dec 14, 2015

DamianEdwards commented Dec 14, 2015

NTaylorMullen commented Dec 14, 2015

dougbu commented Dec 14, 2015

NTaylorMullen commented Dec 14, 2015

dougbu commented Dec 14, 2015

NTaylorMullen commented Dec 14, 2015

Eilon commented Dec 30, 2015

dougbu commented Dec 31, 2015

Eilon commented Dec 31, 2015

DamianEdwards commented Dec 31, 2015

dougbu commented Jan 18, 2016

DFieldhouse commented Sep 22, 2017

Option to bypass GetContent HTML encoding in TagHelper #643

Option to bypass GetContent HTML encoding in TagHelper #643

Comments

khalidabuhakmeh commented Dec 14, 2015

Implementation

Input

Expected

Actual

Proposed Solution

DamianEdwards commented Dec 14, 2015

khalidabuhakmeh commented Dec 14, 2015

DamianEdwards commented Dec 14, 2015

dougbu commented Dec 14, 2015

DamianEdwards commented Dec 14, 2015

dougbu commented Dec 14, 2015

DamianEdwards commented Dec 14, 2015

dougbu commented Dec 14, 2015

DamianEdwards commented Dec 14, 2015

NTaylorMullen commented Dec 14, 2015

dougbu commented Dec 14, 2015

NTaylorMullen commented Dec 14, 2015

dougbu commented Dec 14, 2015

NTaylorMullen commented Dec 14, 2015

Eilon commented Dec 30, 2015

dougbu commented Dec 31, 2015

Eilon commented Dec 31, 2015

DamianEdwards commented Dec 31, 2015

dougbu commented Jan 18, 2016

DFieldhouse commented Sep 22, 2017