Skip to content
This repository has been archived by the owner on Dec 19, 2018. It is now read-only.

Option to bypass GetContent HTML encoding in TagHelper #643

Closed
khalidabuhakmeh opened this issue Dec 14, 2015 · 20 comments
Closed

Option to bypass GetContent HTML encoding in TagHelper #643

khalidabuhakmeh opened this issue Dec 14, 2015 · 20 comments
Assignees
Milestone

Comments

@khalidabuhakmeh
Copy link

The current implementation of TagHelperContent.GetContent automatically HTML encodes when called from within the TagHelper implementation. Using Dave Paquette's implementation of Markdown Taghelper, I am able to reproduce the unexpected behavior of premature encoding.

Implementation

[HtmlTargetElement("p", Attributes = "markdown")]
[HtmlTargetElement("markdown")]
[OutputElementHint("p")]
public class MarkdownTagHelper : TagHelper
{
    public async override Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
    {
        if (output.TagName == "markdown")
        {
            output.TagName = "p";
        }
        var childContent = await context.GetChildContentAsync();

        // This call to get markdownContent is HTML encoded
        // prematurely and strips meaningful newline characters
        // necessary for the markdown output
        var markdownContent = childContent.GetContent();

        var markdown = new MarkdownSharp.Markdown();
        var htmlContent = markdown.Transform(markdownContent);
        output.Content.SetContentEncoded(htmlContent);
    }
}

Input

### Features\r\n\r\n* API namespace cleanup for tag helpers ([#578](https://github.com/aspnet/Razor/issues/578))\r\n* Refactor WriteAttribute API surface. ([#177](https://github.com/aspnet/Razor/issues/177))\r\n\r\n### Bugs Fixed\r\n\r\n* Multiple TagHelpers targeting same element cannot effectively communicate to children. ([#571](https://github.com/aspnet/Razor/issues/571))\r\n* Compilation error within tag attribute shows generated code instead of markup ([#569](https://github.com/aspnet/Razor/issues/569))\r\n* Specifying `RestrictChildren` and empty `HtmlTargetElement` results in an error ([#562](https://github.com/aspnet/Razor/issues/562))\r\n* `TreeStructureChanged` marked as `true` when whitespace is added to a page with `TagHelper`s ([#553](https://github.com/aspnet/Razor/issues/553))\r\n* Expose `FilePath` on `MappingLocation` to let Razor editor know mapping origination. ([#552](https://github.com/aspnet/Razor/issues/552))\r\n\r\n

Expected

Actual

Proposed Solution

I expect to be able to pull the exact contents of the tag element, even if there is content that is potentially unsafe at runtime. The responsibility of HTML encoding should be left up to the TagHelper developer.


http://www.khalidabuhakmeh.com/asp-net-5-mvc-6-taghelpers-and-the-cake

@DamianEdwards
Copy link
Member

Thanks for the write up. Here's a gist I did to demonstrate the issue too: https://gist.github.com/DamianEdwards/d57810d28d977058e702

@khalidabuhakmeh
Copy link
Author

I did notice that this "fixes" the issue, like you suggested.

<markdown>@(new HtmlString(release.Body))</markdown>

It just smells.

@DamianEdwards
Copy link
Member

Yeah this is a layering issue. The Razor parser is emitting code that always evaluates the content inside a Tag Helper as a Razor expression, and thus writes it out that way, which always HTML encodes. I think we need to either:

  1. Allow for the Tag Helper to declare that it doesn't want the content encoded
  2. Always capture the encoded and unencoded result of the expression then let the Tag Helper get at each via API

@dougbu
Copy link
Member

dougbu commented Dec 14, 2015

@DamianEdwards suggestion 1 is similar to letting tag helpers declare authors should treat the element content like a bound string property. As such we'd need some new tooling support.

For suggestion 2, note one issue with concepts such as "encoded and unencoded result of the expression" is the mix of scopes in a .cshtml file. There is often no single expression. Even an attribute bound to a string property of a tag helper can contain more than just C#. Doubled HTML encoding won't be eradicated.

Separately, binding to a string property (instead of using the element body) is almost a workaround for this issue. Except in <text> and similar corner cases, nothing is encoded before the tag helper gets it.

@DamianEdwards
Copy link
Member

@dougbu why would suggestion 1 require tooling support? It's just changing the behavior of contained expressions, not literal content.

@dougbu
Copy link
Member

dougbu commented Dec 14, 2015

why would suggestion 1 require tooling support? It's just changing the behavior of contained expressions, not literal content.

Because users HTML encode literal content when they're typing in an HTML context.

@DamianEdwards
Copy link
Member

Example?

@dougbu
Copy link
Member

dougbu commented Dec 14, 2015

<tagHelper>I love me some &lt;tagHelper&gt;.</TagHelper>

@DamianEdwards
Copy link
Member

@dougbu ah yes, of course. A tooling feature isn't a huge deal, and frankly we could ship a version without it. This will be applicable only to very particular Tag Helpers. Let's chat about it in person this week.

@NTaylorMullen
Copy link
Member

@DamianEdwards and I worked through the implications of this issue and decided that in order to fix this we'll add an overload to GetChildContentAsync that takes an encoder. It's not ideal since you'll also have a GetContent(HtmlEncoder) overload; the idea behind it is that TagHelper rendering has two pieces:

  1. Prepare the body to be rendered. This essentially builds out a buffer used to render final content, in cases like @("<p>This should be encoded</p>") it needs to execute the code and encode it when preparing. - GetChildContentAsync
  2. Render the body with the specified encoder. - GetContent

Both parts need the encoder.

@dougbu
Copy link
Member

dougbu commented Dec 14, 2015

@NTaylorMullen please loop in @Eilon on these conversations or talk to me about it tomorrow. What you're suggesting will be incorrect in at least some cases e.g. literal text between the start and end tags. Note as well that this approach assumes all encoding is deferred until GetContent(HtmlEncoder) (ick) is called.

@NTaylorMullen
Copy link
Member

@dougbu I think you misunderstood the suggestion.

Note as well that this approach assumes all encoding is deferred until GetContent(HtmlEncoder) (ick) is called.

The suggested approach does not defer all encoding. Anything that is @_____ (aka RazorPage.WriteTo where the value is not a string and not an IHtmlContent will be encoded with the encoder passed to GetChildContentAsync. Therefore, to get the "raw" value of the body users would be able to pass in no-op encoders into GetChildContentAsync and GetContent methods.

@dougbu
Copy link
Member

dougbu commented Dec 14, 2015

suggested approach does not defer all encoding

Huh? It assumes everything inside that @ expression has not yet been encoded -- that's deferral.

The approach also assumes literal strings contain neither characters that are already encoded &lt;tag&gt; nor characters that need to be encoded <tag>, depending on how the .cshtml author thought it would be handled.

@NTaylorMullen
Copy link
Member

Huh? It assumes everything inside that @ expression has not yet been encoded -- that's deferral.

Oh, I wouldn't consider that deferred. That's just how Razor has always worked. Anything after @ is run through WriteTo which may or may not encode the value. In this new case, it'd just use a different encoder IF it decided a value should be encoded.

The approach also assumes literal strings contain neither characters that are already encoded <tag> nor characters that need to be encoded , depending on how the .cshtml author thought it would be handled.

It's just Razor, it doesn't make any assumptions, I'm not seeing how this is different than classic Razor. If you want your stuff to not be encoded and its a string, wrap it in @Html.Raw.

@Eilon Eilon added this to the 4.0.0-rc2 milestone Dec 30, 2015
@Eilon
Copy link
Member

Eilon commented Dec 30, 2015

@dougbu please also add to this bug the final decisions that were made when you met with @DamianEdwards and @NTaylorMullen so that we can track those here.

@dougbu
Copy link
Member

dougbu commented Dec 31, 2015

@DamianEdwards, @NTaylorMullen and I talked about this before the ⛄🎁Days of Quietude🎆🎉

As I remember it, we all recognize literal text in the element's body involves lots of odd cases e.g. when the .cshtml author types in &lt;. But @DamianEdwards wants to provide an escape hatch that disables further encoding, with lots of warnings glowing around it.

So we'll go with what @NTaylorMullen described above:

an overload to GetChildContentAsync that takes an encoder

Ideally we'd also have Tooling support of some kind -- something to let the .cshtml author know they're not in a regular HTML context.

But I won't be doing the work this year and this extends the ITagHelper API. 🆗 to do in January?

@Eilon
Copy link
Member

Eilon commented Dec 31, 2015

I'm cool with this. @DamianEdwards please confirm.

@DamianEdwards
Copy link
Member

👍

dougbu added a commit that referenced this issue Jan 6, 2016
…ding an `HtmlEncoder`

- #643
- change is viral and requires an update to `RazorPage.StartTagHelperWritingScope()`
- note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected

WIP:
- new tests of the new `TagHelperOutput` methods yet
- test do not pass a non-`null` `HtmlEncoder` into `TagHelperScopeManager.Begin()`

Nit: remove unused `using`s in files I had open
dougbu added a commit to aspnet/Mvc that referenced this issue Jan 6, 2016
- aspnet/Razor#643 part 2
- add `HtmlEncoder` parameter to `RazorPage.StartTagHelperWritingScope()`
- note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected

Also simplify scope management, removing bizarre assertion when user does something odd.
- see code comments to reviewers about other options here.

Loads of test fallout though this is a WIP: Not yet testing out the new feature.
dougbu added a commit that referenced this issue Jan 8, 2016
…ding an `HtmlEncoder`

- #643
- change is viral and requires an update to `RazorPage.StartTagHelperWritingScope()`
- note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected

WIP:
- new tests of the new `TagHelperOutput` methods yet
- test do not pass a non-`null` `HtmlEncoder` into `TagHelperScopeManager.Begin()`

Nit: remove unused `using`s in files I had open
dougbu added a commit to aspnet/Mvc that referenced this issue Jan 10, 2016
- aspnet/Razor#643 part 2
- add `HtmlEncoder` parameter to `RazorPage.StartTagHelperWritingScope()`
- note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected

Also simplify scope management, removing bizarre assertion when user does something odd.
- see code comments to reviewers about other options here.

Loads of test fallout though this is a WIP: Not yet testing out the new feature.
dougbu added a commit to aspnet/Mvc that referenced this issue Jan 10, 2016
- improve doc comments
dougbu added a commit that referenced this issue Jan 18, 2016
…HtmlEncoder`

- #643 part 1
- change is viral and requires an update to `RazorPage.StartTagHelperWritingScope()`
  - memoize `GetChildContentAsync()` per-encoder
  - update generation tests to match and to test new behaviour
- note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected

Add `NullHtmlEncoder`

Nits:
- generally clean up affected doc comments and make them more consistent
- remove unused `using`s in files I had open
dougbu added a commit to aspnet/Mvc that referenced this issue Jan 18, 2016
- aspnet/Razor#643 part 2: react to aspnet/Razor#653
 - change test calls and delegates to include new parameter
 - add tests specifically of the new feature
 - add tag helpers using new feature to `TagHelpersTest` functional test
- note `HtmlEncoder`s used elsewhere e.g. in other `RazorPage` instances are unaffected
 - i.e. changing one `RazorPage.HtmlEncoder` property only affects C# expressions in that page

Also simplify scope management, removing bizarre assertion when user does something odd.

nits:
- correct `// Act` and `//  Act & Assert` content
- remove #YOLO wrapping
@dougbu
Copy link
Member

dougbu commented Jan 18, 2016

@DFieldhouse
Copy link

The way to get the raw content is like this:

var markdown = (await output.GetChildContentAsync(NullHtmlEncoder.Default)).GetContent(NullHtmlEncoder.Default);

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants