Skip to content
This repository has been archived by the owner on Dec 14, 2018. It is now read-only.

FromRoute parameters aren't URL Decoded. #4599

Closed
MicahZoltu opened this issue May 8, 2016 · 33 comments
Closed

FromRoute parameters aren't URL Decoded. #4599

MicahZoltu opened this issue May 8, 2016 · 33 comments
Assignees

Comments

@MicahZoltu
Copy link

[HttpGet]
[Route("")]
public async Task<IActionResult> GetValue([FromQuery(Name = "value")] String value) { ... }

[HttpGet]
[Route("{value}")]
public async Task<IActionResult> GetValue([FromRoute(Name = "value")] String value) { ... }

In the first example, a URL encoded string passed as the value query string parameter will be decoded before being passed into the GetValue function.

In the second example, a URL encoded string passed as a route segment will NOT be URL decoded before being passed to the GetValue function.

This behavior is unexpected, I would expect both Route and Query parameters to be URL decoded before being passed to the function.

If this is expected behavior, is there a workaround or a way to tell the route to decode before passing the parameter through? The problem is made more complex in my real world example because the route parameter is a URL and I want the parameter to be a Uri rather than a String. My current solution is to accept a String parameter and then construct a new Uri(value) out of it. However, this means I don't benefit from ModelState validation since the only a string is necessary to have the model state be valid.

@Eilon
Copy link
Member

Eilon commented May 9, 2016

@Zoltu there was a bug in this in the past, but it was fixed in the RC1 release. I just tried it in the latest RC2 nightly builds and I don't see the bug for query strings or route data.

Can you clarify which build of ASP.NET Core you are using?

@MicahZoltu
Copy link
Author

RC1 final. Il try with RC2 This evening and post here whether it resolves the issue or not.

@danroth27
Copy link
Member

I am not able to reproduce this behavior on RC1 final. In both cases I see the model bound value as decoded.

@MicahZoltu
Copy link
Author

@Eilon @danroth27

I just tested this again in a brand new ASPNET Core application created in Visual Studio 2015 using the Web template (https://docs.asp.net/en/latest/tutorials/your-first-aspnet-application.html). I added Swashbuckle (via NuGet) for easy testing and made both of the methods simply return Ok(value).

The working method, where the URL encoded string is in a query parameter:
image

The not working method, where the URL encoded string is in a route parameter:
image

You can see that Swashbuckle encoded the URL the same in both requests, just one in the query string and one in the route. However, the results from the two calls were different, despite the actions being the same except for the FromRoute/FromQuery attributes.

@Eilon Eilon reopened this May 10, 2016
@Eilon
Copy link
Member

Eilon commented May 10, 2016

@Zoltu thanks for the extra info, I'll have someone try this out and see what's going on here. Very odd.

@Eilon Eilon added this to the 1.0.0 milestone May 12, 2016
@Eilon Eilon assigned ryanbrandenburg and unassigned dougbu May 17, 2016
@Eilon
Copy link
Member

Eilon commented May 17, 2016

@ryanbrandenburg, a gift for you.

@ryanbrandenburg
Copy link
Member

Hello @Zoltu! I did some investigation and found some interesting behaviors here.

The first thing I found was that in your example the route response was partially URL decoded. The only character which wasn't decoded was %2f, which is /. The other thing I learned was that you get this behavior only when self-hosting (that is, running the project without IISExpress), if you had launched this project from IISExpress you would have gotten different behavior, namely that your %2f's would be URL decoded before the route had been determined, meaning that for the URL http://localhost:5000/first%2fsecond and routes

[Route("{value}")]
public IActionResult GetValue([FromRoute(Name = "value")] string value)
...

[Route("{first}/{second}")]
public IActionResult Multi([FromRoute(Name = "first")] string first, [FromRoute(Name = "second")] string second)
...

you would hit GetValue when running self-hosted and Multi when running under IISExpress.

aspnet/KestrelHttpServer#124 appears to mention this behavior and it sounds like it's intentional (or at least known). @Tratcher and @troydai can hopefully confirm my reading of that.

If my reading was correct then I think the guidance would be not to use [FromRoute] on any values which are likely to contain %2f, and instead use [FromQuery] which will give you an entirely URL decoded string every time, regardless of which hosting method you're using.

@MicahZoltu
Copy link
Author

MicahZoltu commented May 18, 2016

Good catch on it only not decoding the / character, I didn't notice that.

I did discover the difference when IIS hosting (http://stackoverflow.com/questions/37178949/how-do-i-allow-url-encoded-path-segments-in-azure).

I am of the opinion that what IIS is doing is _incorrect behavior_ because it is negating the entire point of URL encoding. The point of URL encoding is to allow an HTTP request to pass data via the URL (route, query string, etc.) through to the application without it being interpreted by the routing engine. In this case, IIS is decoding my encoded value and then using the decoded value as part of its routing logic.

I would encourage Kestrel to _not_ follow IIS's lead on this one and instead have the web server behave to spec.

References:

Uniform Resource Identifier (URI): Generic Syntax § 2.4. When to Encode or Decode

When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters.

Uniform Resource Locators § 2.2. URL Character Encoding Issues (emphasis mine)

Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.
...
If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme.

It sounds like this is an issue with Kestrel and not with MVC. Should I move this issue over there?

@Eilon
Copy link
Member

Eilon commented May 18, 2016

Yeah I think IIS/http.sys do this for "security purposes" and doing the "extra" (i.e. wrong) decoding. Unfortunately, I think that once it does its decoding there no way to know how to get that raw data back because you don't know if what you see as a / is really / or if it started out as a %2f.

@Eilon
Copy link
Member

Eilon commented May 24, 2016

@Tratcher @blowdart need your view on this. I think this is technically by design, given how IIS/http.sys are designed to work. I think that raw Kestrel has the correct RFC behavior. And I'll perhaps regret asking this question, but should we modify Kestrel to have the same behavior as IIS/http.sys and super-decode %2f?

@Eilon
Copy link
Member

Eilon commented May 24, 2016

Talked to @Tratcher and this is presently by design. The IIS/http.sys feature was originally for security reasons when URLs typically mapped to physical files, and at this point for compat reasons that can't change. Kestrel, on the other hand, is a brand new server, and so does not have this legacy behavior.

@Eilon Eilon closed this as completed May 24, 2016
@blowdart
Copy link
Member

I like how you closed this before I could give you an answer you'll regret from me :p

@MicahZoltu
Copy link
Author

@Eilon Now that we have established that Kestrel should not be URL decoding per the RFC, we are back to the original issue of MVC not decoding the / even though it decodes everything else. I believe that the proper behavior for MVC would be to decode the route parameters the same as query string parameters.

Unless I am misunderstanding which part is by design, I think this issue should be reopened since the original problem is still outstanding. I suspect we just got side tracked on the IIS thing and which part of the pipeline should decode.

@Eilon Eilon reopened this May 25, 2016
@Eilon Eilon removed the by design label May 25, 2016
@Eilon
Copy link
Member

Eilon commented May 25, 2016

@Zoltu you are right. And when you're right, you're right. And you, you're always right! 😄

Yes, it seems that "something" (not sure what "something" is) should be URL decoding this before the app sees it...

@Tratcher
Copy link
Member

%2F is the only one that changes the definition of the path (excluding . and ..). All the others are safe to un-escape once the path has been isolated from the query, etc.. The %2F can only be unescaped once the individual segments have been separated, which only happens in routing or model binding.

@rynowak
Copy link
Member

rynowak commented May 25, 2016

If we're talking about the path, we're talking about routing. Nothing else in MVC looks at the path.

What do you think should happen for a route + path like the following?

route: /blog/posts/{*slug}
path: /blog/posts/routing/is%2Fcool

Should slug get the value routing/is/cool?

What should link generation output if you put routing/is/cool back in? /blog/posts/routing/is/cool or /blog/posts/routing%2Fis%2Fcool?

What about a case like this?

route: /blog/posts/{author}/{*subject}
path: /blog/posts/billy%2Fbobby/vaccuum-cleaners

Should author get the value billy or billy/bobby?

What should link generation output if you put billy/bobby back in? /blog/posts/billy%2Fbobby/vaccuum-cleaners or /blog/posts/billy/bobby/vaccuum-cleaners

@MicahZoltu
Copy link
Author

@Tratcher %2F is the only one that changes the definition of the path

I got a chance to sit down at a computer and throw a quick test together for the behavior that I was afraid of. My contention here is that per my interpretation of the RFC, Kestrel should not be doing any* (see last paragraph) URL decoding and MVC should not be doing URL decoding until model binding time (after routing). Based on what I have seen when looking briefly at the Kestrel code, it appears that it is doing URL decoding which leads to the incorrect MVC action being called, or in some cases the correct MVC action being called but with an incompletely decoded model. My original bug report only touched on the surface of this problem because it happened to be what I ran into but I believe the problem is much deeper and a fundamental flaw in the way URL Decoding is handled in Kestrel + MVC.

Given the following code and ASP.NET RC2 (including Kestrel RC2):

using System;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;

namespace TestWeb.Controllers
{
    [Route("api/[controller]")]
    public class ValuesController : Controller
    {
        [Route("foo&bar")]
        public async Task<String> Foo()
        {
            return "bar";
        }

        [Route("foo%26bar")]
        public async Task<String> Apple()
        {
            return "apple";
        }

        [Route("{zip}")]
        public async Task<String> Zip([FromRoute] String zip)
        {
            return $"zip: {zip}";
        }
    }
}

GET http://localhost:5000/api/values/foo%26bar returns bar. I believe this is incorrect behavior based on my interpretation of the RFC. %26 has been URL Encoded so its decoded value should not play a role in routing, which includes the MVC routing engine (not just Kestrel). In this example, I believe the correct behavior would be to hit the Apple action or if the Apple action didn't exist then fallback to the Zip action. While this example is a bit contrived, I believe it shows the underlying problem with URL decoding too early in the pipeline. The purpose of URL Encoding is to allow users to pass characters through to the action without them changing the way routing occurs.

Now this is a bit of a can of worms but the RFC special cases "uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde." and specifically says that they should be normalized to and handled in their unencoded form. This means that given the routes [Route("apple")] and [Route("{foo}")], apple should be called if the URI has a URL Encoded apple. Personally, I think this is a bit short sighted of the RFC since it means it isn't possible to hit the foo route with the string apple, but if Kestrel wants to follow the spec then it should URL Decode those (and only those) characters.

@Tratcher
Copy link
Member

How many of these scenarios have you tested against Asp.Net 4 / MVC 5?

@Tratcher
Copy link
Member

Note when running behind IIS we'll be subject to most of the same behaviors as Asp.Net 4 because they're implemented in Http.Sys.

@MicahZoltu
Copy link
Author

@rynowak What do you think should happen for a route + path like the following?

route: /blog/posts/{*slug}
path: /blog/posts/routing/is%2Fcool
Should slug get the value routing/is/cool?

Yes, routing/is/cool should show up in the model passed to the action.

What about a case like this?

route: /blog/posts/{author}/{*subject}
path: /blog/posts/billy%2Fbobby/vaccuum-cleaners
Should author get the value billy or billy/bobby?

Author should be bill/bobby.

I'm not familiar with MVC link generation so I probably shouldn't comment on it but I will anyway. :D In an ideal world (without humans), the link generation would generate links that were as guaranteed as possible to get back to the original route. This means URL Encoding everything except for the specially-handled characters listed in my previous comment (letters, digits, etc.). I suspect though that humans won't like that much, despite it giving the most desirable behavior in terms of guaranteeing proper routing in the face of all current and future possible routes.

@MicahZoltu
Copy link
Author

@Tratcher How many of these scenarios have you tested against Asp.Net 4 / MVC 5?

None. I see Kestrel and ASP.NET Core as an opportunity to make breaking changes with IIS and MVC 5 that bring .NET in line with various RFCs and web standards. If maintaining consistent behavior (despite it being counter to the RFC) is desired then most of my arguments here fall apart. :) Microsoft has a an unfortunate negative image in the developer community with regards to not following web standards in the past (IE 6, IIS, etc.). I would love to see Kestrel / ASP.NET come into line with existing RFCs where possible to help shed this image and encourage adoption, which is why I am pushing a bit on this (my specific use case can easily be worked around in a number of ways).

I do understand that IIS will continue to behave the way it always has, which includes its URL Rewrite stuff. My long term plan is to stop using IIS if possible (I'm hoping that Kestrel can achieve similar performance to IIS now or in the future).

@Eilon Eilon modified the milestones: 1.0.1, 1.0.0 Jun 8, 2016
@pranavkm
Copy link
Contributor

pranavkm commented Aug 8, 2016

A quick observation - if we decide to take on this change, we'd need to make this change to about any middleware that deals with PathStrings. Consider,

app.Map("/a&b", app => 
{
   app.UseMvc();
});

app.Map("/a%26b", app => 
{
   app.UseStaticFiles(new StaticFileOptions
   {
       RequestPath = "/c%2fd"
   });
});

Each of the middlewares in this sample, Map, StaticFiles, and Routing needs to now work with unencoded urls and know how to consume and produce url segments that are unencoded.

@Tratcher
Copy link
Member

Tratcher commented Aug 8, 2016

@pranavkm the three fields in your samples are explicitly PathStrings, which by definition uses the decoded representation. That's too fundamental to ever change.

@pranavkm
Copy link
Contributor

pranavkm commented Aug 8, 2016

@Tratcher, where does that leave us? If we specifically have to make this change in Routing, we'd have to know how much of the unencoded path has already been consumed give the current RequestPath and route the rest. That sounds pretty complicated.

@pranavkm
Copy link
Contributor

Going back to the @rynowak's comment from #4599 (comment), it doesn't look like we could round trip the value correctly - there isn't an mechanism in place to tell what the token's value was over the wire. Given the other issue with PathStrings expressly disallowing encoded values, there's not a whole lot left to do this in this work item. I'll go ahead and close this for now.

@dustinmoris
Copy link

Just read the entire thread and I am pretty disappointed that this hasn't been fixed at all.

Currently an ASP.NET Core application will behave entirely differently depending on where it is being run:

Kestrel: Decodes partially (WTF!???)
WebHostBuilder (for testing): Decodes everything
IIS: Decodes everything?

Since we all agree that we don't want Kestrel to become as shit as IIS and therefore not match it's shitty behaviour I wonder why did Kestrel go along with this weird partial decoding strategy? What is this about? Why can Kestrel not just be cool and NOT decode route arguments, so that MVC/Nancy/{your-web-framework-of-choice} can handle it consistently?

Currently it's an absolute bloody mess. A web framework should be able to decide whether the route argument should be left entirely decoded or encoded, but please not some half baked crap which makes it impossible to EVER know what the user has really sent across the wire.

Please can this issue be looked at again with a fresh pair of eyes?

You had 3 choices:
a.) decode everything
b.) decode nothing
c.) DO SOMETHING COMPLETELY UNCONVENTIONAL WHICH WILL MOST CERTAINLY BE WRONG

why did you go with c.)?? No seriously, why?

/rantover

@dustinmoris
Copy link

dustinmoris commented Aug 12, 2018

One more thing which I would like to add, the current implementation makes web applications do funny logic, which is really not obvious to anyone who didn't study this thread.

For example, a developer might implement some logic which will find and replace all %2F in a route argument to / and afterwards a different dev who comes across the code might scratch his head thinking what is this odd behaviour. The new dev will probably change this at least to something like WebUtility.UrlDecode, which then however creates a bug because the + sign would get incorrectly decoded a second time.

It is such an obscure oddity that code has to be written specifically for Kestrel. I would suggest at least to have a method like Kestrel.DecodeRouteArguments(string arg) which can be used, so at least it will become obvious that a.) this web application has logic which is tightly coupled to Kestrel, and b.) it will help to prevent silly mistakes like above, where developers are not familiar with this issue.

@Tratcher
Copy link
Member

Comments on closed issues are not tracked, please open a new issue with the details for your scenario. E.g. which specific encoded characters are you having trouble with?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants