-
-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Router does not decode URIs #2672
Comments
@yusukebe, do you think that it is a bug? |
Hi @szmarczak The router doesn't need to decode the path. Instead, if you want to decode it, you can use the import { Hono } from 'hono'
const app = new Hono({
getPath: (req) => {
const url = req.url
const queryIndex = url.indexOf('?', 8)
const path = url.slice(url.indexOf('/', 8), queryIndex === -1 ? undefined : queryIndex)
return decodeURI(path)
},
}) |
I disagree. The params are decoded out-of-the-box (though in Request), I don't see why the path can't be: Deno/Hono import { Hono } from 'npm:hono';
const h = new Hono();
h.get('/:foobar', async (c) => c.text(c.req.param('foobar')));
Deno.serve({
port: 8080,
}, h.fetch); Node http.get({ host: 'localhost', port: 8080, path: '/'+encodeURIComponent('/') }, res => res.pipe(process.stdout));''; Also, the benchmark for |
Thanks. Indeed, paths may need to be decoded. But I think it should not be done by the router but by @usualoma What do you think about this? |
There's no reason to make decoding opt-in. It's bad practice and bad DX. Furthermore, the URI RFC says:
so not decoding the components is actually against the spec. |
I think the paths should be decoded by default, making decoding optional can lead to errors and poor developer experience (DX) because it introduces the possibility of inconsistent handling of URIs, I believe automatically decoding URI components ensures consistency and reduces the likelihood of bugs related to misinterpretation of encoded characters.
I think we can add an option in But default is enabled. |
I considered it for a while and concluded that we should decode the path component before processing the routing. |
@yusukebe, can I implement it? |
Why? There have been several comments saying that with no rationale behind. |
@szmarczak Thanks for your comment. Are you saying that you think the router should do this process? |
@szmarczak diff --git a/src/utils/url.ts b/src/utils/url.ts
index b37a5539..7109a307 100644
--- a/src/utils/url.ts
+++ b/src/utils/url.ts
@@ -73,7 +73,7 @@ export const getPath = (request: Request): string => {
// Optimized: indexOf() + slice() is faster than RegExp
const url = request.url
const queryIndex = url.indexOf('?', 8)
- return url.slice(url.indexOf('/', 8), queryIndex === -1 ? undefined : queryIndex)
+ return decodeURI(url.slice(url.indexOf('/', 8), queryIndex === -1 ? undefined : queryIndex))
}
export const getQueryStrings = (url: string): string => { |
Correct. For example, testing this behavior would be much easier. Otherwise, you'd have to test the entire Hono app to make sure the behavior is right.
Indeed, however the router can be used separately outside of Hono. Making it a user responsibility would be a bad DX.
Having said the above, I don't believe that making it a part of
path.includes('%') ? decodeURI(path) : path Also note that |
@szmarczak Thank you! I think I understand your thoughts. It will be helpful. I consider the following What are the responsibilities of Hono routers?The If it is "receive a Request object and return a result", I think it should be decoded at the router, but since it is not, I feel comfortable with the API being "decode and pass it outside the router". Who do we consider DX for?
Yes, there may be such users. But I would prefer to improve the DX for the users of Hono (or us, the developers) who implement their own router in Hono, rather than improve the DX for the users who "use the router outside of Hono". If I want to write a new router, I would appreciate it if it is decoded outside the router. Performancepath.includes('%') ? decodeURI(path) : path I think that snippet will meet the specification, but we need code that performs better. For example, the following code. (Whether it is decodeURI or decodeURIComponent is a separate discussion.) export const getPath = (request: Request): string => {
const url = request.url
const start = url.indexOf('/', 8)
let i = start
for (; i < url.length; i++) {
const charCode = url.charCodeAt(i)
if (charCode === 37) {
// '%'
// If the path contains percent encoding, use `indexOf()` to find '?' and return the result immediately.
// Although this is a performance disadvantage, it is acceptable since we prefer cases that do not include percent encoding.
const queryIndex = url.indexOf('?', i)
return decodeURIComponent(url.slice(start, queryIndex === -1 ? undefined : queryIndex))
} else if (charCode === 63) {
// '?'
break
}
}
return url.slice(start, i)
} |
The string consists of URI components.
Whether the URL is extracted from
Then you can extract the custom decoding URI function into hono/src/router/trie-router/router.ts Line 25 in d87d996
and other routers too. IMO in that way DX for Hono developers is preserved.
I haven't benchmarked this but it may occur that I think that both sides (me and Hono) have expressed their opinions thoroughly. Whether it's
I think the benchmark should contain code that decodes the URI (using Hono's implementation) for routers that don't decode URIs. Otherwise the benchmark is untrue as it doesn't represent how the route is handled in a real life scenario. |
@szmarczak Thanks. I understand that there are other opinions, but I still think that |
I would like to refer to #2688 for discussion. If anyone has an opinion, please comment. Expected Routing ResultsIs this the expected routing result? There may be disagreement as to whether It's a bit of a quibble whether PerformaceI have benchmarked with benchmarks/utils/src/get-path.ts and I think this implementation will be faster in any runtime.
Handle
|
@usualoma
['foo/bar', 'baz', 'ąę'] However, if we're using strings, then to match this we need
Ideally, we'd like to decode what we can, and malformed percent-encoded characters leave in their percent-encoded form. I'm almost certain that this would be much slower, therefore IMO it's ok to fallback to original not-decoded path. |
@szmarczak Thanks for the comment!
If you read the code, you will see that we extract the path component and then apply
Can you provide a reason to believe that the specification is correct? I am referring to RFC 3986, 2.3, which states that they should be decoded. https://datatracker.ietf.org/doc/html/rfc3986#section-2.3
Also, |
I don't understand the question. The specification is always correct because it is the definition.
No, it does not state that.
We're not talking about |
https://www.rfc-editor.org/rfc/rfc9110.html#name-https-normalization-and-com
https://www.rfc-editor.org/rfc/rfc3986#section-2.2
I hope this clarifies. This means you must not decode |
@szmarczak
However, "apply decodeURI() -> split into segments -> apply decodeURIComponent() for each segment" doesn't work. decodeURI("%2525a/b/c").split("/").map(decodeURIComponent) The expected result should be as follows,
In fact, it would look like this.
If we want to get the correct result on a per-segment basis, we would need to return an array from getPath() as follows return "%2525a/b/c".split("/").map(decodeURIComponent) However, the Hono router does not understand the unit of “segment”. |
No worries! I'm glad I could clarify.
[edit: removed because the example is right] |
I believe implement memoization could improve the speed. Also skips normal path to fast parse Click me/**
* Parse the `req` URL with memoization.
*
* @param {ServerRequest} req
* @return {Object|undefined}
* @public
*/
function parseurl(req) {
const url = req.url;
if (url === undefined) {
// URL is undefined
return undefined;
}
const parsed = req._parsedUrl;
if (isFresh(url, parsed)) {
// Return cached URL parse
return parsed;
}
// Parse the URL
const newParsed = fastparse(url);
newParsed._raw = url;
return (req._parsedUrl = newParsed);
}
/**
* Parse the `req` original URL with fallback and memoization.
*
* @param {ServerRequest} req
* @return {Object|undefined}
* @public
*/
function originalurl(req) {
const url = req.originalUrl;
if (typeof url !== 'string') {
// Fallback
return parseurl(req);
}
const parsed = req._parsedOriginalUrl;
if (isFresh(url, parsed)) {
// Return cached URL parse
return parsed;
}
// Parse the URL
const newParsed = fastparse(url);
newParsed._raw = url;
return (req._parsedOriginalUrl = newParsed);
}
/**
* Parse the `str` URL with a fast-path shortcut.
*
* @param {string} str
* @return {Object}
* @private
*/
function fastparse(str) {
if (typeof str !== 'string' || str.charCodeAt(0) !== 0x2f /* / */) {
return decodeURIToUrlObject(str);
}
let pathname = str;
let query = null;
let search = null;
// Unroll the regexp for performance optimization
for (let i = 1; i < str.length; i++) {
switch (str.charCodeAt(i)) {
case 0x3f: /* ? */
if (search === null) {
pathname = str.substring(0, i);
query = str.substring(i + 1);
search = str.substring(i);
}
break;
case 0x09: /* \t */
case 0x0a: /* \n */
case 0x0c: /* \f */
case 0x0d: /* \r */
case 0x20: /* */
case 0x23: /* # */
case 0xa0: /* non-breaking space */
case 0xfeff: /* BOM */
return decodeURIToUrlObject(str);
}
}
const url = {};
url.path = str;
url.href = str;
url.pathname = pathname;
if (search !== null) {
url.query = query;
url.search = search;
}
return url;
}
/**
* Decode the URI and convert it to a URL object.
*
* @param {string} uri
* @return {Object}
* @private
*/
function decodeURIToUrlObject(uri) {
const decodedURI = decodeURI(uri);
const url = new URL(decodedURI);
return {
href: decodedURI,
path: url.pathname + (url.search || ''),
pathname: url.pathname,
search: url.search,
query: url.searchParams.toString(),
};
}
/**
* Determine if parsed URL is still fresh.
*
* @param {string} url
* @param {object} parsedUrl
* @return {boolean}
* @private
*/
function isFresh(url, parsedUrl) {
return (
typeof parsedUrl === 'object' &&
parsedUrl !== null &&
parsedUrl._raw === url
);
}
const req = {
url: "localhost/foo%2Fbar/baz/ąę"
}
console.log(parseurl(req).path) Result {
href: 'localhost/foo%2Fbar/baz/ąę',
path: '/foo%2Fbar/baz/%C4%85%C4%99',
pathname: '/foo%2Fbar/baz/%C4%85%C4%99',
search: '',
query: '',
_raw: 'localhost/foo%2Fbar/baz/ąę'
} |
@fzn0x Thanks for the suggestion! First of all, I don't think the cache will speed up the process, since we won't be calling getPath() multiple times for the same request object. Also, I don't think the code you suggested would perform well if it were not cached. If you still think it performs better, I would appreciate it if you could suggest it along with a benchmark. |
FYI The results of using symbols and multibyte characters in the routing definitions were investigated in the major frameworks. They vary considerably from framework to framework. I don't think "we should match the major frameworks," but just that "if symbols and multibyte characters are not well supported in the major frameworks, it indicates that no one is having trouble in the production environment. expressSymbols and multibytes do not appear to be available. paths are treated as "/"-separated segments. const express = require('express')
const app = express()
const port = 3333
// "|" has a special meaning, so use "^" to validate
app.get('/^hello^', (req, res) => {
res.send('^hello^')
})
app.get('/hello🔥', (req, res) => {
res.send('hello🔥')
})
app.get('/users/:id/action', (req, res) => {
res.send('users action')
})
app.listen(port, () => {
console.log(`Example app listening on port ${port}`)
})
ruby on railsOnly percent-encoded URLs are supported; unencoded URLs will result in a 404. paths are treated as "/"-separated segments.
dgangoBefore routing, all characters are decoded in a decodeURIComponent() equivalent process, showing similar results to #2688.
|
If what we need is a result similar to Ruby on Rails, then the following changes will accomplish this. (Actually, it's a bit more complicated, but it's "encode when adding a routing.") diff --git a/src/hono-base.ts b/src/hono-base.ts
index 293d3874..d4cb8071 100644
--- a/src/hono-base.ts
+++ b/src/hono-base.ts
@@ -290,7 +290,7 @@ class Hono<
private addRoute(method: string, path: string, handler: H) {
method = method.toUpperCase()
- path = mergePath(this._basePath, path)
+ path = encodeURI(mergePath(this._basePath, path))
const r: RouterRoute = { path: path, method: method, handler: handler }
this.router.add(method, path, [handler, r])
this.routes.push(r) |
Yes, that's what I mean, sorry for the example, I think every framework does different things to deal with paths, but Django looks good! |
If the expected behavior is equivalent to Ruby on Rails, I think #2711 can achieve this. I feel this is better than #2688 because it is less burdensome at the request time. However, I wonder if such routing is ever used in a production environment. If not, I think we can consider the option of "not supporting it". |
I created #2714 with great support from @szmarczak! Although #2711 is hard to discard for its performance advantage, at this point I consider this new #2714 to be the best. The reasons are as follows
Although I am concerned about the following points.
Also, there is still the issue of how to handle @yusukebe Looking at the discussion so far, what are your thoughts? |
Thank you for the discussion, and I'm sorry for the misunderstanding at first. I like #2714! It achieves ideal handling paths, and the performance is good.
I also think it should return an error. When should it return the error? Can it return an error during the registration phase? |
@usualoma If someone sends UTF-8 (non-ASCII) path over the wire, it's incorrect as the URI spec permits only ASCII. UTF-8 characters must be percent-encoded. So the
The performance should be the same. The least performant way is when someone sends
Well, (unfortunately!) the URI spec does not define the encoding percent-encoded characters have to use (they are just bytes). It's just that the browser APIs like Whether you return |
Having developed a Node.js HTTP client (sorry for bragging!), I can definitely say that people widely send UTF-8 encoded characters (well, some even send non-UTF-8 but that's minority and we were enforcing UTF-8 for years, we had to disable UTF-8 enforcing in order to support legacy servers using other encodings). |
Difference in performance between #2711 and #2714I think the difference between the two at request time is indicated by the following part of the benchmark: for bun, #2714 is slightly faster; for deno and node, #2711 is faster. (This is a short URL benchmark, not including the %) Who is affected by this modification?
Even hono@v4.3.7 can process URLs containing percent encodings with the following routing app.get('/users/:id', (c) => c.text(`id is ${c.req.param('id')}`))
What cannot be handled correctly is the case where "the routing definition contains symbols or multibyte characters". app.get('/|', async (c) => c.text('ayy')); I would like to merge #2714 because it is more correct for the framework to be able to handle these cases. However, I don't think there are any realistic users who would define such a routing for a web service endpoint that is widely accessed by various clients. Handle "URIError: URI malformed"@szmarczak
You are right, regarding the former, I think it would be best if the framework does not make it an error, but in a way that the user can handle it himself. Of course, I am considering this. |
It depends what you're developing. If you're developing an HTTP client, then you need to have a spec-compliant server to properly test the client. Or you just need static routes for simple mock server. Also, being a Polish guy, I can say there are websites exposing static routes like
I wouldn't worry about this. First, it's against the spec. Second, it's impossible to get this via HTML forms (because forms are spec compliant). Proper |
@szmarczak Thank you!
I agree with you on that point.
Yes, of course, a user trying to send a correct request would not send the Suppose we have a routing definition like this,
And suppose we have the following URL
I think it would be best if a request to this URL would result in an error before entering the router, rather than proceeding to fallback.
As I said in a previous comment, I think #2714 has the goodness to prevent this. If this feature is compromised, I think #2711 would be sufficient. About #2711As shown in the following test, I would expect #2711 to return the expected result for processing |
Oh, by the way, there is another issue that is difficult to handle in #2714 in terms of spec-compliant. Suppose we have a routing definition like this, app.get('/główna/:path', (c) => c.text('główna')) And suppose we have the following URL
I think this is a spec-compliant URL, but it is difficult to get it routed in #2714 (while maintaining the spec of accepting non-UTF-8 encodings). #2711 would accomplish this. |
I've come to the conclusion that with a few adjustments #2714 should be fine. Wait a minute. |
That's a very good find! In this case it makes more sense to return I believe there's no difference in terms of the result. I'd choose what's more performant (or maintainable) and link to the other possible solution if you ever needed to change the algorithm.
It is!
The |
@szmarczak Thanks right away! I was going to comment on this, but I'm late. Added process for invalid percent-encoding and invalid UTF-8 sequences. This will prevent the following URLs from also slipping through the routing
(As was the case before this PR.) If an encoding other than UTF-8 is used, a URIError is thrown when trying to retrieve it with Invalid percent encodings, such as |
@szmarczak Thanks again for all the thought-provoking comments so far, and for your thoughtful answers to all my questions. #2714 or #2711I think #2711 is often marginally better in terms of performance. However, #2714 has more room for app.get('/:path{home|główna|ホーム}') I think #2714 is a good choice because of these various good points. |
It looks good to me! The implementation of Shall we go with it? |
@yusukebe Thanks for the confirmation. |
Let's go with it. I don't have any suggestions; I trust everyone here. :) |
What version of Hono are you using?
4.3.6
What runtime/platform is your app running on?
Deno
What steps can reproduce the bug?
Deno/Hono
Node
What is the expected behavior?
ayy
What do you see instead?
404 Not Found
Additional information
No response
The text was updated successfully, but these errors were encountered: