Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL Decoding on server is not correct #18

Closed
olivercoad opened this issue Aug 11, 2019 · 6 comments · Fixed by #21
Closed

URL Decoding on server is not correct #18

olivercoad opened this issue Aug 11, 2019 · 6 comments · Fixed by #21
Labels
bug Something isn't working worker Issues relating to the gpu worker

Comments

@olivercoad
Copy link
Contributor

olivercoad commented Aug 11, 2019

In order to handle any user input correctly (see #4), it has to be URL encoded before placing in the URL.
For example, in javascript, encodeURIComponent("asdf ") returns asdf%20%20.
This generates the following face.
Checkface for url encoding as asdf%20%20

Unfortunately, that's not the only way spaces (and probably other things) can be URL encoded.
For example, an HTML form request or System.Web.HttpUtility.UrlEncode will encode spaces as + rather than %20: System.Web.HttpUtility.UrlEncode("asdf ") returns asdf++.
This generates the following face, noticably different to above.
Checkface for url encoding as asdf++

Furthermore, it returns the same face for asdf%2B%2B as asdf++; %2B is the url encoding of +.

I expect the problem is to do with how flask automatically decodes parts of the path to pass as the hash parameter

@app.route('/api/<path:hash>', methods=['GET'])
def image_generation(hash):

Expected Behaviour:
The generated face for the url encodings as asdf%20%20 and asdf++ should both be the same (should both be the first image), corresponding to the user input of asdf .
The generated face for the url encoding as asdf%2B%2B should be different (it should be the second image), and correspond to the user input of asdf++.

@olivercoad
Copy link
Contributor Author

olivercoad commented Aug 11, 2019

Apparently github's issue markdown won't preserve multiple spaces. The above examples are for
"asdf "
without the quotation marks and with two spaces at the end. This is inconsequetial to the issue though - it doesn't matter where or how many spaces are used.

@olivercoad olivercoad changed the title URI Decoding on server is not correct URL Decoding on server is not correct Aug 11, 2019
@olivercoad olivercoad added bug Something isn't working worker Issues relating to the gpu worker labels Aug 11, 2019
@cdilga
Copy link
Contributor

cdilga commented Aug 13, 2019

I would have thought that any client bears responsibility of encoding correctly - if it's encoded differently as ++, then we should just respect whatever is sent. If there are spaces sent by the client as UrlEncoded %20%20 then I don't see a problem.

I'm not sure why we should expect asdf%20%20 and asdf++ to both be the same.

I was wondering what the consensus was on handling this, as I presume it would come up often, and this stack overflow post summarises that they have distinct purposes depending on where they are placed

@olivercoad
Copy link
Contributor Author

Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B".

This is the problem, they are two different but both valid encodings of the same user input.

I'm not sure why we should expect asdf%20%20 and asdf++ to both be the same.

I partly agree because the url encoding is an implementation detail and from the user's point of view it makes no difference (we could just as easily base64 encode it or use any other encoding instead). However we do need the same user input to produce the same image regardless of the client, and currently that's not the case.

@olivercoad
Copy link
Contributor Author

olivercoad commented Aug 13, 2019

Actually even javascript's decodeURIComponent doesn't decode + to ... https://stackoverflow.com/a/20247435

@cdilga
Copy link
Contributor

cdilga commented Aug 13, 2019

Surely we can just advise a url encoding scheme for all clients in an API doc, which should resolve this issue from a dev perspective on new clients using our API, including our own client

@olivercoad
Copy link
Contributor Author

It gets worse...

We really do need to actually fix it on the server side, otherwise we can't start with a slash.
pallets/flask#900
Luckily it will be pretty easy to fix. I realised that of course flask isn't decoding a plus to escape because it's part of the path not the query parameter (I was thinking of it like encoding a query parameter because that's what we do for the frontend site). So we just need to put the encoded request in a query parameter instead of the path and it will work correctly.
Google Colab notebook for testing

We can still leave the original endpoint path working for legacy purposes (and it's a little more elegant for handwriting in img tags 😄)

@cdilga cdilga closed this as completed in c33a8e2 Aug 14, 2019
@cdilga cdilga mentioned this issue Aug 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working worker Issues relating to the gpu worker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants