New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL not handled/encoded correctly #1303

Open
ozh opened this Issue Apr 6, 2013 · 26 comments

Comments

Projects
None yet
@ozh
Copy link
Member

ozh commented Apr 6, 2013

What steps will reproduce the problem?

  1. Enter the following long URL: http://domain.com/good space/
  2. The long URL is correctly stored as http://domain.com/good%20space/
  3. Now, enter the following long URL: http://domain.com/bad%20space/
  4. The long URL is incorrectly stored as http://domain.com/bad%2520space/

What is the expected output? What do you see instead?

What I expect is that YOURLS should be smart enough to know that '%20' is a space and should be left alone. Instead, what's happening is that YOURLS is converting the '%' to '%25' and therefore, the '%20' becomes '%2520'.

Perhaps there could be a check before cleaning up the long URL to detect '%20' or whatever else it could be (e.g. '%5C' for backspace, etc).


This is a COPY of Issue 1303: %20 in long URL not handled correctly, filed on Google Code before the project was moved on Github.

@LeoColomb

This comment has been minimized.

Copy link
Member

LeoColomb commented Apr 10, 2013

Seems to be encoding - decoding URL case.
Found two ways to fix it:

  • Be sure URL is decode before encode it:
urldecode($string);
urlencode($string);
save_in_db($string);
  • Search in string caracters which says it's encoded. For exemple:
if (!preg_match("@^[a-zA-Z0-9%+-_]*$@", $string))
    urlencode($string);
save_in_db($string);
@adigitalife

This comment has been minimized.

Copy link
Contributor

adigitalife commented Apr 11, 2013

I'm not sure how to fix this properly in the code but I've implemented a crude workaround by creating a plugin:

yourls_add_filter( 'sanitize_url', 'fix_long_url' );
function fix_long_url( $url, $unsafe_url ) {
    $search = array ( '%2520', '%2521', '%2522', '%2523', '%2524', '%2525', '%2526', '%2527', '%2528', '%2529', '%252A', '%252B', '%252C', '%252D', '%252E', '%252F', '%253D', '%253F', '%255C', '%255F' );
    $replace = array ( '%20', '%21', '%22', '%23', '%24', '%25', '%26', '%27', '%28', '%29', '%2A', '%2B', '%2C', '%2D', '%2E', '%2F', '%3D', '%3F', '%5C', '%5F' );
    $url = str_ireplace ( $search, $replace ,$url );
    return yourls_apply_filter( 'after_fix_long_url', $url, $unsafe_url );
@ColinQi

This comment has been minimized.

Copy link

ColinQi commented Apr 28, 2013

it works using above code.
but the problem is still the same when use API.
I am using wordpress plugin:pluginbuddy-yourls;
if the url include below code, the url will be stored incorrectly.

code:
/t?e=zGU34CA7K%2BPkqB07S4%2FK0CITy7klxxrJ35Nnc0iK%2FBdaKhVAmXafYUmD5KGJ2oqiWhej1Yse2lEI%2FXHtNGaPeSdAuj9tBnnfDunBLi4cn5tl2g%3D%3D

@LeoColomb

This comment has been minimized.

Copy link
Member

LeoColomb commented Apr 28, 2013

@ColinQi Please view my PR #1365.

@ColinQi

This comment has been minimized.

Copy link

ColinQi commented Apr 29, 2013

@LeoColomb I just can't understand, when I goto yourls/admin/ to add such encoded url, stored correctly.
but when I use API to add such encoded url, it stored incorrectly.

@uwma

This comment has been minimized.

Copy link

uwma commented Sep 5, 2013

Thanks to @adigitalife! His solution may not be the most elegant one, it works!
http://en.uw.ma/pYrs

@innov8ion

This comment has been minimized.

Copy link

innov8ion commented Sep 17, 2013

I was surprised to hear about this issue so long after 1.6 was released but glad to find a temporary fix. I created and activated the following plugin per adigitalife's info on this thread. Thanks, and will be watching for a more permanent fix.

Meanwhile, should this plugin be placed in the official plugin list here to help others? https://github.com/YOURLS/YOURLS/wiki/Plugin-List

@ozh

This comment has been minimized.

Copy link
Member

ozh commented Sep 18, 2013

I've just committed something that should fix that. Please try to break and report :) Re-open this issue if needed.

@ozh ozh closed this Sep 18, 2013

@innov8ion

This comment has been minimized.

Copy link

innov8ion commented Sep 19, 2013

Very nice, ozh. Is this the one commit you made to address this? 59dff6b

@ozh

This comment has been minimized.

Copy link
Member

ozh commented Sep 20, 2013

@LeoColomb

This comment has been minimized.

Copy link
Member

LeoColomb commented Mar 14, 2016

Definitely, there is a issue with encoding. May need a full rewrite.

@LeoColomb LeoColomb reopened this Mar 14, 2016

@markwaters

This comment has been minimized.

Copy link

markwaters commented May 23, 2016

Not sure if this is related , but when I try and shorten a link with a percentage sign in it , for example -
http://erinjo.xyz/content/all?q=%23Eurovision
When I visit / expand the link , it changes to -
http://erinjo.xyz/content/all?q=#Eurovision
Which doesn't work as expected.
HTH.

@adigitalife

This comment has been minimized.

Copy link
Contributor

adigitalife commented May 23, 2016

@markwaters I think it's tricky if the actual URL contains '%23' because it's actually the same as '#' so it gets converted automatically. There's been a discussion that maybe YOURLS should do any kind of encoding at all. I'm not sure what's the current status with that.

@ayyoovod

This comment has been minimized.

Copy link

ayyoovod commented Aug 2, 2016

In my case, the symbol "%2F" is being replaced by "/" and "%2B" is being replaced by "+". This is also what is being formatted in (functions-formatting.php). I can't find how to change it. I tried the plugin and still not working. Any guidance is appreciated.

@JeffreyDunster

This comment has been minimized.

Copy link

JeffreyDunster commented Dec 1, 2016

I have a similar problem with Microsoft OneDrive links. They often include %2... and %3... in their hashed IDs, which are then converted to other charters, like /, +, " ", etc.

@maustyle

This comment has been minimized.

Copy link

maustyle commented Mar 29, 2018

hello, i still have the same issue. i have tried to install the plugin:

Fix Long URLs
https://github.com/adigitalife/yourls-fix-long-url/

with no avail.

@adigitalife

This comment has been minimized.

Copy link
Contributor

adigitalife commented Mar 29, 2018

It depends on the exact characters in the URL that you have a problem with. My plugin only looks at some specific characters. If your problematic characters are not in the plugin, you'll need to modify the plugin to add them.

@maustyle

This comment has been minimized.

Copy link

maustyle commented Apr 9, 2018

thank you !!

@PopVeKind

This comment has been minimized.

Copy link
Contributor

PopVeKind commented Apr 9, 2018

I've just read through this (current and open) thread and it seems to be evolving with different problems. I am after a better understanding of the workings of YOURLS and have a few questions about this topic,

Why Encode?

  1. The Long URL does not originate with YOURLS, so why, for any reason, does YOURLS change it?
  2. Is it not a requirement for the source (human or program) to deliver a properly working Long URL?

If an API or a Plugin or Microsoft OneDrive provides a Long URL that does not work, is that YOURLS job to fix it? If I typed http://vekind.com/ instead of http://vekind.org/ is that the responsibility of YOURLS to correct .com to .org? Why not use just what is supplied?

Decoding

For the exact same reason, that YOURLS should not encode, YOURLS should not decode. If I typed http://vekind.org/bad%20url input It is not a YOURLS job to try to figure out what I meant. Likewise, if it is from an API, Bookmark program, or Plugin. It is the responsibility of the sending program to send the correct Long URL and all YOURLS should do is save it and use it as sent!

Database etc.

Database inputs should be screened for SQL injections, etc. However, this is not at all the same as URL encoding.

If a program (or human) provides the wrong information, should we not just make a note that program has a bug? So the Plugin or API or Bookmark program can be fixed?

The Real Solution

It seems to me the real solution for these errors (that fucking URL encoding problem) is to neither decode the URL nor encode the URL.
The most logical solution is to expect (demand) that the sending program (API, Bookmark, Plugin, or Human), sends the CORRECT Long URL and then YOURLS should simply use what was sent, AS-IS!

Real Reasons

Can any YOURLS Developer explain why YOURLS should try to fix wrong inputs from external sources? Why not fix the problem at the source Plugin, Bookmark, or API?

@PopVeKind

This comment has been minimized.

Copy link
Contributor

PopVeKind commented Apr 9, 2018

@ozh @LeoColomb - Please review this logic?

Suggested Solution

  1. I would suggest that all URL encode and URL decode be removed from the Core Code.
  2. URL decode and URL encode are not needed for requesting programs that send a correct LongURL.
  3. As decode and encode are not needed on all sites, they can rightly be classified as Code Bloat,
  4. URL decode and URL encode should be offered as a Plugin for those people who need it to compensate for poorly constructed API, Plugin, or Bookmark programs that do not supply a working LongURL.

Second Suggestion

If moving this out of Core Code Bloat and into a Plugin is not desired, would it be acceptable to add a Core Option to disable all URL encode and URL decode code?

PopVeKind referenced this issue Apr 9, 2018

Merge pull request #1504 from YOURLS/fix-decode
Fix multiple encoded URLs
@ozh

This comment has been minimized.

Copy link
Member

ozh commented May 10, 2018

There is a need to encode or decode because depending on the context, URLs supplied are, or are not, raw text.

  • URL entered in the text box and click "Shorten"? Don't try to encode or decode, just use what's provided
  • but URL supplied via "prefix and shorten" (eg http://sho.rt/http://omglongurl/some~funk~chars?) will be coded
  • URL supplied via bookmarklet will be coded too, by the browser
@mackaaij

This comment has been minimized.

Copy link

mackaaij commented Jul 13, 2018

I think I ran into this issue when adding this long url: https://worcade.stackstorage.com/s/MzCWEihRfYldw5X?dir=/Terms of Service

If I leave the automatically generated shortcode, the shortlink works. If I customize the shortcode to 'terms', the shortlink breaks...

Jorn suggested a workaround: I now shortened the link with bit.ly and then "shortened" the link with our custom domain using Yourls.

@PopVeKind

This comment has been minimized.

Copy link
Contributor

PopVeKind commented Aug 16, 2018

@ozh I just saw your post today. It seems we can use a little logic to sort this out at the point YOURLS receives the long URL.

Decode Everything

First off, how about decode everything upon receipt? If the text box is not encoded the decoded output would be the same. Someone might copy/paste an encoded URL into the text box too. So it seems everything should be decoded upon receipt by YOURLS.
I would also encode everything just before saving to the database,

Dubble encoding

The problem seems to be double encoding without decoding. Double decoding or decoding unencoded text is not a problem.

Encode Everything

By encoding everything just before saving in the database, it reverses the first decode and makes the URL Internet ready. Encoding an unencoded URL would have no effect on its functionality.

Examples

  • Enter the following long URL: http://domain.com/good space/
  • The long URL is correctly stored as http%3A%2F%2Fdomain.com%2Fgood%20space%2F
  • Now, enter the following long URL: http%3A%2F%2Fdomain.com%2Fgood%20space%2F
  • The long URL is correctly stored as http%3A%2F%2Fdomain.com%2Fgood%20space%2F

PS

  • This is logic based on the comments on this thread.
  • I have not (yet) coded this onto a working YOURLS site.

@LeoColomb LeoColomb pinned this issue Dec 15, 2018

@rinogo

This comment has been minimized.

Copy link

rinogo commented Jan 3, 2019

Update

I'll leave this here in case it helps someone. I've discovered the issue, and it's unrelated to this thread. The problem was the "Keep Query String" plugin, which was activated on prod, but not on dev.

Carry on. :)


Just to add to this issue - I haven't read the entire thread, but hopefully this adds to the discussion.

I have two configurations, both essentially identical, for our dev and prod systems. The prod system produces URLs in which URL parameters have slashes encoded as %25%2F (double-encoded).

The dev system produces links that work just fine. That is, slashes in URL parameters are completely unencoded - they appear as /.

The difference between these two systems is minor. Both systems run CentOS and an essentially identical stack. Both systems run version 1.7.3 of YOURLS. I'm thinking it could be a difference in DB configurations or maybe due to the fact that the dev server runs Apache and the prod server runs Litespeed.

The most curious wrinkle on this entire mess is that the url that is stored in the YOURLS database is identical on both dev and prod. In other words, this means that the problem is somewhere in the code/stack responsible for converting a shorturl into a longurl.

Test cases

We see the same behavior via the GUI and when using the API.

longurl (input): https://test.com/narf blah.html?a=hello there&b=whatever/you/want

stored longurl in the database on both dev and prod: https://test.com/narf%20blah.html?a=hello%20there&b=whatever/you/want

converted shorturl (output) on dev (functions as desired): https://www.test.com/narf%20blah.html?a=hello%20there&b=whatever/you/want

converted shorturl (output) on prod (double-encodes slashes): https://www.test.com/narf%20blah.html?a=hello%2Bthere&b=whatever%252Fyou%252Fwant

@PopVeKind

This comment has been minimized.

Copy link
Contributor

PopVeKind commented Jan 6, 2019

@rinogo - That's good intel Rich,

the URL that is stored in the YOURLS database is identical on both dev and prod

The URL spaces were encoded as stored in the database, compared to the input.

At some point, the domain name changes and adds a www. Why? Is that a YOURLS change?

Maybe we should start gathering data on sites that are failing? Example:

Are all sites that fail running under CentOS? Debian? RedHat? FreeBSD? or Ubuntu?
Are all sites that fail running on Litespeed?
Are all sites that fail running PHP5? PHP7?
Are all sites that fail running MariaDB? MySQL?

What sites are NOT failing?

I run without this problem in:
Ubuntu/Apache/MySQL/php5
Ubuntu/Apache/MySQL/php7
Debian8/Nginx/MariaDB/php7

The error does seem to be after the long URL is retrieved from the DB.
Q. Why are we encoding anything AFTER we retrieve the URL from the Database?

Perhaps if we discover what the sites that fail have in common, it will lead to why some sites fail?

@rinogo

This comment has been minimized.

Copy link

rinogo commented Jan 7, 2019

Hi, @PopVeKind! I think that's a great idea! I can double-check my setup, but I think everything is working fine since I discovered that the discrepancy was due to a plugin that was installed on prod but not on dev. (I added an update to my post after the fact - perhaps you're working off of email notifications).

Regardless, I'll keep an eye on it; if we have problems, I'll definitely report back here. Thanks to you all (contributors and users alike) for making YOURLS awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment