New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL regex doesn't recognise important TLDs like .cat #3

Closed
FauxFaux opened this Issue Oct 9, 2011 · 7 comments

Comments

5 participants
@FauxFaux
Owner

FauxFaux commented Oct 9, 2011

Hovering URLs like http://nyan.cat/ and clicking take you to http://nyan.ca/ , as the .cat TLD isn't recognised.

TLD specific code should probably be removed anyway, due to the introduction of arbitrary TLDs.

@ljani

This comment has been minimized.

ljani commented Jan 25, 2012

There are also other problems with the regex. Hosts such as http://localhostr.com get detected as http://localhost/

@joosera

This comment has been minimized.

joosera commented Jan 29, 2012

Also .pro is affected.
.[a-zA-Z][a-zA-Z] could possibly be changed to something like .[a-zA-Z]{3} (not sure what regex markup putty tray uses, this didn't seem to work when i manually edited the regex, though it didn't break the old behaviour of only hilighting the .pr part so not really sure what's going on there)

@wodim

This comment has been minimized.

wodim commented Oct 17, 2012

Bump.

FauxFaux added a commit that referenced this issue Apr 30, 2013

FauxFaux added a commit that referenced this issue Apr 30, 2013

@incognico

This comment has been minimized.

incognico commented May 1, 2013

Why not just simplify the regexp? The current one leads to nowhere. Why not use something like (pcre syntax that is) \b((?:[hH][tT][tT][pP]://|[wW][wW][wW]\.)[^\s)'"]+) or even \b((?:[a-zA-Z]+://|[wW][wW][wW]\.)[^\s'")]+) for any protocol? I don't know if the used libary can do word boundaries but in general this regexp needs to be minimal. Imagine the new gTLDs are starting :p

@FauxFaux

This comment has been minimized.

Owner

FauxFaux commented May 1, 2013

The aim of the regex has previously been to match only plausible URLs, not anything that could conceivably be a URL; I'm guessing this is what people expect (it's what I expect).

It's customisable; if you want to use something more liberal feel free.

Also, no, the regex engine supports basically nothing. #4 is to fix that, but it's hard work.

@incognico

This comment has been minimized.

incognico commented May 2, 2013

Everything that conceivably is a URL is also a plausible URL in my opinion.

So, if anyone is interested:
I am now using (([a-zA-Z]+://|[wW][wW][wW]\.)[^ '")>]+).

In contrast to the very complex default regexp which needs a lot of maintence it allows me to click the following URIs:

www.example.newgtld
http://foo.intern
http://hostname
http://user:pass@example.com:8081/foo
whatever://protocol

I also giggled at * @(#)regexp.c 1.3 of 18 April 87

FauxFaux added a commit that referenced this issue Jun 1, 2013

FauxFaux added a commit that referenced this issue Jun 1, 2013

FauxFaux added a commit that referenced this issue Jul 13, 2013

FauxFaux added a commit that referenced this issue Jul 13, 2013

FauxFaux added a commit that referenced this issue Jul 14, 2013

GH-3: Remove the old browser detection code
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
@FauxFaux

This comment has been minimized.

Owner

FauxFaux commented Jul 14, 2013

I've added a default option of @incognico's suggestion, and liberalised the "classic" default a bit.

I've additionally removed the nasty browser detection code, as this was just generally broken even launching urls like "www.google.com".

@FauxFaux FauxFaux closed this Jul 14, 2013

FauxFaux added a commit that referenced this issue Jul 14, 2013

FauxFaux added a commit that referenced this issue Jul 14, 2013

FauxFaux added a commit that referenced this issue Jul 14, 2013

GH-3: Remove the old browser detection code
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.

FauxFaux added a commit that referenced this issue Aug 6, 2013

FauxFaux added a commit that referenced this issue Aug 6, 2013

FauxFaux added a commit that referenced this issue Aug 6, 2013

GH-3: Remove the old browser detection code
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.

FauxFaux added a commit that referenced this issue Aug 7, 2013

FauxFaux added a commit that referenced this issue Aug 7, 2013

FauxFaux added a commit that referenced this issue Aug 7, 2013

GH-3: Remove the old browser detection code
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.

FauxFaux added a commit that referenced this issue Aug 11, 2013

FauxFaux added a commit that referenced this issue Aug 11, 2013

FauxFaux added a commit that referenced this issue Aug 11, 2013

GH-3: Remove the old browser detection code
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment