Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

URL regex doesn't recognise important TLDs like .cat #3

Closed
FauxFaux opened this Issue Oct 9, 2011 · 7 comments

Comments

Projects
None yet
5 participants
Owner

FauxFaux commented Oct 9, 2011

Hovering URLs like http://nyan.cat/ and clicking take you to http://nyan.ca/ , as the .cat TLD isn't recognised.

TLD specific code should probably be removed anyway, due to the introduction of arbitrary TLDs.

ljani commented Jan 25, 2012

There are also other problems with the regex. Hosts such as http://localhostr.com get detected as http://localhost/

joosera commented Jan 29, 2012

Also .pro is affected.
.[a-zA-Z][a-zA-Z] could possibly be changed to something like .[a-zA-Z]{3} (not sure what regex markup putty tray uses, this didn't seem to work when i manually edited the regex, though it didn't break the old behaviour of only hilighting the .pr part so not really sure what's going on there)

wodim commented Oct 17, 2012

Bump.

@FauxFaux FauxFaux added a commit that referenced this issue Apr 30, 2013

@FauxFaux FauxFaux GH-3: Reformat regex 199180d

Why not just simplify the regexp? The current one leads to nowhere. Why not use something like (pcre syntax that is) \b((?:[hH][tT][tT][pP]://|[wW][wW][wW]\.)[^\s)'"]+) or even \b((?:[a-zA-Z]+://|[wW][wW][wW]\.)[^\s'")]+) for any protocol? I don't know if the used libary can do word boundaries but in general this regexp needs to be minimal. Imagine the new gTLDs are starting :p

Owner

FauxFaux commented May 1, 2013

The aim of the regex has previously been to match only plausible URLs, not anything that could conceivably be a URL; I'm guessing this is what people expect (it's what I expect).

It's customisable; if you want to use something more liberal feel free.

Also, no, the regex engine supports basically nothing. #4 is to fix that, but it's hard work.

Everything that conceivably is a URL is also a plausible URL in my opinion.

So, if anyone is interested:
I am now using (([a-zA-Z]+://|[wW][wW][wW]\.)[^ '")>]+).

In contrast to the very complex default regexp which needs a lot of maintence it allows me to click the following URIs:

www.example.newgtld
http://foo.intern
http://hostname
http://user:pass@example.com:8081/foo
whatever://protocol

I also giggled at * @(#)regexp.c 1.3 of 18 April 87

@FauxFaux FauxFaux added a commit that referenced this issue Jun 1, 2013

@FauxFaux FauxFaux GH-3: Reformat regex e5b5388

@FauxFaux FauxFaux added a commit that referenced this issue Jul 13, 2013

@FauxFaux FauxFaux GH-3: Reformat regex 8f9106d

@FauxFaux FauxFaux added a commit that referenced this issue Jul 14, 2013

@FauxFaux FauxFaux GH-3: Remove the old browser detection code
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
cffd963
Owner

FauxFaux commented Jul 14, 2013

I've added a default option of @incognico's suggestion, and liberalised the "classic" default a bit.

I've additionally removed the nasty browser detection code, as this was just generally broken even launching urls like "www.google.com".

@FauxFaux FauxFaux closed this Jul 14, 2013

@FauxFaux FauxFaux added a commit that referenced this issue Jul 14, 2013

@FauxFaux FauxFaux GH-3: Reformat regex 534efe1

@FauxFaux FauxFaux added a commit that referenced this issue Jul 14, 2013

@FauxFaux FauxFaux GH-3: Remove the old browser detection code
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
b515dab

@FauxFaux FauxFaux added a commit that referenced this issue Aug 6, 2013

@FauxFaux FauxFaux GH-3: Reformat regex c5ae841

@FauxFaux FauxFaux added a commit that referenced this issue Aug 6, 2013

@FauxFaux FauxFaux GH-3: Remove the old browser detection code
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
801efae

@FauxFaux FauxFaux added a commit that referenced this issue Aug 7, 2013

@FauxFaux FauxFaux GH-3: Reformat regex 18198fb

@FauxFaux FauxFaux added a commit that referenced this issue Aug 7, 2013

@FauxFaux FauxFaux GH-3: Remove the old browser detection code
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
6a44306

@FauxFaux FauxFaux added a commit that referenced this issue Aug 11, 2013

@FauxFaux FauxFaux GH-3: Reformat regex 781afde

@FauxFaux FauxFaux added a commit that referenced this issue Aug 11, 2013

@FauxFaux FauxFaux GH-3: Remove the old browser detection code
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
a5a801c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment