URL detection #76

smaragdus · 2018-10-21T17:05:47Z

AlephNote cannot detect some URLs, screen:

ResophNotes detects the same URLs just fine, screen:

Example URLs:

URL 1
URL 2

Mikescher · 2018-10-23T13:57:59Z

and once again I update the URL regex :D

but thanks for the report - I think we slowly get to the point where all strange cases are included

--> references #76

smaragdus · 2018-10-23T14:29:52Z

I am "URL" champion. 🥇
If it happens that I come across such undetected URL I will notify you.
I hesitated whether to open a new issue or to search for similar old issues but i was not sure whether you will see the new post in old issues so I decided for a new one.

smaragdus · 2018-10-23T14:34:15Z

Off-topic
When you have time please have a look at this because it makes me desperate. :(

Mikescher · 2018-10-23T14:52:55Z

@smaragdus:

I created a new issue for this:
#77

That way it's easier for me to keep track of things, because closed issues don't appear in the github overview - even if they have new comments.

I hesitated whether to open a new issue or to search for similar old issues but i was not sure whether you will see the new post in old issues so I decided for a new one.

Yeah - new issues are always better :D - it's just more organized that way

smaragdus · 2018-10-24T10:46:21Z

@Mikescher

Yeah - new issues are always better :D - it's just more organized that way

Fine, I will keep this in mind.

I created a new issue for this:
#77

Thank you very much!

smaragdus · 2018-10-24T15:58:55Z

@Mikescher

I am afraid that the URLs I mentioned above are not detected in version 1.6.23, screen:

Can you confirm this?

Mikescher · 2018-10-24T16:37:00Z

Ah damnit - that minus sign is not a minus sign but an EN_DASH:

EN_DASH, does not work:
https://en.wikipedia.org/wiki/Dungan_Revolt_(1862–77)

Normal Minus, does work:
https://en.wikipedia.org/wiki/Dungan_Revolt_(1862-77)

I guess I will add all the EN/EM Dash stuff to the regex too (perhaps the whole unicode punctuation class)

By they way: What unholy browser do you all use that lets you copy such links?
If I try to copy the link I get a properly encoded one like that:

https://en.wikipedia.org/wiki/Dungan_Revolt_%281862%E2%80%9377%29

[Edit] Perhaps I should just give up on intelligent URL parsing and allow all characters except whitespace?

smaragdus · 2018-10-24T18:21:20Z

@Mikescher

[Edit] Perhaps I should just give up on intelligent URL parsing and allow all characters except whitespace?

I do not know. Actually the dashes are tricky:

hyphen (-)
en dash (–)
em dash (—)

In fact even white spaces can be part of the URLs, a couple of examples:

https://sourceforge.net/projects/portableapps/files/FileZilla Portable/
https://sourceforge.net/projects/portableapps/files/KiTTY Portable/

The same URLs copied by the 'normal' way look 'normal' too:

By they way: What unholy browser do you all use that lets you copy such links?

The unholy browser (I agree it is unholy, in fact for me all browsers are unholy now) is pre-Quantum version of Firefox, but I copy the URL using an add-on (UrlbarExt) which produces better-looking links for pasting.

Mikescher · 2018-10-25T16:36:56Z

Okay that explains it a bit.

I changed the URL matching now to only abort once a whitespace is found.
Unfortunately this now enables the problem I wanted to avoid from the beginning:
In this text snippet the recognized URL includes the closing parenthesis:

bullet point one (https://www.mikescher.com)

bullet point one (https://www.mikescher.com)

But I guess there is not really an optimal solution.

But if anyone is unsatisfied with the new regex, there is now a setting under the advanced section where you can change the URL matching mode:

In fact even white spaces can be part of the URLs, a couple of examples:

Okay but that is the one thing I really can't support - if I accepted spaces as valid URL parts then all URLs would only end at the end of the line (and you could even argue to accept a line break in URLs).
In this case you probably have to insert the "proper" %20 encoding

--> references #76

smaragdus · 2018-10-26T11:25:35Z

I changed the URL matching now to only abort once a whitespace is found.
Unfortunately this now enables the problem I wanted to avoid from the beginning
...
But I guess there is not really an optimal solution.

The problem seems really insolvable. I think that I prefer URL not to be detected than the closing parenthesis to be recognized as part of the URL. However I would like to show you slightly different behaviour- ResophNotes does not recognize the opening parenthesis as part of URLs, screen:

What do you think?

Mikescher · 2018-10-26T13:16:04Z

What do you think?

Hmm it seems like ResophNotes forces the URL to start with an whitespace (or BeginOfLine)

I guess I could add this also - I currently can't think of a situation where this would make more problems than the current syntax

Btw: Opening and closing parenthesis are valid URL components ( according to the original RFC )

Mikescher self-assigned this Oct 23, 2018

Mikescher added the bug label Oct 23, 2018

Mikescher added a commit that referenced this issue Oct 23, 2018

add round braces to URL regex

861840a

--> references #76

Mikescher closed this as completed in 1038102 Oct 24, 2018

Mikescher reopened this Oct 24, 2018

Mikescher added a commit that referenced this issue Oct 25, 2018

changed URL matching mode to allow all characters except \s

1c77872

--> references #76

Mikescher closed this as completed in d7c8f8f Oct 25, 2018

Mikescher reopened this Oct 26, 2018

Mikescher closed this as completed in 07d448e Oct 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL detection #76

URL detection #76

smaragdus commented Oct 21, 2018 •

edited

Loading

Mikescher commented Oct 23, 2018

smaragdus commented Oct 23, 2018

smaragdus commented Oct 23, 2018 •

edited

Loading

Mikescher commented Oct 23, 2018 •

edited

Loading

smaragdus commented Oct 24, 2018

smaragdus commented Oct 24, 2018

Mikescher commented Oct 24, 2018 •

edited

Loading

smaragdus commented Oct 24, 2018

Mikescher commented Oct 25, 2018

smaragdus commented Oct 26, 2018

Mikescher commented Oct 26, 2018 •

edited

Loading

URL detection #76

URL detection #76

Comments

smaragdus commented Oct 21, 2018 • edited Loading

Mikescher commented Oct 23, 2018

smaragdus commented Oct 23, 2018

smaragdus commented Oct 23, 2018 • edited Loading

Mikescher commented Oct 23, 2018 • edited Loading

smaragdus commented Oct 24, 2018

smaragdus commented Oct 24, 2018

Mikescher commented Oct 24, 2018 • edited Loading

smaragdus commented Oct 24, 2018

Mikescher commented Oct 25, 2018

smaragdus commented Oct 26, 2018

Mikescher commented Oct 26, 2018 • edited Loading

smaragdus commented Oct 21, 2018 •

edited

Loading

smaragdus commented Oct 23, 2018 •

edited

Loading

Mikescher commented Oct 23, 2018 •

edited

Loading

Mikescher commented Oct 24, 2018 •

edited

Loading

Mikescher commented Oct 26, 2018 •

edited

Loading