Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL detection #76

Closed
smaragdus opened this issue Oct 21, 2018 · 11 comments
Closed

URL detection #76

smaragdus opened this issue Oct 21, 2018 · 11 comments
Assignees
Labels

Comments

@smaragdus
Copy link

smaragdus commented Oct 21, 2018

AlephNote cannot detect some URLs, screen:

alephnote 1 6 19 0 - 2018-10-21 - url - 001

ResophNotes detects the same URLs just fine, screen:

resophnotes 1 5 7 - 2018-10-21 - url - 001

Example URLs:

@Mikescher Mikescher self-assigned this Oct 23, 2018
@Mikescher Mikescher added the bug label Oct 23, 2018
@Mikescher
Copy link
Owner

and once again I update the URL regex :D

but thanks for the report - I think we slowly get to the point where all strange cases are included

Mikescher added a commit that referenced this issue Oct 23, 2018
@smaragdus
Copy link
Author

I am "URL" champion. 🥇
If it happens that I come across such undetected URL I will notify you.
I hesitated whether to open a new issue or to search for similar old issues but i was not sure whether you will see the new post in old issues so I decided for a new one.

@smaragdus
Copy link
Author

smaragdus commented Oct 23, 2018

Off-topic
When you have time please have a look at this because it makes me desperate. :(

@Mikescher
Copy link
Owner

Mikescher commented Oct 23, 2018

@smaragdus:

I created a new issue for this:
#77

That way it's easier for me to keep track of things, because closed issues don't appear in the github overview - even if they have new comments.

I hesitated whether to open a new issue or to search for similar old issues but i was not sure whether you will see the new post in old issues so I decided for a new one.

Yeah - new issues are always better :D - it's just more organized that way

@smaragdus
Copy link
Author

@Mikescher

Yeah - new issues are always better :D - it's just more organized that way

Fine, I will keep this in mind.

I created a new issue for this:
#77

Thank you very much!

@smaragdus
Copy link
Author

@Mikescher

I am afraid that the URLs I mentioned above are not detected in version 1.6.23, screen:

alephnote 1 6 23 0 - 2018-10-24 - urls - 002

Can you confirm this?

@Mikescher
Copy link
Owner

Mikescher commented Oct 24, 2018

Ah damnit - that minus sign is not a minus sign but an EN_DASH:

EN_DASH, does not work:
https://en.wikipedia.org/wiki/Dungan_Revolt_(1862–77)

Normal Minus, does work:
https://en.wikipedia.org/wiki/Dungan_Revolt_(1862-77)

I guess I will add all the EN/EM Dash stuff to the regex too (perhaps the whole unicode punctuation class)

By they way: What unholy browser do you all use that lets you copy such links?
If I try to copy the link I get a properly encoded one like that:

https://en.wikipedia.org/wiki/Dungan_Revolt_%281862%E2%80%9377%29

[Edit] Perhaps I should just give up on intelligent URL parsing and allow all characters except whitespace?

@Mikescher Mikescher reopened this Oct 24, 2018
@smaragdus
Copy link
Author

@Mikescher

[Edit] Perhaps I should just give up on intelligent URL parsing and allow all characters except whitespace?

I do not know. Actually the dashes are tricky:

  • hyphen (-)
  • en dash (–)
  • em dash (—)

In fact even white spaces can be part of the URLs, a couple of examples:

The same URLs copied by the 'normal' way look 'normal' too:

By they way: What unholy browser do you all use that lets you copy such links?

The unholy browser (I agree it is unholy, in fact for me all browsers are unholy now) is pre-Quantum version of Firefox, but I copy the URL using an add-on (UrlbarExt) which produces better-looking links for pasting.

@Mikescher
Copy link
Owner

Okay that explains it a bit.

I changed the URL matching now to only abort once a whitespace is found.
Unfortunately this now enables the problem I wanted to avoid from the beginning:
In this text snippet the recognized URL includes the closing parenthesis:

bullet point one (https://www.mikescher.com)

bullet point one (https://www.mikescher.com)

 

But I guess there is not really an optimal solution.

But if anyone is unsatisfied with the new regex, there is now a setting under the advanced section where you can change the URL matching mode:

In fact even white spaces can be part of the URLs, a couple of examples:

Okay but that is the one thing I really can't support - if I accepted spaces as valid URL parts then all URLs would only end at the end of the line (and you could even argue to accept a line break in URLs).
In this case you probably have to insert the "proper" %20 encoding

@smaragdus
Copy link
Author

I changed the URL matching now to only abort once a whitespace is found.
Unfortunately this now enables the problem I wanted to avoid from the beginning
...
But I guess there is not really an optimal solution.

The problem seems really insolvable. I think that I prefer URL not to be detected than the closing parenthesis to be recognized as part of the URL. However I would like to show you slightly different behaviour- ResophNotes does not recognize the opening parenthesis as part of URLs, screen:

resophnotes 1 7 0 - 2018-10-26 - url - 002

What do you think?

@Mikescher
Copy link
Owner

Mikescher commented Oct 26, 2018

What do you think?

Hmm it seems like ResophNotes forces the URL to start with an whitespace (or BeginOfLine)

I guess I could add this also - I currently can't think of a situation where this would make more problems than the current syntax

Btw: Opening and closing parenthesis are valid URL components ( according to the original RFC )

@Mikescher Mikescher reopened this Oct 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants