Skip to content

fixed clickability of URLs if they are surrounded by special characters#570

Closed
simensBat wants to merge 7 commits into
SevenTV:masterfrom
simensBat:fix-clickable-urls-if-special-characters
Closed

fixed clickability of URLs if they are surrounded by special characters#570
simensBat wants to merge 7 commits into
SevenTV:masterfrom
simensBat:fix-clickable-urls-if-special-characters

Conversation

@simensBat
Copy link
Copy Markdown

Fixed #548 Added an additional regular expression that fixes links inside messages if there are special characters near the links.
Special symbols:

  • []
  • ()
  • ,
  • .
  • :
  • ;

All of these characters are stripped during url validation.

As it was before:

image

As it is now:

image

Links are now clickable. But there is one feature, the URL of the link will be correct, without extra characters. But the link is displayed along with special characters.

image

I did not invent anything, because I would have to parse the string, split it into characters, and manually add each element as a token.

If this fix option is enough, I'll be glad. If not, I will try to fix it to the end.

@simensBat simensBat marked this pull request as ready for review May 7, 2023 16:42
@berghall
Copy link
Copy Markdown
Contributor

berghall commented May 8, 2023

I think it could be simpler to just use one single regex.

@AnatoleAM
Copy link
Copy Markdown
Contributor

By your own screenshot, this seems like incorrect behavior. Why are the characters around the link becoming part of it?

image

@simensBat
Copy link
Copy Markdown
Author

@AnatoleAM Because when there are special characters next to the link, it is no longer a link, but just a set of strings.

I have to remove special characters and check if there is a link in the string after removal.

There are no special characters in the transition link itself, but they are in "displayText"

If you try to remove them, you will have to render each special character as a separate token, and you will also need to follow the order so that the message text does not change.

@simensBat
Copy link
Copy Markdown
Author

Alternatively, I can offer a solution that will simply remove special characters from the message:

screencast-www.twitch.tv-2023.05.09-14_29_19.webm

This option is better, because now all text is not styled as a link. The value of these special characters is not very high. The main thing is that the link is clickable.

I tried to parse the message, separate the special characters into separate tokens and render them separately from the link, but I couldn't make a stable behavior where nothing would break into the chat message.

Maybe you have an idea how to do it better.

@kuel27
Copy link
Copy Markdown

kuel27 commented May 9, 2023

Removing the special characters isn't a good solution. You can modify the range parameter to correctly highlight only the URL part.

Comment thread src/common/chat/Tokenizer.ts Outdated
@simensBat simensBat force-pushed the fix-clickable-urls-if-special-characters branch from 459f983 to f5f0d78 Compare May 16, 2023 12:23
@simensBat simensBat force-pushed the fix-clickable-urls-if-special-characters branch from 0631999 to 9e3729c Compare May 16, 2023 12:30
@simensBat
Copy link
Copy Markdown
Author

@berghall
Hi. I still didn't understand what you wanted from me on the edits of regular expressions. The fact is that these regulars perform different tasks, one searches for a link in a line, the other removes special characters from the beginning and end of the link (special characters that go without a problem, they need to be removed in order to parse the link normally). I didn't find any examples on the Internet, with a similar search through one regular, everywhere either there is just a search for an url in a line (but I don't just have an url), or in two, as I did.

I modified the code, there were problems with determining the range in order to correctly insert a link between special characters and so that all these edits would not break the original text of the message.

I put this logic in a separate class so as not to clog up the Tokenizer

Base rules:

image

Rules for special characters:

image

image

@berghall
Copy link
Copy Markdown
Contributor

@Batyodie I mean that you don't need to use a regex to filter out special characters. What happens if I send a link surronded by characters you haven't specified in your special character regex?
You only need one regex to match URL's.

@simensBat
Copy link
Copy Markdown
Author

@berghall In different ways, a lot of curls, where the symbol is located, what kind of symbol, etc.

In the worst case, the link will become just text, another option is that this special character will be in the link display text, but it will not affect the link url.

It all comes down to the fact that we must find where the link starts and where it ends, in order to set the correct range for the token based on this information.

If you do not use 2 regular expressions to prepare the url string for parsing through tldParse, you still have to comb out these characters somehow. Otherwise, the current url validation check will not pass.

private isValidLink(link: string): URL | null {
		try {
		       // without clearing, the url comes here, where there is an extra character that breaks the check.
			const url = new URL(link);

			const { isIcann, domain } = tldParse(url.hostname);

			if (domain && isIcann) {
				return url;
			}
		} catch (e) {
			void 0;
		}

		return null;
	}

for the start of the link is even clearer if everything to the left of http | https is there, just cut it out. And what to do with the end of the link, without a regular expression, you still have to check that there are no strange characters there.

@kuel27
Copy link
Copy Markdown

kuel27 commented May 16, 2023

Here's a solution where the entire URL is matched. It could be cleaned though

https://github.com/kuel27/Extension/blob/improvements/src/common/chat/Tokenizer.ts#L139

@simensBat
Copy link
Copy Markdown
Author

@kuel27 Overall yes, your solution is better. The regularity looks terrible, but it works.

I would still make a check for links from the tails of else if.... It doesn't look very declarative, there are several conditions where we still redefine the variable inside this condition. But this is so, the little things of life.

As I understand it, this was done as part of another task, how should it be in such cases? Close this pr?

@simensBat
Copy link
Copy Markdown
Author

Or you can postpone this court decision, as part of this task.

@simensBat simensBat force-pushed the fix-clickable-urls-if-special-characters branch from 075a9f5 to 15a9694 Compare May 17, 2023 13:43
@AnatoleAM
Copy link
Copy Markdown
Contributor

Whether or not this is a desirable feature is debatable. Some popular twitch clients do not create hyperlinks when wrapped in parentheses or brackets.

The only real solution would be change the separation of text fragments (currently it's separated by spaces), but this would add extra complexity to the tokenization process, which is hard to justify for a relatively small improvement.

tl;dr this could be solved by changing a few characters on a single line, but likely unnecessary.

@AnatoleAM AnatoleAM closed this Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] URLs not clickable if surrounded by special characters

4 participants