fixed clickability of URLs if they are surrounded by special characters by simensBat · Pull Request #570 · SevenTV/Extension

simensBat · 2023-05-07T16:40:36Z

Fixed #548 Added an additional regular expression that fixes links inside messages if there are special characters near the links.
Special symbols:

[]
()
,
.
:
;

All of these characters are stripped during url validation.

As it was before:

As it is now:

Links are now clickable. But there is one feature, the URL of the link will be correct, without extra characters. But the link is displayed along with special characters.

I did not invent anything, because I would have to parse the string, split it into characters, and manually add each element as a token.

If this fix option is enough, I'll be glad. If not, I will try to fix it to the end.

berghall · 2023-05-08T18:58:57Z

I think it could be simpler to just use one single regex.

AnatoleAM · 2023-05-08T21:57:37Z

By your own screenshot, this seems like incorrect behavior. Why are the characters around the link becoming part of it?

simensBat · 2023-05-09T09:06:56Z

@AnatoleAM Because when there are special characters next to the link, it is no longer a link, but just a set of strings.

I have to remove special characters and check if there is a link in the string after removal.

There are no special characters in the transition link itself, but they are in "displayText"

If you try to remove them, you will have to render each special character as a separate token, and you will also need to follow the order so that the message text does not change.

simensBat · 2023-05-09T12:33:19Z

Alternatively, I can offer a solution that will simply remove special characters from the message:

screencast-www.twitch.tv-2023.05.09-14_29_19.webm

This option is better, because now all text is not styled as a link. The value of these special characters is not very high. The main thing is that the link is clickable.

I tried to parse the message, separate the special characters into separate tokens and render them separately from the link, but I couldn't make a stable behavior where nothing would break into the chat message.

Maybe you have an idea how to do it better.

kuel27 · 2023-05-09T13:51:26Z

Removing the special characters isn't a good solution. You can modify the range parameter to correctly highlight only the URL part.

…cher in text

…url matcher in text.

simensBat · 2023-05-16T12:45:37Z

@berghall
Hi. I still didn't understand what you wanted from me on the edits of regular expressions. The fact is that these regulars perform different tasks, one searches for a link in a line, the other removes special characters from the beginning and end of the link (special characters that go without a problem, they need to be removed in order to parse the link normally). I didn't find any examples on the Internet, with a similar search through one regular, everywhere either there is just a search for an url in a line (but I don't just have an url), or in two, as I did.

I modified the code, there were problems with determining the range in order to correctly insert a link between special characters and so that all these edits would not break the original text of the message.

I put this logic in a separate class so as not to clog up the Tokenizer

Base rules:

Rules for special characters:

berghall · 2023-05-16T15:45:41Z

@Batyodie I mean that you don't need to use a regex to filter out special characters. What happens if I send a link surronded by characters you haven't specified in your special character regex?
You only need one regex to match URL's.

simensBat · 2023-05-16T16:12:44Z

@berghall In different ways, a lot of curls, where the symbol is located, what kind of symbol, etc.

In the worst case, the link will become just text, another option is that this special character will be in the link display text, but it will not affect the link url.

It all comes down to the fact that we must find where the link starts and where it ends, in order to set the correct range for the token based on this information.

If you do not use 2 regular expressions to prepare the url string for parsing through tldParse, you still have to comb out these characters somehow. Otherwise, the current url validation check will not pass.

private isValidLink(link: string): URL | null {
		try {
		       // without clearing, the url comes here, where there is an extra character that breaks the check.
			const url = new URL(link);

			const { isIcann, domain } = tldParse(url.hostname);

			if (domain && isIcann) {
				return url;
			}
		} catch (e) {
			void 0;
		}

		return null;
	}

for the start of the link is even clearer if everything to the left of http | https is there, just cut it out. And what to do with the end of the link, without a regular expression, you still have to check that there are no strange characters there.

kuel27 · 2023-05-16T16:15:23Z

Here's a solution where the entire URL is matched. It could be cleaned though

https://github.com/kuel27/Extension/blob/improvements/src/common/chat/Tokenizer.ts#L139

simensBat · 2023-05-16T18:03:18Z

@kuel27 Overall yes, your solution is better. The regularity looks terrible, but it works.

I would still make a check for links from the tails of else if.... It doesn't look very declarative, there are several conditions where we still redefine the variable inside this condition. But this is so, the little things of life.

As I understand it, this was done as part of another task, how should it be in such cases? Close this pr?

simensBat · 2023-05-17T11:17:22Z

Or you can postpone this court decision, as part of this task.

AnatoleAM · 2023-06-20T15:21:38Z

Whether or not this is a desirable feature is debatable. Some popular twitch clients do not create hyperlinks when wrapped in parentheses or brackets.

The only real solution would be change the separation of text fragments (currently it's separated by spaces), but this would add extra complexity to the tokenization process, which is hard to justify for a relatively small improvement.

tl;dr this could be solved by changing a few characters on a single line, but likely unnecessary.

simensBat marked this pull request as ready for review May 7, 2023 16:42

simensBat mentioned this pull request May 7, 2023

[BUG] URLs not clickable if surrounded by special characters #548

Closed

2 tasks

berghall reviewed May 9, 2023

View reviewed changes

Comment thread src/common/chat/Tokenizer.ts Outdated

simensBat added 4 commits May 16, 2023 15:16

fixed modify range parameters so that special characters. Add url mat…

785c81a

…cher in text

fixed clickability of URLs if they are surrounded by special characters

f625e67

fix comma

2c4a622

fixed modify range parameters so that special characters. Add module …

f5f0d78

…url matcher in text.

simensBat force-pushed the fix-clickable-urls-if-special-characters branch from 459f983 to f5f0d78 Compare May 16, 2023 12:23

fix logger

9e3729c

simensBat force-pushed the fix-clickable-urls-if-special-characters branch from 0631999 to 9e3729c Compare May 16, 2023 12:30

fix changelog nightly

15a9694

simensBat force-pushed the fix-clickable-urls-if-special-characters branch from 075a9f5 to 15a9694 Compare May 17, 2023 13:43

Merge branch 'master' into fix-clickable-urls-if-special-characters

706b573

AnatoleAM closed this Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed clickability of URLs if they are surrounded by special characters#570

fixed clickability of URLs if they are surrounded by special characters#570
simensBat wants to merge 7 commits into
SevenTV:masterfrom
simensBat:fix-clickable-urls-if-special-characters

simensBat commented May 7, 2023

Uh oh!

berghall commented May 8, 2023

Uh oh!

AnatoleAM commented May 8, 2023

Uh oh!

simensBat commented May 9, 2023

Uh oh!

simensBat commented May 9, 2023

Uh oh!

kuel27 commented May 9, 2023

Uh oh!

Uh oh!

simensBat commented May 16, 2023

Uh oh!

berghall commented May 16, 2023

Uh oh!

simensBat commented May 16, 2023

Uh oh!

kuel27 commented May 16, 2023

Uh oh!

simensBat commented May 16, 2023

Uh oh!

simensBat commented May 17, 2023

Uh oh!

AnatoleAM commented Jun 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

simensBat commented May 7, 2023

Uh oh!

berghall commented May 8, 2023

Uh oh!

AnatoleAM commented May 8, 2023

Uh oh!

simensBat commented May 9, 2023

Uh oh!

simensBat commented May 9, 2023

Uh oh!

kuel27 commented May 9, 2023

Uh oh!

Uh oh!

simensBat commented May 16, 2023

Uh oh!

berghall commented May 16, 2023

Uh oh!

simensBat commented May 16, 2023

Uh oh!

kuel27 commented May 16, 2023

Uh oh!

simensBat commented May 16, 2023

Uh oh!

simensBat commented May 17, 2023

Uh oh!

AnatoleAM commented Jun 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants