Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MarkdownSanitizer sanitizes URLs #2378

Open
3 tasks done
EclipsedSolari opened this issue Jan 11, 2023 · 3 comments
Open
3 tasks done

MarkdownSanitizer sanitizes URLs #2378

EclipsedSolari opened this issue Jan 11, 2023 · 3 comments

Comments

@EclipsedSolari
Copy link

General Troubleshooting

  • I have checked for similar issues on the Issue-tracker.
  • I have updated to the latest JDA version
  • I have checked the branches or the maintainers' PRs for upcoming bug fixes.

Expected Behaviour

Discord treats URLs (almost) as literals, even URLs with Markdown characters in them.

Consider the following Google search:
https://www.google.com/search?q=__test__

Although the string __test__ is Markdown for underlined text, because it's part of a URL, Discord leaves it as-is.

JDA's MarkdownSanitizer does not make the distinction between URLs and normal text, so JDA escapes this URL:
https://www.google.com/search?q=\_\_test\_\_

When trying to send this URL as a Discord message, Discord replaces the backslashes with forward slashes but doesn't apply Markdown rules, including escaping, so the end result is the broken URL https://www.google.com/search?q=/_/_test/_/_.

I expected JDA's Markdown handler to leave URLs as-is.

Code Example for Reproduction Steps

//add this to the EscapeMarkdownTest class in src/test/java/MarkdownTest.java
//all of the assertions except the first fail
@Test
public void testUrl()
{
	Assertions.assertEquals("https://www.google.com/", markdown.compute("https://www.google.com/"));
	Assertions.assertEquals("https://www.google.com/search?q=_test_", markdown.compute("https://www.google.com/search?q=_test_"));
	Assertions.assertEquals("https://www.google.com/search?q=__test__", markdown.compute("https://www.google.com/search?q=__test__"));
	Assertions.assertEquals("https://www.google.com/search?q=*test*", markdown.compute("https://www.google.com/search?q=*test*"));
	Assertions.assertEquals("https://www.google.com/search?q=**test**", markdown.compute("https://www.google.com/search?q=**test**"));
	Assertions.assertEquals("https://www.google.com/search?q=***test***", markdown.compute("https://www.google.com/search?q=***test***"));
	Assertions.assertEquals("https://www.google.com/search?q=~~test~~", markdown.compute("https://www.google.com/search?q=~test~"));
}

Code for JDABuilder or DefaultShardManagerBuilder used

N/A

Exception or Error

N/A
@EclipsedSolari
Copy link
Author

I'm willing to take a shot at this issue, but I'd like to discuss the correct way to approach it first.

The current entities in MarkdownSanitizer are represented by simple tokens or by regex. Parsing URLs with regex is an unpleasant problem, and I would probably have to roll my own regex since I'm unsure of licensing (and correctness) implications if I grab one from somewhere else on the internet.

Java does have a URL parser, but its implementation seems like a bad fit for this use case, since I'd have to construct a new URL and potentially catch an exception (MalformedURLException) on every possible text sequence.

I could explore a hybrid approach, with a regex to detect possible leading URLs in a sequence (i.e. in the text sequence https://www.google.com?q=~~test~~ some other text here, each space indicates the end of a potential URL) and feed those potential URLs to java.net.URL for validation.

I'm open to any other ideas.

@LaFriska
Copy link

I am currently giving this issue a shot. This is somewhat complicated as previously suggested, but as long as the programme detects and URL, replaces it with a placeholder and changes it back after the algorithm, it should work correctly.

I would be able to make a pull request tomorrow.

@LaFriska
Copy link

For the final assertion in that test method, I think you may have made a mistake: the expect result should be the URL with only one layer of "~" wrapped around the word "test" in the url, not two.

LaFriska added a commit to LaFriska/JDA that referenced this issue Jul 16, 2023
-Suggested by EclipsedSolari in issue discord-jda#2378
-The algorithm implemented searches and replaces all URLs by a placeholder, and replaces them back after the majority of the compute method finishes.
-Also added a test method in the MarkdownTest class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants