Keep {} in Markdown rendering #11196

jeanprbt · 2024-04-15T11:20:03Z

This PR modifies the formatter HTMLChars used alongside the formatter Markdown to avoid the useless swallow of chars { and }. It adapts tests accordingly and adds RemoveBrackets formatter in relevant *.layout files to maintain previous behavior for non-Markdown fields.
See this comment.

Adapt tests
Adapt *.layout files

Closes #10928

Now, Markdown rendering keeps braces, see following screenshot.

Mandatory checks

Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

… adapted tests

jeanprbt · 2024-04-15T11:23:19Z

@koppor, in your comment you say the following.

Thinking more, the LaTeX command handling in HTMLChars should be removed. In the .layout files, LaTeXToUnicode should be used. - I think, HTMLChars will be much smaller then. HTML5 default encoding is unicode, thus LaTeXToUnicode is the right formatter.

Do you want me to also cover this in this current PR, or does this deserve another one ?

Also, the original issue mentions that Markdown rendering also swallows new lines, but removing the following lines in MardownFormatter solves the problem.

// workaround HTMLChars transforming "\n" into <br> by returning a one liner
        return html.replaceAll("\\r\\n|\\r|\\n", " ").trim();

The only drawback is that it seems to add more newlines than necessary, which may come from the external parser used. Do we keep the old version with no newlines ?

koppor

Small comments regarding the test

src/test/java/org/jabref/logic/layout/format/HTMLCharsTest.java

koppor · 2024-04-15T12:45:42Z

src/test/java/org/jabref/logic/layout/format/HTMLCharsTest.java

-        assertEquals("&#39;", layout.format("{\\textquotesingle}"));
+        assertEquals("&#39;", layout.format("\\textquotesingle"));


The code change is OK. Reason: { is not used as parameter for a LaTeX command.

Since the existing layouter is tested now, please add this back.

src/test/java/org/jabref/logic/layout/format/HTMLCharsTest.java

koppor · 2024-04-15T12:54:00Z

Please also add a test case to org.jabref.logic.layout.format.MarkdownFormatterTest so that it is really seen that this works.

koppor · 2024-04-15T12:55:17Z

Thinking more, the LaTeX command handling in HTMLChars should be removed. In the .layout files, LaTeXToUnicode should be used. - I think, HTMLChars will be much smaller then. HTML5 default encoding is unicode, thus LaTeXToUnicode is the right formatter.

Do you want me to also cover this in this current PR, or does this deserve another one ?

Since you went deep into fixing HTMLChars, that update can be done in a next PR. It also more about experimentation. --> Test cases in org.jabref.logic.layout.format.MarkdownFormatterTest are needed to ensure that the functionality is unchanged after rewriting HTMLChars.

koppor · 2024-04-15T12:56:54Z

The only drawback is that it seems to add more newlines than necessary,

Without going deep into the code, I cannot follow. Can you add test cases to org.jabref.logic.layout.format.MarkdownFormatterTest so that the current behavior is shown? Then, the new behavior can also be shown in test cases.

Remember that modern Java allows to use """ for multiline string constants (including newlines) which can come handy here.

jeanprbt · 2024-04-16T09:57:11Z

Small comments regarding the test

This is fixed now.

Please also add a test case to org.jabref.logic.layout.format.MarkdownFormatterTest so that it is really seen that this works.

I should have thought about this before, my bad.

Since you went deep into fixing HTMLChars, that update can be done in a next PR. It also more about experimentation. --> Test cases in org.jabref.logic.layout.format.MarkdownFormatterTest are needed to ensure that the functionality is unchanged after rewriting HTMLChars.

Ok ! I'll do this in a separate PR then. You're right it will be experimentation, since I don't really see the border between HTMLChars and LaTeXToUnicode formatters at the moment. I added many test cases to org.jabref.logic.layout.format.MarkdownFormatterTest to ensure functionnality will be unchanged in next PR.

jeanprbt · 2024-04-16T10:01:02Z

Without going deep into the code, I cannot follow. Can you add test cases to org.jabref.logic.layout.format.MarkdownFormatterTest so that the current behavior is shown? Then, the new behavior can also be shown in test cases.

Current behavior is shown using the following test.

 @Test
    void formatWhenFormattingComplexMarkupThenReturnsOnlyOneLine() {
        assertFalse(markdownFormatter.format("Markup\n\n* list item one\n* list item 2\n\n rest").contains("\n"));
    }

The formatter only removes every new line.

jeanprbt · 2024-04-16T10:06:17Z

Now I have to adapt every *.layout files, adding the formatter RemoveBrackets for non-Markdown fields. Will be done in a couple of days max.

koppor · 2024-04-16T18:34:43Z

Ok ! I'll do this in a separate PR then. You're right it will be experimentation, since I don't really see the border between HTMLChars and LaTeXToUnicode formatters at the moment.

Yeah, you are right. - I first thought LaTeXToUnicode is executed first and then some Unicode-to-HTML conversion. italics unicode back to <em>. However, this might be too complicated and the current approach is OK - even though LaTeX parsing is doubled. -- Maybe, it would make sense to use SnuggleTeX to parse LaTeX and do the conversion based on the DOM tree? (See org.jabref.logic.integrity.LatexIntegrityChecker for some code ideas).

However, LaTeX has a stable syntax, maybe, we leave everything as is.

koppor · 2024-04-16T18:36:37Z

Current behavior is shown using the following test.

Not quote. Please use assertEquals. Left: The new string, right: the original stirng.

USe

  String input = """
     test
  
     some more
     """

This is more readable for the human reader than reading \n.

koppor · 2024-04-16T18:42:48Z

Now I have to adapt every *.layout files, adding the formatter RemoveBrackets for non-Markdown fields. Will be done in a couple of days max.

This is a nightmare for backward compatiblity :)

Thinking aloud:

When using latex2unicode, these {} are removed. Therefore maybe the idea to use that in htmlchars. However, not wanted for markdown
The htmlchars converter could be parameterized to keep braces. Depending on the mode, it could behave differently. (enum BraceHandling: KEEP, REMOVE)
maybe the previewer can parameterize the htmlchars with KEEP - and the exporter using layout parameteirzed using REMOVE

OK, the default preview of JabRef needs to be adapted maybe... The normal latex2unicode should be able to handle most cases, thus I think, there probably is no adaption necessary?

jeanprbt · 2024-04-17T09:45:59Z

Yeah, you are right. - I first thought LaTeXToUnicode is executed first and then some Unicode-to-HTML conversion. italics unicode back to \<em\>. However, this might be too complicated and the current approach is OK - even though LaTeX parsing is doubled. -- Maybe, it would make sense to use SnuggleTeX to parse LaTeX and do the conversion based on the DOM tree? (See org.jabref.logic.integrity.LatexIntegrityChecker for some code ideas).

However, LaTeX has a stable syntax, maybe, we leave everything as is.

I agree with you. Since the current syntax is working properly and changes are not simply LaTeXToUnicode and then Unicode-to-HTML, but far more complex, we can leave everything as is. The issue about Markdown rendering with braces is now solved. In any case, perhaps the thymeleaf template engine will fill all these gaps this summer ? ;)

Not quote. Please use assertEquals. Left: The new string, right: the original string.

Use
  String input = """
     test
  
     some more
     """
This is more readable for the human reader than reading \n.

Understood, I implemented a test with multiline quotes in MarkdownFormatterTest.

jeanprbt · 2024-04-17T09:49:58Z

This is a nightmare for backward compatiblity :)

Thinking aloud:

When using latex2unicode, these {} are removed. Therefore maybe the idea to use that in htmlchars. However, not wanted for markdown

The htmlchars converter could be parameterized to keep braces. Depending on the mode, it could behave differently. (enum BraceHandling: KEEP, REMOVE)

maybe the previewer can parameterize the htmlchars with KEEP - and the exporter using layout parameteirzed using REMOVE

OK, the default preview of JabRef needs to be adapted maybe... The normal latex2unicode should be able to handle most cases, thus I think, there probably is no adaption necessary?

Your idea of an enum BraceHandling for HTMLChars is great, I love it. What I can do is modify HTMLChars to support both modes (i.e. remove braces or not), the default being REMOVE. I will only have to modify the preview to use the KEEP mode, which allows backwards compatibility in avoiding modifying all *.layout files. WDYT ?

koppor · 2024-04-17T11:22:20Z

Your idea of an enum BraceHandling for HTMLChars is great, I love it. What I can do is modify HTMLChars to support both modes (i.e. remove braces or not), the default being REMOVE.

Add a constructor accepting the mode. The default constructor calls the other one with REMOVE.

Then add some ifs in the code covering the two cases. - The places should be easy to find as they are the places of your modifications.

Modifying the tests will be more hard, but required, too.

Note: I am aware this is not OO style at all. However, rewriting in OO-style would make the class even more complex and not help IMHO.

I will only have to modify the preview to use the KEEP mode, which allows backwards compatibility in avoiding modifying all *.layout files. WDYT ?

Yes!

src/main/java/org/jabref/logic/layout/format/HTMLChars.java

src/main/java/org/jabref/migrations/PreferencesMigrations.java

koppor · 2024-04-21T20:41:07Z

src/test/java/org/jabref/logic/layout/format/HTMLCharsTest.java

-        assertEquals("&#39;", layout.format("{\\textquotesingle}"));
+        assertEquals("&#39;", layout.format("\\textquotesingle"));


Since the existing layouter is tested now, please add this back.

koppor · 2024-04-21T20:42:04Z

src/test/java/org/jabref/logic/layout/format/MarkdownFormatterTest.java

    }

+    @Test
+    void formatWhenFormattingHeaderThenReturnsHeaderInHtml() {


You could convert all of them to ParameterizedTests!

All the tests seems to test some markdown-to-html functionality, which JabRef does not implement by itself, but through a library.

I would remove them and keep formatWhenFormattingCodeBlockThenReturnsCodeBlockInHtml() { only.

And then add some tests with empty lines to see how the converter deals with blocks. Are they also removed - or is only the code block the issue?

What about code blocks "recommended" by MD031 of markdown-lint?

""" First line Second line ```java String test; ``` """

After having written some tests I get what's going on about new line formatting. The current formatter uses HTML <p> tags for each line. By default, HTML inserts a new line between each <p> tag, which is why in MarkdownFormatter we remove the \n before returning formatted strings, to avoid jumping two lines instead of one. But in <pre> tags which include the code in Mardown, there are no such <p> tags to indicate the new lines. When JabRef removes the \n before returning the final string, we lose track of those new lines in code blocks.

… preferences migrations and made keep braces mode name more clear in HTMLChars

src/main/java/org/jabref/logic/layout/format/HTMLChars.java

src/test/java/org/jabref/logic/layout/format/HTMLCharsTest.java

…e equality for braces keep mode in HTMLChars

src/test/java/org/jabref/logic/layout/format/MarkdownFormatterTest.java

jeanprbt added 2 commits April 15, 2024 13:10

fix: removed '{' and '}' swallow when not in command in HTMLChars and…

46863a4

… adapted tests

doc: updated CHANGELOG.md

a27c395

koppor requested changes Apr 15, 2024

View reviewed changes

jeanprbt added 6 commits April 16, 2024 11:13

merge main into current branch

23da0a4

fix: added old test cases and adapted them to apply requested changes

177a555

fix: fixed small rule in HTMLChars formatter to fix faulty tests

a0b4c64

test: added test case in MardownFormatterTest for string with braces

58c45f5

test: added more test cases in MarkdownFormatterTest

abacdaf

fix: fixed checkstyle erros in HTMLChars

7014ab5

jeanprbt added 2 commits April 17, 2024 11:57

test: added test about code between quotes in MardownFormatterTest

1354a84

Merge branch 'JabRef:main' into issue/10928

ec8f83e

jeanprbt added 4 commits April 18, 2024 09:42

merge branch main into current branch

f58e8d5

feat: added parameter to keep braces to HTMLChars

594a583

fix: merged main into current branch

564ee56

feat: modified default preview to keep braces when formatting markdown

49d8526

jeanprbt marked this pull request as ready for review April 19, 2024 17:22

jeanprbt requested a review from koppor April 21, 2024 17:59

koppor requested changes Apr 21, 2024

View reviewed changes

jeanprbt added 3 commits April 22, 2024 14:01

merged main into current branch

ddfe663

test: added parameterized test cases to MardownFormatterTest, updated…

cd8b380

… preferences migrations and made keep braces mode name more clear in HTMLChars

fix: fixed issue about preferences migrations

905df63

jeanprbt requested a review from koppor April 22, 2024 12:53

koppor requested changes Apr 22, 2024

View reviewed changes

src/main/java/org/jabref/logic/layout/format/HTMLChars.java Outdated Show resolved Hide resolved

src/test/java/org/jabref/logic/layout/format/HTMLCharsTest.java Show resolved Hide resolved

fix: rolling back modif. in HTMLCharsTest and enabled case-insensitiv…

ba2911a

…e equality for braces keep mode in HTMLChars

jeanprbt requested a review from koppor April 22, 2024 14:27

koppor requested changes Apr 22, 2024

View reviewed changes

src/test/java/org/jabref/logic/layout/format/MarkdownFormatterTest.java Show resolved Hide resolved

jeanprbt added 2 commits April 22, 2024 19:14

merged main into current branch

0c1f9b9

test: added test for markdown lists in MarkdowFormatterTest

b56debe

jeanprbt requested a review from koppor April 22, 2024 17:38

koppor approved these changes Apr 22, 2024

View reviewed changes

koppor added this pull request to the merge queue Apr 23, 2024

Merged via the queue into JabRef:main with commit dbd9257 Apr 23, 2024

		assertEquals("'", layout.format("{\\textquotesingle}"));
		assertEquals("'", layout.format("\\textquotesingle"));

Uh oh!

Keep {} in Markdown rendering #11196

Keep {} in Markdown rendering #11196

Uh oh!

Conversation

jeanprbt commented Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mandatory checks

Uh oh!

jeanprbt commented Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koppor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

koppor Apr 15, 2024

Choose a reason for hiding this comment

Uh oh!

koppor Apr 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

koppor commented Apr 15, 2024

Uh oh!

koppor commented Apr 15, 2024

Uh oh!

koppor commented Apr 15, 2024

Uh oh!

jeanprbt commented Apr 16, 2024

Uh oh!

jeanprbt commented Apr 16, 2024

Uh oh!

jeanprbt commented Apr 16, 2024

Uh oh!

koppor commented Apr 16, 2024

Uh oh!

koppor commented Apr 16, 2024

Uh oh!

koppor commented Apr 16, 2024

Uh oh!

jeanprbt commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeanprbt commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koppor commented Apr 17, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

koppor Apr 21, 2024

Choose a reason for hiding this comment

Uh oh!

koppor Apr 21, 2024

Choose a reason for hiding this comment

Uh oh!

jeanprbt Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeanprbt commented Apr 15, 2024 •

edited

Loading

jeanprbt commented Apr 15, 2024 •

edited

Loading

jeanprbt commented Apr 17, 2024 •

edited

Loading

jeanprbt commented Apr 17, 2024 •

edited

Loading