Fix syntax colouring when inline comments contain multi-byte characters + numerical constants/punctuation #451

shapayev · 2022-10-08T18:54:44Z

When you type something like the following code in C/C++:

struct S {
    // кошка (just a cat, only 1)
    char mouse; // syntax colouring of this string will be broken
};
char cat[] = {"кошка"}; char mouse = -1;
// typing the end-of-line at the end of a line above will trigger the exception

the syntax colouring of a line below the triggering string will be broken. I've managed to reproduce this behaviour with the latest build of Eclipse Corrosion and the most current TM4E snapshot, you can see it in the animated gif below.

EDIT: Double-checked this today with the provided example: in my case (Eclipse CDT with a small patch to connect the extension point of the generic editor to the TMPresentationReconciler of TM4E) without this fix I also had the following exception:

java.lang.IllegalArgumentException: Argument not valid
	at org.eclipse.swt.SWT.error(SWT.java:4899)
	at org.eclipse.swt.SWT.error(SWT.java:4833)
	at org.eclipse.swt.SWT.error(SWT.java:4804)
	at org.eclipse.swt.custom.StyledText.setStyleRanges(StyledText.java:10665)
	at org.eclipse.swt.custom.StyledText.replaceStyleRanges(StyledText.java:8238)
	at org.eclipse.jface.text.TextViewer.addPresentation(TextViewer.java:4777)
	at org.eclipse.jface.text.TextViewer.changeTextPresentation(TextViewer.java:4854)
	at org.eclipse.tm4e.ui.text.TMPresentationReconciler.applyTextRegionCollection(TMPresentationReconciler.java:692)
	at org.eclipse.tm4e.ui.text.TMPresentationReconciler.colorize(TMPresentationReconciler.java:596)
	at org.eclipse.tm4e.ui.text.TMPresentationReconciler$InternalListener.colorize(TMPresentationReconciler.java:410)

And so debugging this one had lead me to the LineTokenizer trying to handle the captured token of negative length.

sebthom · 2022-10-09T10:42:48Z

Hi Alexander,
thank you for your contribution. Can you please describe the issue and provide some examples (e.g. screenshots) of the issue you are seeing?

shapayev · 2022-10-09T12:14:40Z

Hi Sebastian,
Sorry for the lack of initial description. Hope I have fixed it in right way now.
Should I open a separate issue for the bug that this pull request proposes to fix?

sebthom · 2022-10-09T18:30:48Z

We don't need a separate issue for this. Comments in the PR are fine.

mickaelistria · 2022-10-10T08:13:48Z

Do you think you could add some tests that cover this bug+fix? I'm afraid it's too easy to introduce a regression later without a test.

shapayev · 2022-10-10T09:15:40Z

You're right, I've had the same thought that in current state it may be easily reintroduced.
Honestly, I'm not so fluent in Java and it's patterns so writing tests will not be so fast for me as catching bugs. But if I can take some time for doing this I shall appreciate it very much as this may be a good first opportunity to start.

P.S. Please sorry me for my mistake: forced-pushed squashed commits to the wrong pull request inadvertently, hope it's fixed now...

Fix finding the end of the regexp in a multi-byte string and incorrect use of bytesCount instead of the string length in multi-byte characters. In C/C++ grammars the bug is triggered by strings/comments containing multi-byte characters and numeric constants/punctuation. Additional comment: lineTokens.produce() takes length of the string in multi-byte characters as a second parameter and captureIndices of OnigCaptureIndex are also counted as characters (see OnigNextMatchResult.captureIndicesOfMatch()).

shapayev · 2022-10-12T19:11:03Z

Added some test cases. Have checked they all fail for the current master branch.
Any suggestions/comments?

P.S. I had to add surefire-junit-platform to the top-level pom.xml for internal tests to return. Were they skipped by intention?

sebthom · 2022-10-12T19:42:45Z

The PR looks good to me. Thanks for adding the test cases.

The internal tests were not skipped by intention. Thanks for the fix.

shapayev · 2022-10-12T19:50:09Z

Sorry, can't comment inline at the moment so shall try to explain it here.
I shall fix the comment, the returned value looks good for me ;)
The dependency was added to force using JUnit5 by surefire. Could not deduce why the auto-detection is not working otherwise. Maybe there is a better fix, this one has turned to be the most compact.

sebthom · 2022-10-12T19:53:01Z

I removed my inline comments again. All good. Thanks again for the PR. Identifying this issue and fixing it was very good work!

shapayev · 2022-10-12T19:56:28Z

Thank you much very much for your warm reception! I am very glad I could be of any help to you.

shapayev force-pushed the fixMultiByteStringComments branch from 1fda72c to 556cf61 Compare October 9, 2022 06:36

shapayev force-pushed the fixMultiByteStringComments branch 2 times, most recently from cd539f4 to 556cf61 Compare October 10, 2022 08:52

shapayev force-pushed the fixMultiByteStringComments branch from 556cf61 to 782ea94 Compare October 12, 2022 18:58

sebthom merged commit e10f425 into eclipse:master Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix syntax colouring when inline comments contain multi-byte characters + numerical constants/punctuation #451

Fix syntax colouring when inline comments contain multi-byte characters + numerical constants/punctuation #451

shapayev commented Oct 8, 2022 •

edited

Loading

sebthom commented Oct 9, 2022

shapayev commented Oct 9, 2022 •

edited

Loading

sebthom commented Oct 9, 2022

mickaelistria commented Oct 10, 2022

shapayev commented Oct 10, 2022

shapayev commented Oct 12, 2022

sebthom commented Oct 12, 2022

shapayev commented Oct 12, 2022

sebthom commented Oct 12, 2022

shapayev commented Oct 12, 2022

Fix syntax colouring when inline comments contain multi-byte characters + numerical constants/punctuation #451

Fix syntax colouring when inline comments contain multi-byte characters + numerical constants/punctuation #451

Conversation

shapayev commented Oct 8, 2022 • edited Loading

sebthom commented Oct 9, 2022

shapayev commented Oct 9, 2022 • edited Loading

sebthom commented Oct 9, 2022

mickaelistria commented Oct 10, 2022

shapayev commented Oct 10, 2022

shapayev commented Oct 12, 2022

sebthom commented Oct 12, 2022

shapayev commented Oct 12, 2022

sebthom commented Oct 12, 2022

shapayev commented Oct 12, 2022

shapayev commented Oct 8, 2022 •

edited

Loading

shapayev commented Oct 9, 2022 •

edited

Loading