Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix syntax colouring when inline comments contain multi-byte characters + numerical constants/punctuation #451

Merged
merged 1 commit into from
Oct 12, 2022

Conversation

shapayev
Copy link
Contributor

@shapayev shapayev commented Oct 8, 2022

When you type something like the following code in C/C++:

struct S {
    // кошка (just a cat, only 1)
    char mouse; // syntax colouring of this string will be broken
};
char cat[] = {"кошка"}; char mouse = -1;
// typing the end-of-line at the end of a line above will trigger the exception

the syntax colouring of a line below the triggering string will be broken. I've managed to reproduce this behaviour with the latest build of Eclipse Corrosion and the most current TM4E snapshot, you can see it in the animated gif below.

cat_and_mouse

EDIT: Double-checked this today with the provided example: in my case (Eclipse CDT with a small patch to connect the extension point of the generic editor to the TMPresentationReconciler of TM4E) without this fix I also had the following exception:

java.lang.IllegalArgumentException: Argument not valid
	at org.eclipse.swt.SWT.error(SWT.java:4899)
	at org.eclipse.swt.SWT.error(SWT.java:4833)
	at org.eclipse.swt.SWT.error(SWT.java:4804)
	at org.eclipse.swt.custom.StyledText.setStyleRanges(StyledText.java:10665)
	at org.eclipse.swt.custom.StyledText.replaceStyleRanges(StyledText.java:8238)
	at org.eclipse.jface.text.TextViewer.addPresentation(TextViewer.java:4777)
	at org.eclipse.jface.text.TextViewer.changeTextPresentation(TextViewer.java:4854)
	at org.eclipse.tm4e.ui.text.TMPresentationReconciler.applyTextRegionCollection(TMPresentationReconciler.java:692)
	at org.eclipse.tm4e.ui.text.TMPresentationReconciler.colorize(TMPresentationReconciler.java:596)
	at org.eclipse.tm4e.ui.text.TMPresentationReconciler$InternalListener.colorize(TMPresentationReconciler.java:410)

And so debugging this one had lead me to the LineTokenizer trying to handle the captured token of negative length.

@sebthom
Copy link
Member

sebthom commented Oct 9, 2022

Hi Alexander,
thank you for your contribution. Can you please describe the issue and provide some examples (e.g. screenshots) of the issue you are seeing?

@shapayev
Copy link
Contributor Author

shapayev commented Oct 9, 2022

Hi Sebastian,
Sorry for the lack of initial description. Hope I have fixed it in right way now.
Should I open a separate issue for the bug that this pull request proposes to fix?

@sebthom
Copy link
Member

sebthom commented Oct 9, 2022

We don't need a separate issue for this. Comments in the PR are fine.

@mickaelistria
Copy link
Contributor

Do you think you could add some tests that cover this bug+fix? I'm afraid it's too easy to introduce a regression later without a test.

@shapayev shapayev force-pushed the fixMultiByteStringComments branch 2 times, most recently from cd539f4 to 556cf61 Compare October 10, 2022 08:52
@shapayev
Copy link
Contributor Author

You're right, I've had the same thought that in current state it may be easily reintroduced.
Honestly, I'm not so fluent in Java and it's patterns so writing tests will not be so fast for me as catching bugs. But if I can take some time for doing this I shall appreciate it very much as this may be a good first opportunity to start.

P.S. Please sorry me for my mistake: forced-pushed squashed commits to the wrong pull request inadvertently, hope it's fixed now...

Fix finding the end of the regexp in a multi-byte string and incorrect use of bytesCount instead of the string length in multi-byte characters. In C/C++ grammars the bug is triggered by strings/comments containing multi-byte characters and numeric constants/punctuation.
Additional comment: lineTokens.produce() takes length of the string in multi-byte characters as a second parameter and captureIndices of OnigCaptureIndex are also counted as characters (see OnigNextMatchResult.captureIndicesOfMatch()).
@shapayev
Copy link
Contributor Author

Added some test cases. Have checked they all fail for the current master branch.
Any suggestions/comments?

P.S. I had to add surefire-junit-platform to the top-level pom.xml for internal tests to return. Were they skipped by intention?

@sebthom
Copy link
Member

sebthom commented Oct 12, 2022

The PR looks good to me. Thanks for adding the test cases.

The internal tests were not skipped by intention. Thanks for the fix.

@sebthom sebthom merged commit e10f425 into eclipse:master Oct 12, 2022
@shapayev
Copy link
Contributor Author

Sorry, can't comment inline at the moment so shall try to explain it here.
I shall fix the comment, the returned value looks good for me ;)
The dependency was added to force using JUnit5 by surefire. Could not deduce why the auto-detection is not working otherwise. Maybe there is a better fix, this one has turned to be the most compact.

@sebthom
Copy link
Member

sebthom commented Oct 12, 2022

I removed my inline comments again. All good. Thanks again for the PR. Identifying this issue and fixing it was very good work!

@shapayev
Copy link
Contributor Author

Thank you much very much for your warm reception! I am very glad I could be of any help to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants