Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong syntax highlighting for Matlab strings #18

Closed
stefano-zanotti opened this issue Jun 28, 2021 · 13 comments
Closed

Wrong syntax highlighting for Matlab strings #18

stefano-zanotti opened this issue Jun 28, 2021 · 13 comments
Labels
committed Issue fixed in repository but not in release matlab Caused by the matlab or octave lexer

Comments

@stefano-zanotti
Copy link

As reported to Notepad++ (issue #10065), where they redirected me to lexilla.

Description of the Issue

In Matlab (at least in version '9.10.0.1649659 (R2021a) Update 1', but I don't think it has ever been different), a double-quoted string does not have escape sequences. All characters represent themselves, except for double quotes - you need to double them.
E.g.

"\" -> \
"\\" -> \\
"a""a" -> a"a

C-like escape sequences are only converted when the string is passed to sprintf or similar.
Instead, Notepad++'s syntax coloring interprets those sequences as actual escape sequences.

Steps to Reproduce the Issue

Paste this text in a notepad++ tab, with Matlab as the language:

a="""";
b=1;
c='\';
d=2;
e="\";
f=3;
%" this should be a comment (colored as such), instead it closes the string
g="
h=123;
%" this is a syntax error in Matlab (about 'g'),
% followed by a valid assignment (of 'h')
% Instead, 'h' is colored as part of the string

Expected Behavior

Only the strings should be colored as strings.
Also, Matlab strings cannot span multiple lines, so the coloring should stop at the end of the line, even if no closing quote is found.

Actual Behavior

The 'e' and 'g' strings are not detected properly, so their coloring spills to the next semicolon, and also to the next line.

Debug Information

Notepad++ v7.9.5 (64-bit)
Build time : Mar 21 2021 - 02:13:17
Path : C:\Program Files\Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS Name : Windows 10 Pro (64-bit)
OS Version : 2004
OS Build : 19041.1052
Current ANSI codepage : 1252
Plugins : HexEditor.dll mimeTools.dll NppConverter.dll NppExport.dll

Screenshots

Notepad++ coloring:
notepad

Matlab coloring:
matlab

@nyamatongwe nyamatongwe added the matlab Caused by the matlab or octave lexer label Jun 28, 2021
@nyamatongwe
Copy link
Member

Documentation:
https://www.mathworks.com/help/matlab/matlab_prog/represent-text-with-character-and-string-arrays.html

LexMatlab, the Matlab lexer is also responsible for GNU Octave which appears to allow escape sequences.
http://laris.fesb.hr/digitalno_vodjenje/octave/doc/octave_6.html#SEC56
A change should allow escape sequences in Octave mode.

@zufuliu
Copy link
Contributor

zufuliu commented Jun 28, 2021

diff --git a/lexers/LexMatlab.cxx b/lexers/LexMatlab.cxx
index 9c319887..b4ce6b37 100644
--- a/lexers/LexMatlab.cxx
+++ b/lexers/LexMatlab.cxx
@@ -188,12 +188,16 @@ static void ColouriseMatlabOctaveDoc(
  				}
 			}
 		} else if (sc.state == SCE_MATLAB_DOUBLEQUOTESTRING) {
-			if (sc.ch == '\\') {
+			if (sc.ch == '\\' && !ismatlab) {
 				if (sc.chNext == '\"' || sc.chNext == '\'' || sc.chNext == '\\') {
 					sc.Forward();
 				}
 			} else if (sc.ch == '\"') {
-				sc.ForwardSetState(SCE_MATLAB_DEFAULT);
+				if (sc.chNext == '\"') {
+ 					sc.Forward();
+				} else {
+					sc.ForwardSetState(SCE_MATLAB_DEFAULT);
+ 				}
 			}
 		} else if (sc.state == SCE_MATLAB_COMMAND) {
 			if (sc.atLineEnd) {

@nyamatongwe
Copy link
Member

Committed the first change to LexMatlab.cxx in the patch (checking for !ismatlab) as 1c1a4cb which seems to me to fix the problem.

The second change in the patch appears unrelated and does not affect the output.

@zufuliu
Copy link
Contributor

zufuliu commented Jul 18, 2021

That's fine as escape sequence highlighting is not implemented.

@stefano-zanotti-88
Copy link

The bug is still partially present in Notepad++ 8.4.7 (I don't know which version of lexilla it uses).
This is the current coloring:

img2

The backslash issue was fixed, but the "multiline" string is treated as a proper string by lexilla, whereas it is a syntax error in Matlab. That is, the newline character after the opening double quotes for variable g terminates the statement, so the string coloring should not spill to the next line.
In Matlab, any statement terminates at the newline, even if we are in the middle of a string, or a function call, or an array. Only "..." at the end of the line causes the statement to continue to the next line.

See the previous message for Matlab's coloring: the syntax error terminates at the newline, and the next lines are all properly parsed, as if the parser was reset at the beginning of the line.

@nyamatongwe nyamatongwe reopened this Dec 10, 2022
@zufuliu
Copy link
Contributor

zufuliu commented Dec 11, 2022

Can be fixed with following changes:

diff --git a/lexers/LexMatlab.cxx b/lexers/LexMatlab.cxx
index 4a3ae85f..39c4b0dd 100644
--- a/lexers/LexMatlab.cxx
+++ b/lexers/LexMatlab.cxx
@@ -297,6 +297,8 @@ static void ColouriseMatlabOctaveDoc(
 				} else {
 					sc.ForwardSetState(SCE_MATLAB_DEFAULT);
 				}
+			} else if (sc.atLineEnd) {
+				sc.SetState(SCE_MATLAB_DEFAULT);
 			}
 		} else if (sc.state == SCE_MATLAB_DOUBLEQUOTESTRING) {
 			if (sc.ch == '\\' && !ismatlab) {
@@ -310,6 +312,8 @@ static void ColouriseMatlabOctaveDoc(
 			if (sc.atLineEnd) {
 				sc.SetState(SCE_MATLAB_DEFAULT);
 				transpose = false;
+			} else if (sc.atLineEnd) {
+				sc.SetState(SCE_MATLAB_DEFAULT);
 			}
 		} else if (sc.state == SCE_MATLAB_COMMENT) {
 			// end or start of a nested a block comment?

@nyamatongwe
Copy link
Member

Second chunk is repeating condition from immediately preceding if so appears wrong.

@zufuliu
Copy link
Contributor

zufuliu commented Dec 11, 2022

} else if (sc.state == SCE_MATLAB_STRING) {
	if (sc.ch == '\'') {
		if (sc.chNext == '\'') {
			sc.Forward();
		} else {
			sc.ForwardSetState(SCE_MATLAB_DEFAULT);
		}
	} else if (sc.atLineEnd) {
		sc.SetState(SCE_MATLAB_DEFAULT);
	}
} else if (sc.state == SCE_MATLAB_DOUBLEQUOTESTRING) {
	if (sc.ch == '\\' && !ismatlab) {
		if (sc.chNext == '\"' || sc.chNext == '\'' || sc.chNext == '\\') {
			sc.Forward();
		}
	} else if (sc.ch == '\"') {
		sc.ForwardSetState(SCE_MATLAB_DEFAULT);
	} else if (sc.atLineEnd) {
		sc.SetState(SCE_MATLAB_DEFAULT);
	}
} else if (sc.state == SCE_MATLAB_COMMAND) {

@nyamatongwe
Copy link
Member

GNU Octave allows newlines in strings if they are preceded by a backslash escape character.

a = "octave"
disp(a)
b = "multi\
line"
disp(b)

$ octave x.m.octave
a = octave
octave
b = multiline
multiline

Without a backslash escape, it is a syntax error.

b = "multi
line"
disp(b)

$ octave x.m.octave
error: source: error sourcing file '/home/neil/merc/x.m.octave'

@stefano-zanotti-88
Copy link

Matlab does not support escaping a newline character with a backslash, instead.

@zufuliu
Copy link
Contributor

zufuliu commented Dec 12, 2022

change atLineEnd to MatchLineEnd() would fix line continuation.

} else if (sc.state == SCE_MATLAB_STRING) {
	if (sc.ch == '\'') {
		if (sc.chNext == '\'') {
			sc.Forward();
		} else {
			sc.ForwardSetState(SCE_MATLAB_DEFAULT);
		}
	} else if (sc.MatchLineEnd()) {
		sc.SetState(SCE_MATLAB_DEFAULT);
	}
} else if (sc.state == SCE_MATLAB_DOUBLEQUOTESTRING) {
	if (sc.ch == '\\' && !ismatlab) {
		sc.Forward(); // skip escape sequence, new line and others after backlash
	} else if (sc.ch == '\"') {
		sc.ForwardSetState(SCE_MATLAB_DEFAULT);
	} else if (sc.MatchLineEnd()) {
		sc.SetState(SCE_MATLAB_DEFAULT);
	}
} else if (sc.state == SCE_MATLAB_COMMAND) {

nyamatongwe pushed a commit that referenced this issue Dec 15, 2022
@nyamatongwe nyamatongwe added the committed Issue fixed in repository but not in release label Dec 15, 2022
@nyamatongwe
Copy link
Member

Committed above change.

@nyamatongwe
Copy link
Member

Fixed in 5.2.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
committed Issue fixed in repository but not in release matlab Caused by the matlab or octave lexer
Projects
None yet
Development

No branches or pull requests

4 participants