Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax Highlighting for JavaScript Doesn't Understand Template Literals #94

Closed
Lonniebiz opened this issue Jul 6, 2022 · 20 comments
Closed
Labels
committed Issue fixed in repository but not in release cpp Caused by the cpp lexer javascript Caused for JavaScript by the cpp or html lexer

Comments

@Lonniebiz
Copy link

Lonniebiz commented Jul 6, 2022

I originally reported this issue to Geany:
geany/geany#3231

The Syntax Highlighting for JavaScript Doesn't Understand Template Literals. For example if you put an expression like ${expression} inside a template literal the syntax highlighting doesn't distinguish that expression from the rest of the string.

For example, here lexilla is providing syntax highlighing for geany:
ss

Here's that same code in atom :
atom

Notice that the ${expressions} are distinguished with a different color than the rest of the string.

@nyamatongwe nyamatongwe added the cpp Caused by the cpp lexer label Jul 6, 2022
@nyamatongwe
Copy link
Member

Just to be completely clear here - this request is for files that contain only JavaScript, as distinct from HTML files that embed some JavaScript?

The example in atom shows differentiation of elements inside the ${...} with order being styled differently to its fields - is that essential to the feature?

It helps implementers for issues to include example code as text files since it is time-consuming and error-prone to type in text from images.

@zufuliu
Copy link
Contributor

zufuliu commented Jul 6, 2022

@Lonniebiz
Copy link
Author

Lonniebiz commented Jul 7, 2022

@nyamatongwe : This type of syntax highlighting would be appreciated anywhere javascript is used. While I'm primarily wanting it for .js and .mjs file extensions, if javascript is inside .HTML via a <script> tag, this too would certainly be appreciated.

As for the further color distinction, ${ocurring.withinThe.Expression}, I'm not sure whether others would consider that necessary or not. I guess it is just distinguishing an object from its properties. Later in the Atom syntax highlighting it also distinguishes methods/functions from properties, where you see it making `.toString()' blue (that's a function/method).

Mainly, I'd just appreciate distinction between the string and the expressions it contains. More info about Template Literals is here.

Sample Code:

let order = {};
order.totalWeight = 1000;
order.destination = {};
order.destination.city = "Houston";
order.destination.stateProvince = "TX";
order.destination.postalCode = "77038";

let objHeader = {};
objHeader['Content-Type'] = 'application/json';
let fetchOptions = {};
fetchOptions.headers = new Headers(objHeader);
fetchOptions.body = `
{
	"accountNumber" : "248796",
	"customerType" : "Shipper",
	"destinationCity" : "${order.destination.city}",
	"destinationState" : "${order.destination.stateProvince}",
	"destinationZip" : ${order.destination.postalCode},
	"paymentType" : "Prepaid",
	"shipmentInfo" :
	{
		"items" : [ { "shipmentClass" : "50", "shipmentWeight" : "${order.totalWeight.toString()}" } ]
	}
}`;

lexilla via Geany 1.37.1 :
geany

Some other open source highlighter via Atom :
atom

@nyamatongwe
Copy link
Member

As mentioned on the linked Scintilla Feature Tracker issue, there are complicating aspects here including expressions containing strings, nested expressions, and multi-line expressions. There's probably more issues since I don't use or really understand the details of this feature.

If all the complications are ignored then its reasonably simple to declare an 'expression' style, transition into it when ${ is found in a raw string with ` delimiters and transition back to raw string when } is found. Maybe add a property to enable/disable the feature. The question is whether a partial implementation is better than the current state or is actually worse and will mislead users.

For some perspective, Python's similar f-string feature was implemented in these change sets: 1438aa0, eb64bf6, ec6c46e, fa29d60. Its not a small feature.

@Lonniebiz
Copy link
Author

Lonniebiz commented Jul 9, 2022

I think a partial 'expression' style is better than no distinction at all. That's mainly what I need. But, think about it like this: The portion between the ${} should probably be treated like the code that's outside the entire tic tic string. But, if that's hard to figure out, just making ${} (and everything inside it) a different color is a helpful improvement.

Mainly, what I'm wanting, is color distinction between the static string parts and the expressions. I think every javascript developer would find that helpful.

If you achieve that, I'd make it default, and not worry about enabling or disabling it. I just can't imagine a complaint; everyone would rather have that than not (unless they're deliberately trying to defy my generalization for the sake of doing so).

I'm sure someone will request a more sophisticated refinement later, but the minimal idea you proposed is indeed step in the right direction.

@nyamatongwe
Copy link
Member

nyamatongwe commented Jan 2, 2024

Implemented a minimal version of a SCE_C_TEMPLATEEXPRESSION style that starts with ${ inside a backtick string and terminates with the next }.

Patch: JSTemplateExpression.patch

Since this change adds a new style, downstream projects would likely see this as unstyled or default until they are modified to support SCE_C_TEMPLATEEXPRESSION.

@zufuliu
Copy link
Contributor

zufuliu commented Jan 3, 2024

new style is not needed when highlighted like Python f-string or Ruby string interpolation:

  • \$ escapes dollar
  • ${ starts nested expression interpolation, next matching } ends interpolation.

screenshot for following code with Notepad2's PHP Page and JavaScript schemes:

<html>
<script>
var xxx = {
    '1': `\${a${1 + 1}b}`,
    '2': `\${a${ `b${1 + 2}c`.charCodeAt(2) }d}`,
    '3': `\${a${ `b${ `c${ JSON.stringify({'a':'b'}) }d` }e` }f}`,
};
document.write(JSON.stringify(xxx));
</script>
</html>

image

@zufuliu
Copy link
Contributor

zufuliu commented Jan 4, 2024

updated test case with normal escape sequence and multiline expression:

<html>
<script>
var xxx = {
    '1': `\`\u0020\${a${1 + 1}b}`,
    '2': `\${a${ `b${1 + 2}c`.charCodeAt(2) }d}`,
    '3': `\${a${ `b${ `c${ JSON.stringify({
        '4': {},
        }) }d` }e` }f}`,
};
document.write(JSON.stringify(xxx));
</script>
</html>

also a draft patch: TemplateLiteral-0104.patch

@zufuliu
Copy link
Contributor

zufuliu commented Jan 5, 2024

Just some renaming: TemplateLiteral-0105.patch

@zufuliu
Copy link
Contributor

zufuliu commented Jan 5, 2024

I don't understand how the code works:

@nyamatongwe
Copy link
Member

nyamatongwe commented Jan 5, 2024

Sci_PositionU startPos parameter is not line start position

This does not occur in normal use, since EnsureStyledTo and ScintillaBase::NotifyStyleToNeeded (for !container) use LineStartPosition(GetEndStyled()). However, there is still a hole to allow containers to implement non-line-based lexing if really wanted but they are then responsible for the results.

delete all states for line >= lineCurrent

Because styling is a forward cascade and states beyond the change position may not be valid.

@nyamatongwe
Copy link
Member

The code design seems good.

The '2' and '3' examples don't terminate. It would still be an improvement if it didn't support nested raw strings.

var xxx = {
    '1': `\`\u0020\${a${1 + 1}b}`,
    '2': `\${a${ `b${1 + 2}c`.charCodeAt(2) }d}`,
    '3': `\${a${ `b${ `c${ JSON.stringify({
        '4': {},
        }) }d` }e` }f}`,
};

JSTemplateLiteral

@zufuliu
Copy link
Contributor

zufuliu commented Jan 5, 2024

Because styling is a forward cascade and states beyond the change position may not be valid.

Got it, all text after modification position is restyled.

The '2' and '3' examples don't terminate.

Fixed by restore rawStringTerminator after pop back.
TemplateLiteral-0106.patch

It seems worth to add a template class to wrap the code (save and restore states) as interpolating become popular in languages, something like:

template <typename T>
class LineStateMap {
	std::map<Sci_Position, T> stateMap;
public:
	void Save(Sci_Position lineCurrent, const T &state) {
		stateMap[lineCurrent] = state;
	}
	void Restore(Sci_Position lineCurrent, T &state) {
		auto it = stateMap.find(lineCurrent - 1);
		if (it != stateMap.end()) {
			state = it->second;
		}
		it = stateMap.lower_bound(lineCurrent);
		if (it != stateMap.end()) {
			stateMap.erase(it, stateMap.end());
		}
	}
};

@nyamatongwe
Copy link
Member

Something went wrong with TemplateLiteral-0106.patch and the link leads to "Not Found".

A generic way to store line to value is likely worthwhile though there may be some fiddling with the details like whether Restore should restore from just the exact line or find any previous value.

@zufuliu
Copy link
Contributor

zufuliu commented Jan 6, 2024

change patch to not clear rawStringTerminator when there are nested interpolating.
TemplateLiteral-0106-2.patch

@zufuliu
Copy link
Contributor

zufuliu commented Jan 6, 2024

change patch to not clear rawStringTerminator when there are nested interpolating.

Using separator style for template literal will simplify the code, in case JavaScript added true raw string (without interpret escape sequence) in the future, e.g. https://es.discourse.group/t/raw-string-literals-that-can-contain-any-arbitrary-text-without-the-need-for-special-escape-sequences/1757, but it maybe a breaking change.

nyamatongwe pushed a commit that referenced this issue Jan 8, 2024
strings as JavaScript template literals which allow embedded ${expressions}.
@nyamatongwe
Copy link
Member

Committed TemplateLiteral-0106-2.patch.

It's a pity that almost every use of backQuotedStrings needs a cast. Perhaps there is a way to templatize the code to allow DefineProperty to point to an enum class but assume its the underlying int type for assignment. That could bridge the untyped application to the type-safe lexer.

@nyamatongwe nyamatongwe added the committed Issue fixed in repository but not in release label Jan 8, 2024
@zufuliu
Copy link
Contributor

zufuliu commented Jan 8, 2024

Don't how to add enum into OptionSet, but some helpers (needs better names) can be added into OptionsCPP:

diff --git a/lexers/LexCPP.cxx b/lexers/LexCPP.cxx
index f33a4954..e87260e7 100644
--- a/lexers/LexCPP.cxx
+++ b/lexers/LexCPP.cxx
@@ -361,6 +361,13 @@ struct OptionsCPP {
 	bool foldPreprocessorAtElse = false;
 	bool foldCompact = false;
 	bool foldAtElse = false;
+
+	bool TemplateLiteral() const noexcept {
+		return backQuotedStrings == static_cast<int>(BackQuotedString::TemplateLiteral);
+	}
+	bool StringInterpolating() const noexcept {
+		return TemplateLiteral();
+	}
 };
 
 const char *const cppWordLists[] = {
@@ -399,7 +406,7 @@ struct OptionSetCPP : public OptionSet<OptionsCPP> {
 			"Set to 1 to enable highlighting of hash-quoted strings.");
 
 		DefineProperty("lexer.cpp.backquoted.strings", &OptionsCPP::backQuotedStrings,
-			"Set how to highlighting back-quoted strings. "
+			"Set how to highlight back-quoted strings. "
 			"0 (the default) no highlighting. "
 			"1 highlighted as Go raw string. "
 			"2 highlighted as JavaScript template literal.");
@@ -792,7 +799,7 @@ void SCI_METHOD LexerCPP::Lex(Sci_PositionU startPos, Sci_Position length, int i
 	std::vector<InterpolatingState> interpolatingStack;
 
 	Sci_Position lineCurrent = styler.GetLine(startPos);
-	if (options.backQuotedStrings == static_cast<int>(BackQuotedString::TemplateLiteral)) {
+	if (options.StringInterpolating()) {
 		// code copied from LexPython
 		auto it = interpolatingAtEol.find(lineCurrent - 1);
 		if (it != interpolatingAtEol.end()) {
@@ -1158,7 +1165,7 @@ void SCI_METHOD LexerCPP::Lex(Sci_PositionU startPos, Sci_Position length, int i
 					if (interpolatingStack.empty()) {
 						rawStringTerminator.clear();
 					}
-				} else if (options.backQuotedStrings == static_cast<int>(BackQuotedString::TemplateLiteral)) {
+				} else if (options.TemplateLiteral())) {
 					if (sc.ch == '\\') {
 						if (options.escapeSequence) {
 							escapeSeq.resetEscapeState(sc.state, sc.chNext);

TemplateLiteral-0108.patch

@nyamatongwe
Copy link
Member

Enumerated options appear to be possible with this patch.

diff --git a/lexers/LexCPP.cxx b/lexers/LexCPP.cxx
index f33a4954..f39f1037 100644
--- a/lexers/LexCPP.cxx
+++ b/lexers/LexCPP.cxx
@@ -324,7 +324,7 @@ public:
 	}
 };
 
-enum class BackQuotedString {
+enum class BackQuotedString : int {
 	None,
 	RawString,
 	TemplateLiteral,
@@ -347,7 +347,7 @@ struct OptionsCPP {
 	bool verbatimStringsAllowEscapes = false;
 	bool triplequotedStrings = false;
 	bool hashquotedStrings = false;
-	int backQuotedStrings = static_cast<int>(BackQuotedString::None);
+	BackQuotedString backQuotedStrings = BackQuotedString::None;
 	bool escapeSequence = false;
 	bool fold = false;
 	bool foldSyntaxBased = true;
@@ -792,7 +792,7 @@ void SCI_METHOD LexerCPP::Lex(Sci_PositionU startPos, Sci_Position length, int i
 	std::vector<InterpolatingState> interpolatingStack;
 
 	Sci_Position lineCurrent = styler.GetLine(startPos);
-	if (options.backQuotedStrings == static_cast<int>(BackQuotedString::TemplateLiteral)) {
+	if (options.backQuotedStrings == BackQuotedString::TemplateLiteral) {
 		// code copied from LexPython
 		auto it = interpolatingAtEol.find(lineCurrent - 1);
 		if (it != interpolatingAtEol.end()) {
@@ -1158,7 +1158,7 @@ void SCI_METHOD LexerCPP::Lex(Sci_PositionU startPos, Sci_Position length, int i
 					if (interpolatingStack.empty()) {
 						rawStringTerminator.clear();
 					}
-				} else if (options.backQuotedStrings == static_cast<int>(BackQuotedString::TemplateLiteral)) {
+				} else if (options.backQuotedStrings == BackQuotedString::TemplateLiteral) {
 					if (sc.ch == '\\') {
 						if (options.escapeSequence) {
 							escapeSeq.resetEscapeState(sc.state, sc.chNext);
@@ -1260,7 +1260,7 @@ void SCI_METHOD LexerCPP::Lex(Sci_PositionU startPos, Sci_Position length, int i
 			} else if (options.hashquotedStrings && sc.Match('#', '\"')) {
 				sc.SetState(SCE_C_HASHQUOTEDSTRING|activitySet);
 				sc.Forward();
-			} else if (options.backQuotedStrings && sc.Match('`')) {
+			} else if ((options.backQuotedStrings != BackQuotedString::None) && sc.Match('`')) {
 				sc.SetState(SCE_C_STRINGRAW|activitySet);
 				rawStringTerminator = "`";
 			} else if (IsADigit(sc.ch) || (sc.ch == '.' && IsADigit(sc.chNext))) {
diff --git a/lexlib/OptionSet.h b/lexlib/OptionSet.h
index 09f6058b..a264e301 100644
--- a/lexlib/OptionSet.h
+++ b/lexlib/OptionSet.h
@@ -97,6 +97,12 @@ public:
 		nameToDef[name] = Option(ps, description);
 		AppendName(name);
 	}
+	template <typename E>
+	void DefineProperty(const char *name, E T::*pe, std::string_view description="") {
+		const plcoi pi = reinterpret_cast<plcoi>(pe);
+		nameToDef[name] = Option(pi, description);
+		AppendName(name);
+	}
 	const char *PropertyNames() const noexcept {
 		return names.c_str();
 	}

EnumerationOption.patch

@zufuliu
Copy link
Contributor

zufuliu commented Jan 9, 2024

Oh, that works. similar change can be applied to Bash lexer for commandSubstitution property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
committed Issue fixed in repository but not in release cpp Caused by the cpp lexer javascript Caused for JavaScript by the cpp or html lexer
Projects
None yet
Development

No branches or pull requests

3 participants