Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bash] highlight nested quotes and shell expansions #154

Closed
zufuliu opened this issue Apr 30, 2023 · 37 comments
Closed

[Bash] highlight nested quotes and shell expansions #154

zufuliu opened this issue Apr 30, 2023 · 37 comments
Labels
bash Caused by the bash lexer committed Issue fixed in repository but not in release

Comments

@zufuliu
Copy link
Contributor

zufuliu commented Apr 30, 2023

This is a big change, I would like to spit it as following small changes:

  • noexcept and clang-tidy suggested changes.
  • change BASH_CMD_* and BASH_DELIM_* macros into scoped enumeration.
  • treat parameter expansion ${} same as backticks on handling nested quotes and shell expansions, as the document says:

When braces are used, the matching ending brace is the first ‘}’ not escaped by a backslash or within a quoted string, and not within an embedded arithmetic expansion, command substitution, or parameter expansion.

  • Remove QuoteCls Quote; variable as single quoted string has no inner states.
  • Move out QuoteStackCls definition, and change it to use QuoteCls array as stack.
  • Change the for loop to while loop, it currently has bug on highlighting single punctuation special parameters. e.g. whole $?1 highlighted as variable instead of just $?.
  • highlight nested quotes and shell expansions for double quoted string, backticks and parameter expansion. $() command substitution still colored as SCE_SH_BACKTICKS.
  • Simplify case SCE_SH_HERE_Q by using sc.Match() to match the delimiter.
  • highlight shell expansions in here document.
@zufuliu
Copy link
Contributor Author

zufuliu commented Apr 30, 2023

First change, add noexcept and const, use nullptr and C++ include, remove unused destructor.
bash-tidy.zip

nyamatongwe pushed a commit that referenced this issue Apr 30, 2023
… Remove inline.

Separate variable declarations.
Remove unnecessary destructors.
Assign more type-specific literals.
Include C++ headers.
@nyamatongwe nyamatongwe added the bash Caused by the bash lexer label Apr 30, 2023
@nyamatongwe
Copy link
Member

Looks good.

Changed order of includes to match lexilla/scripts/HeaderOrder.txt. The C++ versions are in a different order to the C includes but it would cause too much profitless code churn to fix.

@zufuliu
Copy link
Contributor Author

zufuliu commented Apr 30, 2023

bash-enum.zip

scoped enumeration changes: BASH_CMD_* changed to CmdState, BASH_DELIM_ change to QuoteStyle (will be style field for QuoteCls), renamed BASH_DELIM_STACK_MAX to BASH_QUOTE_STACK_MAX, added enumratin for TestExprType.

@nyamatongwe
Copy link
Member

Added in bash-enum.zip but synchronized some comments with the new enum names. Also added some more test cases.

Started working on nested and multi-line tests but am not sure which tests are good. Here is the current state:

# Nested elements

# String with backtick inclusion
"x`ls`"
# Nested string
"x`ls "*.c"`"
# Not terminated at first "
"x`ls" # "`" #

# String with command inclusion
"x$(ls)"

# Nested command
$(ls -la$(ls *.c))

# Check strings and backticks in command
$('x' "x" `ls` $'x' $"x")

# $( not terminated by ) if contains unterminated string
$('x) # ') #
$("x) # ") #
$(`x) # `) #
$($'x) # ') #
$($"x) # ") #

# Multi-line
$(ls |
more)

$(
`x`
"x"
`ls`
$'x'
$"x"
)
#end -- checks termination of previous

@zufuliu
Copy link
Contributor Author

zufuliu commented May 1, 2023

There also needs a test for parameter expansion:

var=abcdef
sub=abc
rep='& '
echo ${var/$sub/"${rep}}"} #

Change SCE_SH_PARAM to be handled same as SCE_SH_BACKTICKS:
bash-param.zip

@zufuliu
Copy link
Contributor Author

zufuliu commented May 2, 2023

Remove QuoteCls Quote; variable.
bash-single-quote.zip

@zufuliu
Copy link
Contributor Author

zufuliu commented May 2, 2023

Move out class QuoteCls and class QuoteStackCls, removed unused && QuoteStack.Up != '\\' test.
bash-quote-stack.zip

@nyamatongwe
Copy link
Member

QuoteCls::Clear is never called and QuoteCls::Open is only called from QuoteCls::Start so its code could be moved there.

@zufuliu
Copy link
Contributor Author

zufuliu commented May 3, 2023

QuoteCls::Clear can be removed from this commit, Open and Start can be merged into single method as only the later is used.

bash-quote-stack2.zip

nyamatongwe pushed a commit that referenced this issue May 3, 2023
QuoteStack.Up can only be in { '(', '{', '"', '\'', '`'}, never '\\'.
nyamatongwe pushed a commit that referenced this issue May 3, 2023
Merge QuoteCls::open into QuoteCls::Start as its only called from Start.
@zufuliu
Copy link
Contributor Author

zufuliu commented May 4, 2023

Change for loop to while loop, new test case:

echo $?Status

bash-while-loop.zip

@nyamatongwe
Copy link
Member

This seems a bit heavy to just avoid the sc.Forward() for $ and removes the for loop's reassurance of progression and thus termination.

The code could be changed to check that sc.chNext needs a pair like the following although that does state the set of paired characters twice. Alternatively use sc.chNext in the if block and sc.Forward() after.

@@ -856,6 +856,10 @@ void SCI_METHOD LexerBash::Lex(Sci_PositionU startPos, Sci_Position length, int
 					continue;
 				}
 				sc.SetState(SCE_SH_SCALAR);
+				if (!AnyOf(sc.chNext, '{', '\'', '\"', '(', '`')) {
+					// scalar has no delimiter pair
+					continue;
+				}
 				sc.Forward();
 				if (sc.ch == '{') {
 					sc.ChangeState(SCE_SH_PARAM);
@@ -872,8 +876,6 @@ void SCI_METHOD LexerBash::Lex(Sci_PositionU startPos, Sci_Position length, int
 				} else if (sc.ch == '`') {	// $` seen in a configure script, valid?
 					sc.ChangeState(SCE_SH_BACKTICKS);
 					QuoteStack.Start(sc.ch, QuoteStyle::Backtick);
-				} else {
-					continue;	// scalar has no delimiter pair
 				}
 			} else if (sc.Match('<', '<')) {
 				sc.SetState(SCE_SH_HERE_DELIM);

continue.patch

@zufuliu
Copy link
Contributor Author

zufuliu commented May 4, 2023

OK , seems better to move AnyOf as a function. .

I think convert for loop to while is still needed, current code for handling nested highlighting:
https://github.com/zufuliu/notepad2/blob/main/scintilla/lexers/LexBash.cxx#L645

case SCE_SH_STRING_DQ:	// delimited styles, can nest
case SCE_SH_PARAM: // ${parameter}
case SCE_SH_BACKTICKS:
	if (sc.ch == '\\') {
		sc.Forward();
	} else if (sc.ch == QuoteStack.Current.Down) {
		if (QuoteStack.CountDown(sc, cmdState)) {
			continue;
		}
	} else if (sc.ch == QuoteStack.Current.Up) {
		QuoteStack.Current.Count++;
	} else {

and CountDown() (so it need to avoid extra sc.Forword() by continue):
https://github.com/zufuliu/notepad2/blob/main/scintilla/lexers/LexBash.cxx#L259

if (Current.Count == 0) {
	cmdState = CmdState::Body;
	const int outer = Current.Outer;
	if (Depth > 0) {
		Pop();
	} else {
		Clear();
	}
	sc.ForwardSetState(outer);
	return true;
}

@zufuliu
Copy link
Contributor Author

zufuliu commented May 5, 2023

Whole code block for sc.ch == '$' will be extracted as method QuoteStack.Expand(sc, cmdState);, I think AnyOf(sc.chNext) change can be delayed then (e.g. to avoid continue after Expand())?
https://github.com/zufuliu/notepad2/blob/main/scintilla/lexers/LexBash.cxx#L782

} else if (sc.ch == '$') {
	QuoteStack.Expand(sc, cmdState);
	continue;

@zufuliu
Copy link
Contributor Author

zufuliu commented May 12, 2023

Implement nested highlighting, new test case:

echo "$((1 + 2))" #
echo "$[1 + 2]" #

bash-nested.patch

@nyamatongwe
Copy link
Member

The bash-nested.patch diff implements the feature well for all the examples I have tried.

However, I don't think this feature will be universally wanted, including by myself and the original lexer author Kein-Hong Man. For me, its highlighting smaller elements that I am less interested in most of the time, similar to highlighting escape sequences in C++ strings which I turn off.

This new behaviour should be a choice although it may not be just a binary choice - some may want more fine-grained styling inside $() commands but not inside "" strings.

@zufuliu
Copy link
Contributor Author

zufuliu commented May 13, 2023

OK, then need to add property for each nested quote style + scalar variable, and what would the property names?

@nyamatongwe
Copy link
Member

I'm unsure whether or not the multiple settings is required but I'm fairly sure at least one is. It could start with a single option and grow to a full set when there is demand or it seems worthwhile.

Perhaps a single lexer.bash.styling.within.quotes to match lexer.baan.styling.within.preprocessor. Or lexer.bash.styling.inside.quotes.

If there are multiple settings they should match the class name or the element name in bash documentation
SCE_SH_BACKTICKS -> lexer.bash.styling.within.backticks, or
SCE_SH_STRING -> lexer.bash.styling.inside.string.

@zufuliu
Copy link
Contributor Author

zufuliu commented May 13, 2023

styling.within or styling.inside seems suitable for #153, main change for this is highlighting Shell Expansions(Shell Parameter Expansion, Command Substitution and Arithmetic Expansion) inside string, backticks/command substitution, parameter expansion.

@zufuliu
Copy link
Contributor Author

zufuliu commented May 13, 2023

bash-nested-prop.patch

Added lexer.bash.styling.inside.[string | backticks | parameter].

nyamatongwe pushed a commit that referenced this issue May 15, 2023
Controlled with properties "lexer.bash.styling.inside.string",
"lexer.bash.styling.inside.backticks", and "lexer.bash.styling.inside.parameter".
https://sourceforge.net/p/scintilla/feature-requests/1033/
@nyamatongwe nyamatongwe added the committed Issue fixed in repository but not in release label May 15, 2023
@nyamatongwe
Copy link
Member

Committed with an added [[nodiscard]] for stylingInside.

Duplicated Nested.bsh as NestedStyledInside.bsh to test lexer.bash.styling.inside.*.

@zufuliu
Copy link
Contributor Author

zufuliu commented May 15, 2023

bash-heredoc.patch

Use sc.Match() to match here document delimiter.

@zufuliu
Copy link
Contributor Author

zufuliu commented May 15, 2023

bash-heredoc3.patch

Simplify preview patch.

@zufuliu
Copy link
Contributor Author

zufuliu commented May 16, 2023

bash-heredoc-nested.patch

Implement nested highlighting inside here document, new test case (undocumented at https://www.gnu.org/software/bash/manual/bash.html#Here-Documents, but same behavior as Perl when there are backslash inside delimiter):

cat <<-\EOF
	$((1+2))
	`date`
	$(date)
EOF

@zufuliu
Copy link
Contributor Author

zufuliu commented May 16, 2023

bash-heredoc-nested2.patch

Previous patch has bugs when stylingInsideHeredoc is not set (no quote counting), updated the patch, how ever it's only works when stylingInsideHeredoc is enabled for following code:

cat <<EOF
heredoc line 1
$(echo \
EOF
)
heredoc line 3
EOF

echo $(cat <<'EOF'
heredoc line 1
$(echo \
EOF
)
#

@nyamatongwe
Copy link
Member

new test case (undocumented ...), but same behavior as Perl when there are backslash inside delimiter

If its undocumented, where is the need coming from?

@zufuliu
Copy link
Contributor Author

zufuliu commented May 16, 2023

where is the need coming from?

it's common in configure script, e.g. there 20 <<\ in https://github.com/eblot/newlib-cygwin/blob/master/configure

Add QuoteStyle::HereDoc to fix first failure when stylingInsideHeredoc not enabled.
bash-heredoc-nested3.patch

@zufuliu
Copy link
Contributor Author

zufuliu commented May 16, 2023

Moved HereDoc enumerator to bellow LString.
bash-heredoc-nested4.patch

@zufuliu
Copy link
Contributor Author

zufuliu commented May 16, 2023

bash-heredoc-nested5.patch

Change the code to use QuoteStack.Start(-1, QuoteStyle::HereDoc, SCE_SH_DEFAULT); to avoid counting.
Fix second failure requires highlighting inside command substitution.

@nyamatongwe
Copy link
Member

With this bash-heredoc-nested5.patch change, here-docs style the following byte. With Windows \r\n line ends this styles just the \r as SCE_SH_HERE_Q.

It really is worthwhile running the tests after making a change.

Lexing bash\x.bsh
C:\u\hg\lexilla\test\examples\bash\x.bsh:21: different styles between \r and \n at 769: 13, 0

HereDocCls::Append should not be noexcept as it can fail producing undefined behaviour if the fixed-length buffer is exhausted. The caller protects against this occurring but its a logically fallible action and may fail (in different ways) if Delimiter was changed to a std::string for example.

@zufuliu
Copy link
Contributor Author

zufuliu commented May 18, 2023

bash-heredoc-nested6.patch

test failure caused by sc.ForwardSetState(outer);.

@zufuliu
Copy link
Contributor Author

zufuliu commented May 18, 2023

Another fix to not do the magic call CountDown() for SCE_SH_HERE_Q.
bash-heredoc-nested7.patch

nyamatongwe pushed a commit that referenced this issue May 20, 2023
lexer.bash.styling.inside.heredoc set.
@nyamatongwe
Copy link
Member

Also ensured initialization to avoid warnings with d56956e.

nyamatongwe pushed a commit that referenced this issue May 22, 2023
…and ")" as operators

and interior styled as bash code.
Controlled with property lexer.bash.command.substitution.
@nyamatongwe
Copy link
Member

Included in release 5.2.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bash Caused by the bash lexer committed Issue fixed in repository but not in release
Projects
None yet
Development

No branches or pull requests

2 participants