Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested block comments handled incorrectly for Transact-SQL (Microsoft SQL Server) #87

Closed
piomiq opened this issue May 27, 2022 · 12 comments
Labels
committed Issue fixed in repository but not in release mssql Caused by the mssql lexer

Comments

@piomiq
Copy link

piomiq commented May 27, 2022

Description of the Issue

Code of SQL or C++ or Java commented by block commands which is nested (example below) is not highlighted correctly. Please notice that in development environments I tested, which I suppose doesn't use ScintillaOrg, highlighting works well.
I tested Microsoft SQ Server Management Studio, KDevelop, QtCreator.

/* aaaaaaaaaa
/* bbbbbbbbbbbbbbb */
aaaaaaaaaa */

Steps to Reproduce the Issue

Put below into editor, for example in Notepad++
/* aaaaaaaaaa
/* bbbbbbbbbbbbbbb */
aaaaaaaaaa */

Change highlighting with SQL or C++ or Java

Expected Behavior

all lines coming from example should be like as commented

Actual Behavior

last line is incorrectly highlighted - like would be not commented

Other information

I reported this issue also in Notepad++ bug tracker: notepad-plus-plus/notepad-plus-plus#11718

@nyamatongwe
Copy link
Member

Block comments do not nest in C++ or Java.

For SQL it depends on the variant: Oracle does not support nested comments but other vendors may allow them.

https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/fundamentals.htm#BEIJCJED

@nyamatongwe
Copy link
Member

This issue is invalid as currently stated. The scope of the issue should be restricted to the particular SQL variant used where nested comments are allowed. Some documentation that shows the validity of nested quotes in that variant should be linked.

@rdipardo
Copy link
Contributor

Before I (regrettably) shepherded this issue from downstream, it was only about LexSQL's treatment of nested block comments in Transact-SQL queries. T-SQL does permit nested block comments, and, at a glance, so does LexMSSQL:

lexilla/lexers/LexMSSQL.cxx

Lines 229 to 234 in 22777f4

} else if (state == SCE_MSSQL_COMMENT) {
if (ch == '/' && chPrev == '*') {
if (((i > (styler.GetStartSegment() + 2)) || ((initStyle == SCE_MSSQL_COMMENT) &&
(styler.GetStartSegment() == startPos)))) {
styler.ColourTo(i, state);
//~ state = SCE_MSSQL_COMMENT;

As in #85, the only discernible issue here is with the downstream application not providing bespoke lexing of T-SQL queries, since it does not implement LexMSSQL.

@nyamatongwe
Copy link
Member

does permit nested block comments, and, at a glance, so does LexMSSQL

Doesn't appear to be implemented from the code or running it. Since block comment nesting generally occurs over multiple lines, there needs to be a counter for the nesting lexel. This may be stored in line state or similar (there are no calls to line state in LexMSSQL) or as a limited set of styles like LexHaskell's SCE_HA_COMMENTBLOCK, SCE_HA_COMMENTBLOCK2, SCE_HA_COMMENTBLOCK3 which allows 3 levels of comment nesting.

It would be reasonable to add comment nesting to LexMSSQL. It may also be OK as an option to LexSQL as downstream projects often only support it.

@piomiq
Copy link
Author

piomiq commented Jun 1, 2022

Block comments do not nest in C++ or Java.

For SQL it depends on the variant: Oracle does not support nested comments but other vendors may allow them.

https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/fundamentals.htm#BEIJCJED

Sorry, I didn't add that I meant Microsoft SQL

@nyamatongwe nyamatongwe added the mssql Caused by the mssql lexer label Jun 1, 2022
@rdipardo
Copy link
Contributor

rdipardo commented Jun 3, 2022

It would be reasonable to add comment nesting to LexMSSQL.

I would say its lack is a bug. Nested block comments appear to be a permanent feature of the specification, unlike the optional semi-colon after statements (the root cause of #85), which was deprecated a few editions ago.

It may also be OK as an option to LexSQL as downstream projects often only support it.

Experience has shown that "multi-lexers" are more likely to be the source of future bug reports.
JavaScript template strings remain lacking because SCE_C_STRINGRAW has historic priority, and LexJSON has the annoying habit of switching styles on any string containing a single colon, a common idiom in NPM project files that incidentally resembles a compact IRI in JSON-LD.

Once you make an exception for this or that particular specification, where do you stop?

@nyamatongwe
Copy link
Member

Experience has shown that "multi-lexers" are more likely to be the source of future bug reports.

There are many SQL variants around and it's unlikely there will be specific lexers for most. Where there are dialect specific lexers, then new features and bug fixes are less likely to propagate than with shared lexers. Providing options for features like nested comments also makes it possible to specify a profile that may, for example, highlight portability or versioning concerns.

Once you make an exception for this or that particular specification, where do you stop?

The project should be able to follow language changes and extensions.

@rdipardo
Copy link
Contributor

rdipardo commented Jun 5, 2022

Since block comment nesting generally occurs over multiple lines, there needs to be a counter for the nesting level.

Along those lines, here's a draft implementation of nested T-SQL block comments, using line state:

0001-LexMSSQL-Implement-nested-stream-comments.diff.txt

These test files provide a quick proof of concept:

mssql.zip

An omnibus lexer test has to wait for a patch to an unrelated problem with SCE_MSSQL_COLUMN_NAME_2 (and likely SCE_MSSQL_COLUMN_NAME as well), discovered by TestLexers apropos this minimal script:

USE [test]
GO
~/../Issue87.tsql:1: has different per-line styles

The first line should resume SCE_MSSQL_DEFAULT before the newline character:

{6}USE{0} {16}[test]{0}
{6}GO{15}

To acheive that, however, this branch takes Unix EOLs for granted:

lexilla/lexers/LexMSSQL.cxx

Lines 267 to 271 in 55a507f

} else if (state == SCE_MSSQL_COLUMN_NAME_2) {
if (ch == ']') {
styler.ColourTo(i, state);
prevState = state;
state = SCE_MSSQL_DEFAULT_PREF_DATATYPE;

In CRLF mode, LexAcessor::ColourTo receives an insufficient range, consuming up to the \r only, so the line ends in SCE_MSSQL_DEFAULT_PREF_DATATYPE:

{6}USE{0} {16}[test]{15}
{6}GO{15}

SCE_MSSQL_DEFAULT_PREF_DATATYPE-CRLF-mode

@nyamatongwe nyamatongwe changed the title Nested block of comments in languages allowing block comments isn't highlighted correctly Nested block comments handled incorrectly for Transact-SQL (Microsoft SQL Server) Jun 6, 2022
@nyamatongwe
Copy link
Member

SCE_MSSQL_DEFAULT_PREF_DATATYPE appears to be the same as SCE_MSSQL_DEFAULT but preferences data types over other keywords for the next element. CHAR, for example, is both a data type and a string function.

DECLARE @find AS CHAR(30);
SELECT p.LastName + CHAR(13) FROM Person.Person p; 

SCE_MSSQL_DEFAULT_PREF_DATATYPE depends on the range being styled: typing on the line before the column name can switch the line end from SCE_MSSQL_DEFAULT to SCE_MSSQL_DEFAULT_PREF_DATATYPE.

@nyamatongwe
Copy link
Member

Here is a sample file that exercises most mssql styles along with some properties to show the styles.
Sample.zip

nyamatongwe added a commit that referenced this issue Jun 10, 2022
@nyamatongwe nyamatongwe added the committed Issue fixed in repository but not in release label Jun 10, 2022
@nyamatongwe
Copy link
Member

Committed the above 0001-LexMSSQL-Implement-nested-stream-comments.diff.txt and examples. Calling styler.LineStart for each character is a little expensive but not unreasonably so.

Since the change fixes this issue it gets marked committed and will be closed with the next release. That hides the SCE_MSSQL_DEFAULT_PREF_DATATYPE issue but that can have its own issue,

@rdipardo
Copy link
Contributor

Calling styler.LineStart for each character is a little expensive but not unreasonably so.

The author left no instance of StyleContext for me to use, and creating one just to call ::MatchLineEnd() seemed wasteful (to me, at least). Since the call to LexAccessor::LineStart takes its argument unchanged from ::GetLine with no side effects, maybe hoisting it to function scope would be less expensive?

The continuation of this issue is at #90, so a resolution, if any, can be made there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
committed Issue fixed in repository but not in release mssql Caused by the mssql lexer
Projects
None yet
Development

No branches or pull requests

3 participants