New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve LexJulia #13
Improve LexJulia #13
Conversation
To follow the naming convention, "fold.docstring" should be "fold.julia.docstring". |
These have been merged. |
Closing as equivalents merged into 5.0.3 release. |
There is a problem with docstring folding.
Result: folding region for docstring now extends to end of file. It appears this is because of the test for the end of the docstring which requires |
Good catch, I don't remember why I introduced this variable, but it seems that it's working well removing it. |
Minimal example: a = [2
for i in randn(10)
if i > 1] This code should fold between the 2 brackets. |
How about track nested levels for braces/brackets/parentheses on each line, and given keywords inside braces/brackets/parentheses a different style. |
Its common for lexers and folders to backtrack to a safe consistent starting point before performing their main loop. Perhaps LexerRaku::Fold can be used as an example, where it goes looking for a line starting with the default style as a starting point. I'm currently fixing some similar bugs in LexerRaku. If backtracking doesn't look reasonable as its too expensive or uncertain, LineState (or similar) can be used to leave breadcrumbs of which lines are good starting points or to count braces (for example). |
This bug was found by a new automatic test in TestLexers: if lexing / folding the whole file matches the known-good files, then the file is lexed and folded line-by-line. If the result of this is different then that is reported. This test found bugs in 5 folders and 1 lexer. This test is temporarily disabled for Julia by the |
I believe current Julia lexer has bugs for
|
I just pushed two commits:
Regarding 2-word keywords, as @zufuliu noticed, do you have any idea if this was implemented for another language? |
There are other languages has keyword pairs (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lexilla aims at binary compatibility between releases. One aspect of that is that lexical style numbers should remain stable. This change set adds a new style SCE_JULIA_KEYWORD4 at 6 and then moves SCE_JULIA_CHAR and all styles from SCE_JULIA_STRING.
Instead, SCE_JULIA_KEYWORD4 should be added as 20. This allows current applications to work with a new Lexilla.DLL or .so.
Handling two word keywords completely is more work, particularly when they are split by a line end. Some languages handle this by retaining the previous word and hard-coding some sequences: see LexFortran around line 642 for "type is". The character currently used for abbreviations or abridgements in word lists is "~" so "mutable~struct". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an unused function that should be removed:
../lexers/LexJulia.cxx:518:13: warning: 'bool IsTripleStringState(Lexilla::LexAccessor&, Sci_Position)' defined but not used [-Wunused-function]
518 | static bool IsTripleStringState(LexAccessor &styler, Sci_Position i) {
| ^~~~~~~~~~~~~~~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an unused function that should be removed.
../lexers/LexJulia.cxx:518:13: warning: 'bool IsTripleStringState(Lexilla::LexAccessor&, Sci_Position)' defined but not used [-Wunused-function]
518 | static bool IsTripleStringState(LexAccessor &styler, Sci_Position i) {
| ^~~~~~~~~~~~~~~~~~~
Is this a recommendation or is it mandatory? |
I'm actually thinking if it's worth the effort. In the geany thread, somebody commented that it is an acceptable bug :) |
I regard it as important. How would the change to LexJulia be communicated to downstream projects? I suppose a second Julia lexer could be added with a new name so there was no possibility of mismatches.
It will be the third release with LexJulia. The first was 5.0.3 and 5.1.0 was just released.
It appears that the Julia code for keywords is just choosing from sets of lexemes that look like identifiers. A better mechanism for expandable keyword sets is 'substyles' as is implemented in LexCPP and LexPython starting with AllocateSubStyles which allocates style numbers dynamically. Distinct keyword lists are needed more when sets of keywords are also lexically distinct like comment documentation keywords or where there are sublanguages like embedding SQL in C or JavaScript in HTML. Naming a word list "Raw string literals" was also a bit strange since the code matches identifiers. |
From the application point-of-view, substyles are documented at https://www.scintilla.org/ScintillaDoc.html#Substyles . From the lexer's perspective, the ILexer4 methods from AllocateSubStyles to GetSubStyleBases may need to be implemented. The SubStyles class provides a basic implementation of the required methods as can be seen in LexPython. 'Secondary' styles are probably not needed - they allow greying out inactive code for C/C++. |
I ended up not using substyles, I didn't understand how they work and just added KEYWORD4 at the end of the style list. |
The change also moved the "Raw string literal prefixes" (previously named "Raw string literals") from |
It's a bit better to have this ordering, because the string prefixes are not identifers, whereas the other lists are. But I understand the compatibility issue. |
The tags in LexicalClass definitions are more likely to be useful when they are shared as they enable applications to define appearance and behaviour based on particular tags. For example, an application may define a spell-check command that only looks inside styles with tag "comment". Look in other lexers for the tags they use and only invent new tags when there is no good existing value. The best examples are LexCPP, LexPython, and LexHTML.
|
For 18 SCE_JULIA_TYPEANNOT, it is used in For 15 and 17, SCE_JULIA_STRINGLITERAL and SCE_JULIA_COMMANDLITERAL, it is used in string literal like |
I just realized that the raw string prefixes was not needed as all the prefixed strings are treated as raw strings. |
As an example of tag usage there is a screen shot of an unfinished "search in tag" option for SciTE at the end.
Tags were seen as constant so they aren't such a good fit where a property changes the meaning. If its likely that either will be chosen then "operator type". If one choice is likely to predominate then just use its tag.
Most lexers include any prefix text as part of the string, so "literal string" would likely be most similar to other lexers. |
I actually wrote the lexer just to have syntax colorization, but I understand now that this is more powerful. So I added another style 8(), Thanks! |
Standardise tags in lexical class information to follow other lexers where possible.
The changes have been merged over another change 44f6ac8 which added the include of <functional> needed for a change to OptionSet. I also amended the changes to reference this pull request, remove an unused variable You should synchronize to the current repository state before making more changes so the changes will apply. |
Thanks, I think this can be closed. |
From https://sourceforge.net/p/scintilla/feature-requests/1380/
lexer.julia.
fold.julia.
The fact that the end of the line of a single line comment is not styled as a comment, is a bug, all the line should be styled as comment. But I don't manage to make it work. It is likely a problem at line 970 with the condition
if (sc.atLineEnd || sc.ch == '\r' || sc.ch == '\n')
.