-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support new PHP features (flexible heredoc/nowdoc syntaxes, an underscore inside numbers) #19
Comments
Broke this up into two commits: one for digit separator and one for heredoc/nowdoc. Added example file test/examples/hypertext/Issue19.php. You should check that the results in test/examples/hypertext/Issue19.php.styled (with style numbers in brackets) is exactly what is intended. The patches were credited to "ivan-u7n". If you would prefer a different name, then please replay with your preferred name. |
Thanks for a swift response! Everything in test/examples/hypertext/Issue19.php.styled looks as it should. My last name is in Cyrillic and has no true form in Latin script, so either Ivan Ustûžanin or Ivan U7n will do. The latter is in case the diacritics aren't possible. |
The credits are reasonably Unicode clean - LexillaHistory.html and SciTEHistory.html are UTF-8 and browsers are well behaved with UTF-8. For the SciTE about box, its also UTF-8 but any character with the 8th bit set is represented in the source code with an octal escape so it doesn't get mangled by the compiler: "Ivan Ust\303\273\305\276anin" Which then shows correctly. 'Иван Устюжанин' (or similar) should also survive. Where it breaks down is trying to represent names in git or other source code control systems and all the source code control clients. Sometimes it works and sometimes the bits fall on the floor, broken. Its understandable that after seeing this people just go with something ASCII that can be recognized. |
Yes, my name in Cyrillic is “Иван Устюжанин”. But I'm fine with the current credits, for not many people know how to read Cyrillic. The issue can now be closed. BTW, I have an additional trivial fix for annotations in PHP 8 not being highlighted as comments: lines 2287 and 2411 should be changed from <?php
#[
MultiLineAnnotation('string', 1, null)
]
#[SingleLineAnnotation('string', 1, null)] which should not contain any line comments. |
Committed annotation change. Issues are normally closed when the fix goes in a release. |
Are you interested in extending the PHP part of the lexer with more styles for stuff like annotations, errors in syntax and in increasing the number of keywords lists for tokens with special meaning? I mean the implementation is on me, I'm just asking wether it will be accepted and it is something worth doing or just a basic lexer like the current one will do. Although the support for nesting lexers will be event greater, I don't think I can tackle that one yet. Edit: My first thought was to remake the categorization of digits to be in accordance with the language grammar, and in the case of invalid tokens it'll be much better to highlight them somehow. I had an idea to just style them with the default style, but if we can detect them, why not to style them accordingly. |
There are only 9 available styles in the 0-127 range and there is a potential for incompatibility with styles > 127. Other aspects of HTML (like JavaScript with template literals) may need these more than PHP. Adding more styles should only be done where there is strong justification. The current "hypertext" lexer is difficult to work with so should be replaced with a more modular design. Client-side and server-side languages should be extracted into their own modules that are then combined with the markup language code.
This doesn't seem to me to be worth another style. Using the default style would be visible enough. |
I agree that the "HTML" lexer is rather complicated and needs overhaul. However, do you have in mind compile-time or run-time combining? Nevertheless, I've prepared the update to PHP's numeric literals: php-numbers.patch.txt. It's greedier than PHP's own lexer — it doesn't stop on the first invalid character, but goes on. It was made this way on purpose: to visually show, by applying the default style, the invalid “numeric words” which will result in parser errors. The test for this patch (“+” denotes a valid syntax that should be styled as a number, “-” — an invalid one that should have the default style): 123456; // +
123_456; // +
1234z6; // -
123456_; // -
123__456; // -
0x89Ab; // +
0x89_aB; // +
0x89zB; // -
0x89AB_; // -
0x_89AB; // -
0_x89AB; // -
0x89__AB; // -
1234.; // +
1234.e-0; // +
1234e+0; // +
1234e0; // +
1234.e-; // -
1234e+; // -
1234.-e; // -
1234+e; // -
1234e; // -
.1234; // +
.12e0; // +
.12.0e0; // -
.12e0.0; // -
.12e0e0; // -
1.234e-10; // +
1.2_34e-1_0; // +
1.234e-_10; // -
1.234e_-10; // -
1.234_e-10; // -
1._234e-10; // -
1_.234e-10; // -
01234567; // +
0_1234567; // +
012345678; // -
0...0; // + |
Moved numeric literals into a new issue #20 so this issue can be closed. |
As written at https://sourceforge.net/p/scintilla/feature-requests/1378/ "This is a reasonable addition but it will require someone to provide an implementation." The someone in question turned out to be me.
The patch for the issue is below. Its more elaborate explanation, a downloadable file, and a test php file are available in the comment to the similar Notepad++ issue.
The text was updated successfully, but these errors were encountered: