Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax highlighting fails when next to certain operators #19

Closed
hackerb9 opened this issue Oct 24, 2022 · 8 comments · Fixed by #22
Closed

Syntax highlighting fails when next to certain operators #19

hackerb9 opened this issue Oct 24, 2022 · 8 comments · Fixed by #22
Assignees

Comments

@hackerb9
Copy link
Contributor

hackerb9 commented Oct 24, 2022

In BASIC, some of the characters Emacs treats as being valid within a "symbol" are actually operators and should delimit keywords. For example, only IF and THEN are syntax highlighted in the following:

IF A$<>CHR$(65) THEN Z$=CHR$(X+SGN(Y)*RND(Z))

In particular, I've noticed the problem in the following operators:

Symbol Name ASCII value
* Asterisk 42
+ Plus 43
, Comma 44
- Minus 45
. Period 46
/ Slash 47
< Less than 60
= Equals 61
> Greater than 62

I believe the following change, which sets them all to "Punctuation", will fix the problem and not cause more bugs.

(defvar basic-mode-syntax-table
  (let ((table (make-syntax-table)))
    (modify-syntax-entry (cons ?* ?/)   ".   " table)   ; Operators * + , - . /                                       
    (modify-syntax-entry (cons ?< ?>)   ".   " table)   ; Operators < = >                                             
    (modify-syntax-entry ?_   "w   " table)             ; Underscore is valid in variable names in some BASIC dialects                       
    (modify-syntax-entry ?.   "w   " table)             ; xxx Is period ever allowed in variable names?  xxx                          
    (modify-syntax-entry ?'   "<   " table)             ; Comment starts with '                                       
    (modify-syntax-entry ?\n  ">   " table)             ; Comment ends with newline                                   
    (modify-syntax-entry ?\^m ">   " table)             ;                or carriage return                           
    table)
  "Syntax table used while in ‘basic-mode'.")

Just for reference, my understanding from the Emacs manual is that in a programming language the syntax table characters have the following meanings:

Class Name Character Description Examples
Whitespace Space Characters that separate “symbols” (variable names and keywords) Tab, Space
Word constituents w The characters allowed in symbols A to Z, digits
Symbol constituents _ Extra characters allowed in symbols, but that aren't parts of a word C allows underscore
Punctuation characters . Operators that separate symbols +, -, *, /

(There are many more classes, but I think those are the ones relevant here.)


“Symbol” is an old term for a variable name or keyword. In modern parlance, it'd be called an “identifier.”

Comma and period are already marked in the default syntax table as punctuation, I included them only so that I can set the entire range from 42 to 47 in the solution. Also note that in BASIC mode, the syntax for period was already being change to "word". I don't know why that is as I am not familiar with any BASIC dialect in which a period can be used in the name of a variable or command. It looks like "period" is used in Visual BASIC as a dot operator to access structures. As in C, that should be classed with punctuation, not word chars. Or, was this change made so that Emacs would treat a long floating point number as a single word?

@dykstrom
Copy link
Owner

QuickBasic allows periods (.) in identifiers. I don't think there are any reserved words including periods though.

@hackerb9
Copy link
Contributor Author

Oh, how interesting about QuickBasic!

At some point would you be open to the idea of derived modes for specific BASIC dialects? Mainly it'd be to provide quick presets to change the set of keywords, but it could have different syntax tables as well.

@dykstrom
Copy link
Owner

Yes, I'm open to the idea, but I'm not sure I can implement it myself. There was a similar suggestion before (#10) but it has not (yet) lead to any implementation.

@hackerb9
Copy link
Contributor Author

Yes, I'm open to the idea, but I'm not sure I can implement it myself. There was a similar suggestion before (#10) but it has not (yet) lead to any implementation.

I have created a new issue (#20) to describe a possible implementation, with a working example.

@dykstrom dykstrom self-assigned this Nov 2, 2022
@hackerb9
Copy link
Contributor Author

hackerb9 commented Nov 2, 2022

QuickBasic allows periods (.) in identifiers. I don't think there are any reserved words including periods though.

It looks like Microsoft may have allowed decimal points in variable names as far back as GW-BASIC. http://www.antonis.de/qbebooks/gwbasman/chapter%206.html:

GW-BASIC variable names may be any length; up to 40 characters are significant. The characters allowed in a variable name are letters, numbers, and the decimal point.

However, they didn't always support it very well: The BASIC compiler, BC.EXE for QuickBasic 4.0 and 4.5 had problems with dots in variable names, which meant the "Make .EXE" button sometimes failed to compile an executable. Microsoft's solution? “To work around the problem, do not use a period (.) in a variable name…”

@hackerb9
Copy link
Contributor Author

hackerb9 commented Nov 3, 2022

It would appear allowing periods in identifiers goes way back. Here's Microsoft's BASIC-80 v5.0 reference manual from 1979 which describes how variables can now be up to 40 characters long and include a decimal point:

https://archive.org/details/BASIC-80_v5.0_1979_Microsoft/page/n13/mode/1up

@dykstrom
Copy link
Owner

dykstrom commented Nov 3, 2022

Interesting this with periods in identifiers. It feels very strange if you come from a language such as C or Java. For now, I will leave period as a word constituent. When implementing #20, I think I will have it as punctuation as default, and derived modes that allow period in identifiers can change that.

@hackerb9
Copy link
Contributor Author

hackerb9 commented Nov 3, 2022

That sounds right to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants