Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COBOL syntax and wrap strings #229

Closed
RiaanPretoriusRIP opened this issue Mar 13, 2024 · 13 comments
Closed

COBOL syntax and wrap strings #229

RiaanPretoriusRIP opened this issue Mar 13, 2024 · 13 comments
Labels
cobol Caused by the COBOL lexer committed Issue fixed in repository but not in release

Comments

@RiaanPretoriusRIP
Copy link

COBOL allows long text variables to be split over two or more lines. Note, this is not related to how Notepad++ wrap lines.

Example

            DISPLAY MESSAGE BOX
              "The following process must be applied to Earnings, Deduct
      -       "ions and Company Contributions separately."

                 STRING "Earning Budget Figures for the Financial Year 
      -          "Starting " CNT-BUD-TAX-YEAR(1:4)
                 DELIMITED BY SIZE  INTO REPTITLE
				 
           MOVE "***************************   T I T L E    P A G E   **
      -         "*************************" TO PRTITLE.		

It uses a “-” sign in column 7 to indicate a continuation. Note that the first line does not end with a quote to indicate end of the string. COBOL will take all the text up and including column 72 and concatenate that with the rest from the following line. There is no rule on where the opening quote on the second line should be. Also this can extend to a third line or more, although that is not very common, example

LP61A            DISPLAY MESSAGE BOX 
lp61b            "S*** strives to continually develop and improve its pr
LP61B -          "oducts and services to deliver more value to our custo
LP61B -          "mers." 

In COBOL column 1 to 6 is line numbers or can be used to mark changes. It is ignored during compile time.
Notepad++ get confused from the continuation and will show all the following lines as part of the string.

@nyamatongwe nyamatongwe added the cobol Caused by the COBOL lexer label Mar 13, 2024
@nyamatongwe
Copy link
Member

Notepad++ get confused from the continuation and will show all the following lines as part of the string.

That doesn't happen for me. Both Notepad++ and SciTE show line 3 with the line number and initial " as part of the multi-line string but the following text is not.
COBOL

@RiaanPretoriusRIP
Copy link
Author

RiaanPretoriusRIP commented Mar 13, 2024

Not sure if your display is right. "AND" plus "TO" are keywords and shown in blue. But in this context, it is part of the text string, and should be in same color as first / previous line, all of that line in purple. Last line is again treated as text value.

@mpheath
Copy link
Contributor

mpheath commented Mar 15, 2024

I'm not a COBOL programmer though some searching gave me some hints.

IMO with this issue, end the string at newline so the style does not continue into the next line. The next continued line will be ok.

I looked at some other issues to build a house of cards and see if the house topples over. Seems ok.

diff --git a/lexers/LexCOBOL.cxx b/lexers/LexCOBOL.cxx
index 39682f55..a93955f0 100644
--- a/lexers/LexCOBOL.cxx
+++ b/lexers/LexCOBOL.cxx
@@ -95,7 +95,7 @@ static int classifyWordCOBOL(Sci_PositionU start, Sci_PositionU end, /*WordList
     s[1] = '\0';
     getRange(start, end, styler, s, sizeof(s));
 
-    char chAttr = SCE_C_IDENTIFIER;
+    int chAttr = SCE_C_IDENTIFIER;
     if (isdigit(s[0]) || (s[0] == '.') || (s[0] == 'v')) {
         chAttr = SCE_C_NUMBER;
 		char *p = s + 1;
@@ -107,7 +107,7 @@ static int classifyWordCOBOL(Sci_PositionU start, Sci_PositionU end, /*WordList
 			++p;
 		}
     }
-    else {
+    if (chAttr == SCE_C_IDENTIFIER) {
         if (a_keywords.InList(s)) {
             chAttr = SCE_C_WORD;
         }
@@ -211,7 +211,7 @@ static void ColouriseCOBOLDoc(Sci_PositionU startPos, Sci_Position length, int i
             if (isCOBOLwordstart(ch) || (ch == '$' && IsASCII(chNext) && isalpha(chNext))) {
                 ColourTo(styler, i-1, state);
                 state = SCE_C_IDENTIFIER;
-            } else if (column == 6 && ch == '*') {
+            } else if (column == 6 && (ch == '*' || ch == '/')) {
             // Cobol comment line: asterisk in column 7.
                 ColourTo(styler, i-1, state);
                 state = SCE_C_COMMENTLINE;
@@ -255,7 +255,9 @@ static void ColouriseCOBOLDoc(Sci_PositionU startPos, Sci_Position length, int i
 
                 state = SCE_C_DEFAULT;
                 chNext = styler.SafeGetCharAt(i + 1);
-                if (ch == '"') {
+                if (column == 6 && (ch == '*' || ch == '/')) {
+                    state = SCE_C_COMMENTLINE;
+                } else if (ch == '"') {
                     state = SCE_C_STRING;
                 } else if (ch == '\'') {
                     state = SCE_C_CHARACTER;
@@ -292,6 +294,9 @@ static void ColouriseCOBOLDoc(Sci_PositionU startPos, Sci_Position length, int i
                 if (ch == '"') {
                     ColourTo(styler, i, state);
                     state = SCE_C_DEFAULT;
+                } else if (ch == '\r' || ch == '\n') {
+                    ColourTo(styler, i-1, state);
+                    state = SCE_C_DEFAULT;
                 }
             } else if (state == SCE_C_CHARACTER) {
                 if (ch == '\'') {

multiple_fixes

I could split up the fixes into individual fixes if preferred, if the code is regarded as ok as I'm not experienced with COBOL.

The folding is off in SciTE.properties as testlexers.exe would crash (Exception Code: 40000015) with the cobol test file (original or patched lexer). IDK why, maybe fold code has issue or the GCC compiler is at fault. Folding works in compiled Sc1.

Just noticed, test file should state that comment string style should not continue to next line. :(

gh229.zip

nyamatongwe added a commit that referenced this issue Mar 15, 2024
@nyamatongwe
Copy link
Member

It makes future maintenance easier when the purpose and scope of each fix can be seen in separate commits.

I added a basic test file with a86f236.

@mpheath
Copy link
Contributor

mpheath commented Mar 16, 2024

I added a basic test file with a86f236.

@nyamatongwe thanks, I have pulled down the changes.

I will break the fixes up so can be committed separately. Should be able to start with this issue first. I will create an issue for keywords starting with v not being styled and do the issue at SourceForge about the comment styling bug to close it. If you want me to open a duplicate issue at Github for the last open SourceForge issue, let me know.

I am not aware of any other COBOL lexer issues so hopefully will all be good. Oh, except for the folding crash by TestLexers.exe, still IDK.

Fix for only this issue:
229.zip

@nyamatongwe
Copy link
Member

I am not aware of any other COBOL lexer issues

I found another with SCE_C_COMMENTDOC in the tests where its inconsistent with line ends so I dropped that test. Its just a matter of styling to i-1 but I didn't commit that as it may have interfered with your changes.

@mpheath
Copy link
Contributor

mpheath commented Mar 16, 2024

I found another with SCE_C_COMMENTDOC in the tests where its inconsistent with line ends so I dropped that test. Its just a matter of styling to i-1 but I didn't commit that as it may have interfered with your changes.

I have done nothing with SCE_C_COMMENTDOC so it may have no impact on the source updates. I never found any info on comment doc and how it works so is a custom dialect or copied from another lexer, though it is only for the first column which seems intriguing. Could fix last and update tests... as needed I guess. Comments are the last of the series of fixes so could be updated.

I see it

{3}** comment
{0}
{11}abc{0}

The CR is before {0} and LF after. I'll look into it some more. Checked with TestLexer.exe, adding i-1 should be ok as no affect on the tests here.

Line tested to update to i-1:

ColourTo(styler, i, state);

The ColourTo above that line is also only i without the -1. Comment seems fine in the styled test files though.

@nyamatongwe nyamatongwe added the committed Issue fixed in repository but not in release label Mar 16, 2024
@nyamatongwe
Copy link
Member

Committed the changes.

and how it works so is a custom dialect or copied from another lexer

There are several indications of a dialect such as a '?' in column 1 indicating pre-processor. This lexer arrived with lexers for TAL and TACL languages associated with Tandem computers.
https://sourceforge.net/p/scintilla/feature-requests/531/

@mpheath
Copy link
Contributor

mpheath commented Mar 17, 2024

There are several indications of a dialect such as a '?' in column 1 indicating pre-processor. This lexer arrived with lexers for TAL and TACL languages associated with Tandem computers. https://sourceforge.net/p/scintilla/feature-requests/531/

I cannot validate ? in column 1 either. It would only be in free form which has no continuation of lines so that starting with a symbol needs to be special with intention. So, it may do no harm if not valid.

In my previous comment ended with:

The ColourTo above that line is also only i without the -1. Comment seems fine in the styled test files though.

Correction... I should have specified SCE_C_COMMENTLINE seems fine.

SCE_C_COMMENT is in the line above I referred to with the ColourTo and is never assigned to state or used anywhere else so could be considered as dead unused code. So concern of i-1 should also be there is probably none.

} else if (state == SCE_C_COMMENT) {

@nyamatongwe
Copy link
Member

Added 2a90020 to style both \r and \n the same for SCE_C_COMMENTDOC and SCE_C_COMMENT even though SCE_C_COMMENT is unreachable. It may have been meant for one of the SCE_C_COMMENTLINE states.

Also fixed up indentation to only use spaces with b729628 and fixed some Cppcheck scope can be decreased warnings with 4af9e3e.

@RiaanPretoriusRIP
Copy link
Author

Sorry, I found another issue with comments on COBOL, sorry for realizing it last week.

RP62b *    IF XREPNR = 768 
EI53a *       IF COUNTRY = "R"
EI53a *          CHAIN "VIP768.ACU" USING XAREA
EI53a *       END-IF
EI53a *    END-IF.
RP62b *    IF XREPNR = 769 
PIR53A*       IF COUNTRY = "R"
PIR53A*          CHAIN "VIP769.ACU" USING XAREA
PIR53A*    END-IF.

A star / asterisk in column 7 indicates that whatever is in the line is all commented. If there are something in column 1 to 5 it is treated correctly. However, when column 6 also got any text in it, then the line is not treated as a comment.

@nyamatongwe
Copy link
Member

Sorry, I found another issue with comments

See #231.

@RiaanPretoriusRIP
Copy link
Author

RiaanPretoriusRIP commented Apr 15, 2024

Thank you, #229 Tested and works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cobol Caused by the COBOL lexer committed Issue fixed in repository but not in release
Projects
None yet
Development

No branches or pull requests

3 participants