Skip to content

How to Create Tests Using Use Case Engine

Andrei Temnikov edited this page Mar 24, 2022 · 9 revisions

The test-driven development approach assumes that every code change is preceded by creating a test. Testing the grammar might be quite tricky, especially if it integrates with wrapping programming logic. In order to simplify testing and ensure maintainability of tests, COBOL Language Support contains a special markup language to provide use cases for the language engine.

How It Works

The COBOL specification contains many rules for definition and usage of semantic elements, and errors display for typos or misuse of variables. We therefore test for the following features:

  • Diagnostics
  • Definitions
  • Usage

If something is required but is not found or does not match the expectations, then the test fails. For example, IDENTIFICATION DIVISIONs. should produce a syntax error because “DIVISIONs” is not a valid keyword. If no such error displays, then the language engine is not functioning correctly. Furthermore, the test should also check the exact error message and severity level. Manual calculation of the positions for diagnostics and variables is boring and error-prone, so an automated solution would be preferable.

Here we come up with an idea of how to build proper use cases - create an example COBOL program that contains a statement or a case that should be tested, mark up all the variables, paragraphs, copybooks, erroring tokens etc. and then pass the text, diagnostics, and copybooks to the language engine and compare the expected and observed results.

The current implementation of this idea resides in the package org.eclipse.lsp.cobol.usecases. The entry point is the UseCaseEngine class, which contains several methods that allow us to run use case tests with different configurations. This class applies syntax and semantic analysis for COBOL texts using the current Language Engine.

For test purposes, it is required to mark all the semantic elements (i.e. variables, constants, paragraphs, sections, and copybooks) in the text. If there are tokens that produce syntax errors, they should be marked too. The affected token should be wrapped with curly brackets {} with special tags. Diagnostic IDs for the elements are optional. Without special tags, a token inside curly brackets is considered a diagnostic of a syntax error.

Tags:

  • $ - Variable
  • # - Paragraph
  • @ - Section
  • ~ - Copybook
  • * - Definition
  • | - Diagnostic
  • ^ - Replacement that calculates the position of the element after the replacement
  • ` - Replacement that calculates the position of the element before the replacement
  • & - Constant (predefined variable)
  • % - Subroutine
  • {_ _} - Multi-token error

By default, tags are treated as an element usage. With '*' specified the occurrence is treated as a definition. For example, {$*VAR1} is a definition for variable VAR1, and {#PAR2} is the usage of paragraph PAR2. Copybook definitions are optional, by default they point to the beginning of the copybook. Predefined variable definitions are implicit, so they are present in the analysis result by default. It means that if you have a {&EIBCALEN} predefined variable usage in the text, you don't need to define in the DATA DIVISION, they are provided by the language engine. The full list of those variables you may find in the org.eclipse.lsp.cobol.core.semantics.PredefinedVariableContext class.

All tags are removed during processing, so "MOVE 00 TO {$CHILD|id}." is treated as "MOVE 00 TO CHILD." with positions according to the resulting text.

After the analysis is finished, the AnalysisResult of the current Language Engine is compared with the expected one. The order of elements in a list does not matter, the output shows the difference. For example, the given output:

Expected: [A, B, C]
Observed: [A, B]

means that the text example declared more variables/paragraphs/whatever than there were actually found by the language engine. If you are sure that the test is correct, it means that some semantic definition or usage was lost during the analysis. The same for the positions of usages and definitions - if they do not point to the token you expected, then it implies a problem with logic.

Creating New Tests

The simple test looks like this:

/** This class checks that a 77-level variable without PIC specified produces a semantic error. */
class Test77WithoutPICShowsError {
 private static final String TEXT =
     "       IDENTIFICATION DIVISION.\n"
         + "       PROGRAM-ID. TEST1.\n"
         + "       DATA DIVISION.\n"
         + "       WORKING-STORAGE SECTION.\n"
         + "       {_COPY {~DEF77}.|1_}\n"
         + "       PROCEDURE DIVISION.\n"
         + "           DISPLAY {$VARNAME}.";

 private static final String DEF77 = "       77 {$*VARNAME|1}.\n";

 @Test
 void test() {
   UseCaseEngine.runTest(
       TEXT,
       ImmutableList.of(new CobolText("DEF77", DEF77)),
       ImmutableMap.of(
           "1",
           new Diagnostic(
               null,
               "A \"PICTURE\" clause was not found for elementary item VARNAME",
               DiagnosticSeverity.Error,
               ERROR.getText())));
 }
}

Start the test with a descriptive JavaDoc comment. Describe the purpose of this test, e.g. checking that some code produces an error, the variable usage is found, etc. If you doubt that the text you want to test is obvious, you can clarify it in this comment.

Ensure that the class name starts with “Test” and contains a short description of what is to be tested, e.g. “TestErrorShownOnStatement…”.

We recommend that you store the text to test in a static final field. Remember, it must be a parsable COBOL program, containing a sequence area (six characters long), indicator area (one character), with the entire length not exceeding 80 characters. The text inside quotes must finish with line breaks (\n), else those lines will be treated as one long line. In order to make it more readable, each actual code line may start on a new line with concatenation.

This type of test is expected to have only one test method (one marked with @Test annotation). It should be package-private, as well as the test itself, so as not to produce a visibility code smell.

Here you can see a copybook named “DEF77”. It contains a variable definition that produces an error. All the errors inside copybooks should traverse through the copybook hierarchy to the COBOL file. The entire COPY statement, including the COPY keyword, the period in the end, and REPLACING, if present, are underlined with the same error. You can put the IDs of the nested error afterward.

The only logic you need inside the method is calling the UseCaseEngine.runTest and passing the test data to it. It will be described further in detail. Currently, there are three options to run the analysis:

  • runTest(String text, List<CobolText> copybooks, Map<String, Diagnostic> expectedDiagnostics) - the default method. If you do not test any specific features related to subroutines or processing modes, you can use this one.

  • runTest(String text, List<CobolText> copybooks, Map<String, Diagnostic> expectedDiagnostics, List<String> subroutineNames) - the same as previous, but allows specifying subroutine names. By default the list is empty.

  • runTest(String text, List<CobolText> copybooks, Map<String, Diagnostic> expectedDiagnostics, List<String> subroutineNames, CopybookProcessingMode processingMode) - the same as previously mentioned, also allows specifying the processing mode. This might be useful when you test features related to copybook processing. For details see the Javadoc for org.eclipse.lsp.cobol.service.CopybookProcessingMode

Processing mode specifies how the copybooks are processed during the analysis. For example, ENABLED means applying analysis to the copybooks and automatic downloading without asking the user. Please, refer to the org.eclipse.lsp.cobol.service.CopybookProcessingMode for the details.

The default value is ENABLED. You should not change it unless you are testing the special cases of copybook processing.

Diagnostics

A syntax error can be defined as follows: {DIVISIONs|1}. The map of expected tokens should contain '1' as a key, and a diagnostic with the message "Syntax error on 'DIVISIONs' expected DIVISION". The position of the diagnostic should be null.

Diagnostics is optional for semantic elements. To specify a semantic error, add '|', and the ID of the expected diagnostic at this position: {$CHILD|invalidDefinition}. In this case, the map of expected tokens should contain a pair "invalidDefintion" -> new Diagnostic(null, "Invalid definition for: CHILD", DiagnosticSeverity.ERROR, SourceInfoLevel.ERROR.getText(). See below the explanation.

The diagnostic ID can contain any text except whitespace or the chars from the tags list. It can be a descriptive name, e.g. “invalidDefintion” or just a number to show the order. You can reuse the same ID several times if there are errors with the same severity and message at different positions. Also, one token may have several IDs split by the | character, e.g. {$ERRORVAR|1|2|3}, where 1, 2, and 3 represent different errors on the same token. It may happen, for example, if the defined variable represents a group item without nested elementary items and its level number is higher than 01 when there is no 01 variable before it:

...
       WORKING-STORAGE SECTION.
       05 {$*SOMEVAR|1|2}.
       PROCEDURE DIVISION.
...

The UseCaseEngine uses the org.eclipse.lsp4j.Diagnostic class for syntax and semantic errors. The range is extracted automatically while the preprocessing, so it should always be null. If it is not, the value will be ignored.

The error message is a String instance that contains the expected error message. It should match the error message exactly, including the entire list of the suggestions. For example:

"Syntax error on 'COPY' expected {<EOF>, ID, IDENTIFICATION, LINKAGE, LOCAL-STORAGE, WORKING-STORAGE, PROCEDURE, "
   + "SCHEMA, END, FILE, MAP, SQL, 'EXEC SQL', EXEC, '01-49', '66', '77', '88'}"

Split lines that are too long into several lines that can be concatenated. The recommended maximum line length is 120 characters. If the error message contains a line break, the one that you provide should contain ‘\n’ or `\r\n’ (there is no difference from Java's point of view in this case).

The majority of diagnostics have the default severity level “ERROR”. There is a special class org.eclipse.lsp4j.DiagnosticSeverity that contains all the possible values.

The Diagnostic class constructor also expects text describing its source. Especially for this we have the org.eclipse.lsp.cobol.service.delegates.validation.SourceInfoLevels class. The source info level should match the severity level, i.e. if the severity level is DiagnosticSeverity.ERROR, the source should be SourceInfoLevel.ERROR. Do not pass the object itself, but call the getText() method of the required enum element.

The typical diagnostics look like this:

ImmutableMap.of(
   "1",
   new Diagnostic(
       null,
       "A \"PICTURE\" clause was not found for elementary item VARNAME",
       DiagnosticSeverity.Error,
       SourceInfoLevel.ERROR.getText()))

Here ImmutableMap.of is just a way to instantly create a map; “1” is a diagnostic ID mentioned in the text; null represents a position range that will be calculated automatically; "A \"PICTURE\" clause was not found for elementary item VARNAME" is the expected error message (notice that the chars are escaped using \); DiagnosticSeverity.Error is the expected severity; ERROR.getText() is the source info that matches the expected severity level.

Some errors may consist of several tokens. In order to show this, you may use multi-token error declaration: {_COPY {~ABCD}.|id_}. The included semantic elements will be also processed as usual. So, if it is a semantic error, e.g. starting position warning, the variable is found and added to the list, and the warning is shown. But if it is a syntax error, most likely, the variable is not recognized. It means that in this case the $ sign should be skipped.

Example 1:

000000 01 {$*VARNAME|1}.
000001 PROCEDURE DIVISION.

Here VARNAME is a syntactically correct variable that has neither a PIC statement nor nested variables. The variable definition is recognized with a semantic error “PIC clause not found”.

Example 2:

000000 DISPLA {VARNAME|1}

Here DISPLA is the misspelled token DISPLAY. It is recognized as an identifier, so the next token is expected to be SECTION due to the order of the parser rules. Even when the VARNAME is a defined variable, it is not recognized by the language engine because the variable is not expected in the matched rule. In this case, there should be no $ char.

Copybooks and Subroutines

Some code in COBOL can be extracted to a copybook and then be included in the file using the COPY <copybook name>. statement. If you have mentioned a copybook in your code example, then you must also provide it to the Engine or you will receive a "Missing copybook" error.

By default, the definition of the copybook points to the first character of the first line in the file under the copybook URI. This is done to let the “Go to Definition” function open the copybook file. It is determined, so you do not have to specify the definition explicitly - it is calculated by the Engine.

The UseCaseEngine expects a list of org.eclipse.lsp.cobol.positive.CobolText instances that contain the copybook name and its content. They both are strings and might be literals or static final fields.

As mentioned above, if the copybook contains an error, it is shown on the COPY statement. To do this, you need to wrap it with multi-token diagnostics, e.g. {_COPY {~SMTH}.|1_}. Here, “1” is the error ID, “~” is a copybook mark, “{” and “}” show the borders of the error range. The error ID should be the same as for the actual erroring token inside the copybook.

By the specification, the copybook content is pasted inside the program instead of the COPY statement. Also, the content can be replaced using the REPLACING A BY B statement. For the exact definition of the REPLACING statement logic, please refer to the IBM documentation.

The copybook URIs are based on their names and you don’t have to provide them explicitly. Still, there is no way to provide a custom copybook URI, so if you need this to test logic, the UseCaseEngine will not help you with it.

Subroutines are COBOL programs that can be called from the main program. You don’t operate with their content, so you need to provide only the name of the used copybooks. If you use the statement CALL ‘sub’. and don’t provide “sub” as a subroutine name, you will see a diagnostic.

Replacing

To show that the token will be replaced during the pre-processing on the current language engine, you can mark it with the '^' character with the result specified after. For example, {$*:TAG:-ID^CSTOUT-ID} returns ':TAG:-ID' as the text, and CSTOUT-ID as a variable definition. The position is calculated as for the replacement.

The position will be calculated as for the replacement. If you specify {~CPYNM`CPYNM_ABC}, the result will be 'CPYNM' as the text and the 'CPYNM_ABC' as a copybook usage name. The position will be calculated as for the original text.

The replacing length need not be equal to the replaceable token length. It is a known language engine limitation that it extracts the positions for tokens after the replacing, so it might not match the initial code representation.

Predefined Variables

When you reference predefined variables in your test program, e.g. Exec Interface Block (EIB), you cannot mark them with a $ sign. Use @ instead. They also don’t need a definition because it is implicit.

You can find this list in UseCasePreprocessorListener, so don't forget to update it once new predefined variables are added.

Testing Patterns

The easiest way to check the variable definition logic is to call it in the PROCEDURE DIVISION. E.g. MOVE 0 TO {$VARNAME}. allows to check that the variable VARNAME is defined correctly and does not return errors.

If you want to check that the variable is not defined, you can call it like this: DISPLAY {CHILD|invalid} OF PARENT. Here “invalid” is the identifier of the “Invalid definition for CHILD” semantic error. As you can see, there is no “$” mark here, due to this variable not being recognized, as well as “PARENT”.

Testing paragraphs is similar: you can check a definition by doing this:

000000 PROCEDURE DIVISION.
000000 {#*PARNAME}.
000000 PERFORM {#OTHER-PAR|1}.

Due to OTHER-PAR not being defined, it will return an error.

Known Limitations

The intersected ranges cannot be processed. For example, if you have a statement like ONE TWO THREE and two errors, pointing to the ONE TWO and TWO THREE tokens, this cannot be processed by the Use Case Engine. The statement like `{ONE {TWO|1} THREE|2} will be treated as nested errors on “TWO” and “ONE TWO THREE”.

If the error range contains a line break, e.g. an unexpected period at the end of a line, it also cannot be processed. All the ranges should remain on one line, including multi-token ones.

Those cases require manual asserting for the diagnostics using UseCaseUtils class.

Extending the use case engine

If you are working on a new semantic element, you might need to extend the UseCaseEngine. Let’s assume you are working on user-defined divisions. First of all, find a symbol that is not used for other semantic elements, e.g. “!”.

Now you need to add this token to the token list in the UseCasePreprocessor.g4 file. Name it descriptively and add the START token to it to make the magic work.

DIVISIONDEFINITION : START + '!*';
DIVISIONUSAGE : START + '!';

Obviously, the first token represents definition, and the second is a usage. If you don’t need anything from it, feel free to omit it.

Next, you need to add the related parser rules to the same file:

divisionStatement
  : (divisionUsage | divisionDefinition) diagnostic* STOP
  ;

divisionUsage
  : DIVISIONUSAGE word
  ;

divisionDefinition
  : DIVISIONDEFINITION word
  ;

The token name must start with an uppercase letter and the rule name with a lowercase letter.

As the last step with grammar, just reference the divisionStatement rule in the startRule and multiToken rules. Now build the project to proceed.

You also have to extend the org.eclipse.lsp.cobol.usecases.engine.UseCasePreprocessorListener and override the newly generated methods enterDivisionStatement and exitDivisionStatement. Copy the logic from some other rules, e.g. variables, replacing the “Variable” with “Division”. The requirement of duplicating the logic is caused by the nominative Java type system. You will also need to add new maps for definitions and usages, and do the same for the org.eclipse.lsp.cobol.usecases.engine.TestData class to store the extracted values.

The final change in the tests is to assert the analysis result for definitions in the UseCaseEngine#assertResultEquals(). Nothing complicated here, just repeat the logic of other elements with the new message.

Now you can add a text to the test like this:

‘{!*MAGIC} DIVISION.`

Voila, “MAGIC” will be retrieved as a division definition.

Testing with UseCaseUtils

If the UseCaseEngine cannot help you with the test, you can try the org.eclipse.lsp.cobol.service.delegates.validations.UseCaseUtils class. It contains a number of “analyze” methods with similar logic to “runTest” in the UseCaseEngine. The only difference is that the analyzeForErrors method filters out all the syntax errors with the severity below “ERROR”, so you will not see any warnings or info messages in the analysis result.

Those methods also require a document URI as the first argument. You can use the DOCUMENT_URI from this class as a default value, or provide a custom one. It doesn't change anything in the internal logic.

The AnalysisResult is the actual output of the language engine, and it contains all the diagnostics and semantic context. You have to parse it yourself. Good luck.

FAQ

Can I add several use cases to one test file?

It is possible, but not recommended. Tests with several cases are harder to debug and support. The requirement for such cases is that they should be logically connected (e.g. test similar features). You can extract the boilerplate code and concatenate it with the text to test not to repeat.

You can organize it as parameterized tests. See the Junit 5 manual for the details.

In the code it might look like this:

private static Stream<String> textsToTest() {
   return Stream.of(
           BOILERPLATE + IDMSCS_WITH_ALL_CLAUSES, BOILERPLATE + IDMSCS_WITH_ALL_CLAUSES_PUNCT, BOILERPLATE + IDMSCS_WITH_SSN_IDMSREC,
           BOILERPLATE + IDMSCS_WITH_ONLY_PROTOCOL, BOILERPLATE + IDMSCS_WITH_SSN_LEN_ERROR);
}

@ParameterizedTest
@MethodSource("textsToTest")
@DisplayName("Parameterized - varying tests")
void test(String text) {
   UseCaseEngine.runTest(text, ImmutableList.of(), ImmutableMap.of("1",
           new Diagnostic(null, MESSAGE_1, DiagnosticSeverity.Error, SourceInfoLevels.ERROR.getText())));
}

Why isn’t my variable usage found (it produces an error)?

The variable definitions and usages won’t be found if they cannot be parsed. It is expected behavior.

Why do I see area A or B warning?

Sometimes it is hard to predict which way ANTLR will parse the syntactically incorrect statement. It may falsy recognize a rule that requires the area check. If your statement is correct, check that you have a line break \n at the end of this rule.

Why does the build fail after I created a test that passes?

The project is configured to apply the style check on the build, so if your test is not formatted properly, it will fail. Just press Ctrl+Alt+L (Windows) or ⌥⌘L (Mac OS) to apply the auto-formatting.

Why is the ‘public’ keyword highlighted?

The Junit 5 tests should be package-private. Remove all the public declarations for @Test methods and class from your test.

Is it OK if the error message is too long?

Yes, it may happen that a syntax error produces a very long message. It is still required to put the entire message into your test.

When do I need a test?

Whenever you fix a bug or add a new feature.

Can I store the COBOL file outside the test?

No.

Why do incomplete statements show weird syntax errors?

If there is an incomplete statement, e.g. COPY. instead of COPY ABCD., then the parser turns on the lookahead tool that tries to find a matching grammar rule. It may not always succeed, expecially if the expected grammar rule is complicated and has a lot of options. In this case, some tokens may be falsy treated as variable or paragraph identifiers or the beginning of another statement. It may seem like an inconsistent behavior but solve it quite hard - you should provide rules that may match every possible state and throw errors on incorrect ones manually using code actions or semantic predicates. It is not a best practice and is not needed in the majority of cases. Concentrate on processing the correct statements first.