Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIG-11 when running an M program that had a PATNOTFOUND error at compile time #90

Closed
nars1 opened this issue Nov 14, 2017 · 0 comments
Closed
Assignees
Projects
Milestone

Comments

@nars1
Copy link
Collaborator

nars1 commented Nov 14, 2017

Final Release Note

YottaDB correctly runs M programs which had PATNOTFOUND errors at compile time. Previously, in r1.10 it was possible for mumps processes to terminate abnormally with a SIG-11 as a consequence of a defect in the GT.M V6.3-002 code base. (YDB#90)

Description

The below M program correctly issues a PATNOTFOUND error when compiling. But if one tries to run the compiled object code (which should be okay since the execution does not reach the portions of the M code where the compiler error was found), a SIG-11 is observed. This happens only with GT.M V63002 (and in turn YottaDB r1.10) but not with V63001A (and in turn YottaDB r1.00).

> cat test.m
main    ;
        do good
        quit
bad     ;
        if 1?1B
        quit
good    ;
        write "hello",!
        quit

> $gtm_dist/mumps test.m
                if 1?1B
                       ^-----
                At column 9, line 5, source module test.m
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B

> $gtm_dist/mumps -run test
%GTM-F-KILLBYSIGSINFO1, GT.M process 10860 has been killed by a signal 11 at address 0x00007FE84FE8000F (vaddr 0x00007FE84FE8000F)
%GTM-F-SIGMAPERR, Signal was caused by an address not mapped to an object

With the fix to the above issue, the M program correctly prints the "hello" string.

> $gtm_dist/mumps -run test
hello

Draft Release Note

YottaDB correctly runs M programs which had PATNOTFOUND errors at compile time. In previous versions (GT.M V6.3-002 and YottaDB r1.10), it was possible for the mumps process to terminate abnormally with a SIG-11.

@nars1 nars1 self-assigned this Nov 14, 2017
@nars1 nars1 changed the title SIG-11 at runtime when running an M program that had a PATNOTFOUND error at compile time SIG-11 when running an M program that had a PATNOTFOUND error at compile time Nov 14, 2017
nars1 added a commit to nars1/YottaDB that referenced this issue Nov 14, 2017
…me errors as they could cause SIG-11 YottaDB#90

A few issues related to compile-time errors were discovered.

1) The below M program correctly issues a PATNOTFOUND error when compiling. But if one tries to run the compiled object code (which should be okay since the execution does not reach the portions of the M code where the compiler error was found), a SIG-11 is observed. This happens only with GT.M V63002 (and in turn YottaDB r1.10) but not with V63001A (and in turn YottaDB r1.00).

```
> cat test.m
main    ;
        do good
        quit
bad     ;
        if 1?1B
        quit
good    ;
        write "hello",!
        quit
```

Related to the above, the below M program test1.m produces a GTMASSERT when run with a debug build. Unlike the previous test case (test.m), the production build did not have problems with test1.m.

```
> cat test1.m
        if 1?1B

> $gtm_dist/mumps -run test1
%GTM-F-GTMASSERT, GT.M V6.3-002 Linux x86_64 - Assert failed /Distrib/GT.M/V63002/sr_port/chktchain.c line 28
```

The primary issue in both the above tests was in bx_boollit() which noticed a pattern match operator usage with both operands being literals and hence invoked do_patfixed() which encountered a PATNOTFOUND error. That caused ins_errtriple() to be invoked which in turn removed all triples corresponding to the current M line (dqdelchain() call) and returned back to bx_boollit() which did not realize this and went ahead with manipulating the triple chains (dqrins() call etc.) and returned to its caller bool_expr() which also did triple chain manipulation (dqdel() call etc.) all the while operating on triples that were no longer part of the execution chain (due to the prior delqchain() call). This caused a corruption in the doubly-linked triple list in "t_orig" which resulted in incorrect object code being generated that later ended up as the SIG-11 when one tried running this M program.

In GT.M V63002, boolean expression evaluation and literal optimization got a significant rework. As part of that change, the macros RETURN_IF_RTS_ERROR and RETURN_EXPR_IF_RTS_ERROR were introduced to check for compile-time errors and if so return from functions right away instead of manipulating triple chains. These safety checks needed to be added in a few more places. That fixed the primary issue.

2) In addition, it was noticed that the following M program fails an assert when run with the debug build.

```
> cat test2.m
        xecute "if ""a""?1B"

> mumps -run test2
%GTM-F-ASSERT, Assert failed in /Distrib/GT.M/V63002/sr_port/zlcompile.c line 81 for expression ((FALSE == run_time) && (TRUE == TREF(compile_time)))
```

Below is the corresponding C-stack.

```
 #0  0x00007ff2e6988767 in kill () at ../sysdeps/unix/syscall-template.S:84
 #1  0x00007ff2e6014a5c in gtm_dump_core () at /Distrib/GT.M/V63002/sr_unix/gtm_dump_core.c:69
 #2  0x00007ff2e5f1de97 in gtm_fork_n_core () at /Distrib/GT.M/V63002/sr_unix/gtm_fork_n_core.c:211
 YottaDB#3  0x00007ff2e6007f2b in ch_cond_core () at /Distrib/GT.M/V63002/sr_unix/ch_cond_core.c:59
 YottaDB#4  0x00007ff2e5f443a2 in rts_error_va (csa=0x0, argcnt=7, var=0x7ffffc4b0a90) at /Distrib/GT.M/V63002/sr_unix/rts_error.c:153
 YottaDB#5  0x00007ff2e5f439b8 in rts_error_csa (csa=0x0, argcnt=7) at /Distrib/GT.M/V63002/sr_unix/rts_error.c:85
 YottaDB#6  0x00007ff2e636d64c in zlcompile (len=48 '0', addr=0x7ffffc4b0e30 "/extra1/testarea1/nars/test/temp/tmp/tmp/test2.m") at /Distrib/GT.M/V63002/sr_port/zlcompile.c:81
 YottaDB#7  0x00007ff2e60e6f1c in op_zlink (v=0x7ffffc4b14a0, quals=0x7ffffc4b0cf0) at /Distrib/GT.M/V63002/sr_unix/op_zlink.c:443
 YottaDB#8  0x00007ff2e5f6a2d7 in job_addr (rtn=0x7ffffc4b1590, label=0x7ffffc4b15a0, offset=0, hdr=0x7ffffc4b1518, labaddr=0x7ffffc4b1510, need_rtnobj_shm_free=0x7ffffc4b14e4) at /Distrib/GT.M/V63002/sr_port/job_addr.c:41
 YottaDB#9  0x00007ff2e5f40b48 in jobchild_init () at /Distrib/GT.M/V63002/sr_unix/jobchild_init.c:146
 YottaDB#10 0x00007ff2e5f3835d in gtm_startup (svec=0x7ffffc4b1d30) at /Distrib/GT.M/V63002/sr_unix/gtm_startup.c:252
 YottaDB#11 0x00007ff2e5f3b2f6 in init_gtm () at /Distrib/GT.M/V63002/sr_unix/init_gtm.c:201
 YottaDB#12 0x00007ff2e5f072ea in gtm_main (argc=3, argv=0x7ffffc4b4048, envp=0x7ffffc4b4068) at /Distrib/GT.M/V63002/sr_unix/gtm_main.c:162
 YottaDB#13 0x0000000000400cbe in main (argc=3, argv=0x7ffffc4b4048, envp=0x7ffffc4b4068) at /Distrib/GT.M/V63002/sr_unix/gtm.c:131
```

In this case, run_time was TRUE and caused the assert failure. Turns out this was due to m_xecute() function (invoked by zlcompile()) temporarily setting run_time to FALSE but when a PATNOTFOUND error was encountered, the condition handler compiler_ch() was invoked which did an UNWIND back to zlcompile() incorrectly persisting the global variable changes done by the interim function call of m_xecute().

The fix for this was to reset the run_time and TREF(xecute_literal_parse) global variables just like is being done in mdb_condition_handler().
nars1 added a commit to nars1/YottaDBtest that referenced this issue Nov 14, 2017
…or handling (see YottaDB/YDB#90)

See YottaDB/YDB#90 for a description of the issue and corresponding test cases. Those test cases are folded into this automated test.
There are 4 M programs. And each of them are included because they show different failure symptoms with R110 pro and dbg. Below is the output of those test programs with the older code. All of them run correctly with the code fix.

R110 pro build output
---------------------
```
 --> Running test with patnotfound1.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code

 --> Running test with patnotfound2.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-E-ZLINKFILE, Error while zlinking "/usr/library/gtm_test/T998/r120/inref/patnotfound2.m"
%GTM-E-ZLNOOBJECT, No object module was produced

 --> Running test with patnotfound3.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-KILLBYSIGSINFO1, GT.M process 14353 has been killed by a signal 11 at address 0x00007F90F6C0E8A3 (vaddr 0x0000000000000000)
%GTM-F-SIGMAPERR, Signal was caused by an address not mapped to an object

 --> Running test with patnotfound4.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-KILLBYSIGSINFO1, GT.M process 14355 has been killed by a signal 11 at address 0x00007F7CDB8D700F (vaddr 0x00007F7CDB8D700F)
%GTM-F-SIGMAPERR, Signal was caused by an address not mapped to an object
```

R110 dbg build output
---------------------
```
 --> Running test with patnotfound1.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-GTMASSERT, GT.M V6.3-002 Linux x86_64 - Assert failed /Distrib/GT.M/V63002/sr_port/chktchain.c line 28

 --> Running test with patnotfound2.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-ASSERT, Assert failed in /Distrib/GT.M/V63002/sr_port/zlcompile.c line 81 for expression ((FALSE == run_time) && (TRUE == TREF(compile_time)))

 --> Running test with patnotfound3.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-ASSERT, Assert failed in /Distrib/GT.M/V63002/sr_port/code_gen.c line 51 for expression (0 == pending_errtriplecode)

 --> Running test with patnotfound4.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-GTMASSERT, GT.M V6.3-002 Linux x86_64 - Assert failed /Distrib/GT.M/V63002/sr_port/chktchain.c line 28
```
nars1 added a commit to YottaDB/YDBTest that referenced this issue Nov 15, 2017
…or handling (see YottaDB/YDB#90)

See YottaDB/YDB#90 for a description of the issue and corresponding test cases. Those test cases are folded into this automated test.
There are 4 M programs. And each of them are included because they show different failure symptoms with R110 pro and dbg. Below is the output of those test programs with the older code. All of them run correctly with the code fix.

R110 pro build output
---------------------
```
 --> Running test with patnotfound1.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code

 --> Running test with patnotfound2.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-E-ZLINKFILE, Error while zlinking "/usr/library/gtm_test/T998/r120/inref/patnotfound2.m"
%GTM-E-ZLNOOBJECT, No object module was produced

 --> Running test with patnotfound3.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-KILLBYSIGSINFO1, GT.M process 14353 has been killed by a signal 11 at address 0x00007F90F6C0E8A3 (vaddr 0x0000000000000000)
%GTM-F-SIGMAPERR, Signal was caused by an address not mapped to an object

 --> Running test with patnotfound4.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-KILLBYSIGSINFO1, GT.M process 14355 has been killed by a signal 11 at address 0x00007F7CDB8D700F (vaddr 0x00007F7CDB8D700F)
%GTM-F-SIGMAPERR, Signal was caused by an address not mapped to an object
```

R110 dbg build output
---------------------
```
 --> Running test with patnotfound1.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-GTMASSERT, GT.M V6.3-002 Linux x86_64 - Assert failed /Distrib/GT.M/V63002/sr_port/chktchain.c line 28

 --> Running test with patnotfound2.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-ASSERT, Assert failed in /Distrib/GT.M/V63002/sr_port/zlcompile.c line 81 for expression ((FALSE == run_time) && (TRUE == TREF(compile_time)))

 --> Running test with patnotfound3.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-ASSERT, Assert failed in /Distrib/GT.M/V63002/sr_port/code_gen.c line 51 for expression (0 == pending_errtriplecode)

 --> Running test with patnotfound4.m <---
%GTM-E-PATNOTFOUND, Current pattern table has no characters with pattern code B
%GTM-F-GTMASSERT, GT.M V6.3-002 Linux x86_64 - Assert failed /Distrib/GT.M/V63002/sr_port/chktchain.c line 28
```
nars1 added a commit that referenced this issue Nov 15, 2017
…me errors as they could cause SIG-11 #90

A few issues related to compile-time errors were discovered.

1) The below M program correctly issues a PATNOTFOUND error when compiling. But if one tries to run the compiled object code (which should be okay since the execution does not reach the portions of the M code where the compiler error was found), a SIG-11 is observed. This happens only with GT.M V63002 (and in turn YottaDB r1.10) but not with V63001A (and in turn YottaDB r1.00).

```
> cat test.m
main    ;
        do good
        quit
bad     ;
        if 1?1B
        quit
good    ;
        write "hello",!
        quit
```

Related to the above, the below M program test1.m produces a GTMASSERT when run with a debug build. Unlike the previous test case (test.m), the production build did not have problems with test1.m.

```
> cat test1.m
        if 1?1B

> $gtm_dist/mumps -run test1
%GTM-F-GTMASSERT, GT.M V6.3-002 Linux x86_64 - Assert failed /Distrib/GT.M/V63002/sr_port/chktchain.c line 28
```

The primary issue in both the above tests was in bx_boollit() which noticed a pattern match operator usage with both operands being literals and hence invoked do_patfixed() which encountered a PATNOTFOUND error. That caused ins_errtriple() to be invoked which in turn removed all triples corresponding to the current M line (dqdelchain() call) and returned back to bx_boollit() which did not realize this and went ahead with manipulating the triple chains (dqrins() call etc.) and returned to its caller bool_expr() which also did triple chain manipulation (dqdel() call etc.) all the while operating on triples that were no longer part of the execution chain (due to the prior delqchain() call). This caused a corruption in the doubly-linked triple list in "t_orig" which resulted in incorrect object code being generated that later ended up as the SIG-11 when one tried running this M program.

In GT.M V63002, boolean expression evaluation and literal optimization got a significant rework. As part of that change, the macros RETURN_IF_RTS_ERROR and RETURN_EXPR_IF_RTS_ERROR were introduced to check for compile-time errors and if so return from functions right away instead of manipulating triple chains. These safety checks needed to be added in a few more places. That fixed the primary issue.

2) In addition, it was noticed that the following M program fails an assert when run with the debug build.

```
> cat test2.m
        xecute "if ""a""?1B"

> mumps -run test2
%GTM-F-ASSERT, Assert failed in /Distrib/GT.M/V63002/sr_port/zlcompile.c line 81 for expression ((FALSE == run_time) && (TRUE == TREF(compile_time)))
```

Below is the corresponding C-stack.

```
 #0  0x00007ff2e6988767 in kill () at ../sysdeps/unix/syscall-template.S:84
 #1  0x00007ff2e6014a5c in gtm_dump_core () at /Distrib/GT.M/V63002/sr_unix/gtm_dump_core.c:69
 #2  0x00007ff2e5f1de97 in gtm_fork_n_core () at /Distrib/GT.M/V63002/sr_unix/gtm_fork_n_core.c:211
 #3  0x00007ff2e6007f2b in ch_cond_core () at /Distrib/GT.M/V63002/sr_unix/ch_cond_core.c:59
 #4  0x00007ff2e5f443a2 in rts_error_va (csa=0x0, argcnt=7, var=0x7ffffc4b0a90) at /Distrib/GT.M/V63002/sr_unix/rts_error.c:153
 #5  0x00007ff2e5f439b8 in rts_error_csa (csa=0x0, argcnt=7) at /Distrib/GT.M/V63002/sr_unix/rts_error.c:85
 #6  0x00007ff2e636d64c in zlcompile (len=48 '0', addr=0x7ffffc4b0e30 "/extra1/testarea1/nars/test/temp/tmp/tmp/test2.m") at /Distrib/GT.M/V63002/sr_port/zlcompile.c:81
 #7  0x00007ff2e60e6f1c in op_zlink (v=0x7ffffc4b14a0, quals=0x7ffffc4b0cf0) at /Distrib/GT.M/V63002/sr_unix/op_zlink.c:443
 #8  0x00007ff2e5f6a2d7 in job_addr (rtn=0x7ffffc4b1590, label=0x7ffffc4b15a0, offset=0, hdr=0x7ffffc4b1518, labaddr=0x7ffffc4b1510, need_rtnobj_shm_free=0x7ffffc4b14e4) at /Distrib/GT.M/V63002/sr_port/job_addr.c:41
 #9  0x00007ff2e5f40b48 in jobchild_init () at /Distrib/GT.M/V63002/sr_unix/jobchild_init.c:146
 #10 0x00007ff2e5f3835d in gtm_startup (svec=0x7ffffc4b1d30) at /Distrib/GT.M/V63002/sr_unix/gtm_startup.c:252
 #11 0x00007ff2e5f3b2f6 in init_gtm () at /Distrib/GT.M/V63002/sr_unix/init_gtm.c:201
 #12 0x00007ff2e5f072ea in gtm_main (argc=3, argv=0x7ffffc4b4048, envp=0x7ffffc4b4068) at /Distrib/GT.M/V63002/sr_unix/gtm_main.c:162
 #13 0x0000000000400cbe in main (argc=3, argv=0x7ffffc4b4048, envp=0x7ffffc4b4068) at /Distrib/GT.M/V63002/sr_unix/gtm.c:131
```

In this case, run_time was TRUE and caused the assert failure. Turns out this was due to m_xecute() function (invoked by zlcompile()) temporarily setting run_time to FALSE but when a PATNOTFOUND error was encountered, the condition handler compiler_ch() was invoked which did an UNWIND back to zlcompile() incorrectly persisting the global variable changes done by the interim function call of m_xecute().

The fix for this was to reset the run_time and TREF(xecute_literal_parse) global variables just like is being done in mdb_condition_handler().
@nars1 nars1 closed this as completed Nov 15, 2017
@nars1 nars1 added this to To Do in r1.20 via automation Jan 8, 2018
@nars1 nars1 added this to the r120 milestone Jan 8, 2018
@nars1 nars1 moved this from To Do to Done in r1.20 Jan 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
r1.20
  
Done
Development

No branches or pull requests

1 participant