Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures on Alpine Linux building notcurses 2.2.1 #1344

Closed
PureTryOut opened this issue Feb 10, 2021 · 27 comments
Closed

Test failures on Alpine Linux building notcurses 2.2.1 #1344

PureTryOut opened this issue Feb 10, 2021 · 27 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@PureTryOut
Copy link

PureTryOut commented Feb 10, 2021

https://build.alpinelinux.org/buildlogs/build-edge-x86_64/community/notcurses/notcurses-2.2.1-r0.log

Running tests...
Test project /home/buildozer/aports/community/notcurses/src/notcurses-2.2.1/build
    Start 1: notcurses-tester
1/7 Test #1: notcurses-tester .................***Failed   15.30 sec
    Start 2: ncpp_build
2/7 Test #2: ncpp_build .......................   Passed    0.02 sec
    Start 3: ncpp_build_exceptions
3/7 Test #3: ncpp_build_exceptions ............   Passed    0.02 sec
    Start 4: sgr-full
4/7 Test #4: sgr-full .........................***Exception: Numerical  0.01 sec
    Start 5: sgr-direct
5/7 Test #5: sgr-direct .......................***Failed    0.00 sec
    Start 6: rgb
6/7 Test #6: rgb ..............................   Passed    0.01 sec
    Start 7: rgbbg
7/7 Test #7: rgbbg ............................   Passed    0.02 sec

57% tests passed, 3 tests failed out of 7

Total Test time (real) =  15.50 sec

The following tests FAILED:
	  1 - notcurses-tester (Failed)
	  4 - sgr-full (NUMERICAL)
	  5 - sgr-direct (Failed)
Errors while running CTest

@dankamongmen seems you maintain our Alpine Linux package, could you look into this? It is currently blocking our x86_64 builder 😉

@PureTryOut PureTryOut added the bug Something isn't working label Feb 10, 2021
@dankamongmen
Copy link
Owner

egads! please feel free to disable/clear it for now, and thanks for the heads-up.

@dankamongmen dankamongmen self-assigned this Feb 10, 2021
@dankamongmen dankamongmen added this to the 2.3.0 milestone Feb 10, 2021
@dankamongmen
Copy link
Owner

i've got an Alpine builder in my CI pipeline, but this didn't trigger on my setup. i'm looking into it, thanks and apologies.

@dankamongmen
Copy link
Owner

i've got an Alpine builder in my CI pipeline, but this didn't trigger on my setup. i'm looking into it, thanks and apologies.

you know, i haven't updated it recently -- there may well be a new musl or something. let me check that out.

@dankamongmen
Copy link
Owner

ok, there were a healthy number of updates on my CI, so hopefully this'll flush it out on my side:

/ # apk upgrade
Upgrading critical system libraries and apk-tools:
(1/1) Upgrading apk-tools (2.12.0-r4 -> 2.12.3-r0)
Executing busybox-1.32.1-r0.trigger
Continuing the upgrade transaction with new apk-tools:
(1/50) Upgrading musl (1.2.2_pre7-r0 -> 1.2.2-r1)
(2/50) Upgrading busybox (1.32.1-r0 -> 1.33.0-r2)
Executing busybox-1.33.0-r2.post-upgrade
(3/50) Upgrading alpine-baselayout (3.2.0-r8 -> 3.2.0-r9)
Executing alpine-baselayout-3.2.0-r9.pre-upgrade
Executing alpine-baselayout-3.2.0-r9.post-upgrade
(4/50) Upgrading scanelf (1.2.8-r0 -> 1.2.9-r0)
(5/50) Upgrading ssl_client (1.32.1-r0 -> 1.33.0-r2)
(6/50) Upgrading musl-utils (1.2.2_pre7-r0 -> 1.2.2-r1)
(7/50) Upgrading tar (1.33-r0 -> 1.33-r1)
(8/50) Upgrading libgcc (10.2.1_pre1-r1 -> 10.2.1_pre1-r3)
(9/50) Upgrading libstdc++ (10.2.1_pre1-r1 -> 10.2.1_pre1-r3)
(10/50) Upgrading lzip (1.21-r0 -> 1.22-r0)
(11/50) Upgrading nghttp2-libs (1.42.0-r1 -> 1.43.0-r0)
(12/50) Upgrading libcurl (7.74.0-r0 -> 7.75.0-r0)
(13/50) Upgrading curl (7.74.0-r0 -> 7.75.0-r0)
(14/50) Upgrading abuild (3.7.0-r0 -> 3.7.0-r1)
Executing abuild-3.7.0-r1.pre-upgrade
(15/50) Upgrading libgomp (10.2.1_pre1-r1 -> 10.2.1_pre1-r3)
(16/50) Upgrading libatomic (10.2.1_pre1-r1 -> 10.2.1_pre1-r3)
(17/50) Upgrading libgphobos (10.2.1_pre1-r1 -> 10.2.1_pre1-r3)
(18/50) Upgrading mpc1 (1.2.0-r0 -> 1.2.1-r0)
(19/50) Upgrading gcc (10.2.1_pre1-r1 -> 10.2.1_pre1-r3)
(20/50) Upgrading musl-dev (1.2.2_pre7-r0 -> 1.2.2-r1)
(21/50) Upgrading g++ (10.2.1_pre1-r1 -> 10.2.1_pre1-r3)
(22/50) Upgrading expat (2.2.10-r0 -> 2.2.10-r1)
(23/50) Upgrading git (2.30.0-r0 -> 2.30.1-r0)
(24/50) Upgrading alpine-sdk (1.0-r0 -> 1.0-r1)
(25/50) Upgrading lz4-libs (1.9.2-r0 -> 1.9.3-r0)
(26/50) Upgrading ncurses-terminfo-base (6.2_p20210102-r0 -> 6.2_p20210206-r0)
(27/50) Upgrading ncurses-libs (6.2_p20210102-r0 -> 6.2_p20210206-r0)
(28/50) Upgrading rhash-libs (1.4.0-r0 -> 1.4.1-r0)
(29/50) Upgrading libuv (1.40.0-r0 -> 1.40.0-r1)
(30/50) Upgrading cmake (3.18.4-r1 -> 3.19.4-r0)
(31/50) Upgrading doctest-dev (2.4.4-r0 -> 2.4.5-r0)
(32/50) Upgrading freetype (2.10.4-r0 -> 2.10.4-r1)
(33/50) Upgrading libuuid (2.36.1-r0 -> 2.36.1-r1)
(34/50) Upgrading libintl (0.20.2-r0 -> 0.21-r0)
(35/50) Upgrading libblkid (2.36.1-r0 -> 2.36.1-r1)
(36/50) Upgrading libmount (2.36.1-r0 -> 2.36.1-r1)
(37/50) Upgrading harfbuzz (2.7.4-r0 -> 2.7.4-r1)
(38/50) Upgrading nettle (3.6-r0 -> 3.7-r0)
(39/50) Upgrading libdrm (2.4.103-r1 -> 2.4.104-r0)
(40/50) Upgrading wayland-libs-client (1.18.0-r5 -> 1.19.0-r0)
(41/50) Installing libwebp (1.2.0-r0)
(42/50) Upgrading ffmpeg-libs (4.3.1-r3 -> 4.3.1-r4)
(43/50) Upgrading ffmpeg-dev (4.3.1-r3 -> 4.3.1-r4)
(44/50) Upgrading ncurses (6.2_p20210102-r0 -> 6.2_p20210206-r0)
(45/50) Upgrading ncurses-dev (6.2_p20210102-r0 -> 6.2_p20210206-r0)
(46/50) Upgrading notcurses-libs (2.1.4-r0 -> 2.2.1-r0)
(47/50) Upgrading notcurses-dev (2.1.4-r0 -> 2.2.1-r0)
(48/50) Upgrading sudo (1.9.4p2-r0 -> 1.9.5p2-r0)
(49/50) Upgrading xxd (8.2.2303-r0 -> 8.2.2404-r0)
(50/50) Upgrading vim (8.2.2303-r0 -> 8.2.2404-r0)

because i certainly don't see anything that could have caused this breakage between 2.2.0 and 2.2.1! there were only like four commits, lol. we'll know soon...

@dankamongmen
Copy link
Owner

bah, still passes on my alpine builder :( https://drone.dsscaw.com:4443/dankamongmen/notcurses/5516/2/2.

so let's see what we can derive from your pasted output. i don't believe i've ever seen NUMERICAL like that in ctest results. let's see what it means... we have failures in notcurses-tester, sgr-full and sgr-direct....really? those last two are....nothing really. i've never seen sgr-full fail. Hrmmm.

@dankamongmen
Copy link
Owner

"Child crashed with a numerical exception." in sgr-full?!?!?!

@dankamongmen
Copy link
Owner

@PureTryOut , I'm pretty bewildered by this failure report:

  • I've never seen sgr-full nor sgr-direct fail except in cases where everything was failing
  • The 2.2.0 tests apparently ran fine for y'all, yet
  • There were only 4 commits between 2.2.0 and 2.2.1. Only one touched code, and it wouldn't apply to sgr-full
  • I have never seen the NUMERICAL failure case, and can't find anything in sgr-full that could lead to it
  • The tests all run fine on my up-to-date Alpine Edge builder

So I can't reproduce this, and moreover can't at all comprehend how this went down. I hate to ask, but were there any other weird failures on this particular autobuilder recently that might point to hardware issues? Was it changed recently?

If I can get access to the machine, or even a coredump, I'm sure I can track this down, but right now I'm sitting with a few hours invested and zilch results. =[ Very much want to get this tracked down, though.

@dankamongmen
Copy link
Owner

I've just made changes to notcurses-tester which ought leave more visible results available in the case of error.

@PureTryOut
Copy link
Author

I sadly don't have the rights to give you any access to the builders, and neither do I have the knowledge of the state of the machine. You'll have to ask some other dev on #alpine-devel on Freenode, sorry.

@dankamongmen
Copy link
Owner

I sadly don't have the rights to give you any access to the builders, and neither do I have the knowledge of the state of the machine. You'll have to ask some other dev on #alpine-devel on Freenode, sorry.

no worries, i'll jump in there and ask for some help later this evening. thanks a ton!

@dankamongmen
Copy link
Owner

If we look back at #1197, flushing this out there required a lot of piping into and out of notcurses-tester. let's try applying the methodology used there by @kaniini.

@kaniini
Copy link
Contributor

kaniini commented Mar 15, 2021

The latest issue was caused by the x86_64 builder using screen-256color. Notcurses assumes that the sgr and sgr0 escape sequences are available (or more precisely, errors instead of providing a fallback if they are not appropriately defined).

@kaniini
Copy link
Contributor

kaniini commented Mar 15, 2021

I worked around it by setting TERM=vt100 when running tests, but I think solving the sgr issue makes sense too.

@dankamongmen
Copy link
Owner

I worked around it by setting TERM=vt100 when running tests, but I think solving the sgr issue makes sense too.

definitely, how embarrassing. thanks @kaniini !!!!

@dankamongmen
Copy link
Owner

hrmmm, weren't we seeing failures in direct mode? it looks pretty well protected against NULL sgr/sgr0...

@dankamongmen
Copy link
Owner

definitely a missing check in term_setstyles(), though, fixing now

@dankamongmen
Copy link
Owner

i fixed up term_setstyles(), but i don't see any path by which that ought have affected direct mode =[

@kaniini
Copy link
Contributor

kaniini commented Mar 15, 2021

The deficiency is in ncdirect_style_emit. If sgr is null, but sgr0 is defined, then the function will fail:

1: ncdirect_style_emit/1: stylebits=40, out=0x7fb97e7ca2c0, sgr=0, sgr0=0x7fb97a7aaaed
1: ncdirect_style_emit/2: r=-1
1: ncdirect_style_emit/3: r=-1
1: ncdirect_set_styles/4

Those fprintf were at the beginning of the function, middle after sgr/sgr0 were checked, and return back to ncdirect_set_styles.

@dankamongmen
Copy link
Owner

that's correct behavior, though. we only emit sgr0 if stylebits are 0. so the first case there is going to run:

  int r = -1;
  if(stylebits == 0 && n->tcache.sgr0){
    r = term_emit(n->tcache.sgr0, n->ttyfp, false);
  }else if(n->tcache.sgr){
    r = term_emit(tiparm(n->tcache.sgr, stylebits & NCSTYLE_STANDOUT,
                         stylebits & NCSTYLE_UNDERLINE,
                         stylebits & NCSTYLE_REVERSE,
                         stylebits & NCSTYLE_BLINK,
                         stylebits & NCSTYLE_DIM,
                         stylebits & NCSTYLE_BOLD,
                         stylebits & NCSTYLE_INVIS,
                         stylebits & NCSTYLE_PROTECT, 0), out, false);
  }

it does not take the first conditional because stylebits are not 0. it does not take the second conditional because sgr is NULL. so r remains -1 (it is only changed from -1 if we emit something, which we don't). i don't see any misuse there.

@kaniini
Copy link
Contributor

kaniini commented Mar 15, 2021

The problem is that there is not a fallback for the case where sgr is null, and stylebits is non-zero. This results in -1, which leads to the assertion problems, as it is not able to set the requested styles (and thus returns -1).

@dankamongmen
Copy link
Owner

oh assertion problems i got you. i thought we were dereferencing NULL or something similar. of course.

@dankamongmen
Copy link
Owner

That should be fixed. There will be further breakage in sgr-full and sgr-direct, and possibly in some rendered mode unit tests. Getting those now.

@kaniini
Copy link
Contributor

kaniini commented Mar 15, 2021

the sgr issues are likely related to the emoji case too. or it is similar. as soon as i forced TERM=vt100, the tests passed.

@dankamongmen
Copy link
Owner

ok in the rendered mode, lack of sgr isn't going to cause anything to fail. there shouldn't be tests failing because of that. i'll be able to reproduce this locally now by just hard-coding sgr to NULL. you're the bessssssssssst <3 <3 <3

@dankamongmen
Copy link
Owner

wasn't there some manner of timeout on s390x recently?

@dankamongmen
Copy link
Owner

i'm now able to succeed despite lacking sgr.

@dankamongmen
Copy link
Owner

so sgr-full (NUMERICAL) was surely somehow derived from the NULL dereference it would have seen in rasterization.

sgr-direct is no longer run as part of the test set.

notcurses-tester has been modified to detect the case lacking sgr, and verify that these calls fail rather than succeed.

w00000000000000000000000000t. yay! couldn't have done it without you @kaniini , this was pretty much your fix.

@dankamongmen dankamongmen modified the milestones: 3.0.0, 2.3.0 Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants