Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All alpine autobuilders fail in tests using 2.0.11 #1197

Closed
dankamongmen opened this issue Dec 10, 2020 · 41 comments
Closed

All alpine autobuilders fail in tests using 2.0.11 #1197

dankamongmen opened this issue Dec 10, 2020 · 41 comments
Assignees
Labels
bug
Milestone

Comments

@dankamongmen
Copy link
Owner

dankamongmen commented Dec 10, 2020

I cut 2.0.11 for Alpine Edge today, confident that we'd fixed the s390x problem there. The good news is that s390x no longer errors out differently from the others. The bad news is that all now fail :(. Gotta fix this before 2.1.0.

@dankamongmen dankamongmen added the bug label Dec 10, 2020
@dankamongmen dankamongmen added this to the 2.1.0 milestone Dec 10, 2020
@dankamongmen dankamongmen self-assigned this Dec 10, 2020
@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 10, 2020

It looks like 2.0.12-pre is also now failing on drone :(. Though this is since 2.0.11. 4935 worked; since 4936, we're dead in the water.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 11, 2020

I've been trying to reproduce this locally, and failing. Need to get a core file exfiltrated from the docker.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 11, 2020

Finally got it reproduced!

  RefreshSameSizeternal:586:24x80 @ 0/0 → 1/1 @ 0/0 (keeping 1x1 from 0/0)0)
ncreel_redraw:672:Error drawing tablet4x80 plane "std" @ 0x0 1x1 from 0/0)
/root/notcurses/tests/notcurses.cpp:40: ERROR: CHECK( newx == x ) is NOT correct!creel_redraw:672:Error drawing tabletdard plane9fd9e60 @ 2x2ess)
  values: CHECK( 1 == 80 )60389fbf850 != 0x560389fd9e60r address)8 from 0/0)
get_tty_fd:920:File descriptor 1 was not a TTYls0/0 (keeping 1x1 from 0/0)
/root/notcurses/tests/notcurses.cpp:41: ERROR: CHECK( newy == y ) is NOT correct!x560389fbf850 is already registered for signals "std" @ 0x000000003
  values: CHECK( 1 == 24 )t is not to a terminal)"tab" @ 1x1 1x1 from 0/0)
ncplane_new_internal:419:Created new 24x80 plane "std" @ 0x02ess)from 0/0)
===============================================================================
/root/notcurses/tests/piles.cpp:3:as not a TTY@ 0/0 (keeping 1x1 from 0/0)
TEST CASE:  Pilesror opening /dev/tty (No such device or address)
  SmallerPileRenderlready registered for signals9fe55d0eping 1x1 from 0/0)
Defaulting to 24x80 (output is not to a terminal)ab" @ 0x0
/root/notcurses/tests/piles.cpp:32: FATAL ERROR: REQUIRE( nullptr != egc ) is NOT correct!w_internal:419:Created new 24x80 plane "" @ 0x0ing 1x1 from 0/0)
  values: REQUIRE( NULL != NULL ) @ 0/0 → 1/1 @ 0/0 (keeping 1x1 from 0/0)
ncplane_destroy:677:Won't destroy standard plane9fd9e60r address)
===============================================================================
/root/notcurses/tests/reel.cpp:120:s not a TTYal)ab" @ 0x0ddress)
TEST CASE:  Reelsror opening /dev/tty (No such device or address)
  ThreeCycleDownnternal:586:24x80 @ 0/0 → 1/1 @ 0/0 (keeping 1x1 from 0/0)
ncplane_destroy:677:Won't destroy standard plane "std" @ 0x0
/root/notcurses/tests/reel.cpp:335: ERROR: CHECK_LE( 0, order[n] ) is NOT correct!t_tty_fd:920:File descriptor 1 was not a TTY@ 0/0 (keeping 1x1 from 0/0)
  values: CHECK_LE( 0, -1 )g /dev/tty (No such device or address)
0x560389fbf850 is already registered for signals
/root/notcurses/tests/reel.cpp:120: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signalegistered for signals
Defaulting to 24x80 (output is not to a terminal)
===============================================================================
/root/notcurses/tests/reel.cpp:120:w 1x16 plane "plot" @ 1x1
TEST CASE:  Reelsternal:586:24x80 @ 0/0 → 1/1 @ 0/0 (keeping 1x1 from 0/0)
Couldn't drop signals: 0x560389fbf850 != 0x560389fd9e60
DEEPEST SUBCASE STACK REACHED (DIFFERENT FROM THE CURRENT ONE):
  ThreeCycleDownrror opening /dev/tty (No such device or address)
0x560389fbf850 is already registered for signals
===============================================================================
[doctest] test cases:     32 |     27 passed |      5 failed |     10 skipped
[doctest] assertions: 8409076 | 8409054 passed |     22 failed |
[doctest] Status: FAILURE!
[vps](139) $ 

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 11, 2020

/root/notcurses/tests/fds.cpp:49:
DESCRIPTION: Fdplanes and subprocedures
TEST CASE:  FdsAndSubprocs
  SubprocDestroyCmdHung

/root/notcurses/tests/fds.cpp:156: WARNING: WARN( 0 != ncsubproc_destroy(ncsubp) ) is NOT correct!
  values: WARN( 0 != 0 )

^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[
/root/notcurses/tests/fills.cpp:5:
TEST CASE:  Fills
  Ncplane_Stain

/root/notcurses/tests/fills.cpp:272: FATAL ERROR: REQUIRE( 0 < ncplane_stain(n_, 7, 7, channels, channels, channels, channels) ) is NOT correct!
  values: REQUIRE( 0 <  -1 )

^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[
/root/notcurses/tests/ncplane.cpp:20:
TEST CASE:  NCPlane
  PlaneAtCursorAttrs

/root/notcurses/tests/ncplane.cpp:638: ERROR: CHECK( newx == x ) is NOT correct!
  values: CHECK( 0 == 42 )

/root/notcurses/tests/ncplane.cpp:640: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, y - 2, x - 1) ) is NOT correct!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:642: ERROR: CHECK( testcell.gcluster == (__bswap_32 (__bswap_32 (STR1[strlen(STR1) - 1]))) ) is NOT correct!
  values: CHECK( 116 == 110 )

/root/notcurses/tests/ncplane.cpp:643: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, y - 1, x - 1) ) is NOT correct^[[1;1H^[[31m^[[46m╔^[[1;1H^[[31m^[[46m╭^[[1;1H^[[39;49mI^[[1;1H^[[39;49m╭^[[1;1H ^[[1;1H^[[39;49mX^[[1;1H^[[39;49m^[[30mA^[[1;1H^[[39;49m^[[30mC!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:645: ERROR: CHECK( testcell.gcluster == (__bswap_32 (__bswap_32 (STR2[strlen(STR2) - 1]))) ) is NOT correct!
  values: CHECK( 116 == 107 )

/root/notcurses/tests/ncplane.cpp:646: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, y, x - 1) ) is NOT correct!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:648: ERROR: CHECK( testcell.gcluster == (__bswap_32 (__bswap_32 (STR3[strlen(STR3) - 1]))) ) is NOT correct!
  values: CHECK( 116 == 115 )

^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m===============================================================================
/root/notcurses/tests/ncplane.cpp:20:
TEST CASE:  NCPlane
  RightToLeft

/root/notcurses/tests/ncplane.cpp:733: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, 3, 10) ) is NOT correct!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:734: ERROR: CHECK( 0 < ncplane_putstr(n_, "I can write English with מילים בעברית in the same sentence.") ) is NOT correct!
  values: CHECK( 0 <  -1 )

/root/notcurses/tests/ncplane.cpp:735: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, 5, 10) ) is NOT correct!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:736: ERROR: CHECK( 0 < ncplane_putstr(n_, "|🔥|I have not yet ־ begun to hack|🔥|") ) is NOT correct!
  values: CHECK( 0 <  0 )

/root/notcurses/tests/ncplane.cpp:737: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, 7, 10) ) is NOT correct!
  values: CHECK( 0 == -1 )


/root/notcurses/tests/ncplane.cpp:738: ERROR: CHECK( 0 < ncplane_putstr(n_, "㉀㉁㉂㉃㉄㉅㉆㉇㉈㉉㉊㉋㉌㉍㉎㉏㉐㉑㉒㉓㉔㉕㉖㉗㉘㉙㉚㉛㉜㉝㉞㉟") ) is NOT correct!
  values: CHECK( 0 <  0 )

^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m===============================================================================
/root/notcurses/tests/ncplane.cpp:20:
TEST CASE:  NCPlane
  EGCStained

/root/notcurses/tests/ncplane.cpp:806: ERROR: CHECK( 1 == ncplane_putegc_stained(n_, "D", &sbytes) ) is NOT correct!
  values: CHECK( 1 == -1 )

/root/notcurses/tests/ncplane.cpp:813: ERROR: CHECK( 1 == ncplane_at_yx_cell(n_, 0, 1, &c) ) is NOT correct!
  values: CHECK( 1 == -1 )

/root/notcurses/tests/ncplane.cpp:815: ERROR: CHECK( (__bswap_32 (__bswap_32 ('D'))) == c.gcluster ) is NOT correct!
  values: CHECK( 68 == 67 )

/root/notcurses/tests/ncplane.cpp:817: ERROR: CHECK( channels == c.channels ) is NOT correct!
  values: CHECK( 4650116732956966912 == 4630901375692177408 )

^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[
^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[H^[[39;49m ^[[1;1H^[[332m^[[464mX^[[1;1H^[[39;49mO^[[1;1H^[[35m^[[40m▒^[[1;1H^[[39;49m ^[[1;1H^[[37m^[[40m╭^[[1;1H^[[39;49m╭^[[1;1H^[[39;49m╭^[[1;1H^[[39;49m╭^[[1;1H^[[39;49m╭^[[1;1H^[[39;49m╭^[[1;1H╰^[[1;1H^[[39;49m╭^[[1;1H╰^[[1;1H^[[39;49m╭^[[1;1H^[[39
/root/notcurses/tests/notcurses.cpp:7:
TEST CASE:  NotcursesBase
  RefreshSameSize

/root/notcurses/tests/notcurses.cpp:40: ERROR: CHECK( newx == x ) is NOT correct!
  values: CHECK( 1 == 80 )

/root/notcurses/tests/notcurses.cpp:41: ERROR: CHECK( newy == y ) is NOT correct!
  values: CHECK( 1 == 24 )

^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[
/root/notcurses/tests/piles.cpp:3:
TEST CASE:  Piles
  SmallerPileRender

/root/notcurses/tests/piles.cpp:32: FATAL ERROR: REQUIRE( nullptr != egc ) is NOT correct!
  values: REQUIRE( NULL != NULL )

^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[
/root/notcurses/tests/reel.cpp:120:
TEST CASE:  Reels
  ThreeCycleDown

/root/notcurses/tests/reel.cpp:335: ERROR: CHECK_LE( 0, order[n] ) is NOT correct!
  values: CHECK_LE( 0, -1 )

/root/notcurses/tests/reel.cpp:120: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

===============================================================================
/root/notcurses/tests/reel.cpp:120:
TEST CASE:  Reels

DEEPEST SUBCASE STACK REACHED (DIFFERENT FROM THE CURRENT ONE):
  ThreeCycleDown

===============================================================================
[doctest] test cases:     32 |     27 passed |      5 failed |     10 skipped
[doctest] assertions: 8409076 | 8409054 passed |     22 failed |
[doctest] Status: FAILURE!

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 12, 2020

OK, I can reproduce this locally now just by running in nohup and disconnecting the terminal. Yeaargh.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 12, 2020

I see what's happening, though I have no idea why:

ncplane_cursor_move_yx:490:Target y 26 >= height 24
CURRENT: 24/80 TERM: 1/1
ncplane_resize_internal:586:24x80 @ 0/0 → 1/1 @ 0/0 (keeping 1x1 from 0/0)
****************** 0/0
ncplane_cursor_move_yx:479:Target x 79 >= length 1
ncplane_cursor_move_yx:479:Target x 79 >= length 1
ncplane_cursor_move_yx:479:Target x 79 >= length 1

in PlaneAtCursorAttrs, we're somehow dropping the standard plane down to {1, 1} dimensions, at which point we can't emit our strings, and everything goes to hell. why would we be going to 1,1?

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 12, 2020

PlaneAtCursorAttrs is resolved now, but i suspect others are broken in the same way. it looks like we possibly carry cursor information across subtests? but wouldn't that mean we carry all aspects of the standard plane across subtests? i don't think that that's going on.....hrmm....

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 12, 2020

Hrmmm, I don't like that segfault, but we have this resolved and tests are now passing. I'd like to try further to reproduce and chase down that segfault, though.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 12, 2020

the segfault was a failure to check a result in the reels tests. resolved. we're done here!

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 12, 2020

https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/15708 we look good! all alpine builds are now passing =].

algitbot pushed a commit to alpinelinux/aports that referenced this issue Dec 12, 2020
This release was all about fixing the unit tests, which have
been mysteriously breaking on the Arch autobuilder recently:
dankamongmen/notcurses#1197
@kaniini
Copy link
Contributor

kaniini commented Dec 15, 2020

2.0.12 tests are crashing on x86 :(

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 15, 2020

2.0.12 tests are crashing on x86 :(

so i saw. =[ do you have any insight as to why they would all be green in the pipeline attached to the PR, but then one would break somewhere further down the line? the build logs for that pipeline clearly show the tests being run on x86, and succeeding. =[

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 15, 2020

2.0.12 tests are crashing on x86 :(

btw is your "keeper of mazes" a reference to the Ariadne of mythology, she of the gold thread?

@kaniini
Copy link
Contributor

kaniini commented Dec 15, 2020

the CI environment is a bit different than the actual buildservers. on the buildservers, we capture stdout and stderr file descriptors and redirect them to files. i do not believe we do this on CI.

@kaniini
Copy link
Contributor

kaniini commented Dec 15, 2020

i had to block notcurses on x86 so that our x86 buildserver would move onto trying to build other packages, but would be happy to help debug this.

btw is your "keeper of mazes" a reference to the Ariadne of mythology, she of the gold thread?

yes, I work on a lot of security-related code inside and outside alpine, as well. seemed like a good fit.

@kaniini
Copy link
Contributor

kaniini commented Dec 15, 2020

I have set up an x86 alpine install and built notcurses in it manually with abuild:

$ abuild deps clean unpack prepare build
[...]
[100%] Linking CXX executable notcurses-tester
make[2]: Leaving directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'
[100%] Built target notcurses-tester
make[1]: Leaving directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'
make: Leaving directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'

I then ran abuild check to invoke the testsuite by hand:

$ abuild check
make: Entering directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'
Running tests...
Test project /home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build
    Start 1: notcurses-tester
1/7 Test #1: notcurses-tester .................   Passed   23.87 sec
    Start 2: ncpp_build
2/7 Test #2: ncpp_build .......................   Passed    0.01 sec
    Start 3: ncpp_build_exceptions
3/7 Test #3: ncpp_build_exceptions ............   Passed    0.01 sec
    Start 4: sgr-full
4/7 Test #4: sgr-full .........................   Passed    0.01 sec
    Start 5: sgr-direct
5/7 Test #5: sgr-direct .......................   Passed    0.01 sec
    Start 6: rgb
6/7 Test #6: rgb ..............................   Passed    0.02 sec
    Start 7: rgbbg
7/7 Test #7: rgbbg ............................   Passed    0.01 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) =  24.08 sec
make: Leaving directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'

I then ran the testsuite with stdout and stderr captured like the buildservers do:

$ (abuild check >check.log 2>&1) & tail -f check.log
make: Entering directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'
Running tests...
Test project /home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build
    Start 1: notcurses-tester

In my test environment, its just frozen, which is a behavior different than the buildserver even though the FDs are captured the exact same way.

@kaniini
Copy link
Contributor

kaniini commented Dec 15, 2020

Running this a few more times, I cannot get it to crash and I cannot get it to segfault on a bare x86 VM.

@kaniini
Copy link
Contributor

kaniini commented Dec 15, 2020

Running the tester program directly also does not crash. I'm honestly baffled as to why the buildserver is failing reliably, but the failure is not reproducible in a test environment.

@kaniini
Copy link
Contributor

kaniini commented Dec 15, 2020

Hmm, when your code sees that stdout is not a TTY, it tries to open /dev/tty. This is probably related -- I doubt /dev/tty is connected to anything useful in an LXC container, but more importantly it probably shouldn't try to use /dev/tty anyway.

@kaniini
Copy link
Contributor

kaniini commented Dec 15, 2020

Bam! If I replace /dev/tty with something bogus (say, a FIFO), we get an immediate segfault in notcurses-tester.

@kaniini
Copy link
Contributor

kaniini commented Dec 15, 2020

Opened #1212 with a likely fix.

algitbot pushed a commit to alpinelinux/aports that referenced this issue Dec 15, 2020
@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 15, 2020

i had to block notcurses on x86 so that our x86 buildserver would move onto trying to build other packages, but would be happy to help debug this.

btw is your "keeper of mazes" a reference to the Ariadne of mythology, she of the gold thread?

yes, I work on a lot of security-related code inside and outside alpine, as well. seemed like a good fit.

awesome. i'm at work at the moment, but will be able to look at this again this evening. so distressing -- i figured out why we were breaking on s390x, only to start breaking on x86! i definitely intend to get this fixed, just didn't yet have the heart to do so the other day =].

@dankamongmen dankamongmen reopened this Dec 15, 2020
@dankamongmen dankamongmen modified the milestones: 2.1.0, 2.2.0 Dec 15, 2020
@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

Marking this tentatively closed, in the hope that @kaniini 's patch fixes us up. I might go ahead and package Alpine 2.1.0 (they're on 2.0.12 currently) with @kaniini 's patch in the APKBUILD, and that way bring them up to speed while also getting a test prior to 2.1.1. I'd really love to stop breaking their autobuilder.

@kaniini
Copy link
Contributor

kaniini commented Dec 16, 2020

Unfortunately on the latest try, it doesn't fix us up. With the help of @Ikke, I was able to get the CTest log from the builder:

===============================================================================
/home/buildozer/aports/community/notcurses/src/notcurses-2.0.12/tests/piles.cpp:3:
TEST CASE:  Piles
  ShufflePile

/home/buildozer/aports/community/notcurses/src/notcurses-2.0.12/tests/piles.cpp:3: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

Going to dig a little bit into this test.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

Hot damn, y'all are CHAMPIONS. I'm on it as well. The piles stuff is all new from 2.0.x, and I haven't yet written anything that really exercises it, so I'm not surprised to see potential problems there.

@dankamongmen dankamongmen reopened this Dec 16, 2020
@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

tally-ho!

[schwarzgerat](0) $ cat e
==1138495== Memcheck, a memory error detector
==1138495== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1138495== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==1138495== Command: ./notcurses-tester -p ../data/ --tc=Piles
==1138495== 
==1138495== Invalid write of size 8
==1138495==    at 0x4A6FC6E: ncplane_destroy (notcurses.c:699)
==1138495==    by 0x4A6FC6E: ncplane_destroy (notcurses.c:680)
==1138495==    by 0x1C517A: _DOCTEST_ANON_FUNC_2() (piles.cpp:129)
==1138495==    by 0x190A95: doctest::Context::run() (doctest.h:6167)
==1138495==    by 0x1458DB: main (main.cpp:133)
==1138495==  Address 0xf294aa0 is 96 bytes inside a block of size 176 free'd
==1138495==    at 0x48399AB: free (vg_replace_malloc.c:538)
==1138495==    by 0x4A6FCB3: ncplane_destroy (notcurses.c:712)
==1138495==    by 0x4A6FCB3: ncplane_destroy (notcurses.c:680)
==1138495==    by 0x1C5162: _DOCTEST_ANON_FUNC_2() (piles.cpp:127)
==1138495==    by 0x190A95: doctest::Context::run() (doctest.h:6167)
==1138495==    by 0x1458DB: main (main.cpp:133)
==1138495==  Block was alloc'd at
==1138495==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==1138495==    by 0x4A6EDDF: ncplane_new_internal (notcurses.c:358)
==1138495==    by 0x1C3E95: _DOCTEST_ANON_FUNC_2() (piles.cpp:99)
==1138495==    by 0x190A95: doctest::Context::run() (doctest.h:6167)
==1138495==    by 0x1458DB: main (main.cpp:133)
==1138495== 
==1138495== 
==1138495== HEAP SUMMARY:
==1138495==     in use at exit: 58,182 bytes in 264 blocks
==1138495==   total heap usage: 3,175 allocs, 2,911 frees, 3,427,641 bytes allocated
==1138495== 
==1138495== LEAK SUMMARY:
==1138495==    definitely lost: 0 bytes in 0 blocks
==1138495==    indirectly lost: 0 bytes in 0 blocks
==1138495==      possibly lost: 1,352 bytes in 18 blocks
==1138495==    still reachable: 56,830 bytes in 246 blocks
==1138495==                       of which reachable via heuristic:
==1138495==                         newarray           : 1,536 bytes in 16 blocks
==1138495==         suppressed: 0 bytes in 0 blocks
==1138495== Rerun with --leak-check=full to see details of leaked memory
==1138495== 
==1138495== For lists of detected and suppressed errors, rerun with: -s
==1138495== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
[schwarzgerat](0) $ 

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

for all those playing at home, it's likely that whatever's affecting us in ShufflePile is likewise going to trigger on ShufflePileFamilies, which is just not being hit due to the SIGSEGV. so if it's in the test rather than the library core, it'll need to be fixed in both places.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

[schwarzgerat](0) $ cat e
 -------------------------- notcurses debug state -----------------------------
  *************************   0x5572e3da35f0 pile ****************************
0000 off y:   0 x:   0 geom y:  70 x:  80 curs y:   0 x:   0 0x5572e3d68040 std
 bound 0x5572e3d68040 ← 0x5572e3da3600 → (nil) binds (nil)
  *************************   0x5572e3da7630 pile ****************************
0000 off y:   7 x:   7 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dabd40 new3
 bound 0x5572e3dd2710 ← 0x5572e3dd2780 → (nil) binds (nil)
0001 off y:   6 x:   6 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dd25b0 new2
 bound 0x5572e3dd25b0 ← 0x5572e3da7640 → (nil) binds (nil)
0002 off y:   3 x:   3 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dd2710 new1
 bound 0x5572e3dd2710 ← 0x5572e3dabda0 → (nil) binds 0x5572e3dabd40
 WARNING: expected *->bprev 0x5572e3dd2710, got (nil)
 ______________________________________________________________________________
 -------------------------- notcurses debug state -----------------------------
  *************************   0x5572e3da35f0 pile ****************************
0000 off y:   0 x:   0 geom y:  70 x:  80 curs y:   0 x:   0 0x5572e3d68040 std
 bound 0x5572e3d68040 ← 0x5572e3da3600 → (nil) binds (nil)
  *************************   0x5572e3da7630 pile ****************************
0000 off y:   6 x:   6 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dd25b0 new2
 bound 0x5572e3dd25b0 ← 0x5572e3da7640 → (nil) binds (nil)
0001 off y:   3 x:   3 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dd2710 new1
 bound 0x5572e3dd2710 ← 0x5572e3dabda0 → (nil) binds (nil)
 WARNING: expected *->bprev 0x5572e3dd2710, got (nil)
 ______________________________________________________________________________
 -------------------------- notcurses debug state -----------------------------
  *************************   0x5572e3da35f0 pile ****************************
0000 off y:   0 x:   0 geom y:  70 x:  80 curs y:   0 x:   0 0x5572e3d68040 std
 bound 0x5572e3d68040 ← 0x5572e3da3600 → (nil) binds (nil)
  *************************   0x5572e3da7630 pile ****************************
0000 off y:   3 x:   3 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dd2710 new1
 bound 0x5572e3dd2710 ← 0x5572e3dabda0 → (nil) binds (nil)
 WARNING: expected *->bprev 0x5572e3dd2710, got (nil)
 ______________________________________________________________________________
 -------------------------- notcurses debug state -----------------------------
  *************************   0x5572e3da35f0 pile ****************************
0000 off y:   0 x:   0 geom y:  70 x:  80 curs y:   0 x:   0 0x5572e3d68040 std
 bound 0x5572e3d68040 ← 0x5572e3da3600 → (nil) binds (nil)
 ______________________________________________________________________________
[schwarzgerat](0) $ 

@kaniini
Copy link
Contributor

kaniini commented Dec 16, 2020

Ah, linked list corruption. That's what my guess was going to be based on looking at the code.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

Ah, linked list corruption. That's what my guess was going to be based on looking at the code.

absolutely, i am the suck. i'll have it fixed in 10min, but you're welcome to race if you'd like =]. SHOULDA USED RUST.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

and by the way i can't thank you and @Ikke enough, nor the rest of Alpine. i've no idea why yours is the only config that caught this, but you've done me a tremendous service.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

ok yeah, in ncplane_reparent() we're improperly splicing the reparented plane's children into the pile's root list. stupid me!

@kaniini
Copy link
Contributor

kaniini commented Dec 16, 2020

musl's malloc-ng malloc implementation catches a lot of bugs like these. if you're interested, i could set up alpine-based CI using github actions.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

musl's malloc-ng malloc implementation catches a lot of bugs like these. if you're interested, i could set up alpine-based CI using github actions.

i've got a ci server at https://drone.dsscaw.com:4443/ (or i did, anyway; it apparently has stopped), that i've been meaning to throw alpine onto. the musl observation is a compelling one. certainly don't let me stop you from setting up whatever you'd like, of course, but yeah throwing an alpine build into .drone.yml sounds like a great idea.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

with that said, if you're offering to do this because you're interested in getting involved with notcurses, i'd be delighted to have you aboard, and am happy to let you take over whatever you'd like.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

i've got a fix for this. valgrind now runs clear.

@kaniini
Copy link
Contributor

kaniini commented Dec 16, 2020

I do have some interest in notcurses; for example I am interested in using it as a basis for a replacement Alpine installer, modelled after FreeBSD's bsdinstall.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

I do have some interest in notcurses; for example I am interested in using it as a basis for a replacement Alpine installer, modelled after FreeBSD's bsdinstall.

well i've no idea as to what kind of time you want to put into it, but so long as you don't go committing into my C core without letting me know =], i'm happy to make you a collaborator. alternatively, you can just send PRs and know you're on the fast track for approval, heh =]. i'm honored to have people of your competence interested in my humble little project, and @joseluis can hopefully vouch for my willingness to explain my mysterious/inscrutable codes and comments via mail.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

also, i'm delighted to hear of potential use in an alpine installer. feel free to hit me at nickblack@linux.com with any questions you run into, and be liberal with the feature request button. 2.1.1 adds progress bars (already visible in the allgraph and uniblock demos from head), and if tree-based selectors would be useful, i can move up #1164 .

@kaniini
Copy link
Contributor

kaniini commented Dec 16, 2020

>>> notcurses: Build complete at Wed, 16 Dec 2020 07:30:04 +0000 elapsed time 0h 5m 11s

Looks like we've got it this time.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Dec 16, 2020

>>> notcurses: Build complete at Wed, 16 Dec 2020 07:30:04 +0000 elapsed time 0h 5m 11s

Looks like we've got it this time.

boom! definitely a team effort. thank you for restoring a bit of my faith in humanity and free software.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug
Projects
None yet
Development

No branches or pull requests

2 participants