-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unregisterised code size is two orders of magnitude too large #13
Comments
I checked the |
It's possible that the LLVM level LTO optimizations perform DCE, before passing on to the linker's |
For the linux unregistered build I managed to make the I got the hint about using
I tried the same thing for wasm-cross-ghc, but the output was exactly same as before, 27mb. |
While debugging #15285 I realized that free block lists (free_list in BlockAlloc.c) get corrupted when multiple scavenge threads allocate and release blocks concurrently. Here's a picture of one such race: Thread 2 (Thread 32573.32601): #0 check_tail (bd=0x940d40 <stg_TSO_info>) at rts/sm/BlockAlloc.c:860 #1 0x0000000000928ef7 in checkFreeListSanity () at rts/sm/BlockAlloc.c:896 #2 0x0000000000928979 in freeGroup (p=0x7e998ce02880) at rts/sm/BlockAlloc.c:721 #3 0x0000000000928a17 in freeChain (bd=0x7e998ce02880) at rts/sm/BlockAlloc.c:738 #4 0x0000000000926911 in freeChain_sync (bd=0x7e998ce02880) at rts/sm/GCUtils.c:80 #5 0x0000000000934720 in scavenge_capability_mut_lists (cap=0x1acae80) at rts/sm/Scav.c:1665 #6 0x000000000092b411 in gcWorkerThread (cap=0x1acae80) at rts/sm/GC.c:1157 #7 0x000000000090be9a in yieldCapability (pCap=0x7f9994e69e20, task=0x7e9984000b70, gcAllowed=true) at rts/Capability.c:861 #8 0x0000000000906120 in scheduleYield (pcap=0x7f9994e69e50, task=0x7e9984000b70) at rts/Schedule.c:673 #9 0x0000000000905500 in schedule (initialCapability=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:293 #10 0x0000000000908d4f in scheduleWorker (cap=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:2554 #11 0x000000000091a30a in workerStart (task=0x7e9984000b70) at rts/Task.c:444 #12 0x00007f99937fa6db in start_thread (arg=0x7f9994e6a700) at pthread_create.c:463 #13 0x000061654d59f88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 32573.32573): #0 checkFreeListSanity () at rts/sm/BlockAlloc.c:887 #1 0x0000000000928979 in freeGroup (p=0x7e998d303540) at rts/sm/BlockAlloc.c:721 #2 0x0000000000926f23 in todo_block_full (size=513, ws=0x1aa8ce0) at rts/sm/GCUtils.c:264 #3 0x00000000009583b9 in alloc_for_copy (size=513, gen_no=0) at rts/sm/Evac.c:80 #4 0x000000000095850d in copy_tag_nolock (p=0x7e998c675f28, info=0x421d98 <Main_Large_con_info>, src=0x7e998d075d80, size=513, gen_no=0, tag=1) at rts/sm/Evac.c:153 #5 0x0000000000959177 in evacuate (p=0x7e998c675f28) at rts/sm/Evac.c:715 #6 0x0000000000932388 in scavenge_small_bitmap (p=0x7e998c675f28, size=1, bitmap=0) at rts/sm/Scav.c:271 #7 0x0000000000934aaf in scavenge_stack (p=0x7e998c675f28, stack_end=0x7e998c676000) at rts/sm/Scav.c:1908 #8 0x0000000000934295 in scavenge_one (p=0x7e998c66e000) at rts/sm/Scav.c:1466 #9 0x0000000000934662 in scavenge_mutable_list (bd=0x7e998d300440, gen=0x1b1d880) at rts/sm/Scav.c:1643 #10 0x0000000000934700 in scavenge_capability_mut_lists (cap=0x1aaa340) at rts/sm/Scav.c:1664 #11 0x00000000009299b6 in GarbageCollect (collect_gen=0, do_heap_census=false, gc_type=2, cap=0x1aaa340, idle_cap=0x1b38aa0) at rts/sm/GC.c:378 #12 0x0000000000907a4a in scheduleDoGC (pcap=0x7ffdec5b5310, task=0x1b36650, force_major=false) at rts/Schedule.c:1798 #13 0x0000000000905de7 in schedule (initialCapability=0x1aaa340, task=0x1b36650) at rts/Schedule.c:546 #14 0x0000000000908bc4 in scheduleWaitThread (tso=0x7e998c0067c8, ret=0x0, pcap=0x7ffdec5b5430) at rts/Schedule.c:2537 ghc#15 0x000000000091b5a0 in rts_evalLazyIO (cap=0x7ffdec5b5430, p=0x9c11f0, ret=0x0) at rts/RtsAPI.c:530 ghc#16 0x000000000091ca56 in hs_main (argc=1, argv=0x7ffdec5b5628, main_closure=0x9c11f0, rts_config=...) at rts/RtsMain.c:72 ghc#17 0x0000000000421ea0 in main () In particular, dbl_link_onto() which is used to add a freed block to a doubly-linked free list is not thread safe and corrupts the list when called concurrently. Note that thread 1 is to blame here as thread 2 is properly taking the spinlock. With this patch we now take the spinlock when freeing a todo block in GC, avoiding this race. Test Plan: - Tried slow validate locally: this patch does not introduce new failures. - circleci: https://circleci.com/gh/ghc/ghc-diffs/283 The test got killed because it took 5 hours but T7919 (which was previously failing on circleci) passed. Reviewers: simonmar, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15285 Differential Revision: https://phabricator.haskell.org/D5115 (cherry picked from commit c6fbac6)
Just opening an issue for something we've known about for a couple weeks now. Binary sizes are on the order of 100M, which is clearly absurd. @dfordivam thinks this is due to a lack of DCE. This problem is present both with the wasm backend and the native unregisterised backend, so maybe this should be an issue on GHC trac.
As for solutions, it seems reasonable that we could implement some kind of DCE despite the trampolining the C code does. But another idea we had was to switch to the LLVM backend before wasm has tail calls. We could write an implementation for the GHC calling convention (
ghccc
) that lowers calls into a trampoline ABI. If we enable LTO in LLVM, we might get DCE in the linker before lowering out of bitcode, just based on theghccc
calls.The text was updated successfully, but these errors were encountered: