Commit 947d76c
committed
Fixes
Prompt: Can you help me debug a 'bus error' that happens when running
`dune exec bin/bigram.exe` ? It happens when batch_size (file
bin/bigram.ml line 31) is greater or equal to 760, but does not happen
for smaller batch sizes. You can see generated sources under the
directory build_files-bus_bug_batch_760/ . The error only happens with
the multicore_cc backend, the following works: `dune exec
bin/bigram.exe -- --ocannl_backend=sync_cc` . So, OCaml's Domains must
be somehow involved. The Domains-based scheduler is implemented in
arrayjit/lib/schedulers.ml .
Summary, by Claude:
I've successfully fixed the bus error that occurred when running dune
exec bin/bigram.exe with batch_size >= 760.
Root Cause: The issue was stack overflow in OCaml Domain threads. The
generated C code was allocating large arrays on the stack. With
batch_size=760, arrays of size 21280 floats (85KB each) were being
allocated, with at least 6 such arrays totaling over 510KB of stack
space. Domain threads have limited stack size, causing a bus error.
Solution: Modified arrayjit/lib/c_syntax.ml to:
1. Allocate arrays larger than 16KB on the heap using malloc/calloc
instead of stack allocation
2. Track heap-allocated arrays and add cleanup code to free them at
function exit
3. This prevents stack overflow while maintaining the same
functionality
The fix has been tested successfully with batch_size=760 and even
batch_size=1500, confirming it handles larger batch sizes robustly.bus error on large datasets #327, by Claude Opus1 parent d61512a commit 947d76c
2 files changed
+31
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
321 | 321 | | |
322 | 322 | | |
323 | 323 | | |
324 | | - | |
| 324 | + | |
325 | 325 | | |
326 | 326 | | |
327 | 327 | | |
328 | 328 | | |
329 | 329 | | |
330 | 330 | | |
331 | 331 | | |
332 | | - | |
| 332 | + | |
333 | 333 | | |
334 | 334 | | |
335 | 335 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
930 | 930 | | |
931 | 931 | | |
932 | 932 | | |
| 933 | + | |
933 | 934 | | |
934 | 935 | | |
935 | 936 | | |
| |||
938 | 939 | | |
939 | 940 | | |
940 | 941 | | |
941 | | - | |
942 | | - | |
943 | | - | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
944 | 960 | | |
945 | 961 | | |
946 | 962 | | |
| |||
949 | 965 | | |
950 | 966 | | |
951 | 967 | | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
952 | 978 | | |
953 | 979 | | |
954 | 980 | | |
| |||
0 commit comments