|
movem.l %d0-%d7/%a0-%a7, -(%sp) |
The BSS clear loop in init_c_runtime overshoots __bss_end by 1-32 bytes
due to dbra semantics plus missing tail handling:
movea.l #__bss_start_in_ram, %a3
move.l #__bss_end, %d0
sub.l #__bss_start, %d0
lsr.w #5, %d0 ; d0 = size / 32
.Lcopybss:
movem.l %d2-%d7/%a0-%a1, (%a3) ; writes 32 bytes
lea.l 0x20(%a3), %a3
dbra %d0, .Lcopybss
dbra decrements then branches while %d0 != -1, so a loop entered with
%d0 = N runs the body N+1 times - the post-decrement final iteration falls
through when %d0 becomes -1. With size = 64 (so initial %d0 = 2)
the loop writes 3 blocks = 96 bytes; with any size mod 32 != 0 the loop
also overshoots by 32 - (size mod 32) bytes. The overshoot is always
at least 1 byte and at most 32.
Today this is masked by the RAM layout in runtime/ngdevkit.ld:
__data_start_in_ram = (__bss_start_in_ram + SIZEOF(.bss) + 3) / 4 * 4;
.data is placed immediately after .bss in RAM, so the BSS overshoot
lands on the head of .data and is rewritten by the data-copy loop a few
lines down. Any future change putting something else past BSS (debug pads,
malloc heap, ngdevkit-managed scratch) would expose silent corruption at
every boot.
The data-copy loop right below at crt0.S:215-229 already does this
correctly and is the obvious template:
move.w %d0, %d1
lsr.w #5, %d0
andi.w #0x001F, %d1 ; tail byte count
bra .Lcopydata_begin ; pre-branch into the dbra test
.Lcopydata:
movem.l (%a2)+, %d2-%d7/%a0-%a1
movem.l %d2-%d7/%a0-%a1, (%a3)
lea.l 0x20(%a3), %a3
.Lcopydata_begin:
dbra %d0, .Lcopydata
bra .Lcopylastdata_begin
.Lcopylastdata:
move.b (%a2)+, (%a3)+
.Lcopylastdata_begin:
dbra %d1, .Lcopylastdata
Suggested fix - mirror the same shape for the BSS clear:
movea.l #__bss_start_in_ram, %a3
move.l #__bss_end, %d0
sub.l #__bss_start, %d0
move.w %d0, %d1
lsr.w #5, %d0
andi.w #0x001F, %d1
bra .Lcopybss_begin
.Lcopybss:
movem.l %d2-%d7/%a0-%a1, (%a3)
lea.l 0x20(%a3), %a3
.Lcopybss_begin:
dbra %d0, .Lcopybss
bra .Lcopylastbss_begin
.Lcopylastbss:
clr.b (%a3)+
.Lcopylastbss_begin:
dbra %d1, .Lcopylastbss
Repro:
Paper trace - enter .Lcopybss with %d0 = 2 (64-byte region).
After iter 1 %d0 = 1, iter 2 %d0 = 0, iter 3 %d0 = -1 and the loop
falls through. Three iterations, 96 bytes written.
Or set a watchpoint in MAME/gdb at __bss_end and run any project where
SIZEOF(.bss) & 0x1F != 0; observe the writes past it on boot.
ngdevkit/runtime/ngdevkit-crt0.S
Line 138 in d65da8b
The BSS clear loop in
init_c_runtimeovershoots__bss_endby 1-32 bytesdue to
dbrasemantics plus missing tail handling:dbradecrements then branches while%d0 != -1, so a loop entered with%d0 = Nruns the body N+1 times - the post-decrement final iteration fallsthrough when
%d0becomes-1. Withsize = 64(so initial%d0 = 2)the loop writes 3 blocks = 96 bytes; with any
size mod 32 != 0the loopalso overshoots by
32 - (size mod 32)bytes. The overshoot is alwaysat least 1 byte and at most 32.
Today this is masked by the RAM layout in
runtime/ngdevkit.ld:.datais placed immediately after.bssin RAM, so the BSS overshootlands on the head of
.dataand is rewritten by the data-copy loop a fewlines down. Any future change putting something else past BSS (debug pads,
malloc heap, ngdevkit-managed scratch) would expose silent corruption at
every boot.
The data-copy loop right below at
crt0.S:215-229already does thiscorrectly and is the obvious template:
Suggested fix - mirror the same shape for the BSS clear:
Repro:
Paper trace - enter
.Lcopybsswith%d0 = 2(64-byte region).After iter 1
%d0 = 1, iter 2%d0 = 0, iter 3%d0 = -1and the loopfalls through. Three iterations, 96 bytes written.
Or set a watchpoint in MAME/gdb at
__bss_endand run any project whereSIZEOF(.bss) & 0x1F != 0; observe the writes past it on boot.