Skip to content

Fix 32-bit ARM SIGBUS: align elastic allocator payload for coded_lists::buf#262

Merged
aous72 merged 2 commits intoaous72:masterfrom
cary-ilm:armv7-fix
Mar 19, 2026
Merged

Fix 32-bit ARM SIGBUS: align elastic allocator payload for coded_lists::buf#262
aous72 merged 2 commits intoaous72:masterfrom
cary-ilm:armv7-fix

Conversation

@cary-ilm
Copy link
Contributor

Problem:

OpenJPH’s mem_elastic_allocator lays out memory as:
[ stores_list header ][ payload... ]
Payload holds placement-new coded_lists instances via get_buffer():
p = new (cur_store->data) coded_lists(needed_bytes);
Each coded_lists sets its bitstream pointer as:
buf = (ui8*)this + sizeof(coded_lists)
So the address of buf is:
slab + offset(stores_list → data) + sizeof(coded_lists)
Previously, data was set to immediately after the stores_list object (i.e. offset sizeof(stores_list) from the slab start). The compiler-chosen size of stores_list is not guaranteed to be a multiple of 8 or 16. For some layouts, the first coded_lists header therefore landed at an offset such that buf ended up at an address congruent to 4 (mod 8): 4-byte aligned but not 8-byte aligned.

On strict 32-bit ARM, libc memcpy used when copying or flushing these bitstreams (e.g. wide loads such as LDRD) can require 8-byte alignment. Misaligned buf then triggers SIGBUS. This showed up in practice when OpenJPH was used from OpenEXR (Huffman-table / compressed data flush paths) on armv6 and armv7 machines.

Root cause:

The failure is not in coded_lists itself but in where the payload region begins relative to the slab. Padding only inside coded_lists or rounding the user allocation size is insufficient if the first byte of the payload region is still at a bad offset from the malloc base.

Solution

  1. stores_list: Introduce offset16() = round_up(sizeof(stores_list), 16). Construct with: data = orig_data = (ui8*)this + offset16() so the first byte available for coded_lists is 16-byte aligned from the start of the slab. eval_store_bytes() adds this offset so malloc allocates enough space for header + padding + payload.
  2. get_buffer: Round (needed_bytes + sizeof(coded_lists)) up to a multiple of 16 so successive coded_lists regions in the same store keep consistent alignment as data advances. Together, coded_lists headers and their buf pointers stay suitably aligned for memcpy/fwrite on 32-bit ARM while preserving existing allocator behavior on other platforms.

This resolves AcademySoftwareFoundation/openexr#2134

Analysis and solution made with the help of Cursor / Claude Opus 4.5

…s::buf

Problem:

OpenJPH’s mem_elastic_allocator lays out memory as:
  [ stores_list header ][ payload... ]
Payload holds placement-new `coded_lists` instances via get_buffer():
  p = new (cur_store->data) coded_lists(needed_bytes);
Each `coded_lists` sets its bitstream pointer as:
  buf = (ui8*)this + sizeof(coded_lists)
So the address of `buf` is:
  slab + offset(stores_list → data) + sizeof(coded_lists)
Previously, `data` was set to immediately after the `stores_list` object
(i.e. offset sizeof(stores_list) from the slab start). The compiler-chosen
size of `stores_list` is not guaranteed to be a multiple of 8 or 16. For
some layouts, the first `coded_lists` header therefore landed at an offset
such that `buf` ended up at an address congruent to 4 (mod 8): 4-byte
aligned but not 8-byte aligned.

On strict 32-bit ARM, libc memcpy used when copying or flushing these
bitstreams (e.g. wide loads such as LDRD) can require 8-byte
alignment. Misaligned `buf` then triggers SIGBUS. This showed up in
practice when OpenJPH was used from OpenEXR (Huffman-table /
compressed data flush paths) on armv6 and armv7 machines.

Root cause:

The failure is not in `coded_lists` itself but in where the payload
region begins relative to the slab. Padding only inside `coded_lists` or
rounding the *user* allocation size is insufficient if the *first* byte
of the payload region is still at a bad offset from the malloc base.

Solution
--------
1. stores_list: Introduce offset16() = round_up(sizeof(stores_list), 16).
   Construct with:
     data = orig_data = (ui8*)this + offset16()
   so the first byte available for `coded_lists` is 16-byte aligned from
   the start of the slab. eval_store_bytes() adds this offset so malloc
   allocates enough space for header + padding + payload.
2. get_buffer: Round (needed_bytes + sizeof(coded_lists)) up to a multiple
   of 16 so successive `coded_lists` regions in the same store keep
   consistent alignment as `data` advances.
Together, `coded_lists` headers and their `buf` pointers stay suitably
aligned for memcpy/fwrite on 32-bit ARM while preserving existing
allocator behavior on other platforms.

Analysis and solution made with the help of Cursor / Claude Opus 4.5

Signed-off-by: Cary Phillips <cary@ilm.com>
@cary-ilm cary-ilm mentioned this pull request Mar 19, 2026
Changed function name from offset16() to the more descriptive name stores_list_size16().
@aous72
Copy link
Owner

aous72 commented Mar 19, 2026

Thank you Cary for this very helpful commit.

@aous72 aous72 merged commit 0ec2184 into aous72:master Mar 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple tests fail on armv6 and armv7 with Bus error

2 participants