Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is Signal 7 reporting an error on the armv7a platform TEST (F16_VCMUL_NEONFP16ARITH_U8, batch_lt_8) ? #6201

Open
JamesWang2007 opened this issue Mar 21, 2024 · 3 comments

Comments

@JamesWang2007
Copy link

Hello, currently in armv7a platform test case f16-vcmul-test, some test functions may crash with Signal 7 error, but I am not sure if it is a bug. Please help me.
TEST(F16_VCMUL__NEONFP16ARITH_U8, batch_lt_8)
TEST(F16_VCMUL__NEONFP16ARITH_U8, batch_gt_8)
TEST(F16_VCMUL__NEONFP16ARITH_U8, inplace_a)
TEST(F16_VCMUL__NEONFP16ARITH_U8, inplace_b)
TEST(F16_VCMUL__NEONFP16ARITH_U8, inplace_a_and_b)
TEST(F16_VCMUL__NEONFP16ARITH_U16, batch_lt_16)
TEST(F16_VCMUL__NEONFP16ARITH_U16, batch_gt_16)
TEST(F16_VCMUL__NEONFP16ARITH_U16, inplace_a)
TEST(F16_VCMUL__NEONFP16ARITH_U16, inplace_b)
TEST(F16_VCMUL__NEONFP16ARITH_U16, inplace_a_and_b)
TEST(F16_VCMUL__NEONFP16ARITH_U32, batch_lt_32)
TEST(F16_VCMUL__NEONFP16ARITH_U32, batch_gt_32)
TEST(F16_VCMUL__NEONFP16ARITH_U32, inplace_a)
TEST(F16_VCMUL__NEONFP16ARITH_U32, inplace_b)
TEST(F16_VCMUL__NEONFP16ARITH_U32, inplace_a_and_b)

Log analysis: In the source code of xnnpack, the void xnn_f16_vcmul_ukernel_neonfp16Arith_u8 function requires a 4-byte aligned address (memory address should be a multiple of 4) when converting the (uint16_t *) type address to (uint32_t *) type. Otherwise, there may be memory address misaligned access issues, leading to a crash and immediate exit with a Signal 7 error.
222

1、Related source file path:
XNNPACK/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u8.c
XNNPACK/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u16.c
XNNPACK/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u32.c

2、Declaration of failed function:
void xnn_f16_vcmul_ukernel__neonfp16arith_u8(
size_t batch,
const void* input_a,
const void* input_b,
void* output,
const union xnn_f16_default_params params[restrict XNN_MIN_ELEMENTS(1)]) XNN_OOB_READS

void xnn_f16_vcmul_ukernel__neonfp16arith_u16(
size_t batch,
const void* input_a,
const void* input_b,
void* output,
const union xnn_f16_default_params params[restrict XNN_MIN_ELEMENTS(1)]) XNN_OOB_READS

void xnn_f16_vcmul_ukernel__neonfp16arith_u32(
size_t batch,
const void* input_a,
const void* input_b,
void* output,
const union xnn_f16_default_params params[restrict XNN_MIN_ELEMENTS(1)]) XNN_OOB_READS

3、Call logic for failed functions:
// Call optimized micro-kernel.
vcmul(batch_size() * sizeof(uint16_t), a_data, b_data, y.data(), init_params != nullptr ? &params : nullptr);

// Call optimized micro-kernel.
vcmul(batch_size() * sizeof(float), a_data, b_data, y.data(), init_params != nullptr ? &params : nullptr);

@JamesWang2007
Copy link
Author

JamesWang2007 commented Mar 21, 2024

Now that I modify it to (void *) and run the test case normally, may it really be a bug ?

fix

@alankelly
Copy link
Collaborator

Hi, thanks for reporting this. Fix incoming

@fbarchard
Copy link
Contributor

fbarchard commented Mar 27, 2024

Thanks for catching the alignment issue

Was vst1_lane_u32((uint32_t*) or, vreinterpret_u32_f16(vaccr_lo), 0); or += 2;
   9d740: 04 00 10 e3    tst  r0, #4
   9d744: 03 00 00 0a    beq  0x9d758 <xnn_f16_vcmul_ukernel__neonfp16arith_u8+0xe4> @ imm = #12
   9d748: 3d 38 c3 f4    vst1.32  {d19[0]}, [r3:32]!
   9d74c: a3 34 f3 f2    vext.32  d19, d19, d19, #1
   9d750: 3d 18 cc f4    vst1.32  {d17[0]}, [r12:32]!
   9d754: a1 14 f1 f2    vext.32  d17, d17, d17, #1
   9d758: 02 00 10 e3    tst  r0, #2
   9d75c: 10 80 bd 08    popeq  {r4, pc}

Now vst1_lane_u32((void*) or, vreinterpret_u32_f16(vaccr_lo), 0); or += 2;
   9d740: 04 00 10 e3    tst  r0, #4
   9d744: 03 00 00 0a    beq  0x9d758 <xnn_f16_vcmul_ukernel__neonfp16arith_u8+0xe4> @ imm = #12
   9d748: 0d 38 c3 f4    vst1.32  {d19[0]}, [r3]!
   9d74c: a3 34 f3 f2    vext.32  d19, d19, d19, #1
   9d750: 0d 18 cc f4    vst1.32  {d17[0]}, [r12]!
   9d754: a1 14 f1 f2    vext.32  d17, d17, d17, #1
   9d758: 02 00 10 e3    tst  r0, #2
   9d75c: 10 80 bd 08    popeq  {r4, pc}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants