Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignement fault for graphene_vec3_dot with neon #215

Open
kwizart opened this issue Mar 9, 2021 · 23 comments
Open

Alignement fault for graphene_vec3_dot with neon #215

kwizart opened this issue Mar 9, 2021 · 23 comments

Comments

@kwizart
Copy link

kwizart commented Mar 9, 2021

Experienced behavior

Using graphene at runtime has alignment fault on armv7hl with neon.
Tried using Fedora 33/34 workstation on armhfp with a jetson-tk1.
Build is passing with/without neon enabled in theses distros. Also tests seems to pass in all cases...
(so it looks different than #97 that is related to build issues with others functions).

(*): Shared library is missing debugging information.
#0  0xb590e55c in graphene_vec3_dot () from /lib/libgraphene-1.0.so.0
#1  0xb590c320 in graphene_plane_distance () from /lib/libgraphene-1.0.so.0
#2  0xb590c540 in graphene_frustum_intersects_box () from /lib/libgraphene-1.0.so.0
#3  0xb605fafc in clutter_actor_paint () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#4  0xb605fe14 in clutter_actor_real_paint.lto_priv () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#5  0xb5eaa744 in meta_stage_paint () from /lib/libmutter-8.so.0
#6  0xb60a9834 in clutter_paint_node_paint () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#7  0xb60a984c in clutter_paint_node_paint () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#8  0xb605f5cc in clutter_actor_paint () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#9  0xb60bf84c in clutter_stage_do_paint_view () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#10 0xb5eaa8e4 in meta_stage_paint_view () from /lib/libmutter-8.so.0
#11 0xb60e6d94 in paint_stage.isra () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#12 0xb60dfa04 in clutter_stage_cogl_redraw_view () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#13 0xb5eb3e08 in meta_stage_x11_redraw_view () from /lib/libmutter-8.so.0
#14 0xb60c2868 in handle_frame_clock_frame () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#15 0xb6086bc4 in frame_clock_source_dispatch () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#16 0xb69c5a7c in g_main_context_dispatch () from /lib/libglib-2.0.so.0
#17 0xb6a2078c in g_main_context_iterate.constprop () from /lib/libglib-2.0.so.0
#18 0xb69c4eb4 in g_main_loop_run () from /lib/libglib-2.0.so.0
#19 0xb5ef0ce4 in meta_run () from /lib/libmutter-8.so.0
#20 0x0049198c in main ()
(gdb) disassemble 
Dump of assembler code for function graphene_vec3_dot:
   0xb590e554 <+0>:	vld1.64	{d18-d19}, [r1 :128]
   0xb590e558 <+4>:	vld1.64	{d16-d17}, [r0 :128]
=> 0xb590e55c <+8>:	vmul.f32	q8, q8, q9
   0xb590e560 <+12>:	vpadd.f32	d18, d16, d16
   0xb590e564 <+16>:	vadd.f32	d18, d18, d17
   0xb590e568 <+20>:	vmov.32	r3, d18[0]
   0xb590e56c <+24>:	vmov	s0, r3
   0xb590e570 <+28>:	msr	SP_hyp, lr, lsl pc
End of assembler dump.

Expected behavior

Fedora workstation should work with wayland at runtime on the device.

Steps to reproduce

install a build of graphene with neon support enabled (current workaround is to disable neon support in graphene).

Operating system in use

Reproduced on Fedora 33/34 armhfp (armv7hl)

SIMD implementation in use

ARM neon

@nullr0ute
Copy link

Fedora on ARMv7 distro flags don't enable NEON by default, it should really be a runtime detected optimization as not all arm platforms are guaranteed to have NEON available.

@ebassi
Copy link
Owner

ebassi commented Mar 9, 2021

Run time detection does not work with all the inlining going on, so it's a non starter.

@kwizart
Copy link
Author

kwizart commented Mar 9, 2021

Thanks for your answer.

I'm not sure to understand.:

  • For the Fedora perspective, we cannot assume neon, so it will have to be disabled (unless using hwcap and alternatives build).
  • In the jetson-tk1 case, neon is available. So It should works, but actually fails...

@ebassi
Copy link
Owner

ebassi commented Mar 9, 2021

If NEON support is compiled in, then yes: it should work. I only test it on an RPi 3b+, though, because it's the only ARM device I have available.

The dot3_scalar operator, which is called by the dot3 operator, is implemented as:

float32x4_t __mul = vmulq_f32 (a, b);
float32x2_t __s1 = vpadd_f32 (vget_low_f32 (__m), vget_low_f32 (__m));
float32x2_t __res = vadd_f32 (__s1, vget_high_f32 (__m));
float res = vget_lane_f32 (__res, 0);

@kalev
Copy link

kalev commented Mar 9, 2021

Just a random idea: Could it be an LTO issue? @kwizart, have you tried disabling LTO for graphene and see if it helps?

@MastaG
Copy link

MastaG commented Mar 9, 2021

This bug has been playing me for a long time as well, even in Fedora 33 being unable to start gnome-shell due to this.
I'm running it on a Odroid XU4 which properly supports neon but even building with: -mfpu=neon-vfpv4 which is the one for my platform, it will segfault.

Building without neon fixes it here as well.

@kalev
Copy link

kalev commented Mar 9, 2021

https://bodhi.fedoraproject.org/updates/FEDORA-2021-f000eb2320 (F33) and https://bodhi.fedoraproject.org/updates/FEDORA-2021-cb9771bb01 (F34) disable neon for Fedora (thanks kwizart!) if you want to test and karma the updates.

@MastaG
Copy link

MastaG commented Mar 9, 2021

Thanks!
This will make gnome-shell on F33 and F34 run on armv7 based devices again !

@kwizart
Copy link
Author

kwizart commented Mar 9, 2021

Edit: Removed tests for rpi3/4 as I don't reproduce there on f33 after double checks.

@kalev, I confirm that disabling lto is without effect.

@jeremy-hiatt
Copy link

I have encountered a very similar crash, also on an ARMv7l architecture, albeit within graphene_vec3_init() instead.

Text of alignment trap:

Alignment trap: videosrc_queue: (666) PC=0x660e1a88 Instr=0xf9400aef Address=0x6997a648 FSR 0x811

From GDB:

Thread 18 "videosrc_queue:" received signal SIGBUS, Bus error.
[Switching to Thread 0x5edf3310 (LWP 666)]
0x660e1a88 in graphene_vec3_init (v=v@entry=0x6997a648, x=x@entry=0, y=y@entry=0, z=z@entry=1) at /usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/10.2.0/include/arm_neon.h:10398
10398	/usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/10.2.0/include/arm_neon.h: No such file or directory.
(gdb) bt
#0  0x660e1a88 in graphene_vec3_init (v=v@entry=0x6997a648, x=x@entry=0, y=y@entry=0, z=z@entry=1) at /usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/10.2.0/include/arm_neon.h:10398
#1  0x6611562a in gst_gl_transformation_build_mvp (transformation=0x6997a148) at ../gst-plugins-base-1.16.3/ext/gl/gstgltransformation.c:296
#2  0x6611589c in gst_gl_transformation_set_caps (filter=<optimized out>, incaps=<optimized out>, outcaps=<optimized out>) at ../gst-plugins-base-1.16.3/ext/gl/gstgltransformation.c:487
#3  0x6927e192 in gst_gl_filter_set_caps (bt=0x6997a148, incaps=0x60778590, outcaps=0x60778590) at ../gst-plugins-base-1.16.3/gst-libs/gst/gl/gstglfilter.c:773
#4  0x75737eba in ?? () from /usr/lib/libgstbase-1.0.so.0
(gdb) disas graphene_vec3_init
Dump of assembler code for function graphene_vec3_init:
   0x660e1a58 <+0>:	push	{lr}
   0x660e1a5a <+2>:	movs	r3, #0
   0x660e1a5c <+4>:	sub	sp, #28
   0x660e1a5e <+6>:	ldr	r2, [pc, #64]	; (0x660e1aa0 <graphene_vec3_init+72>)
   0x660e1a60 <+8>:	str	r3, [sp, #16]
   0x660e1a62 <+10>:	add	r3, sp, #4
   0x660e1a64 <+12>:	vstr	s0, [sp, #4]
   0x660e1a68 <+16>:	add	r2, pc
   0x660e1a6a <+18>:	vstr	s1, [sp, #8]
   0x660e1a6e <+22>:	vstr	s2, [sp, #12]
   0x660e1a72 <+26>:	vld1.32	{d16-d17}, [r3]
   0x660e1a76 <+30>:	ldr	r3, [pc, #44]	; (0x660e1aa4 <graphene_vec3_init+76>)
   0x660e1a78 <+32>:	ldr	r3, [r2, r3]
   0x660e1a7a <+34>:	ldr	r2, [pc, #44]	; (0x660e1aa8 <graphene_vec3_init+80>)
   0x660e1a7c <+36>:	ldr	r3, [r3, #0]
   0x660e1a7e <+38>:	str	r3, [sp, #20]
   0x660e1a80 <+40>:	mov.w	r3, #0
   0x660e1a84 <+44>:	ldr	r3, [pc, #28]	; (0x660e1aa4 <graphene_vec3_init+76>)
   0x660e1a86 <+46>:	add	r2, pc
=> 0x660e1a88 <+48>:	vst1.64	{d16-d17}, [r0 :128]
   0x660e1a8c <+52>:	ldr	r3, [r2, r3]
   0x660e1a8e <+54>:	ldr	r2, [r3, #0]
   0x660e1a90 <+56>:	ldr	r3, [sp, #20]
   0x660e1a92 <+58>:	eors	r2, r3
   0x660e1a94 <+60>:	bne.n	0x660e1a9c <graphene_vec3_init+68>
   0x660e1a96 <+62>:	add	sp, #28
   0x660e1a98 <+64>:	ldr.w	pc, [sp], #4
   0x660e1a9c <+68>:	blx	0x660d69a8 <__stack_chk_fail@plt>
   0x660e1aa0 <+72>:	adds	r4, #188	; 0xbc
   0x660e1aa2 <+74>:	movs	r1, r0
   0x660e1aa4 <+76>:	lsls	r4, r1, #2
   0x660e1aa6 <+78>:	movs	r0, r0
   0x660e1aa8 <+80>:	adds	r4, #158	; 0x9e
   0x660e1aaa <+82>:	movs	r1, r0
End of assembler dump.
(gdb) info registers
r0             0x6997a648          1771546184
r1             0x60778590          1618445712
r2             0x660f4f28          1712279336
r3             0x8c                140
r4             0x6997a148          1771544904
r5             0x5edf1740          1591678784
r6             0x6997a488          1771545736
r7             0x6997a508          1771545864
r8             0x6997a4c8          1771545800
r9             0x5edf1720          1591678752
r10            0x6997a648          1771546184
r11            0x5edf1780          1591678848
r12            0x6613ab20          1712565024
sp             0x5edf16f0          0x5edf16f0
lr             0x6611562b          1712412203
pc             0x660e1a88          0x660e1a88 <graphene_vec3_init+48>
cpsr           0x600b0030          1611333680
fpscr          0x28000013          671088659

If my read is correct, it looks like the code within graphene_vec3_init() is assuming 16-byte alignment for the value member within v, but as we see from the address (0x6997a648) it's actually only aligned to an 8-byte boundary. The function is invoked from here in GStreamer.

Happy to file as a separate issue instead.

@MastaG
Copy link

MastaG commented Mar 10, 2021

I can also confirm @jeremy-hiatt segfault when the fuction is called from gstreamer.
But this one also happens when building without neon.

@MastaG
Copy link

MastaG commented Mar 10, 2021

Apologies, the error @jeremy-hiatt was referring to, isn't happening when graphene has been built without neon.

@ebassi
Copy link
Owner

ebassi commented Mar 10, 2021

In 1.10.5, released on February 9, I landed a couple of changes to the alignment annotations; looking at the links on Bodhi that @kalev posted, it seems F33 and F34 still use 1.10.4. Would it be possible for people on ARM to test 1.10.5?

@kalev
Copy link

kalev commented Mar 10, 2021

@ebassi
Copy link
Owner

ebassi commented Mar 10, 2021

@kalev You're absolutely right—I got confused with the post-release version bump.

So I'm really confused as to why there are alignment issues, considering that the structures are marked for alignment as required by vectorised types.

@jeremy-hiatt
Copy link

jeremy-hiatt commented Mar 10, 2021

@ebassi One thing I'll admit I don't understand very well is how the alignment annotations are expected to interact with struct nesting. The offending graphene_vec3_t instance is declared as the camera_position member within the GstGLTransformation type here. We can see from the stack trace I posted that the base address for that structure was 0x6997a148; i.e. the camera_position was 16-byte aligned with respect to its parent (offset of 0x500), but that allocation wasn't on a 16-byte alignment, therefore pushing camera_position off alignment as well.

The part I'm really fuzzy on is whether there's supposed to be anything within GStreamer or GLib in general that's smart enough to allocate heap memory with the correct alignment for this type, and if so, why it's not working here.

@jeremy-hiatt
Copy link

jeremy-hiatt commented Mar 12, 2021

I tested with 1.10.4 (was previously using 1.10.2) but no difference. I checked the reported alignment via G_ALIGNOF(GstGLTransformation) and it looks like it's getting calcluated correctly, since it's returning a value of 16. Looking through the GLib code though, I can't find any evidence that it cares at all about this alignment value when it comes to heap allocations made via g_object_new()/g_type_create_instance().

I posted a pretty minimal example that triggers the alignment fault reliably on my board. Perhaps @ebassi you can try this on your RPi to see if it works for you?

@doraskayo
Copy link
Contributor

doraskayo commented Oct 28, 2021

There are no special alignment guarantees for heap-allocated memory via malloc(3), the compiler can only make such guarantees for stack allocations.

Structures that may be placed on the heap must be able to manage their own alignment by padding themselves and applying an offset as part of their API according to their given address.

This indeed complictes things, like when such structures are manually copied from one heap-allocated address to another where they require a different alignment. It is advised to provide a "copy" API that properly realigns the content of the structure to the destination address.

There may be other tricks to deal with this requirement, but I'm personally not familiar with them.

@Bastian-Krause
Copy link

I run into the same alignment trap like @jeremy-hiatt, tested with graphene 1.9.2 as well as 1.10.6 on an ARM Cortex A9.

I posted a pretty minimal example that triggers the alignment fault reliably on my board.

Yes, same here.

@Bastian-Krause
Copy link

After investigating the issue pointed out by @jeremy-hiatt further, I am now of the opinion that gstreamer should pass 16-byte-aligned memory to graphene. To do this, I created this gstreamer merge request gltransformation: pass 16-byte-aligned memory to graphene. Feedback welcome!

@ericwoud
Copy link

ericwoud commented Jan 20, 2022

I'm also getting alignment faults in more then one function. I'm trying to get archlinuxarm gnome running on a rk3288 box.

If you need more info then tell me what I can send.

@knuxify
Copy link

knuxify commented Apr 7, 2022

Seems like a related issue is causing problems with GTK4 (upstream issue); there we're getting SIGBUS errors on Alpine Linux on armv7 and armhf, and the exact function that crashes is graphene_vec4_init_from_vec4 (called from gsk_color_matrix_node_new) - disabling neon fixes it. Does GTK need a similar patch to what was done with GStreamer?

@ebassi
Copy link
Owner

ebassi commented Apr 7, 2022

No, we're not going to allocate pointers to graphene structures in GTK.

The main problem is alignment of GObject: https://gitlab.gnome.org/GNOME/glib/-/issues/1231

junzhuimx pushed a commit to nxp-imx/gst-plugins-base that referenced this issue Dec 6, 2022
With NEON instructions enabled, graphene expects the memory passed to it
16-byte-aligned. Otherwise unaligned memory access faults occur causing
SIGBUS signals.

graphene has alloc functions for its structures that take care of this,
so use them.

See also: ebassi/graphene#215 (comment)

Suggested-by: Sebastian Dröge <sebastian@centricular.com>
Signed-off-by: Bastian Krause <bst@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/2128>
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this issue Feb 8, 2023
This disables neon support on arm devices only because it crashes otherwise.

Upstream-status: Reported [ebassi/graphene#215]
Signed-off-by: Khem Raj <raj.khem@gmail.com>
kraj pushed a commit to YoeDistro/poky that referenced this issue Feb 13, 2023
Not all arm platforms support neon and runtime detection for this feature is
currently not reliable.
Disable neon support by default on ARM-32 platforms because of the
following upstream bug: ebassi/graphene#215

Enable neon for aarch64 by default

(From OE-Core rev: c29a1a2442a22f87913fad24f19eaa45ddb13e23)

Signed-off-by: Markus Volk <f_l_k@t-online.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
kraj pushed a commit to YoeDistro/poky that referenced this issue Feb 14, 2023
Not all arm platforms support neon and runtime detection for this feature is
currently not reliable.
Disable neon support by default on ARM-32 platforms because of the
following upstream bug: ebassi/graphene#215

Enable neon for aarch64 by default

(From OE-Core rev: 4225291418142a61d760b7d317eff47752c41328)

Signed-off-by: Markus Volk <f_l_k@t-online.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
kraj pushed a commit to YoeDistro/poky that referenced this issue Feb 14, 2023
Not all arm platforms support neon and runtime detection for this feature is
currently not reliable.
Disable neon support by default on ARM-32 platforms because of the
following upstream bug: ebassi/graphene#215

Enable neon for aarch64 by default

(From OE-Core rev: 4225291418142a61d760b7d317eff47752c41328)

Signed-off-by: Markus Volk <f_l_k@t-online.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
rpurdie pushed a commit to yoctoproject/poky that referenced this issue Feb 15, 2023
Not all arm platforms support neon and runtime detection for this feature is
currently not reliable.
Disable neon support by default on ARM-32 platforms because of the
following upstream bug: ebassi/graphene#215

Enable neon for aarch64 by default

(From OE-Core rev: 72778f6a647f47926c6ba1b77f0984999a22e44a)

Signed-off-by: Markus Volk <f_l_k@t-online.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
halstead pushed a commit to openembedded/openembedded-core that referenced this issue Feb 15, 2023
Not all arm platforms support neon and runtime detection for this feature is
currently not reliable.
Disable neon support by default on ARM-32 platforms because of the
following upstream bug: ebassi/graphene#215

Enable neon for aarch64 by default

Signed-off-by: Markus Volk <f_l_k@t-online.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
junzhuimx pushed a commit to nxp-imx/gst-plugins-base that referenced this issue Jun 20, 2023
With NEON instructions enabled, graphene expects the memory passed to it
16-byte-aligned. Otherwise unaligned memory access faults occur causing
SIGBUS signals.

graphene has alloc functions for its structures that take care of this,
so use them.

See also: ebassi/graphene#215 (comment)

Suggested-by: Sebastian Dröge <sebastian@centricular.com>
Signed-off-by: Bastian Krause <bst@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/1321>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants