Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imxvpudec_h264 incompatible with glupload? #316

Closed
Talkless opened this issue Jun 8, 2023 · 29 comments
Closed

imxvpudec_h264 incompatible with glupload? #316

Talkless opened this issue Jun 8, 2023 · 29 comments

Comments

@Talkless
Copy link

Talkless commented Jun 8, 2023

Hi,

I'm trying to port our Qt application that display RTSP stream using qmlglsink to some imx8mm device.

So far I've managed to cross-compile GStreamer 1.22.3 with gstreamer-imx 2.0.0, libimxdmabuffer 1.0.1, libimxvpuapi2/2.1.2" (can't use more recent libimx* due to Freescale/libimxdmabuffer#7), but either I get caps not accepted:

0:00:08.390234700 10262 0xffff58000f40 WARN                GST_CAPS gstpad.c:5787:pre_eventfunc_check:<queue1:sink> caps video/x-h264, stream-format=(string)byte-stream, alignment=(string)au, width=(int)704, height=(int)576, framerate=(fraction)0/1, coded-picture-structure=(string)frame, chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8, colorimetry=(string)1:3:5:1, parsed=(boolean)true, profile=(string)high, level=(string)3 not accepted

for pipeline:

rtspsrc location=rtsp://... protocols=udp latency=100 ! queue name=q1 ! rtph264depay ! h264parse ! queue ! imxvpudec_h264 ! glupload ! glcolorconvert ! qmlglsink

Or I do get video displayed, but just barely 4fps if I make imx + glupload link using imxg2dvideotransform like this:

rtspsrc location=... protocols=udp latency=100 ! queue name=q1 ! rtph264depay ! h264parse ! queue ! imxvpudec_h264 ! imxg2dvideotransform ! queue ! glupload ! glcolorconvert ! qmlglsink

Does this experiment proves that imxvpudec_h264 incompatible with glpupload? Have anyone though how it could be made possible to use glupload with some minimal overhead?

@Talkless Talkless changed the title imxvpudec_h264 incompatible with glpupload? imxvpudec_h264 incompatible with glupload? Jun 8, 2023
@Talkless
Copy link
Author

Strangely, fps DOES manage to raise up to target 25fps, but only after 10 or so seconds!? FPS is not stable though, and CPU usages is ~70% instead of ~30% using vpudec with older GStreamer 1.18 available in the system/toolchain. Sometimes it keeps stuck at 9 or 14fps..

Any ideas why FPS keeps 6-10 fps for some seconds, and then raises?

All I see is:

0:00:23.252891573  1602 0xaaaae554d700 WARN            videodecoder gstvideodecoder.c:3668:gst_video_decoder_clip_and_push_buf:<imxvpudech264-1> Dropping frame due to QoS. start:0:00:15.028354956 deadline:0:00:15.028354956 earliest_time:0:00:15.047418051
0:00:23.336191154  1602 0xaaaae554d700 WARN            videodecoder gstvideodecoder.c:3668:gst_video_decoder_clip_and_push_buf:<imxvpudech264-1> Dropping frame due to QoS. start:0:00:15.108372174 deadline:0:00:15.108372174 earliest_time:0:00:15.134527086

@Talkless
Copy link
Author

@dv1 Do you believe it would be "doable" to make imxvpudec_h264 to work with glupload without expensive CPU copies? Could you price that work if company where I work would suggest to sponsor that work?

@Talkless
Copy link
Author

Talkless commented Jun 20, 2023

I've discovered viv-fb option for gstreamer-plugins-base!

See: https://github.com/GStreamer/gst-plugins-base/blob/ce937bcb21412d7b3539a2da0509cc96260562f8/gst-libs/gst/gl/meson.build#L277

After enabling viv-fb while cross-building gstreamer-plugins-base I no longer need imxg2dvideotransform. But performance is still poor - high CPU usages and ~10fs cap for the first ~10 seconds is still there.

@Talkless
Copy link
Author

Settings qmlglsink sync=0 helped a bit, now it shows 25fps from the start.

CPU usages is still near 70%, but maybe that's unavoidable overhead for using qmlglsink?

The only issue now is that there's frame stuttering every ~2s.

Currently pipeline looks line this:

rtspsrc location=rtsp://... protocols=udp latency=100 buffer-mode=slave ! queue max-size-buffers=0 ! rtph264depay ! queue max-size-buffers=0 ! h264parse ! queue max-size-buffers=0 ! imxvpudec_h264 ! queue max-size-buffers=0  ! glupload ! glcolorconvert ! qmlglsink sync=0

@Talkless
Copy link
Author

I had to increace rtspsrc latency to 200ms (can do 100 on PC with VA-API hw decoding). Now I get stable 25fps from the start of the stream. Only "issue" is higher cpu usage compared to launching in terminal gst-lanch-1.0 with glimagesink, but I guess it's just qmlglsink overhead.

So to wrap:

  1. Build gst-plugins-base with viv-fb enabled.
  2. Fiddle with pipeline (play with sync=0/1, latencies, queues, etc.) to find "best" solution.

My current pipeline is:

rtspsrc location=rtsp://... protocols=udp latency=200 buffer-mode=slave ! queue max-size-buffers=0 ! rtph264depay ! queue max-size-buffers=0 ! h264parse ! queue max-size-buffers=0 ! imxvpudec_h264 ! queue max-size-buffers=0 ! glupload ! qmlglsink name=qmlglsink sync=1

@dv1
Copy link
Collaborator

dv1 commented Jun 21, 2023

Sorry for the silence. I am unfortunately still kept busy by other topics. viv-fb is indeed necessary, although the direct dmabuf uploader should work too. I hope I can look at this deeper, since the reported issues are still odd. qmlglsink does have overhead though, that is true. (To be more specific, it is Qt overhead.)

@Talkless
Copy link
Author

(To be more specific, it is Qt overhead.)

Yeah, I've noticed that even without video stream playing, my QML application consumes ~70% CPU JUST BY MOVING MOUSE AROUND :| . It's in EGLFS mode. Maybe I have to use Vivante EGL platform plugin from Qt for better performance, I believe some default EGL is used.

@dv1
Copy link
Collaborator

dv1 commented Jun 22, 2023

Reopening since I will investigate further if there are ways to improve performance.

@Talkless What do you use as build environment? Yocto? If so, what version?

@dv1 dv1 reopened this Jun 22, 2023
@Talkless
Copy link
Author

@dv1 I was provided toolchain called fsl-imx-wayland-glibc-x86_64-core-image-base-cortexa53-crypto-imx8mmevk-toolchain-5.10-hardknott, so I guess it's Yocto Hardknott-based toolchain. Kernel is 5.10.72-lts-5.10.y+g2a23c0cdbb9b.

@dv1
Copy link
Collaborator

dv1 commented Jun 25, 2023

@Talkless I pushed a commit that changes the way G2D is used. On the imx8m plus, serious reduction in imxg2dvideotransform CPU usage can be seen. Can you check how the CPU usage is on your end now? I am not sure if you were using this element in your code.

Also, I just ran a test on an imx8m mini EVK here, and CPU usage is much lower than 70%. However, this is Yocto Kirkstone, with upstream GStreamer (that is, not the NXP fork), and the latest versions of gstreamer-imx and libimxvpuapi. In the logs (by setting GST_DEBUG to *gl*upload*:9) I can see that the GL uploader is using the DirectDmabuf method, which avoids CPU based frame copies. I recommend running your test with that log level set and checking out what upload method is being used.

@Talkless
Copy link
Author

Talkless commented Jun 26, 2023

  • I was using imxg2dvideotransform at first, but removed it after I've built GStreamer with viv-fb support enabled, as it seemed was no longer needed after doing that.
  • I too use not-NXP-GStreamer, but upstream cross-compiled manually with latest gstreamer-imx and latest libimx* dependencies.

I've tried to build gstreamre-imx master, but it fails for me:

In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,                                                                                                                                                         
                 from ../src/sys/v4l2video/gstimxv4l2object.c:28:                                                                                                                                                                                                                        
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:21:8: error: redefinition of ‘struct itimerspec’                                                                                                                                       
   21 | struct itimerspec {                                                                                                                                                                                                                                                              
      |        ^~~~~~~~~~                                                                                                                                                                                                                                                                
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/time.h:48,                                                                                                                                                                    
                 from ../src/sys/v4l2video/gstimxv4l2object.c:25:                                                                                                                                                                                                                        
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_itimerspec.h:8:8: note: originally defined here                                                                                                                                   
    8 | struct itimerspec                                                                                                                                                                                                                                                                
      |        ^~~~~~~~~~ 

@dv1
Copy link
Collaborator

dv1 commented Jun 27, 2023

@Talkless In sys/v4l2video/gstimxv4l2object.c , line 25, try replacing #include <time.h> with #include <sys/time.h>. Then tell me please if it fixes or doesn't fix the issue for you.

@Talkless
Copy link
Author

Also fixed this error in: gstimxv4l2videoformat.c:22:

In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
                 from ../src/sys/v4l2video/gstimxv4l2videoformat.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:21:8: error: redefinition of ‘struct itimerspec’
   21 | struct itimerspec {
      |        ^~~~~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/time.h:48,
                 from ../src/sys/v4l2video/gstimxv4l2videoformat.c:21:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_itimerspec.h:8:8: note: originally defined here
    8 | struct itimerspec
      |        ^~~~~~~~~~

But this fix makes it even worse error after "fixing" gstimxv4l2amphiondec.c:

In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
                 from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:16:8: error: redefinition of ‘struct timeval’
   16 | struct timeval {
      |        ^~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/sys/time.h:25,
                 from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:21:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_timeval.h:8:8: note: originally defined here
    8 | struct timeval
      |        ^~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
                 from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:21:8: error: redefinition of ‘struct itimerspec’
   21 | struct itimerspec {
      |        ^~~~~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/time.h:48,
                 from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:1,
                 from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_itimerspec.h:8:8: note: originally defined here
    8 | struct itimerspec
      |        ^~~~~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
                 from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:26:8: error: redefinition of ‘struct itimerval’
   26 | struct itimerval {
      |        ^~~~~~~~~
In file included from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:21:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/sys/time.h:105:8: note: originally defined here
  105 | struct itimerval
      |        ^~~~~~~~~

@dv1
Copy link
Collaborator

dv1 commented Jun 29, 2023

Attach /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h, /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h, and /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/sys/time.h please. Also, double check that the includes are correct - if you see that error in gstimxv4l2amphiondec.c, then gstimxv4l2videoformat.c should not build, since the #include directives in both of these files start off identically.

@Talkless
Copy link
Author

Talkless commented Jun 29, 2023

Now that you have mentioned vidoedev2.h, I remember I had to patch toolchain so that ffmpeg (or was it something from gstreamer) would build:

readonly VIDEODEV_FILE="${SYSROOT}/usr/include/linux/videodev2.h"
sed -i -e "s|<sys/time.h>|<linux/time.h>|g" "${VIDEODEV_FILE}"

videodev2.h: https://paste.debian.net/1284436/
linux/time.h: https://paste.debian.net/1284437/
sys/time.h: https://paste.debian.net/1284438/

@dv1
Copy link
Collaborator

dv1 commented Jun 29, 2023

Hm then it is a toolchain bug. Patch that, then retry. I will switch the includes to sys/time.h regardless though, to match videodev2.h.

@Talkless
Copy link
Author

@dv1 videodev2.h is already patched, or do you mean I need to patch something more?

@dv1
Copy link
Collaborator

dv1 commented Jun 29, 2023

@Talkless Wait - when you had the build errors, did you try this with the patched or unpatched videodev2.h?

@Talkless
Copy link
Author

It was already patched. I needed this patch for some other package quite some time ago.

@dv1
Copy link
Collaborator

dv1 commented Jun 29, 2023

This patch seems wrong. linux/time.h is not supposed to be there. Read through this kernel mailing list thread for details.

@Talkless
Copy link
Author

Talkless commented Jun 29, 2023

I've reverted videodev2.h "fix", now master builds, but 2.1.0 does not:

In file included from ../src/sys/v4l2video/gstimxv4l2object.c:26:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:2357:20: error: field ‘timestamp’ has incomplete type
 2357 |  struct timespec   timestamp;
      |                    ^~~~~~~~~
[39/52] Compiling C object sys/v4l2video/libgstimxv4l2video.so.p/gstimxv4l2videoformat.c.o

I believe timespec was issue with some other software too that I had to "fix" videodev2.h.

Master shows this:

glupload gstglupload.c:2594:gst_gl_upload_perform_with_buffer:<glupload1> uploader DirectDmabuf returned 1, buffer: 0xffff2c05db40

If I patch videodev2.h again, to make 2.1.0 build, I see the same:

glupload gstglupload.c:2594:gst_gl_upload_perform_with_buffer:<glupload1> uploader DirectDmabuf returned 1, buffer: 0xffff380777e0

@Talkless
Copy link
Author

I get same performance with 2.1.0 and master. I believe CPU usage is just some Qt overhead, as I discovered that simply moving mouse around in application without video playback I get ~70% of CPU usage... And extra element imxg2dvideotransform is not needed if I build gstreamer-plugins-base with viv_fb support enabled.

Maybe I could use one of the gstreamer profiling/tracing utilities to measure actual cost of elements?

@Talkless
Copy link
Author

Thread with imxvpudec consumes ~21% CPU, maybe that number you where referring to?

Thread 0xaaaad670ac00 Statistics:
  Time: 0:00:01.653421500
  Avg CPU load: 21.2 %
  Pad Statistics:
    > queue4.src                    : buffers    4609 (live     0,dec     0,dis     1,res     0,cor     0,mar     0,hdr     0,gap     0,drop     0,dlt     0), size (min/avg/max)      17/   1083/   1452, time 0:00:09.785557548, bytes/sec 510093.265051
    > rtph264depay1.src             : buffers     248 (live     0,dec     0,dis     1,res     0,cor     0,mar   248,hdr     0,gap     0,drop     0,dlt   238), size (min/avg/max)   15306/  25808/ 143436, time 0:00:09.656997050, bytes/sec 662771.663578
    > queue5.src                    : buffers     248 (live     0,dec     0,dis     1,res     0,cor     0,mar   248,hdr     0,gap     0,drop     0,dlt   238), size (min/avg/max)   15306/  25808/ 143436, time 0:00:09.656350571, bytes/sec 662816.035203
    > h264parse1.src                : buffers     248 (live     0,dec     0,dis     1,res     0,cor     0,mar   248,hdr    10,gap     0,drop     0,dlt   238), size (min/avg/max)   15312/  25814/ 143439, time 0:00:09.642270148, bytes/sec 663938.253309
    > queue6.src                    : buffers     248 (live     0,dec     0,dis     1,res     0,cor     0,mar   248,hdr    10,gap     0,drop     0,dlt   238), size (min/avg/max)   15312/  25814/ 143439, time 0:00:09.632929197, bytes/sec 664582.067311
    > imxvpudech264-1.src           : buffers     177 (live     0,dec     0,dis    28,res     0,cor     0,mar     0,hdr     0,gap     0,drop     0,dlt     0), size (min/avg/max) ......./3655712/......., time 0:00:09.500220582, bytes/sec 68110105.277553
    > queue7.src                    : buffers     177 (live     0,dec     0,dis    28,res     0,cor     0,mar     0,hdr     0,gap     0,drop     0,dlt     0), size (min/avg/max) ......./3655712/......., time 0:00:09.499402858, bytes/sec 68115968.305847
    > gluploadelement1.src          : buffers     154 (live     0,dec     0,dis    28,res     0,cor     0,mar     0,hdr     0,gap     0,drop     0,dlt     0), size (min/avg/max) ......./8294400/......., time 0:00:09.497045559, bytes/sec 134498417.646266

So again, I guess we can close this bug...

Bigger issue is how to reduce decoding latency, because I get total ~400ms delay from real world: https://community.nxp.com/t5/i-MX-Processors/How-to-achieve-lowest-latency-while-decoding-RTSP-h264-stream/m-p/1677683#M208271

@dv1
Copy link
Collaborator

dv1 commented Jun 29, 2023

I have trouble keeping track of this because your sysroot seems to be really weird. So, I can only give vague recommendations. Your videodev2.h appears to be broken. I strongly recommend you redo this Yocto setup from scratch, ideally based on Kirkstone.

That said, it does not seem to me that this is really a GStreamer issue anymore - not if a test pipeline with glimagesink instead of qmlglsink uses far less CPU%. I'd suspect a Qt or GPU driver issue there.

@dv1
Copy link
Collaborator

dv1 commented Jun 29, 2023

Yeah, I guess, though even 21% is higher than what I saw I think (I can't check right now).

Agreed, let's close this. The issue does not seem gstreamer-imx specific, but instead originate somewhere else.

@dv1 dv1 closed this as completed Jun 29, 2023
@Talkless
Copy link
Author

I strongly recommend you redo this Yocto setup from scratch, ideally based on Kirkstone.

Toolchain is provided by some Chinese panel pc manufacturers :) .

Yes I'd say close this, brecause:

  • "imxvpudec incompatible with glpuolad" is misnomer, as I just needed to build gstreamer correctly
  • 70% CPU usages is for whole Qt application, where just mouse movelent can bump CPU usage to that level, it's not imxvpudec
  • Once new version of gstreamer-imx will be tagged, I will be able to use toolchain without workarounds.

@Talkless
Copy link
Author

@dv1 should I create issue about decoding latency? Or it's just what ARM cpus can be expected to provide?

GStreamer tracing shows:

0xaaab043ff2a0.imxvpudech264-1.src: mean=0:00:00.166048537 min=0:00:00.067169229 max=0:00:00.254554744

That's 160-250ms latency..?

On desktop with vah264dec we can get TOTAL of ~220-250ms latency, with this NXP IMX8MM machine 400ms or 300ms with sync=false but with jittering...

@dv1
Copy link
Collaborator

dv1 commented Jun 29, 2023

@Talkless Open a new issue, and provide a gst-launch-1.0 command line that reproduces the problem

@Talkless
Copy link
Author

Thanks for all your time @dv1 , it's really great to have FOSS option, to be able to build from source for latest GStreamer. System image/toolkit provided by manufacturers only have GStreamer & vpudec 1.18...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants