Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspberry 4 poor performance when play FullHd video with Wpe + Hight CPU % #127

Closed
embeddedio opened this issue Sep 22, 2019 · 26 comments
Closed

Comments

@embeddedio
Copy link

embeddedio commented Sep 22, 2019

Hi,
When i build Wpe ( following the wiki ) with FDO platform and i launch WPE (cog -P fdo "url of the Video") the video play slowly and hight CPU %.
any one have an idea, i'm noive with yocto.

------ BEGIN local.conf--------

  • MACHINE ?= "raspberrypi4-64"

  • GPU_MEM = "256"

  • EXTRA_IMAGE_FEATURES = "debug-tweaks"

  • IMAGE_FEATURES_append = " ssh-server-dropbear hwcodecs"

  • PREFERRED_PROVIDER_virtual/wpebackend = "wpebackend-fdo"

  • IMAGE_INSTALL_append = " wpewebkit cog"

  • IMAGE_INSTALL_append = " gstreamer1.0-omx gstreamer1.0-plugins-base gstreamer1.0-plugins-bad gstreamer1.0-plugins-base-meta gstreamer1.0-plugins-bad-meta gstreamer1.0-plugins-good-meta "

  • IMAGE_INSTALL_append = " gstreamer1.0-libav"

  • LICENSE_FLAGS_WHITELIST_append = " commercial"

------ END local.conf--------

+Cap 1: Yocto Build Config
https://drive.google.com/file/d/1Ot8WgMxKMznFCFI7NY-0B7VjNmdxTmpU/view?usp=sharing

+Cap2: launch Cog
https://drive.google.com/file/d/1N5Vh2JpX3CJ55W4mz43LdqS3vHNIH5pj/view?usp=sharing

+Cap3: Top
https://drive.google.com/file/d/1CQPeW1vQYbjOYoVOmXTDnaZlM4ymB5vX/view?usp=sharing

+Cap4: Screen Capture
https://drive.google.com/file/d/1u5RT6-AtPisRYy9YE91B9qpbYg3t041N/view?usp=sharing

@clopez
Copy link
Contributor

clopez commented Sep 23, 2019

Can you try playing the video directly with gstreamer?

gst-play-1.0 "url of the Video"

If the performance is as bad as with WPE, then its an issue with gstreamer (perhaps the hardware decoder plugin (OMX) is not working as expected). If the performance with gst-play-1.0 is good, then it can be an issue with WPE, let us know.

@embeddedio
Copy link
Author

i try gstreamer but i got the same performance the video (1080 dpi ) play but slowly. and the cpu is height ~50 %.

https://drive.google.com/file/d/1Lp7IPQeqJ3M37WvEi2B9CAivoNUsM9_F/view?usp=sharing

@clopez
Copy link
Contributor

clopez commented Sep 23, 2019

Ok, then its an issue with the gstreamer drivers. I don't know if the gstreamer component that accelerates hardware decoding on the RPi (in theory is gstreamer-v4l2) already supports wayland and the open source stack and its working as expected. Maybe you are also missing some kernel driver.

The last time I checked this (like 2 years ago) this stuff didn't worked at all with the open source stack. You had to use the vc4/dispmanx userland driver stack to get HW acceleration video decoding and use the gstreamer plugin for OMX.

If you can't get this working properly with the current stack, then I suggest you to try a new build using the vc4/dispmanx userland driver stack (it would be option 1 at https://github.com/Igalia/meta-webkit/wiki/RPi).

Another thing to take into account is 64 bit support. I don't think vc4/dispmanx userland driver stack supports 64-bit. You will have to select raspberrypi4 (without -64 suffix) for your machine.

@nemosupremo
Copy link

I'm digging into this issue, and I'm trying to understand if I will hit the same issue -

AFAIK, gst-play-1.0 is not hardware accelerated on the Pi 4. However when using gstreamer:

$ gst-launch-1.0 -v filesrc location=foo ! qtdemux ! h264parse ! v4l2h264dec capture-io-mode=4 ! autovideosink

the content plays with low CPU usage (although playback isn't as smooth as cvlc or omxplayer. Is this still an issue with gstreamer?

@clopez
Copy link
Contributor

clopez commented Oct 10, 2019

So v4l2h264dec is the gstreamer component that is hardware accelerated on the RPi (with the open source stack, right?).

In theory WPE should also use the v4l2 gstreamer component for hardware-accelerated playback without issue (AFAIK) ...

One question ... is the capture-io-mode=4 parameter required to make the playback hardware accelerated?

@philn any idea of what could be wrong here? Can it be related with this capture-io-mode=4 parameter?

@philn
Copy link
Member

philn commented Oct 11, 2019

this property means the decoder outputs DMABufs. I don't know how well this is supposed to work currently on that platform. I've created this wiki page, I would specially need logs and .dot dumps of the WebKit pipeline... https://github.com/Igalia/meta-webkit/wiki/Providing-useful-GStreamer-Zero-copy-issue-reports

@nemosupremo
Copy link

nemosupremo commented Oct 19, 2019

This is on Raspbian's Debian Buster

Hardware

Raspberry Pi 4

GPU Driver

$ /opt/vc/bin/vcgencmd version
Oct 11 2019 18:37:29 
Copyright (c) 2012 Broadcom
version d4a63855399ed5c2e6cd94e6dbcd850d881c856d (tainted) (release) (start)

Mesa Version: 1.4 (eglinfo

gstreamer

  • gstreamer1.0-plugins-base-apps
  • gstreamer1.0-plugins-bad
  • gstreamer1.0-libav
  • gstreamer1.0-plugins-good
  • gstreamer1.0-alsa

Command

$ GST_DEBUG_FILE="3,webkit*:6" GST_DEBUG_FILE=/tmp/gst.log GST_DEBUG_DUMP_DOT_DIR=/tmp cog -P fdo http://192.168.1.23:5000/

I'm also seeing issues where a video will play fine in gst-play but refuse to start in cog.
This endpoint just plays a 1080p video. The video never starts for resolutions greater than 480p (these videos will play in gst-play-1.0.

gst.log

(empty)

0.00.00.468597582-media-player-0.NULL_READY.dot

digraph pipeline {
  rankdir=LR;
  fontname="sans";
  fontsize="10";
  labelloc=t;
  nodesep=.1;
  ranksep=.2;
  label="<GstPlayBin>\nmedia-player-0\n[-] -> [=]\ncurrent-uri=\"http://192.168.1.23:5000/DaBaby2.mp4\"\nsource=(WebKitWebSrc) source\nflags=video+audio+text+soft-volume+download+deinterlace+soft-colorbalance\naudio-sink=(GstBin) audio-sink\nvideo-sink=(GstBin) bin0\ntext-sink=(WebKitTextSink) webkittextsink0\ntext-stream-combiner=(WebKitTextCombiner) webkittextcombiner0\nmute=TRUE\naudio-filter=(GstScaletempo) scaletempo0";
  node [style="filled,rounded", shape=box, fontsize="9", fontname="sans", margin="0.0,0.0"];
  edge [labelfontsize="6", fontsize="9", fontname="monospace"];
  
  legend [
    pos="0,0!",
    margin="0.05,0.05",
    style="filled",
    label="Legend\lElement-States: [~] void-pending, [0] null, [-] ready, [=] paused, [>] playing\lPad-Activation: [-] none, [>] push, [<] pull\lPad-Flags: [b]locked, [f]lushing, [b]locking, [E]OS; upper-case is set\lPad-Task: [T] has started task, [t] has paused task\l",
  ];
  subgraph cluster_uridecodebin0_0x7e6030 {
    fontname="Bitstream Vera Sans";
    fontsize="8";
    style="filled,rounded";
    color=black;
    label="GstURIDecodeBin\nuridecodebin0\n[-] -> [=]\nparent=(GstPlayBin) media-player-0\nuri=\"http://192.168.1.23:5000/DaBaby2.mp4\"\nsource=(WebKitWebSrc) source\ncaps=video/x-raw(ANY); audio/x-raw(ANY); text/x-raw(ANY); subpicture/x-dvd; subpictur…\ndownload=TRUE";
    fillcolor="#ffffff";
    subgraph cluster_typefindelement0_0x7bc708 {
      fontname="Bitstream Vera Sans";
      fontsize="8";
      style="filled,rounded";
      color=black;
      label="GstTypeFindElement\ntypefindelement0\n[=]\nparent=(GstURIDecodeBin) uridecodebin0";
      subgraph cluster_typefindelement0_0x7bc708_sink {
        label="";
        style="invis";
        typefindelement0_0x7bc708_sink_0x7bc078 [color=black, fillcolor="#aaaaff", label="sink\n[>][bfb]", height="0.2", style="filled,solid"];
      }

      subgraph cluster_typefindelement0_0x7bc708_src {
        label="";
        style="invis";
        typefindelement0_0x7bc708_src_0x7bc888 [color=black, fillcolor="#ffaaaa", label="src\n[>][bfb]", height="0.2", style="filled,solid"];
      }

      typefindelement0_0x7bc708_sink_0x7bc078 -> typefindelement0_0x7bc708_src_0x7bc888 [style="invis"];
      fillcolor="#aaffaa";
    }

    subgraph cluster_source_0x7e8300 {
      fontname="Bitstream Vera Sans";
      fontsize="8";
      style="filled,rounded";
      color=black;
      label="WebKitWebSrc\nsource\n[=]\nparent=(GstURIDecodeBin) uridecodebin0\nlocation=\"http://192.168.1.23:5000/DaBaby2.mp4\"\nresolved-location=\"http://192.168.1.23:5000/DaBaby2.mp4\"";
      subgraph cluster_source_0x7e8300_src {
        label="";
        style="invis";
        source_0x7e8300_src_0x7bc5d8 [color=black, fillcolor="#ffaaaa", label="src\n[>][bfb][T]", height="0.2", style="filled,solid"];
      }

      fillcolor="#ffaaaa";
    }

    source_0x7e8300_src_0x7bc5d8 -> typefindelement0_0x7bc708_sink_0x7bc078 [label="ANY"]
  }

  subgraph cluster_playsink_0x788038 {
    fontname="Bitstream Vera Sans";
    fontsize="8";
    style="filled,rounded";
    color=black;
    label="GstPlaySink\nplaysink\n[-] -> [=]\nparent=(GstPlayBin) media-player-0\nflags=video+audio+text+soft-volume+download+deinterlace+soft-colorbalance\nmute=TRUE\nvideo-sink=(GstBin) bin0\naudio-sink=(GstBin) audio-sink\ntext-sink=(WebKitTextSink) webkittextsink0\nsend-event-mode=first\naudio-filter=(GstScaletempo) scaletempo0";
    fillcolor="#ffffff";
    subgraph cluster_streamsynchronizer0_0x78a038 {
      fontname="Bitstream Vera Sans";
      fontsize="8";
      style="filled,rounded";
      color=black;
      label="GstStreamSynchronizer\nstreamsynchronizer0\n[=]\nparent=(GstPlaySink) playsink";
      fillcolor="#ffffff";
    }

  }

}

More

I tried to replicate the pipeline with

gst-launch-1.0 -v filesrc location=foo ! decodebin ! waylandsink

And this plays a lot more smoothly

@philn
Copy link
Member

philn commented Oct 19, 2019

There's a typo in the wiki page, add this env var and the log should no longer be empty: GST_DEBUG="3,webkit*:6"

@philn
Copy link
Member

philn commented Oct 19, 2019

There's only one .dot file? Please attach all here (no copy/pasta).

@nemosupremo
Copy link

I'll update again, but there was only one dep file. I'll also try again with another video.

@nemosupremo
Copy link

nemosupremo commented Oct 19, 2019

Ok; I'm at a bit of a loss for words. So when I don't include the GST_DEBUG="3,webkit*:6" the video doesn't play. When I do include that env var, the video plays, and generates multiple .dot files (and with decent performance, although CPU usage is still a lot higher than with gst-launch).

Here are the logs and the dot file:
issue-127.tar.gz

Seeing it work, I think the issue may be unrelated to HW acceleration and just WPEWebProcess being inefficient? The video I'm working with is an 1080p h264 video with a 23.9fps (downloaded from youtube)

  • omxplayer has the best performance
  • gst-launch-1.0 -v filesrc location=~/DaBaby2.mp4 ! decodebin ! fpsdisplaysink text-overlay=false video-sink="waylandsink" fps-update-interval=1000 is good with ~23.3 fps average
  • cog is ok now too, aside from the video not playing at all if GST_DEBUG="3,webkit*:6" is not set with CPU usage similar to gst-launch

Edit: GST_DEBUG="3,webkit*:6" doesn't always cause the video to play so it's still nondeterministic.

Edit 2: Quickly browsing the logs, webkit does seem to be using gstv4l2videodec; I'm now having a seperate issue where the video doesn't reliably play.

@philn
Copy link
Member

philn commented Oct 20, 2019

Instead of waylandsink please try with glimagesink as mentioned in the wiki page.

@philn
Copy link
Member

philn commented Oct 20, 2019

The color conversion in WebKit's pipeline might be a bottleneck though... Recently we've been working on adding YUV support in our video sink, which might help. But it won't be shipped until the 2.28 release.

@nemosupremo
Copy link

The glimagesink is a bit worse - averaging one less fps at around ~22. gst-launch-1.0 uses about the same CPU usage, but weston uses less cpu (I assume because the gl-image-sink renders into a tiny square).

In any case I'm not sure if anything can be done about the video playback, AFAIK the cpu usage is mostly from WPEWebProcess (I'm not sure if thats related).

@nemosupremo
Copy link

I think this is a gstreamer issue on Pi 4. I tried a 60fps video like OP, and gstreamer crawls, while the video plays fine in OMX.

https://github.com/bower-media-samples/big-buck-bunny-1080p-30s/blob/master/video.mp4

$ GST_DEBUG="3" gst-launch-1.0 -v filesrc location=video.mp4 !  ! h264parse ! v4l2h264dec capture-io-mode=4  ! fpsdisplaysink  text-overlay=true video-sink='fakevideosink'

v4l2h264dec is a hardware decoder (it uses less cpu libav's avdec_h264) but it uses twice as much cpu as omxplayer. Might be a driver issue with the v4l implementation.

@clopez
Copy link
Contributor

clopez commented Oct 23, 2019

I think this is a gstreamer issue on Pi 4. I tried a 60fps video like OP, and gstreamer crawls, while the video plays fine in OMX.

https://github.com/bower-media-samples/big-buck-bunny-1080p-30s/blob/master/video.mp4

$ GST_DEBUG="3" gst-launch-1.0 -v filesrc location=video.mp4 !  ! h264parse ! v4l2h264dec capture-io-mode=4  ! fpsdisplaysink  text-overlay=true video-sink='fakevideosink'

v4l2h264dec is a hardware decoder (it uses less cpu libav's avdec_h264) but it uses twice as much cpu as omxplayer. Might be a driver issue with the v4l implementation.

I'm a bit confused when you say that the video play fine in OMX. Do you mean when running the omxplayer from the same image/sdcard where you are running WPE (built with yocto) ... or are you using another image/sdcard for testing with omxplayer?

@nemosupremo
Copy link

nemosupremo commented Oct 23, 2019

I'm a bit confused when you say that the video play fine in OMX. Do you mean when running the omxplayer from the same image/sdcard where you are running WPE (built with yocto) ... or are you using another image/sdcard for testing with omxplayer?

Running omxplayer from Raspbian plays fine. Using gstreamer (which webkit uses internally) to play the same video is slow. I believe the issue lies in the v4l drivers on p4 but I'm not sure. In general it looks like v4l has poorer performance than openmax/mmal.

@clopez
Copy link
Contributor

clopez commented Oct 23, 2019

I'm a bit confused when you say that the video play fine in OMX. Do you mean when running the omxplayer from the same image/sdcard where you are running WPE (built with yocto) ... or are you using another image/sdcard for testing with omxplayer?

Running omxplayer from Raspbian plays fine. Using gstreamer (which webkit uses internally) to play the same video is slow. I believe the issue lies in the v4l drivers on p4 but I'm not sure. In general it looks like v4l has poorer performance than openmax/mmal.

Ok. Then you are comparing apples to oranges.

For the RaspberryPi there are two stacks available:

  • the open source one: weston, mesa OpenGL libraries, wpebackend-fdo and gstreamer v4l2h264dec plugin for HW accelerated video decoding
  • the propietary one: usersland/vc4 OpenGL libraries, wpebackend-rdk and gstreamer-omx plugin for HW accelerated video decoding.

The proprietary one is the one used by raspbian.
When building an image with Yocto and meta-webkit you can choose which stack you want to use.

And it seems you have chosen here to build an image with Yocto/meta-webkit using the open source one (since you mentioned that you run Weston and v4l2h264dec plugin).
When you build an image with Yocto/meta-webkit with the propietary stack you don't have Weston. WPE runs directly over the framebuffer (over the "text console", without a window manager involved)

I don't know in the RPi4, but at least on the RPi3 the last time I tried it performed much better the propietary one. So I suggest you to try to build an image with Yocto and meta-webkit but using the propietary stack.

How can you build that image? It is documented here. It would be option 1). With this stack you not longer have Weston, but you run WPE directly over the framebuffer. You also use wpebackend-rdk instead of wpebackend-fdo . To enable mouse support you have to export some environment variables before executing cog (documented there as well). And you use gstreamer1.0-omx for HW accelerated video playback

@nemosupremo
Copy link

nemosupremo commented Oct 23, 2019

  1. I didn't build a Yocto image. I'm running Raspbian. I built both stacks manually to use in Raspbian.
  2. The proprietary stack does not work on the Pi4. It works on the Pi3 and performs much better than the OSS stack - but I'm targeting the Pi4 (or at least I couldn't get it working, I think because the Pi4 doesn't have a working vc4 driver). Regardless, using the proprietary stack runs much better despite, AFAIK, both being hw accelerated on pi4.
  3. To build the stacks I cross compiled the sources (https://wpewebkit.org/) and copied them to the Pi.

@clopez
Copy link
Contributor

clopez commented Jan 29, 2020

BTW, recently an issue with video playback on the RPi3 using the open source stack (Mesa VC4/wayland) was troubleshooted in #140
It would be great if you can check if the suggested fix works also works for RPi4

@clopez
Copy link
Contributor

clopez commented Jan 29, 2020

It would be great if you can check if the suggested fix works also works for RPi4

Forget it, sorry for the noise. I think the issue its not related. #140 is about video not playing, which is different than this one (playing; but not accelerated)

@embeddedio
Copy link
Author

firstly i want to explain that i'm a beginer in yocto and in compiling linux system.

so after few month i try to re-Build an image with Mesa VC4 driver, weston and wpebackend-fdo for Raspberry pi 4 with Zeus Branch now the cpu is arround 11% but i have a issue with HD and Full-HD video i have a red layer on video with Gstreamer or With Webkit ( COG ). also a video with FullHd (1080 p) play a bit slowly.

##local.conf
MACHINE = "raspberrypi4-64"
MACHINE_FEATURES_append = " vc4graphics"
GPU_MEM = "256"
EXTRA_IMAGE_FEATURES = "debug-tweaks"
IMAGE_FEATURES_append = " ssh-server-dropbear hwcodecs"
PREFERRED_PROVIDER_virtual/wpebackend = "wpebackend-fdo"
IMAGE_INSTALL_append = " wpewebkit cog"

Capture du 2020-05-04 14-20-26

Capture du 2020-05-04 14-19-57

Webp net-resizeimage

@kraj
Copy link
Contributor

kraj commented May 8, 2020

Add gstreamer1.0-libav to IMAGE_INSTALL_append and see if that helps

@philn
Copy link
Member

philn commented May 8, 2020

Add gstreamer1.0-libav to IMAGE_INSTALL_append and see if that helps

I'm not sure how that would help, it doesn't provide any hardware decoder.

@embeddedio
Copy link
Author

yes, i added gstreamer1.0-libav, but nothing is change.
i dont know where the issue exactly because there two probleme:

  • the video play with the red layer
  • the video play a bit slowly when h'is 1080p

@embeddedio
Copy link
Author

close issue because new version work perfectly for video playing but open a new issue #185 for the red label issue on video.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants