New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for AMD Hardware encoding #697

Closed
Xaymar opened this Issue Apr 22, 2017 · 75 comments

Comments

@Xaymar

Xaymar commented Apr 22, 2017

Hi I'm the developer of theAMD Encoder Plugin for OBS Studio here. Since there is always a comment about Handbrake AMF support anywhere the plugin gets mentioned, I wanted to know if there were plans on adding it or if someone could point me at the correct point to start adding support for it. I found a PR for it but it was based on the Media SDK as far as I can tell and rejected,

I have a few questions:

  • Do I implement AMF support in Handbrake or should that be in a library itself?
  • If former: Where should the implementation be? Should it be optional (not required)?
  • If latter: Which library should it be and can you point me in the direction(website/github) of it?
@Zexaron

This comment has been minimized.

Zexaron commented Apr 22, 2017

Hello

While I personally cannot answer those questions, I'll just say that I'm happy for your eagerness and appreciate it, I hope you succeed in your cause. Infact this niche* is in so much of need of HW support.
(* EDIT: referring to the encoding sector in general)

However I have a few questions myself to clear something out, if possible:

  • How do you see this affecting the final performance of an encode ? Somewhat, Moderate, Significant ?
  • Would this apply only to certain components of the encoding process, such as filters ?
  • Will it work for both x264 and x265 codecs ? If not, which one would likely be chosen?
  • Possible for all GCN architectures ? If not, what is the lowest GCN version it will support ?
  • Would it be possible for the CPU to help the GPU, or would this replace CPU from the main encoding work ?
@Xaymar

This comment has been minimized.

Xaymar commented Apr 22, 2017

How do you see this affecting the final performance of an encode ? Somewhat, Moderate, Significant ?

It really depends on the GPU in the System. AMDs HW isn't exactly standardized when it comes to VCE since it is technology that was first introduced in TeraScale (remember ATI?) cards and has just been slightly updated since then. An RX 480 (Polaris) can real-time encode 4k at 35 FPS, a R9 285 (Tonga) can encode 4k at 83 FPS. I've listed all the possible real time encoding frame rates here with the help of users:

Would this apply only to certain components of the encoding process, such as filters ?

AMD AMF can take over conversion from different color formats, resizing, encoding, decoding and probably more I don't remember at the moment.

Will it work for both x264 and x265 codecs ? If not, which one would likely be chosen?
Possible for all GCN architectures ? If not, what is the lowest GCN version it will support ?

H264/AVC, H264/SVC are supported by TeraScale, GCN1, GCN2, GCN3 and GCN4. H265/HEVC is supported by GCN4. There is supposedly support for VP9 decoding on GCN4, but accessing it has been difficult.

Would it be possible for the CPU to help the GPU, or would this replace CPU from the main encoding work ?

This replaces the CPU encoding completely. It is a hardware encoder after all. The quality of the encode largely depends on user settings (default hovers between x264 superfast and x264 veryfast) but users of the plugin have discovered ways to push it to quality levels of x264 fast and higher.

@Zexaron

This comment has been minimized.

Zexaron commented Apr 23, 2017

Understood, so it depends on VCE version.

Wow, the speeds that you have presented that would be achievable ... I kinda don't want to celebrate too early just in case, and rather leave room for others to comment on that.

Personally I woudln't mind about no VP9 support (it's more meant for streaming and some people say it's missing some stuff older codecs had) for the past few months I used to do 1080p into H265 and it goes around 20 FPS under my quality settings with a Core i7 3820 at 4.0 GHz (turbo)

I have an RX480 ... and it's sitting there doing nothing in this case.

Unfotunately, I cannot say anything definitely, I'm just an observer around here, I now recall and think that Handbrake is actually only a GUI ontop of other open-source underlying projects, encoders, decoders and etc. I think it may be called "libav.."something"".

I hope that wouldn't mean some big setback for this cause, or your further interest. Fingers crossed.

  • Okay the GPU would take over the main encoding job, but is there any way a CPU could help in some of the misc things (withouth creating bottleneck situation)
@bradleysepos

This comment has been minimized.

Contributor

bradleysepos commented Apr 23, 2017

Handbrake is actually only a GUI ontop of other open-source underlying projects

We use really good tape.

But seriously, there's a lot more to it than that. HandBrake has a lot going on under the hood.

@bradleysepos

This comment has been minimized.

Contributor

bradleysepos commented Apr 23, 2017

@Xaymar Thanks very much for your consideration.

It's important to the project that people don't simply drop code and drop out (see: our OpenCL implementation). If you're willing to support your contributions with bug fixes and other considerations down the line, this should be a nice addition. 😸

I'll let the other developers chime in with specifics.

@Xaymar

This comment has been minimized.

Xaymar commented Apr 23, 2017

Okay the GPU would take over the main encoding job, but is there any way a CPU could help in some of the misc things (withouth creating bottleneck situation)

A CPU will most likely bottle-neck any dedicated HW on the GPU, most of the time it already does in the plugin for OBS Studio. I implemented OpenCL Transfer (OpenCL DMA frame copy) to reduce and stabilise latency issues on CPU->GPU transfers as otherwise 4k real time encoding is next to impossible. This is obviously different per CPU, but so far no Intel CPU and no AMD CPU on Bulldozer or earlier has managed this. Ryzen seems to be completely unfazed by the amount of data that needs to be pushed to the GPU, which is why Host transfer on Ryzen works better than OpenCL transfer.

It's important to the project that people don't simply drop code and drop out (see: our OpenCL implementation). If you're willing to support your contributions with bug fixes and other considerations down the line, this should be a nice addition.

The plan is to use either the same or a similar backend to the one I use in the obs-studio plugin, so that fixes for bugs appear in both at the same time. The 2.0.0 backend has received a big makeover that makes supporting it much easier as it uses an inheritance model instead of duplicating logic. There's still places to make it better, but this is at least a pretty good start to work off of. Doubt there's any reason for me to do this hit & run style, it actively benefits me too.

I'll let the other developers chime in with specifics.

I'll be waiting then.

@Zexaron

This comment has been minimized.

Zexaron commented Apr 23, 2017

@bradleysepos

I was praying to be wrong but I honestly didn't expect it. Obviosuly I'm happy now, but at that moment, I felt like I should crawl into a corner and shut up. lol.

@Xaymar

Allright, so I shouldn't expect a general purpose CPU to be able to help anything, in this case it's all good, we would actually be able to freely do other things while it's encoding in background (web, desktop misc, etc) as it's taking the weight off the CPU. 2 flies with 1 stone.

This brings me to another thing, not that I'm so hungry and I wouldn't be satisfied with just this, but what about combining GPUs together, including the integrated ones on APUs, or I'm just mixing that with DirectX12 and such capability is not possible in AMF ?

Sorry that I keep asking things that may be obvious to actual developers, it may however clear things out for a better overview for other readers as well.

@sr55

This comment has been minimized.

Member

sr55 commented Apr 23, 2017

Do I implement AMF support in Handbrake or should that be in a library itself?

If you want to write a patch, we are defiantly open to it if it's done properly.

If former: Where should the implementation be? Should it be optional (not required)?

Our general preference would be for this to be supported in libav as we already have a wrapper for some of their encoders that we don't already hook directly.

The plan is to use either the same or a similar backend to the one I use in the obs-studio plugin

Low maintenance is key here. We the HandBrake developers don't make heavy use of hw encoders. Partly due to a lack of supported hardware and partly due to trade offs

If latter: Which library should it be and can you point me in the direction(website/github) of it?

From what I understand, MediaSDK isn't being maintained by AMD anymore which would cross that off the list.

Leaving AMF. As long as the license is GPL compatible and the library is compatible with our infrastructure I don't really have an issue. It looks like AMF may be a visual studio project without a way of building it on linux out the box. (We cross-compile from linux using MinGW-w64) so that may be an issue?

As far as the AMD patch went. Our requirements for that was it made it into libav so we could easily hook it. However it seems have fallen apart there and never made it into their code base and they appear to have disappeared.

I think before moving forward and you spending a lot of time, maybe a good idea to put together a rough plan we can sign off on.

@Xaymar

This comment has been minimized.

Xaymar commented Apr 23, 2017

... this to be supported in libav ...

Supporting it in ffmpeg or libav should be easy-ish. I much prefer working with ffmpeg as I already have set up a complete development environment for it, but seeing that apparently libav treats ffmpeg (bad blood between developers? Dunno, don't care) as if it didn't exist and that most distros are switching back to ffmpeg makes it seem that ffmpeg would be a better choice or including it through a third-party library (the backend i spoke of).

... MediaSDK isn't being maintained by AMD anymore ... Leaving AMF. As long as the license is GPL compatible and the library is compatible with our infrastructure ...

AMD Media SDK is the old, dated version that supported ancient cards but was riddled with bugs and unexpected behaviour, which could partially be solved by installing the AMD Media Package that added an even large Library overhead. This is the SDK I started off working with in the plugin (and this later was credited as one of the reasons why AMD Encoding is back on their Roadmap again, which resulted in AMD AMF SDK).

AMD AMF SDK is the newer version which fixes many of the inconsistencies in the library and actually moves the support from the application to the driver itself. Thanks to this distributing and building applications that use the API are massively smaller and easier to code too.

The new SDK is also licensed under MIT/X11 which should be compatible with GPL, so there should be no issues there.

It looks like AMF may be a visual studio project without a way of building it on linux out the box. (We cross-compile from linux using MinGW-w64) so that may be an issue?

The examples are indeed Visual Studio projects, but the library headers itself are pure C/C++, so there should be no issues unless MinGW does not support stdcall, no-vtable attributes and similar. It is important to remember that AMF support is currently Windows only, Linux has VAAPI and VDPAU for this. Support may be coming in the future, but it is uncertain currently.

... maybe a good idea to put together a rough plan we can sign off on.

Yes that would be great. My initial plan was:

  1. Extract the backend code from the obs-studio Plugin and convert it to it's own library so that fixes can be implemented for both tools in mostly the same time. The ideal goal would be a header only library which removes the need of compiling an intermediate library and allows it to be included directly.
  2. Write a Handbrake encoder frontend and UI that handles the direct interaction between either, keeping UI, frontend and backend seperate enough that finding issues because an easy task. This frontend would be using the library from 1.
  3. Have a working version by the end of H2 2017 or earlier, depending on how well the work goes.

This plan however requires adding support directly to Handbrake instead of adding it to ffmpeg or libav.

@sr55

This comment has been minimized.

Member

sr55 commented Apr 23, 2017

https://github.com/HandBrake/HandBrake/blob/master/libhb/encx265.c is an example of us hooking libx265

Outside of this, the encoder is simply registered in a number of places so that the UI's can query libhb for the available encoders and required information to setup the GUI's (

hb_encoder_internal_t hb_video_encoders[] =
)

Linux, Mac GUI's and CLI will automatically pick them up whilst the WinGUI will need some minor tweaks to register it. Given it's platform specific, we'll probably need an is_available function to allow the UI's to detect when it's not and not show it. We already have that for quicksync.

Following the same kind of pattern is preferable. (There are a few encoders there enc*.c). I'd have a dig around to see how it all hangs together. Using this approach allows all the existing filters and a/v sync code etc to all work as it does today.

@Xaymar

This comment has been minimized.

Xaymar commented May 1, 2017

Sorry for the delay, it was a busy week.

Given it's platform specific, we'll probably need an is_available function to allow the UI's to detect when it's not and not show it.

That should be entirely possible. I have something similar in the obs-studio plugin which tests if any GPU even supports it and if not disables itself again.

Following the same kind of pattern is preferable. (There are a few encoders there enc*.c)

I think it would be possible to do this, though I'd much prefer the amf.c, amf_h264.c and amf_h265.c names - that would immediately tell a person working on it that these files work together. However I'll adjust it to whatever is requested.


Anyway, there has been some progress on the AMF end. There is now a C API (or at least something that looks like one) which potentially could help reducing the necessary libraries if we do decide to use it.

As for the backend, it is practically done. There are some design choices which could have been done better, but there is no point changing that now. I've spent a little bit squashing remaining bugs and adding the last requested features, which essentially makes it feature complete for now. Not bug free, but at least feature complete.

@sr55

This comment has been minimized.

Member

sr55 commented May 1, 2017

You'll see for QSV we have enc_qsv.c followed by qsv_* for various bits and pieces it needs. I wasn't suggesting they all be enc_* Basically enc_ is the libhb interface wrapper to qsv*

@jimster480

This comment has been minimized.

jimster480 commented May 1, 2017

Awesome!
I really hope to see this support come to Handbrake.

I'd be willing to test & debug any versions as necessary, I have a 6670, RX 470, RX460, Fury (Non-X) and I happen to be a C/C++ low level software developer!

@Xaymar

This comment has been minimized.

Xaymar commented Jun 10, 2017

A bit of an update on the situation here: I've prioritized integration into ffmpeg so that it can be done in H2 of 2017, essentially also giving access to Handbrake to it through ffmpegs libraries.

@jimster480

This comment has been minimized.

jimster480 commented Jun 10, 2017

@Xaymar

This comment has been minimized.

Xaymar commented Aug 19, 2017

Information Update: ffmpeg integration is technically done for H264 and H265. For sources including build information and more take a look at: https://github.com/Xaymar/ffmpeg-amf .

What's left now is:

  • Formatting - Ensure that the code base matches the coding guidelines
  • Automated testing (scripted) - Ensure that the code actually works as expected in all tested situations
  • Improve code quality for use by libraries.

A PR to ffmpeg should happen within this month.

@iGamerHD

This comment has been minimized.

iGamerHD commented Sep 16, 2017

Hi, I have an R9 290 and would like to know if there is anything new that we should know?
Is it already possible to use the hardware acceleration in Handbrake?

@sr55

This comment has been minimized.

Member

sr55 commented Sep 16, 2017

Not on AMD cards yet. If you have an Intel CPU that's modern (i.e a 4th gen or later), then you can use QuickSync if you enable the onboard GPU as well)|

@iGamerHD

This comment has been minimized.

iGamerHD commented Sep 16, 2017

Okay thank you anyway. I have an Ivy Bridge CPU (3rd gen) which supports intel QSV for H264 but i would like to compare it to native AMD encoding.

@mirh

This comment has been minimized.

mirh commented Sep 16, 2017

Quicksync is there since SB tbh.

@sr55

This comment has been minimized.

Member

sr55 commented Sep 16, 2017

It was but you it's best if you have newer (preferably 4th gen or later) CPU with it. They were making substantial leaps back in improvements.

@mudkipme

This comment has been minimized.

mudkipme commented Sep 16, 2017

As a MacBook Pro with dedicated graphics card user, it's difficult to access the intergrated Intel graphics in Boot Camp, and HandBrake doesn't support hardware encoding on macOS. So AMD Hardward encoding is my only hope.

Looking forward to AMD VCE support in HandBrake 👍

@sr55

This comment has been minimized.

Member

sr55 commented Jun 9, 2018

On a side note, pretty sure constant quality isn't working properly. Will need to look into that.
ABR is working fine. As is 2-pass (as far as I can tell). Not very good at hitting it's target but not massively out.

@lhorace

This comment has been minimized.

lhorace commented Jun 11, 2018

@sr55

I believe you mistaken when you say the use case is mostly going to be limited to lower end hardware, especially one who does youtube videos.

You going to need Radeon Pro version or Vega for constant quality to work properly. It's not about FPS, it's about time at least for me. You have GCN1 where the VCE is quite poor. VCE 3.1 I have is much better (I believe it's 3.1 or 3).

RX480

@jimster480

This comment has been minimized.

jimster480 commented Jun 11, 2018

@kroppt

This comment has been minimized.

kroppt commented Jun 11, 2018

I tested it with R9 390, which is GCN 2, and I got double the framerate (30's to 70's) for typical H.264 encode compared with i7-4790K x264 encoding. Of course, what I really want is H.265 encoding :(

@MathewCNichols

This comment has been minimized.

MathewCNichols commented Jun 11, 2018

Tested with a Phenom II X4 945 and RX 550. (Yes I know this is a huge bottleneck!)

Using H.264 (x.264), it would encode at about 1:1 speed, 5% GPU utilization, maintaining approximately 20 FPS. Using H.264 (AMD VCE), it encodes at about 10:1 speed, 75% GPU utilization, maintaining approximately 60 FPS.

The RX 550 being only entry level is still GCN 4 & VCE 3.4. The bottleneck in my CPU probably inflates the difference in my results.

For now I'll take a hit in video quality for solid FPS, speed, and less energy consumed!

Thank you so much for all of your work!

@Roph

This comment has been minimized.

Roph commented Jun 11, 2018

@MathewCNichols Your 550 can also encode H265, you may want to try it out.

AMD's H264 encoding is quite poor compared to nvidia's hardware NVENC encoder's offerings. The H265 from AMD/Nvidia is much closer in bitrate:quality however.

Also, RX 400/500 and vega VCE encodes H265 much faster than it can do H264. Your CPU shouldn't be a bottleneck if you're also hardware decoding and hardware processing (such as downsampling). I don't know if handbrake can do this but it's possible to do the entire process in hardware with "A's Video Converter".

@sr55

This comment has been minimized.

Member

sr55 commented Jun 11, 2018

@lhorace read my post again ;)

I tested with a Radeon Pro 460 4GB. That's a Polaris GCN 4, VCE 3.4 part.
I also compared that against a much older GCN 1 7850 2GB.

Updated post above to make it clearer. Also added some numbers from a 4770K for comparison.

So Any Desktop with Ryzen 5, 7, Intel Quad Core Haswell or newer is going to be close to, or output perform VCE on the their most common Polaris based cards.

Gets a little more dicey on laptops with low TDP quads etc. Makes more sense there, or lower end hardware in general. So I don't think I'm being unfair.

Based on the numbers i'm seeing, my guess is the design goals for the hardware was:

at least:
1080p120
4Kp60

In real time.

@kroppt and others -> Remember, HandBrake by default users slower x264 settings to prefer quality / filesize over raw speed. You need to adjust the preset on the video tab to "superfast" for roughly equivalent results.

@MathewCNichols Try removing decomb if it's turned on. If your source doesn't need it, it'll remove some of the bottleneck.

I need to confirm, but I would expect VCE 3.4 to perform the same on all cards within the family if it's a true ASIC.

I don't unfortunately have any Vega hardware to test.

I'll do a bit more digging into H.265. Initial impressions is it's a substantially better implementation than their H.264 encoder. (Not the quickest but the quality is substantially better (And I suspect not all of that is codec improvement))

@jimster480

This comment has been minimized.

jimster480 commented Jun 11, 2018

@sr55

This comment has been minimized.

Member

sr55 commented Jun 11, 2018

If you have VEGA, I'm interested in numbers from it for balanced and quality, no filters.
I expect they'll be the same as Polaris but it'd be nice if I was wrong.

@Roph

This comment has been minimized.

Roph commented Jun 11, 2018

Vega has much higher throughput over polaris (fps for a given resolution/quality), though iirc, quality:bitrate is the same.

@jimster480

This comment has been minimized.

jimster480 commented Jun 11, 2018

@jimster480

This comment has been minimized.

jimster480 commented Jun 11, 2018

@bradleysepos

This comment has been minimized.

Contributor

bradleysepos commented Jun 11, 2018

You’ll need to select the appropriate encoder instead of x265 on the video tab.

@sr55

This comment has been minimized.

Member

sr55 commented Jun 11, 2018

image

Usually start with the Fast 1080p30,
Disable decomb,
Change to the VCE encoder of choice.

1080p source such as "big buck bunny"

@jimster480

This comment has been minimized.

jimster480 commented Jun 11, 2018

@jimster480

This comment has been minimized.

jimster480 commented Jun 11, 2018

@jimster480

This comment has been minimized.

jimster480 commented Jun 11, 2018

@lhorace

This comment has been minimized.

lhorace commented Jun 12, 2018

@sr55 I encode with VCE, 30000 bit rate, 1 hour video, no quality lost visually at 30 minutes

I have AMD because I only use AMD hardware and I haven't tested Ryzen Chips. AMD FX-8320E Eight-Core Processor, 5 hours at superfast, 3.5 hours at very fast.

@jimster480

This comment has been minimized.

jimster480 commented Jun 12, 2018

@lhorace

This comment has been minimized.

lhorace commented Jun 12, 2018

Because I am recording video for Youtube, it's always been H.264 at 30000 bit rate. As I've said, with CPU transcoding, 1080P or 1440p, it's from 3.5 to 5 hours (CPU) versus 30 minutes hardware encoding

@jimster480

This comment has been minimized.

jimster480 commented Jun 12, 2018

@lhorace

This comment has been minimized.

lhorace commented Jun 12, 2018

I record at that higher bit rate because when you upload to youtube, youtube also transcode and if you don't give youtube the best. It will look ugly

@lhorace

This comment has been minimized.

lhorace commented Jun 12, 2018

@jimster480 You transcode H.264 to HEVC via CPU. 1 hour video, 1080P or 1440P @ 20 minutes? @ 10000 bit rate?

@lhorace

This comment has been minimized.

lhorace commented Jun 12, 2018

Is this 4,000 Intel i9 cpu?

@bradleysepos

This comment has been minimized.

Contributor

bradleysepos commented Jun 12, 2018

Just a heads up to all, there are 16 participants on this issue. It's easy to write Tweet-length comments on GitHub, but please keep in mind you're likely sending 15 emails every time you do.

@lhorace

This comment has been minimized.

lhorace commented Jun 12, 2018

I personally been following this issues for the last 6 months, every E-Mail, is how I track. Is just as important to me. With that said, it seems to me people, after 6 months of effort, how useful this feature is. I like to make sure that this feature is very important to me.

@jimster480

This comment has been minimized.

jimster480 commented Jun 12, 2018

@jimster480

This comment has been minimized.

jimster480 commented Jun 12, 2018

@bradleysepos

This comment has been minimized.

Contributor

bradleysepos commented Jun 12, 2018

Indeed, the feature is welcome and important, which is why I fixed and merged it. 😸

Perhaps I should have been more specific: In the future, consider condensing multiple one-liners into single posts so those of us who get emails for all issues and pull requests don't have to unsubscribe from the entire thread to limit noise.

Thanks for the benchmarks as requested. Let's move further benchmarks and discussions to the HandBrake Community Forums.

@HandBrake HandBrake locked as resolved and limited conversation to collaborators Jun 12, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.