Skip to content
This repository has been archived by the owner on Aug 23, 2022. It is now read-only.

Capturing window rather than desktop? #7

Open
alex-ong opened this issue Aug 31, 2019 · 8 comments
Open

Capturing window rather than desktop? #7

alex-ong opened this issue Aug 31, 2019 · 8 comments
Labels
enhancement New feature or request

Comments

@alex-ong
Copy link

alex-ong commented Aug 31, 2019

I realise I could write my own wrapper that uses win32 to identify a windows rect/monitor then call d3dshots api with the right settings, however two cases aren't handled:

  1. if the window moves (I think I'd write a wrapper to check window moving and then reinit d3dshots with right arguments)
  2. if there is another window covering the target window.

Windows bitblt handles this fine,I was wondering if d3d can handle this (and it's just a matter of implementation) or if d3d can only do full screen capture followed by cropping?

How does OBS studio handle window capture? Bitblt or d3d?

@parsarahimi
Copy link

I agree with Alex, most people intend to use this for AI stuff and being able to run it on just a specific window is essential, can't do any work while it's running this way. An alternative, for now, would probably be to make the window small and crop out the region that has the game.

@nbrochu
Copy link
Member

nbrochu commented May 28, 2020

Apologies for the late reply.

D3Dshot uses the Desktop Duplication API through Direct3D. It's the fastest way to capture, but it's only full display + cropping. It's built for the absolute best speed at the expense of utility. If you fall back to bitblt (using a library like mss for example) you'll gain some utility back at the expense of capture speed. It's not a minor drop either. In my benchmarks, bitblt can only get 30% of the FPS of the Desktop Duplication API (with full display capture, bitblt will get faster when capturing windows I assume).

bitblt also has important problems (for my use case, at least) with some graphics APIs. For example, you can't give it the handle of a window that uses OpenGL or you can't capture exclusive full screen applications, these will all fail to capture.

Thanks for the question on how OBS handles window capture. I looked into it and found that they use a new Windows API (WindowsGraphicsCapture). I can start looking into it to see if there is an opportunity to hook it in Python, measure the performance and evaluate the overall quality.

I do understand the use case for window capture. I built D3DShot specifically to replace mss in SerpentAI and technically I only care about 1 window too: the game window. The only difference is that for me, cropping was sufficient since I need the window focused to send inputs to the game anyway.

@nbrochu
Copy link
Member

nbrochu commented May 28, 2020

#22

@alex-ong
Copy link
Author

It's the fastest way to capture, but it's only full display + cropping. It's built for the absolute best speed at the expense of utility.

I'm using it for a little program that captures a 700x700 window. My monitor res is 2560x1440, and i get like 20~ fps with D3DShot vs >60 with Targetted BitBlt, using the default example code. mss was even worse of course, iirc around 5 fps, since it copies the full screen and then crops.

Hopefully if you integrate WindowsGraphicsCapture, we can have our pie and eat it too, though i use a sub-section of a 1280x1440 window; i'm not sure what WGC's performance would be since i'm not certain if assume it grabs the entire window then crops vs just cropping a subsection.

@nbrochu
Copy link
Member

nbrochu commented May 30, 2020

I'm getting 58fps at the same resolution for fullscreen captures. You have to use the "numpy" capture output to get good speed as shown in the Performance section of the README.

PIL is the default capture out because it's a lighter dependency and easier to use for casual users but it's about 3 times slower (it's still adequate for everything capture()). The README is massive, so maybe people aren't thoroughly reading it and drawing the wrong conclusions about the speeds that can be achieved. I should probably raise a warning when someone tries to use capture() with PIL.

Are you benchmarking your bitblt FPS as time-to-numpy-array? In my tests, the only scenario I've seen bitblt be faster is when you provide a hwnd of a smaller window. That's still not my main issue with bitblt. It can't do OpenGL or DirectX client areas and things are worse in Windows 10 with the updated DWM. In OBS Studio, if I set a window capture to use bitblt, it only works correctly with native win32 applications. Anything else (Electron, Qt, WPF etc.) was a black screen. The only solution to get them back with bitblt is fullscreen + crop and you are back in sub 20 FPS land again.

That being said, if it's faster and it capture correctly for your use case, I don't see why you wouldn't use that. I would. Just make sure that all your potential capture targets work with bitblt.

@alex-ong
Copy link
Author

alex-ong commented May 31, 2020

The test was a while ago (i had to install python x64) and iirc it was numpy. I remember reading the docs saying so. IIRC i didn't test the threaded workload, i believe i just called d.screenshot() in a loop and printed out timing.

The bitblt was a PIL Image but that actually slows it down because the RGBX vs GBRX formatting makes it do a full memcpy; in my app i'm planning on removing PIL at some point and making it pure numpy, keeping GBRX data format.

i'll install it now and do a retest 🍡

@alex-ong
Copy link
Author

I'm getting 55-58fps on both numpy and PIL. No idea why i was getting poor performance before. I went through all my discord logs an i was getting 20ms/frame == 50fps, not 20fps as i stated earlier.

Now i'm beginning to remember why i went against using D3DShot; the plan is to run this on peasant cpus, in a single-process setup. When running it in a thread, (using d.capture()) its still the same process as everything else, because GIL.

I was wondering what the breakdown of (waiting for vblank) occurs when running D3Dshot.screenshot(). If 99% of the time is just waiting for vblank, and the actual capture -> numpy array takes 1ms, that leaves 15ms for the rest of the "threads" to do image processing.

I think I assumed (perhaps incorrectly?) that there was no non-blocking code and the copying from memory took the full 16ms, so even using threading.Thread would cripple the image processing thread, unless i started using multiprocessing.process. It'd be interesting if you had stats on how much of the time is spent waiting for vblank. I might do some further testing now that its getting 58fps, as i was immediately unsatistifed with 50fps.

My BitBlt thing takes 2ms to capture the 700x700 region, leaving 14ms~ for processing. Once i remove the BitBlt -> PIL conversion it will be <1ms. (Yes it runs in a thread ofc but all threads are on the same process...)

Your statement about only native win32 applications is 100% correct; Users weren't able to capture Streamlabs OBS (just shows as blank). Fortunately most users use it to capture an NES Emulator window or OBS, and it works fine for those use-cases.

@nbrochu
Copy link
Member

nbrochu commented May 31, 2020

I don't have solid answers to what you are asking so I created 2 new issues to address that.

#23
#24

Let's move discussion about those aspects there and keep this issue open for window capture.

@nbrochu nbrochu added the enhancement New feature or request label May 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants