-
-
Notifications
You must be signed in to change notification settings - Fork 74
Capturing window rather than desktop? #7
Comments
I agree with Alex, most people intend to use this for AI stuff and being able to run it on just a specific window is essential, can't do any work while it's running this way. An alternative, for now, would probably be to make the window small and crop out the region that has the game. |
Apologies for the late reply. D3Dshot uses the Desktop Duplication API through Direct3D. It's the fastest way to capture, but it's only full display + cropping. It's built for the absolute best speed at the expense of utility. If you fall back to bitblt (using a library like mss for example) you'll gain some utility back at the expense of capture speed. It's not a minor drop either. In my benchmarks, bitblt can only get 30% of the FPS of the Desktop Duplication API (with full display capture, bitblt will get faster when capturing windows I assume). bitblt also has important problems (for my use case, at least) with some graphics APIs. For example, you can't give it the handle of a window that uses OpenGL or you can't capture exclusive full screen applications, these will all fail to capture. Thanks for the question on how OBS handles window capture. I looked into it and found that they use a new Windows API (WindowsGraphicsCapture). I can start looking into it to see if there is an opportunity to hook it in Python, measure the performance and evaluate the overall quality. I do understand the use case for window capture. I built D3DShot specifically to replace mss in SerpentAI and technically I only care about 1 window too: the game window. The only difference is that for me, cropping was sufficient since I need the window focused to send inputs to the game anyway. |
I'm using it for a little program that captures a 700x700 window. My monitor res is 2560x1440, and i get like 20~ fps with D3DShot vs >60 with Targetted BitBlt, using the default example code. mss was even worse of course, iirc around 5 fps, since it copies the full screen and then crops. Hopefully if you integrate WindowsGraphicsCapture, we can have our pie and eat it too, though i use a sub-section of a 1280x1440 window; i'm not sure what WGC's performance would be since i'm not certain if assume it grabs the entire window then crops vs just cropping a subsection. |
I'm getting 58fps at the same resolution for fullscreen captures. You have to use the "numpy" capture output to get good speed as shown in the Performance section of the README. PIL is the default capture out because it's a lighter dependency and easier to use for casual users but it's about 3 times slower (it's still adequate for everything Are you benchmarking your bitblt FPS as time-to-numpy-array? In my tests, the only scenario I've seen bitblt be faster is when you provide a That being said, if it's faster and it capture correctly for your use case, I don't see why you wouldn't use that. I would. Just make sure that all your potential capture targets work with bitblt. |
The test was a while ago (i had to install python x64) and iirc it was numpy. I remember reading the docs saying so. IIRC i didn't test the threaded workload, i believe i just called The bitblt was a PIL Image but that actually slows it down because the RGBX vs GBRX formatting makes it do a full memcpy; in my app i'm planning on removing PIL at some point and making it pure numpy, keeping GBRX data format. i'll install it now and do a retest 🍡 |
I'm getting 55-58fps on both numpy and PIL. No idea why i was getting poor performance before. I went through all my discord logs an i was getting 20ms/frame == 50fps, not 20fps as i stated earlier. Now i'm beginning to remember why i went against using D3DShot; the plan is to run this on peasant cpus, in a single-process setup. When running it in a thread, (using I was wondering what the breakdown of (waiting for vblank) occurs when running D3Dshot.screenshot(). If 99% of the time is just waiting for vblank, and the actual capture -> numpy array takes 1ms, that leaves 15ms for the rest of the "threads" to do image processing. I think I assumed (perhaps incorrectly?) that there was no non-blocking code and the copying from memory took the full 16ms, so even using threading.Thread would cripple the image processing thread, unless i started using multiprocessing.process. It'd be interesting if you had stats on how much of the time is spent waiting for vblank. I might do some further testing now that its getting 58fps, as i was immediately unsatistifed with 50fps. My BitBlt thing takes 2ms to capture the 700x700 region, leaving 14ms~ for processing. Once i remove the BitBlt -> PIL conversion it will be <1ms. (Yes it runs in a thread ofc but all threads are on the same process...) Your statement about only native win32 applications is 100% correct; Users weren't able to capture Streamlabs OBS (just shows as blank). Fortunately most users use it to capture an NES Emulator window or OBS, and it works fine for those use-cases. |
I realise I could write my own wrapper that uses win32 to identify a windows rect/monitor then call d3dshots api with the right settings, however two cases aren't handled:
Windows bitblt handles this fine,I was wondering if d3d can handle this (and it's just a matter of implementation) or if d3d can only do full screen capture followed by cropping?
How does OBS studio handle window capture? Bitblt or d3d?
The text was updated successfully, but these errors were encountered: