Make `.detect_video` more memory efficient #139

ejolly · 2022-09-27T22:52:29Z

@ljchang after chatting with @TiankangXie it looks like we can fairly easily roll our own read_video function because torch also provides a lower level API with their VideoReader class.

Just like in their examples, we can just write a function that wraps the next(reader) calls and return a generator so at most we load only batch_size frames at most into memory on each loop iteration. That way even long videos shouldn't be a problem on low RAM/VRAM machines, and more memory will simply allow for bigger batch sizes.

The downside trying to get it to work right now is that torch needs to be compiled with support for it and requires a working ffmeg install:

*** RuntimeError: Not compiled with video_reader support, to enable video_reader support, please install ffmpeg (version 4.2 is currently supported) and build torchvision from source.
Traceback (most recent call last):
  File "/Users/Esh/anaconda3/envs/py-feat/lib/python3.8/site-packages/torchvision/io/__init__.py", line 130, in __init__
    raise RuntimeError(

So it seems like the real cost of rolling our own solution with VideoReader until torch allows for more memory efficient read_video(), is an added dependency on ffmepg and potentially more installation hassle. Or we can try a different library or solution for loading video frames. From a brief search on github it looks like there are lots of custom solutions as third party libraries, because this isn't quite "solved." But most libraries "cheat" a bit IMO. e.g. Expecting that you've pre-saved each frame as a separate image file on disk

The text was updated successfully, but these errors were encountered:

maltelueken · 2023-02-01T10:52:54Z

Hi,

first of all, really great work! I was very happy to see the v0.5 release.

I ran into this issue when using your VideoDataset implementation. This is how I solved it for now: https://github.com/mexca/mexca/blob/main/mexca/video.py#L36

The advantage of this solution is that it does not depend on pytorch's VideoReader, which requires building torchvision from source and only seems to work on Linux currently, nor on torchvision.datasets.video_utils.VideoClips which does not seem to work well with batching. A disadvantage of the solution is that the entire video first needs to be decoded to read the timestamps, which can take a couple of minutes for longer videos (this is also necessary with VideoClips btw).

I hope this may be helpful.

ejolly · 2023-06-22T22:20:07Z

Thanks @maltelueken this super helpful! We're currently trying an approach in #170 to "lazy" load video-frames using pyav. Would love your thoughts/any testing you might be able todo with this approach!

juaninachon · 2023-12-18T16:23:28Z

@ejolly @ljchang Hi, I would like to understand why it seems that the time that it takes to process each iteration increases ~lineally with video length. With batch_size=1, it starts at 1.00s/it. By frame 150, it has already doubled and by frame 300, it reaches 3.10s/it. I think this makes running detect_video on large files implausible. Prediction wise, should it make a difference if I split the video into 60s or 10s chunks? Thanks in advance for any recommendations.

Kind regards.
Juan

My system: i7-7700, 50gb ram, rtx 3060 12vram, ubuntu 22.04, feat version '0.6.1'

My detector:
Detector(face_model="retinaface",
landmark_model="mobilefacenet",
au_model="xgb",
emotion_model="resmasknet",
facepose_model="img2pose",
identity_model="facenet",
device="cuda")

.detect_video(video_path=mp4_file,
skip_frames=None,
output_size=700,
batch_size=1,
num_workers=0,
pin_memory=False,
face_detection_threshold=0.5,
face_identity_threshold=0.8)

My file (ffmpeg -i -f) :
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'C3A02A_000.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.76.100
Duration: 00:01:00.08, start: 0.000000, bitrate: 30200 kb/s
Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 30009 kb/s, 59.94 fps, 59.94 tbr, 60k tbn, 119.88 tbc (default)
Metadata:
handler_name : GoPro AVC
vendor_id : [0][0][0][0]
timecode : 14:38:07:29
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s (default)
Metadata:
handler_name : GoPro AAC
vendor_id : [0][0][0][0]
Stream #0:2(eng): Data: none (tmcd / 0x64636D74)
Metadata:
handler_name : GoPro AVC
timecode : 14:38:07:29
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.76.100
Stream #0:0(eng): Video: wrapped_avframe, yuvj420p(pc, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 59.94 fps, 59.94 tbn (default)
Metadata:
handler_name : GoPro AVC
vendor_id : [0][0][0][0]
timecode : 14:38:07:29
encoder : Lavc58.134.100 wrapped_avframe
Stream #0:1(eng): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default)
Metadata:
handler_name : GoPro AAC
vendor_id : [0][0][0][0]
encoder : Lavc58.134.100 pcm_s16le
frame= 3600 fps=429 q=-0.0 Lsize=N/A time=00:01:00.06 bitrate=N/A speed=7.16x

ejolly · 2024-01-02T20:46:20Z

Hey @juaninachon this was a conscious design decision on our part until Pytorch natively handles videos in a more efficient way without additional installation overhead. By default Pytorch tries to read every frame from a video file into RAM at once before passing batches to the GPU. A lot of our users complained that pyfeat would crash silently because they were running out of memory when processing videos. Our current solution "seeks" to each frame in the video and discards previous frames before passing batches to the GPU. What you're noticing is the linearly increasing "seek" time, which we decided was a better trade-off than running out of memory, until we or torch has a more efficient native solution.

There should be no difference between cutting the video into segments and processing them independently if that helps speed things up!

juaninachon · 2024-01-04T13:03:48Z

Thank you for your thoughtful reply on these innerworkings. Sincerely, Juan

…

On Tue, Jan 2, 2024, 17:46 Eshin Jolly ***@***.***> wrote: Hey @juaninachon <https://github.com/juaninachon> this was a conscious design decision on our part until Pytorch natively handles videos in a more efficient way without additional installation overhead. By default Pytorch tries to read every frame from a video file into RAM at once before passing batches to the GPU. A lot of our users complained that pyfeat would crash silently because they were running out of memory when processing videos. Our current solution "seeks" to each frame in the video and discards previous frames before passing batches to the GPU. What you're noticing is the linearly increasing "seek" time, which we decided was a better trade-off than running out of memory, until we or torch has a more efficient native solution. There should be no difference between cutting the video into segments and processing them independently if that helps speed things up! — Reply to this email directly, view it on GitHub <#139 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMGB3HFPMKIK4KMFC5S6G63YMRW2NAVCNFSM6AAAAAAQXHIUGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGUZDQNBYHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ejolly created this issue from a note in Refactor Detection Module (Tasks) Sep 27, 2022

ejolly changed the title ~~More memory efficient detect_video~~ Make .detect_video more memory efficient Sep 27, 2022

ejolly mentioned this issue Jun 22, 2023

More efficient video processing #170

Merged

ejolly mentioned this issue Jul 7, 2023

AU fixes + more efficient video processing #173

Merged

ejolly closed this as completed Jul 10, 2023

ljchang moved this from Tasks to Completed in Refactor Detection Module Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `.detect_video` more memory efficient #139

Make `.detect_video` more memory efficient #139

ejolly commented Sep 27, 2022 •

edited

maltelueken commented Feb 1, 2023

ejolly commented Jun 22, 2023

juaninachon commented Dec 18, 2023

ejolly commented Jan 2, 2024

juaninachon commented Jan 4, 2024 via email

Make .detect_video more memory efficient #139

Make .detect_video more memory efficient #139

Comments

ejolly commented Sep 27, 2022 • edited

maltelueken commented Feb 1, 2023

ejolly commented Jun 22, 2023

juaninachon commented Dec 18, 2023

ejolly commented Jan 2, 2024

juaninachon commented Jan 4, 2024 via email

Make `.detect_video` more memory efficient #139

Make `.detect_video` more memory efficient #139

ejolly commented Sep 27, 2022 •

edited