how can I use GPU in write_videofile #2011

TANGnlp0711 · 2023-07-13T11:44:53Z

@tburrows13 @mgaitan <!--
Hello! If you think that it is a simple problem, then consider asking instead on our Gitter channel: https://gitter.im/movie-py/. This makes it easier to have a back-and-forth discussion in real-time.

You can format code by putting ``` (that's 3 backticks) on a line by itself at the beginning and end of each code block. For example:
I rewrite the file:ffmpeg_writer: add -hwaccle nvdec
line[97]
cmd = [
FFMPEG_BINARY,
"-hwaccel","nvdec",
"-y",
"-loglevel",
"error" if logfile == sp.PIPE else "info",
"-f",
"rawvideo",
"-vcodec",
"rawvideo",
"-s",
"%dx%d" % (size[0], size[1]),
"-pix_fmt",
pix_fmt,
"-r",
"%.02f" % fps,
"-an",
"-i",
"-",
]
if audiofile is not None:
cmd.extend(["-i", audiofile, "-acodec", "copy"])
cmd.extend(["-vcodec", codec, "-preset", preset])
if ffmpeg_params is not None:
cmd.extend(ffmpeg_params)
if bitrate is not None:
cmd.extend(["-b", bitrate])

video_clips.write_videofile(file_name, temp_audiofile=file_name.replace(VIDEO_EXT_NAME, '.mp3'),
                                        fps=24,codec='h264_nvenc')

The GPU memory is being occupied, but the GPU utilization is almost negligible. As a result, the time taken to write the video does not show any significant improvement.

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3427015 C /usr/local/bin/ffmpeg 211MiB |
+-----------------------------------------------------------------------------+
-->

The text was updated successfully, but these errors were encountered:

sixyang · 2023-09-11T10:24:42Z

hello, the bottleneck is not write_video, is the for-loop and iter_frames function.

antsmallant · 2023-11-26T08:43:29Z

hello, the bottleneck is not write_video, is the for-loop and iter_frames function.

yeah, look into the ffmpeg_write.py, in write_frame function, img_array.tobytes() will cost about 90% of the total time running write_videofile, this is the bottleneck.

sixyang · 2024-02-05T10:00:43Z

hello, the bottleneck is not write_video, is the for-loop and iter_frames function.

yeah, look into the ffmpeg_write.py, in write_frame function, img_array.tobytes() will cost about 90% of the total time running write_videofile, this is the bottleneck.

you can use torch to accelerate, in the file moviepy/video/tools/drawing.py, modify blit to blit_gpu, shown as follows：

import numpy as np
import torch


def blit_gpu(im1, im2, pos=None, mask=None, ismask=False):
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    if pos is None:
        pos = [0, 0]

    xp, yp = pos
    x1 = max(0, -xp)
    y1 = max(0, -yp)
    h1, w1 = im1.shape[:2]
    h2, w2 = im2.shape[:2]
    xp2 = min(w2, xp + w1)
    yp2 = min(h2, yp + h1)
    x2 = min(w1, w2 - xp)
    y2 = min(h1, h2 - yp)
    xp1 = max(0, xp)
    yp1 = max(0, yp)

    if (xp1 >= xp2) or (yp1 >= yp2):
        return im2

    if not isinstance(im1, torch.Tensor):               # 5.43 ms per loop / 100 loops
        im1 = torch.tensor(im1, device=device)
    if not isinstance(im2, torch.Tensor):
        im2 = torch.tensor(im2, device=device)

    blitted = im1[y1:y2, x1:x2]

    new_im2 = im2.clone()

    if mask is None:
        new_im2[yp1:yp2, xp1:xp2] = blitted
    else:
        if not isinstance(mask, torch.Tensor):          # 2.71 ms per loop / 10 loops
            mask = torch.tensor(mask[y1:y2, x1:x2], device=device)  # 1.45 ms / 100 loops
        else:
            mask = mask[y1:y2, x1:x2]
        if len(im1.shape) == 3:
            mask = mask.unsqueeze(-1).repeat(1, 1, 3)
        blit_region = new_im2[yp1:yp2, xp1:xp2]
        new_im2[yp1:yp2, xp1:xp2] = mask * blitted + (1 - mask) * blit_region

    # return new_im2.cpu().numpy().astype("uint8") if not ismask else new_im2.cpu().numpy()   # 6.13 ms / 100 loops
    return new_im2 if not ismask else new_im2

then modify file moviepy/video/VideoClip.py line 565 to return blit_gpu(img, picture, pos, mask=mask, ismask=self.ismask).
This works a lot, provided that you have a GPU

keikoro · 2024-02-10T19:31:40Z

Please always include your specs like we ask for in our issue templates – MoviePy version, platform used etc.

Code samples and logs should be code-formatted for better readability.

JasonChoate · 2024-02-17T04:16:29Z

This works a lot, provided that you have a GPU

This is giving me the following error with my RTX 3070:

File "C:\Python311\Lib\site-packages\moviepy\Clip.py", line 474, in iter_frames
    frame = frame.astype(dtype)
            ^^^^^^^^^^^^
AttributeError: 'Tensor' object has no attribute 'astype'. Did you mean: 'dtype'?

I really do appreciate the thought being put into this though, being able to utilize a GPU to help mitigate this bottleneck would be massive.

zhangdanq · 2024-04-01T11:25:51Z

This works a lot, provided that you have a GPU

This is giving me the following error with my RTX 3070:
File "C:\Python311\Lib\site-packages\moviepy\Clip.py", line 474, in iter_frames
    frame = frame.astype(dtype)
            ^^^^^^^^^^^^
AttributeError: 'Tensor' object has no attribute 'astype'. Did you mean: 'dtype'?
I really do appreciate the thought being put into this though, being able to utilize a GPU to help mitigate this bottleneck would be massive.

You can use

return new_im2.cpu().numpy().astype("uint8") if not ismask else new_im2.cpu().numpy() # 6.13 ms / 100 loops

icynare · 2024-04-18T11:42:30Z

It works for me! 3 times faster.

File "C:\Python311\Lib\site-packages\moviepy\Clip.py", line 474, in iter_frames
    frame = frame.astype(dtype)
            ^^^^^^^^^^^^
AttributeError: 'Tensor' object has no attribute 'astype'. Did you mean: 'dtype'?

@JasonChoate As for this error, just modify function iter_frames in Clip.py as follows:

if (dtype is not None) and (frame.dtype != dtype):
       # frame = frame.astype(dtype)
       frame = frame.cpu().numpy().astype(dtype)

maxin9966 · 2024-05-06T08:49:48Z

hello, the bottleneck is not write_video, is the for-loop and iter_frames function.

yeah, look into the ffmpeg_write.py, in write_frame function, img_array.tobytes() will cost about 90% of the total time running write_videofile, this is the bottleneck.

you can use torch to accelerate, in the file moviepy/video/tools/drawing.py, modify blit to blit_gpu, shown as follows：

import numpy as np
import torch


def blit_gpu(im1, im2, pos=None, mask=None, ismask=False):
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    if pos is None:
        pos = [0, 0]

    xp, yp = pos
    x1 = max(0, -xp)
    y1 = max(0, -yp)
    h1, w1 = im1.shape[:2]
    h2, w2 = im2.shape[:2]
    xp2 = min(w2, xp + w1)
    yp2 = min(h2, yp + h1)
    x2 = min(w1, w2 - xp)
    y2 = min(h1, h2 - yp)
    xp1 = max(0, xp)
    yp1 = max(0, yp)

    if (xp1 >= xp2) or (yp1 >= yp2):
        return im2

    if not isinstance(im1, torch.Tensor):               # 5.43 ms per loop / 100 loops
        im1 = torch.tensor(im1, device=device)
    if not isinstance(im2, torch.Tensor):
        im2 = torch.tensor(im2, device=device)

    blitted = im1[y1:y2, x1:x2]

    new_im2 = im2.clone()

    if mask is None:
        new_im2[yp1:yp2, xp1:xp2] = blitted
    else:
        if not isinstance(mask, torch.Tensor):          # 2.71 ms per loop / 10 loops
            mask = torch.tensor(mask[y1:y2, x1:x2], device=device)  # 1.45 ms / 100 loops
        else:
            mask = mask[y1:y2, x1:x2]
        if len(im1.shape) == 3:
            mask = mask.unsqueeze(-1).repeat(1, 1, 3)
        blit_region = new_im2[yp1:yp2, xp1:xp2]
        new_im2[yp1:yp2, xp1:xp2] = mask * blitted + (1 - mask) * blit_region

    # return new_im2.cpu().numpy().astype("uint8") if not ismask else new_im2.cpu().numpy()   # 6.13 ms / 100 loops
    return new_im2 if not ismask else new_im2

then modify file moviepy/video/VideoClip.py line 565 to return blit_gpu(img, picture, pos, mask=mask, ismask=self.ismask). This works a lot, provided that you have a GPU

@sixyang Is this fully utilizing the NVENC of 40-series GPUs?

notmmao · 2024-05-11T07:26:10Z

hello, the bottleneck is not write_video, is the for-loop and iter_frames function.

In my case, I use VizTracer for measurements and find that iter_frames averages 500ms per frame, while write_frame averages 2ms per frame.

from viztracer import VizTracer
with VizTracer(ignore_frozen=True, ignore_c_function=True) as _:
    final_clip.write_videofile(f"{fn}.mp4",
        # threads=16,   # ffmpeg 不是瓶颈
        codec='h264_nvenc', # 2ms per frame, 不是瓶颈
        write_logfile=f"{fn}.log"
    )

TANGnlp0711 added the question Questions regarding functionality, usage label Jul 13, 2023

icynare mentioned this issue Apr 29, 2024

Can I run 'write_videofile' function with GPU ? #2155

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how can I use GPU in write_videofile #2011

how can I use GPU in write_videofile #2011

TANGnlp0711 commented Jul 13, 2023

sixyang commented Sep 11, 2023

antsmallant commented Nov 26, 2023

sixyang commented Feb 5, 2024 •

edited

keikoro commented Feb 10, 2024

JasonChoate commented Feb 17, 2024

zhangdanq commented Apr 1, 2024

icynare commented Apr 18, 2024 •

edited

maxin9966 commented May 6, 2024

notmmao commented May 11, 2024

how can I use GPU in write_videofile #2011

how can I use GPU in write_videofile #2011

Comments

TANGnlp0711 commented Jul 13, 2023

sixyang commented Sep 11, 2023

antsmallant commented Nov 26, 2023

sixyang commented Feb 5, 2024 • edited

keikoro commented Feb 10, 2024

JasonChoate commented Feb 17, 2024

zhangdanq commented Apr 1, 2024

icynare commented Apr 18, 2024 • edited

maxin9966 commented May 6, 2024

notmmao commented May 11, 2024

sixyang commented Feb 5, 2024 •

edited

icynare commented Apr 18, 2024 •

edited