### Dependency
Note:
+ **FFmpeg is a tool for encode and decode video and autio data into Pytorch tensor** -> Super Fast, Memory Efficiency compare to process video to raw frames (use raw frames only for visualization purpose).
+ Torchcodec is just FFmpeg but lighter and specialize for Pytorch.

Setup TorchCodec:
1. Install FFmpeg to "Windows env" for torchcodec (gyan.dev version): `winget install "FFmpeg (Essentials Build)"`
2. Install "ffmpeg" to "Anaconda env" `conda install -c conda-forge "torchcodec=*=*cuda*"`
3. test `ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i kicking.mp4 -f null -`

In [1]:
import torch

print(f"{torch.__version__=}")
print(f"{torch.cuda.is_available()=}")
print(f"{torch.cuda.get_device_properties(0)=}")

torch.__version__='2.9.1+cu128'
torch.cuda.is_available()=True
torch.cuda.get_device_properties(0)=_CudaDeviceProperties(name='NVIDIA GeForce RTX 5060 Ti', major=12, minor=0, total_memory=16310MB, multi_processor_count=36, uuid=a44fad52-90f1-bd41-0295-00a3a33a5178, pci_bus_id=1, pci_device_id=0, pci_domain_id=0, L2_cache_size=32MB)


In [None]:
# Use Python's Windows DLL API (3.8+). Add the folder that holds avcodec/avformat/avutil DLLs.
# TorchCodec README + version matrix: https://github.com/pytorch/torchcodec  (docs)

# from pathlib import Path
# import os, sys
# # Point this to the 'bin' folder of your 'asr' environment
# os.add_dll_directory(r"D:\Utilities\miniconda\envs\asr\Library\bin")

import torch
import torchcodec
print(f"Success! TorchCodec loaded. CUDA: {torch.cuda.is_available()}")

# ffmpeg_dll_dir = Path(r"D:\Utilities\miniconda\envs\asr\Library\bin")  # adjust if your conda root differs
# assert ffmpeg_dll_dir.exists(), ffmpeg_dll_dir
# os.add_dll_directory(str(ffmpeg_dll_dir))  # Python 3.8+ DLL search

# import torch, torchcodec, platform, subprocess
# print("exe:", sys.executable)
# print("torch", torch.__version__, "torchcodec", torchcodec.__version__, "py", platform.python_version())
# subprocess.run(["ffmpeg", "-version"], check=True)

In [None]:
# from torchcodec.decoders import VideoDecoder
import torch
print(torch.version.cuda)
print(f"Successfully loaded TorchCodec with CUDA: {torch.cuda.is_available()}")

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
decoder = VideoDecoder("kicking.mp4", device=device)

decoder.metadata

In [None]:
from torchcodec.samplers import clips_at_regular_timestamps

clips = clips_at_regular_timestamps(
    decoder,
    seconds_between_clip_starts=10, # in 1 video, collect frames / 10s
    num_frames_per_clip=5, # extract 5 frames per clip
    seconds_between_frames=0.2, # frames / 0.2s
)

clips

In [None]:
from typing import Optional

def plot(frames: torch.Tensor, title : Optional[str] = None):
    try:
        from torchvision.utils import make_grid
        from torchvision.transforms.v2.functional import to_pil_image
        import matplotlib.pyplot as plt
    except ImportError:
        print("Cannot plot, please run `pip install torchvision matplotlib`")
        return

    plt.rcParams["savefig.bbox"] = 'tight'
    fig, ax = plt.subplots()
    ax.imshow(to_pil_image(make_grid(frames)))
    ax.set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
    if title is not None:
        ax.set_title(title)
    plt.tight_layout()

In [None]:
for frame in decoder:
    assert (
        isinstance(frame, torch.Tensor)
        and frame.shape == (3, decoder.metadata.height, decoder.metadata.width)
    )

In [None]:
first_frame = decoder[0]
#? Extract frame Flexibility
every_twenty_frame = decoder[0 : -1 : 20]  # using slices

print(f"{first_frame.shape = }")
print(f"{first_frame.dtype = }")
print(f"{every_twenty_frame.shape = }")
print(f"{every_twenty_frame.dtype = }")

In [None]:
plot(every_twenty_frame, "every 20")