-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best way to dynamically letterbox video frames with longest dim as parameter? #2832
Comments
Unfortunately, DALI executes GPU operators strictly after CPU operators. This means that Having said all that, there's one workaround: you can pad your data to given alignment (which will produce a letterbox at the bottom and to the right) and then use videos = fn.readers.video_resize( # type: ignore
sequence_length=num_frames,
filenames=source,
skip_vfr_check=True,
size=640,
mode="not_larger",
dtype=types.FLOAT,
device="gpu",
)
padded = fn.pad(videos, axis_names="HW", shape=[1,1], align=32) # use your preferred alignment here
shift = fn.cast((fn.shapes(videos) - fn.shapes(padded)) // 2, dtype=types.FLOAT)
dx = fn.slice(shift, 2, 1, axes=[0])
dy = fn.slice(shift, 1, 1, axes=[0])
shift = fn.stack(dx, dy, 0.0, axis=0)
matrix = fn.cat(np.identity(3,dtype=np.float32), shift, axis=1)
as_volume = fn.reshape(padded, layout="DHWC")
warped = fn.warp_affine(as_volume, matrix, fill_value=0, interp_type=types.INTERP_NN)
letterboxed = fn.reshape(warped, layout="FHWC")
Note that warp_affine doesn't natively support videos - that's why the frame dimension is reinterpreted as depth and 3D warp is used (the depth/frame dimension is left untouched). |
Thank you for the detailed response! Since I was in a pinch, I currently have an implementation where I collect frame dimensions using Does the ordering requirement affect whether the Anyways, keep up the great work! |
Hi @dwrodri,
We don't have that in our roadmap now. Performance-wise it won't yield much of the benefit as the operations are pretty simple. The only benefit would be to accept data that is already on the GPU. If you think that the community would benefit from it feel free to create a PR that would add such functionality.
DALI would yield |
I'll go ahead and close this issue as it has been answered. Thanks again for the fast and thorough responses! |
@dwrodri Hi! Please tell me how you solved the problem with the TypeError: int() argument must be a string, a bytes-like object or a number, not 'DataNode. What does your tagger_pipeline function look like in the end? |
I am trying to implement a DALI pipeline which will feed frames of video into a single-shot object detector written in PyTorch. Specifically, I am trying to replicate the
LoadImages
class from YOLOv5 using only GPU operations. The preprocessing pipeline I'm referencing performs two main operations in the following order:So, operation 1 is pretty well-documented, and can be done by calling
fn.video_resize
with the right arguments. I'm struggling with figuring out how to perform operation 2.My first attempt involved capturing the frame dimensions using
fn.shapes
, but I couldn't find the "obvious" way of padding opposite sides of the frame. Here was my first attempt:This code doesn't run because I'm generating a
dali.types.Constant
with aDataNode
as one of the arguments for the shape, resulting in the following error:So far, I can come up with five potential alternatives which allow me to continue doing this without any CPU-only operations:
paste
andslice
.types.Constant
s.I don't see how I can use
fn.Pad
because it will only place fill values at the end, andfn.transforms.translate
is CPU-only, according to the documentation.I'd really like to be able to pass in a folder of videos where each video has a different aspect ratio, and have all the preprocessing happen on the GPU, but getting this working with just one aspect ratio that isn't known ahead of time is the first step.
Environment
pip
The text was updated successfully, but these errors were encountered: