Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to modify the labels according to the random rotation of images? #5215

Open
1 task done
WYBupup opened this issue Dec 1, 2023 · 5 comments
Open
1 task done
Assignees
Labels
question Further information is requested

Comments

@WYBupup
Copy link

WYBupup commented Dec 1, 2023

Describe the question.

I am now working on a training framework for image rotation(0、90、180、270 degree) recognization task. Since my dataset is so large that it is unavailable to rotate every images tothe above four angles, becasue there is not enough space on the machine to store them.
As a result, my approach is to, in the preprocess step, randomly rotate images to one of the above four degrees and change the label accordingly. However, itmakes the time cost of preprocessing be the main part of the total time cost.
I want to use DALI to accelerate preprocessing process, and I wonder whether I could random rotate the image and change the label accordingly in the pipeline?

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report
@WYBupup WYBupup added the question Further information is requested label Dec 1, 2023
@JanuszL
Copy link
Contributor

JanuszL commented Dec 1, 2023

Hi @WYBupup,

Thank you for reaching out.
Yes, you can do that using the rotate operator and feed it with an output of the random uniform operator that selects the values from values=[0, 90, 180, 270] set. To adjust labels you can use the output from the same random operator and write a python operator if you do any elaborate adjustment of the labels or just express it using mathematical operators.

@jantonguirao jantonguirao assigned JanuszL and unassigned jantonguirao Dec 4, 2023
@WYBupup
Copy link
Author

WYBupup commented Dec 6, 2023

Thanks for your reply. I have almost complete the pipeline following your guidance.
But I encounter another problem. In my old-version python-based preprocessing pipeline, I resize the image to fixed size while maintaining the aspect ratio. And then, using cv2.copyMakeBorder to place the picture in the center and pad elements around it. I try to emulate this oepration using DALI, but it seems that the padding operator only supports single-direction padding. My objective is to emulate cv2.copyMakeBorder to pad around the original image.
I wonder if there is any operator to achieve the goal?

@mzient
Copy link
Contributor

mzient commented Dec 6, 2023

If you're already rotating the images, you can pass the size explicitly to fn.rotate - you can make it fill the borders with a constant value (monochrome!) or replicate the border. If either of those methods suits you, it will be cheaper to have one operator instead of two.

import nvidia.dali as dali
import nvidia.dali.fn as fn
import PIL.Image
import numpy as np

@dali.pipeline_def(batch_size=1, num_threads=4, device_id=0)
def mypipe():
    enc, _ = fn.readers.file(file_root=".", files=["alley.png"])
    img = fn.decoders.image(enc, device="mixed")
    img = fn.resize(img, mode="not_larger", size=256)
    rep = fn.rotate(img, angle=90, size=(256, 256))
    pad = fn.rotate(img, angle=90, size=(256, 256), fill_value=0)
    return rep, pad

pipe = mypipe()
pipe.build()
rep, pad = pipe.run()

The results:

PIL.Image.fromarray(np.array(rep.as_cpu()[0]))

image

PIL.Image.fromarray(np.array(pad.as_cpu()[0]))

image

If you need a color padding, you can, somewhat counterintuitively, use fn.crop:

    rot = fn.rotate(img, angle=90)
    crop = fn.crop(rot, crop_pos_x=0.5, crop_pos_y=0.5, crop=(256, 256), out_of_bounds_policy="pad", fill_values=[0x76, 0xb9, 0x00])

The result is:
image

@WYBupup
Copy link
Author

WYBupup commented Dec 6, 2023

thanks a lot! This is really helpful!

@mzient
Copy link
Contributor

mzient commented Dec 6, 2023

Also, if you're fine with bilinear resizing without antialiasing, then you can do all those transforms in one go with fn.warp_affine:

import nvidia.dali as dali
import nvidia.dali.fn as fn
import PIL.Image
import numpy as np

@dali.pipeline_def(batch_size=1, num_threads=4, device_id=0)
def mypipe():
    enc, _ = fn.readers.file(file_root=".", files=["alley.png"])
    img = fn.decoders.image(enc, device="mixed")
    shape = fn.peek_image_shape(enc)
    h = shape[0]
    w = shape[1]
    size = fn.stack(w, h)
    scale = dali.math.min(256/w, 256/h)
    out_size = fn.cast(scale * size, dtype=dali.types.INT32)
    
    # use negative angle, since here we use source-to-destination matrix
    mtx = fn.transforms.rotation(angle=-90, center=size/2)
    mtx = fn.transforms.scale(mtx, scale=fn.stack(scale, scale))
    mtx = fn.transforms.translation(mtx, offset=(256.0 - out_size) // 2)

    warped = fn.warp_affine(img, size=(256, 256), matrix=mtx, fill_value=0, inverse_map=False)
    return warped

pipe = mypipe()
pipe.build()
warped, = pipe.run()

The result is:
image

The aliasing artifacts are quite obvious when you compare this image to the previous ones, but if it's OK for you, then this method will certainly be the most performant one. The added benefit is that you end up with a complete transformation matrix, so if your labels are in fact some points, you can use this matrix to transform them. See this tutorial to learn how to use a transformation matrix to transform keypoints alongside images.

The methods sorted in efficiency order:

  1. warp_affine
  2. resize + rotate with border handling
  3. resize + rotate + crop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants