Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[df-if II] add additional input checks to ensure the input is divisible by 8 #7844

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

bghira
Copy link
Contributor

@bghira bghira commented May 2, 2024

What does this PR do?

Fixes #7842

Adds logic to check_inputs for the IF SuperResolution pipeline so that the user receives a clear error when attempting to run the pipeline with invalid image sizes for the input.

This is possible to hit when using the super-resolution model for upscaling evaluation images during training, if eg. the target 256 pixel resolution is aligned to 8px intervals and then divided by 4 to obtain the input image size. The stage II output resolution will be okay, but the input resolution would be wrong.

I suppose there's other ways to hit the problem, but it's always been a bit murky which input is causing the problems.

Before submitting

Who can review?

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented May 3, 2024

ohh thanks for looking into this!
What we usually do with our image inputs is that in the preprocessing step we resize it to default height and width that are divisible by 8

def get_default_height_width(

so I think instead of adding the checks, we should just resize it, we can either adding the resize step to the preprocess_image method for the IF pipeline, or we can just refactor the method with the VaeImageProcessor like what we do in rest of the pipelines

self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)

what do you think?

@bghira
Copy link
Contributor Author

bghira commented May 3, 2024

i considered it. but because of the nature of this, i didn't really feel comfortable just squishing images on the users' behalf. with the small resolution of the inputs, it really can be noticeably distorted, whereas with SD and SDXL at 512/768/1024px it's far less destructive to adjust the size @yiyixuxu how do you feel about that mindset applied to a 64px model, where it might be somewhere around ~5-7% of the image size we end up adjusting by?

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@bghira
Copy link
Contributor Author

bghira commented May 7, 2024

@yiyixuxu i notice the quality checks failed because of some unnecessary list comprehension. but when i look at it, it seems like the most reasonable way to do it? is there a better way? i would love to learn 😁

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented May 8, 2024

can you run make style again?

@bghira
Copy link
Contributor Author

bghira commented May 11, 2024

@yiyixuxu done

@@ -543,12 +543,27 @@ def check_inputs(

if isinstance(image, list):
image_batch_size = len(image)
# Check that each image is the same size:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to do this in a separate code block:
So we keep this section as it is to check image_batch_size

and then

if isinstance(image, list):
    check_image_size = image[0]
else:
    check_image_size = image

if isinstance(check_image_size, PIL.Image.Image):
    image_size = check_image_size.size
elif isinstance(check_image_size, torch.Tensor):
   image_size = check_image_size.shape[2:]
elif isinstanc(..., np.ndarray):
    image_size = check_image.shape[:1]

if image_size ....:
    raise ValueError(...)

The current code does not work for list of array or tensors

image = floats_tensor((1, 3, 31, 31), rng=random.Random(0)).to(torch_device)
generator = torch.Generator(device="cpu").manual_seed(0)
with self.assertRaises(ValueError):
self.pipeline(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make sure this test works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't run the test suite locally, i was waiting for it to run on the workflow here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh so these are the tests that failed https://github.com/huggingface/diffusers/actions/runs/9044175339/job/24855064736#step:7:15620
I can trigger them again now but I think the results would be the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

deepfloyd stage 2 crashes with tensor size mismatch when input image size is not divisible by 8
3 participants