New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[df-if II] add additional input checks to ensure the input is divisible by 8 #7844
base: main
Are you sure you want to change the base?
Conversation
ohh thanks for looking into this! diffusers/src/diffusers/image_processor.py Line 407 in 5823736
so I think instead of adding the checks, we should just resize it, we can either adding the resize step to the diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py Line 292 in 5823736
what do you think? |
i considered it. but because of the nature of this, i didn't really feel comfortable just squishing images on the users' behalf. with the small resolution of the inputs, it really can be noticeably distorted, whereas with SD and SDXL at 512/768/1024px it's far less destructive to adjust the size @yiyixuxu how do you feel about that mindset applied to a 64px model, where it might be somewhere around ~5-7% of the image size we end up adjusting by? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@yiyixuxu i notice the quality checks failed because of some unnecessary list comprehension. but when i look at it, it seems like the most reasonable way to do it? is there a better way? i would love to learn 😁 |
src/diffusers/pipelines/deepfloyd_if/pipeline_if_superresolution.py
Outdated
Show resolved
Hide resolved
…on.py Co-authored-by: YiYi Xu <yixu310@gmail.com>
can you run |
@yiyixuxu done |
@@ -543,12 +543,27 @@ def check_inputs( | |||
|
|||
if isinstance(image, list): | |||
image_batch_size = len(image) | |||
# Check that each image is the same size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to do this in a separate code block:
So we keep this section as it is to check image_batch_size
and then
if isinstance(image, list):
check_image_size = image[0]
else:
check_image_size = image
if isinstance(check_image_size, PIL.Image.Image):
image_size = check_image_size.size
elif isinstance(check_image_size, torch.Tensor):
image_size = check_image_size.shape[2:]
elif isinstanc(..., np.ndarray):
image_size = check_image.shape[:1]
if image_size ....:
raise ValueError(...)
The current code does not work for list of array or tensors
image = floats_tensor((1, 3, 31, 31), rng=random.Random(0)).to(torch_device) | ||
generator = torch.Generator(device="cpu").manual_seed(0) | ||
with self.assertRaises(ValueError): | ||
self.pipeline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make sure this test works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't run the test suite locally, i was waiting for it to run on the workflow here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh so these are the tests that failed https://github.com/huggingface/diffusers/actions/runs/9044175339/job/24855064736#step:7:15620
I can trigger them again now but I think the results would be the same
What does this PR do?
Fixes #7842
Adds logic to check_inputs for the IF SuperResolution pipeline so that the user receives a clear error when attempting to run the pipeline with invalid image sizes for the input.
This is possible to hit when using the super-resolution model for upscaling evaluation images during training, if eg. the target 256 pixel resolution is aligned to 8px intervals and then divided by 4 to obtain the input image size. The stage II output resolution will be okay, but the input resolution would be wrong.
I suppose there's other ways to hit the problem, but it's always been a bit murky which input is causing the problems.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?