Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearer handling of cropping and resolutions: #267

Open
MartinoCesaratto opened this issue Apr 26, 2024 · 4 comments
Open

Clearer handling of cropping and resolutions: #267

MartinoCesaratto opened this issue Apr 26, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@MartinoCesaratto
Copy link

MartinoCesaratto commented Apr 26, 2024

Describe your use-case.

Right now the quick start guide suggests that I shouldn't really bother about resizing my dataset images, but it will be handled by Onetrainer if I activate resolution bucketing, but I noticed that when selecting multiple training resolution, if I set batch size to 1 it uses all samples, but if I set it to 2 the number of steps is less than half, so some image is not used anymore.

What's not really clear is what happens, let's make an example:

  • I set training resolutions to 512, 640, 768, 960.
  • I have a 639*641 image, is it always cropped to 512x640, or sometimes to 512x512?
  • I have a 256x320 image, is it upscaled to 512x640 or can sometimes end up at 768x960?

I also noticed that even with crop jitter enabled the preview is static, if I have a 1024*512 image do I get crops of image[0:960,0:512] and [64:1024, 0:512] or the crops are always centered? Will it sometimes be cropped to resolutions different from 960x512?

What would you like to see as a solution?

I have 5 proposals to improve both clarity and training:

  1. Use all images option: when batch size > 1, always try to have batch_size images for every resolution even if it involves using crops with less coverage of the original images
  2. correclty show crop jitter's effect in the preview (assuming righ now it only shows a centered square crop and not what's actually used)
  3. vary scaling option: if possible, also uses samples downscaled to lower resolutions, not only maximum one
  4. when using samples below a set resolution (even if upscaled), optionally add a set tag (for example "low resolution, low quality") to the prompt, same when above certain resolution (for example "high resolution")
  5. allow to set both horizontal and vertical resolution, so that i can set something like "384, 512x512, 768" and have as a set of allowed resolutions "384x384, 384x768, 512x512, 768x768, 768x384"

Have you considered alternatives? List them here.

right now I can probably have multiple copies of each image with different resolutions/aspect ratios/cropping, but would require a lot of them to truly cover each possible crop of each image

@MartinoCesaratto MartinoCesaratto added the enhancement New feature or request label Apr 26, 2024
@FurkanGozukara
Copy link

I really can't trust any of the scripts when it comes to bucketing and auto resize

I wish there was a button that process images and save exactly as that would be used during training

So if bucketing enabled they could be save under bucketing res folders

@MartinoCesaratto
Copy link
Author

The main reason I ask this is that I noticed that training with multiple resolutions slightly improves quality, but on some datasets it seems to overfit to a subset of the images at each available resolution, so I'd like to have each image used at multiple scales to prevent this

@dathide
Copy link

dathide commented May 19, 2024

I have had similar questions about how scaling works

@gilga2024
Copy link

What I can add is that if you add a number of repeats >1 to your concept setting and enable crop/jitter you actually seem to get multiple versions of the same picture(s) => judging on the amount of items cached. This also makes sense from the standpoint that after each epoch, a full run of all training data was performed. Hence my guess is, that for each repeat a crop/jitter "instance" of each image is created and each of these "instances" gets used during one epoch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants