Brainstorming changes to training preprocessing #3802

enn-nafnlaus · 2022-10-27T15:18:40Z

enn-nafnlaus
Oct 27, 2022

Not sure I'll actually have time for this, but in case I do... I'm thinking about making changes to the preprocessing tab. It's nice to have, but sorely deficient at present. I'd like to brainstorm some ideas about what would be a desirable approach.

Right now you're sort of operating blind - you just pick two directories and then do the same steps on every file in them, and have no individualized control.

General:

I think we source and destination list of the files in their respective directories.
When you choose a directory, it should automatically load all of the images in that directory.
Clicking on images in either tab should show the image in the preview window

Structural:

All training input and output directories should be subdirectories of a fixed location, specified in the Settings tab.
Thus you can easily select directories as a dropdown, and add or remove them. Including a dropdown the training tab - no more pasting it in.
This would also remove any potential security risks of people being able to specify arbitrary directories.
Might be wise to be able to lock a directory against accidental removal / overwrite (chmod), since a lot of work goes into building up datasets.

Input tab:

You should be able to select many files at once (multi-select) and check or uncheck options (flip, split, etc) for them, on a per-file (not global) basis.
You should be able to delete images.
There should (eventually) be more options, esp. for hires images - subsampling grids at different resolutions, rotating images (within a user-specified range), excluding images where more than X percentage of the pixels are within a specified magnitude from a specified background colour, etc. With these sort of options, a dataset of just a few hires images could easily inflate into a dataset of thousands of 512x512 images, from the very zoomed out to extreme closeups.
For subsampling a grid of smaller images from a hires image, you should be able to specify a zoom level at which the word "Closeup" is added to the generated prompt.
You should be able to specify the total number of images you want to generate, and specify a priority on a per-image basis.
Priorities which are too high (too low res / too limited options for modifiying them) to generate their proportional share of images should be flagged in red.
Maybe even some day add a button to auto-fetch images matching a given prompt from e.g. Google Images.

Output tab:

You should be able to edit captions of selected files individually
You should be able to prepend text to all selected captions (multi-select)
You should be able to do find-and-replace with all selected captions (e.g. if you're training a dataset on John Q. Public, and Blip keeps saying "A man" or "A person", you'll want to be able to bulk-replace those statements with "John Q. Public").
You should be able to delete images

What are people's thoughts on this? Obviously - if I have time to work on it at all (I've started browsing through the code), I wouldn't work on everything at once. But before one can start they have to know what's even a desirable direction to move. What do people think? Right now, making a dataset is tedious, and it'd be nice if it were a bit less tedious.

I figured, as a series of steps, it might go:

Structural changes
Input / Output subtabs and a output loader / selector with image preview.
The rest of the output tab features
Input loader with image preview and delete
Per-image input options
More per-image processing options, esp. for high-res images (subsampling at different resolutions, rotations within a given range, elimination of subimages that contain too much background colour, etc)
Total specifiable number of images to generate, with a per-file priority
Google Images autofetch

In the end, it could be amazing to have a workflow like "Specify a search string and autodownload the images, delete the ones you don't like, bulk-choose reasonable parameters for subsampling the images, tell it to generate thousands of training images from your dataset, and then bulk-edit the resultant captions."

I dunno whether I'll even have time - just thinking about this. What are everyone's thoughts? It's just been tedious generating and labeling datasets manually, both in terms of turning hi-res images into numerous smaller images, and in terms of Blip not labeling the very thing you want to train in. And I've been writing various bash and python scripts to help myself out, but I know that's not the right solution - the RIGHT solution is that all this would be in the interface itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brainstorming changes to training preprocessing #3802

{{title}}

Replies: 0 comments

Select a reply

Brainstorming changes to training preprocessing #3802

enn-nafnlaus Oct 27, 2022

Replies: 0 comments

enn-nafnlaus
Oct 27, 2022