Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options to choose min and max images per class #161

Closed
tambetm opened this issue Jul 7, 2015 · 6 comments
Closed

Add options to choose min and max images per class #161

tambetm opened this issue Jul 7, 2015 · 6 comments

Comments

@tambetm
Copy link

tambetm commented Jul 7, 2015

Add options to choose min images and max images per class when adding new dataset. This would ignore folders (classes) that have less than or more than given number of images. These fields could be added to dataset creation form, below validation set creation options. This would in many cases allow using of original dataset folder without additional preprocessing.

@lukeyeager
Copy link
Member

That's a good idea. I actually have this feature implemented already in the parse_folder tool:
https://github.com/NVIDIA/DIGITS/blob/v2.0.0-preview/tools/parse_folder.py#L494-L504

I just haven't exposed it through the web interface yet.

@tambetm
Copy link
Author

tambetm commented Jul 8, 2015

Thanks @lukeyeager, I knew I had seen these options somewhere in Digits, but had forgotten where.

Still exposing them through web interface would be nice.

@gheinrich
Copy link
Contributor

Should the min/max conditions apply to the training set only or should they also apply to the validation/test sets? Should we ignore min/max number of images per class when samples are specified through text files? Thanks.

@tambetm
Copy link
Author

tambetm commented Aug 2, 2015

I think min/max should be applied before dividing it into train/validation/test set. For example if you set min to 10 and train/val/test split to 60/20/20, then each class will have at least 6 training samples, 2 validation samples and 2 test samples. In case it doesn't split evenly, I would round down validation and test numbers and use remaining for training, so all samples are still used.

I agree, that when samples are specified using text files, min/max constraints should be ignored.

Tambet

Sent from a device without proper keyboard

On 01.08.2015, at 13:38, gheinrich notifications@github.com wrote:

Should the min/max conditions apply to the training set only or should they also apply to the validation/test sets? Should we ignore min/max number of images per class when samples are specified through text files? Thanks.


Reply to this email directly or view it on GitHub.

gheinrich added a commit to gheinrich/DIGITS that referenced this issue Aug 3, 2015
@gheinrich
Copy link
Contributor

Hi @tambetm, I have created a Pull Request #191 for this. Would you like to check this addresses your request? I have added the option to select min/max number of samples for the training set only. I figured that this wasn't necessary if the user specifies the train/test sets directly.

gheinrich added a commit to gheinrich/DIGITS that referenced this issue Aug 4, 2015
Add ParseFolderTasks info in dataset job JSON

Add tests for image counts and min/max samples per category
gheinrich added a commit to gheinrich/DIGITS that referenced this issue Aug 5, 2015
Add ParseFolderTasks info in dataset job JSON

Add tests for image counts and min/max samples per category
gheinrich added a commit to gheinrich/DIGITS that referenced this issue Aug 5, 2015
Add ParseFolderTasks info in dataset job JSON

Add tests for image counts and min/max samples per category
@tambetm
Copy link
Author

tambetm commented Aug 6, 2015

Sorry, I was on a vacation and now catching up with backlog. I'll give this a try ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants