Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support export balance dataset from imbalance source #4107

Closed
shaojun opened this issue Jan 4, 2022 · 3 comments
Closed

Support export balance dataset from imbalance source #4107

shaojun opened this issue Jan 4, 2022 · 3 comments
Labels
question Further information is requested

Comments

@shaojun
Copy link
Contributor

shaojun commented Jan 4, 2022

Hi, thanks for the great effort.

I'm using transfer learning for training an object detection model, the model convered 3 classes:

  • people
  • bicycle
  • custom sign

the people and bicycle will get from public dataset like PASCAL VOC, which the amount is big, while the custom sign is a private dataset with very small amount, say 1000 images with a single label in each.

As the training tool requires a balance dataset for each class for achieve a good accuracy, so I'm planning import the full PASCAL VOC and custom dataset into 2 CVAT tasks (2 tasks are in one project), after labeled the custom sign data, when doing the export, I expect the exported (I prefer the format KITTI) dataset can achieve:

  • limited classes
    as PASCAL VOC has 20 classes, here I only want export people, bicycle, and plus custom sign
  • Balance distribution for that 3 classes
    as PASCAL VOC has thousands data for each class, for balancing with custom sign, each class need to align with amount 1000.

does the tool now support this?

@shaojun shaojun changed the title Support export balance dataset from imbalance source(project) Support export balance dataset from imbalance source Jan 4, 2022
@efcy
Copy link
Contributor

efcy commented Jan 4, 2022

I use datumaro (https://github.com/openvinotoolkit/datumaro) for this usecase. It used to be a part of cvat and it is now it's own thing. You can download the tasks in one format, convert it to the format you want and remove classes you don't like. Balancing can be easily achieved by just sampling the annotations and removing the rest.

@shaojun
Copy link
Contributor Author

shaojun commented Jan 4, 2022

@StellaASchlotter thanks.

Balancing can be easily achieved by just sampling the annotations and removing the rest.

could you specify more on this? is it by manual or?

@efcy
Copy link
Contributor

efcy commented Jan 4, 2022

You could do it manually if you really want to. I was thinking more like this: openvinotoolkit/datumaro#522

There is no command which does exactly want you want. But you can write a small script to achieve your goal. In the issue I linked there are some suggestions on how to get it working.

It's also possible to not use datumaro or other libraries like that and write the code for it completly yourself.

@nmanovic nmanovic added the question Further information is requested label Jan 5, 2022
@nmanovic nmanovic added this to To do in Dataset framework (Datumaro) via automation Jan 5, 2022
Dataset framework (Datumaro) automation moved this from To do to Done Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

4 participants