Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download data in different formats #675

Open
hlydecker opened this issue Apr 27, 2022 · 2 comments
Open

Download data in different formats #675

hlydecker opened this issue Apr 27, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@hlydecker
Copy link
Contributor

It might be very useful to allow users to download data directly into an ML ready format such as YOLO.

I could probably repurpose some of the dataset utilities I developed for camera traps to do this, because we turn a COCO like dataset into YOLO.

YOLO requires:
a dataset.yaml with:

path: weed_ai_dataset
train: train/images
val: val/images
test: test/images

nc: 2
names: ['Weed: Lolium rigidum','Weed: Sonchus oleraceus']

Where path directs to where the images are stored relative to the YOLO model install.

Individual annotations are stored for each image in text files with the same name as the image, but with the .txt extension. These are in the "labels" directory that sits adjacent to the relevant "images" directory.

These are space separated text files, with:
<class_id> <x_center> <y_center>

0 0.25 0.1 0.43 0.3

Note that x,y,w,h are all in relative percentage of the image.

This is not for us to deal with now, but would be useful in the future!

@hlydecker hlydecker added the enhancement New feature or request label Apr 27, 2022
@geezacoleman
Copy link
Collaborator

This would be a great feature - I've been putting together a Google Colab file to train a YOLOv5 model with Weed-AI datasets. It could be a good interim solution. CVAT also offers export/upload in various different formats which might help the process.

Came across this converter from Ultralytics which might also help.

@hlydecker
Copy link
Contributor Author

So some more thoughts:

  1. Minimum viable product: WeedCOCO -> COCO converter. There would be a button on the dataset page "Download Model Ready" (or something similar) which would download a COCO zip file with a COCO format dataset, with the AgContext object split off into a dataset description sort of document (JSON, YAML, md).
  2. Full featured version: Selected dataset export. Allow COCO, YOLOv5, VOC. We would need to build in a test/train/val split functionality to make it work with YOLO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants