Community `data` contributions API #33

masonearles · 2022-10-31T14:07:19Z

We're opening up this issue regarding how to enable easy, yet high quality, data contributions to AgML. This was raised initially in Issue 15. If you are interested in contributing to this discussion and code development, let's have this conversation below.

geezacoleman · 2022-10-31T14:32:05Z

What you're doing here with AgML is really awesome, and will make using image datasets for testing/developing so much easier! I think this was briefly mentioned some time ago, but it would be great to form a connection between Weed-AI and AgML for the weeds image side of things.

Weed-AI now supports annotation through CVAT, so unannotated data can be annotated publicly before being uploaded to the platform. We've also worked on establishing agricultural metadata reporting standards for weeds called AgContext, so each dataset has information on where/how it was collected. There is also version control and dataset editing functions too. One limitation is that it currently is only for weeds, not all the various form of image data used in agriculture currently.

Helping make the API is a little beyond my skillset, but if it's something of interest I'd be happy to help some other way. At least it might help make the connection between annotation/upload > standardised metadata > use/editing > download.

KeynesYouDigIt · 2024-04-14T21:29:03Z

@amogh7joshi / @masonearles where is the s3 bucket located where the data goes now? I can for sure build and API no sweat, but I need a target :)

Also for you 2 plus @geezacoleman / other users

(To be clear your answers are going to depend on your use case and data sets, so just answer for what you know!)

what are the MOST important data quality specs, if any ? When should we say about a data set "this is could enough to accept and use" ?
what formats is data typically uploaded in?
what formats would we like download to support?
what else should I know about the most minimal version of this API?

Excited to get started!

KeynesYouDigIt · 2024-04-14T21:41:49Z

To be clear are we just looking for an automated API to do all of this stuff? Does the data just live here in the repo? https://github.com/Project-AgML/AgML/blob/main/CONTRIBUTING.md

KeynesYouDigIt · 2024-04-20T16:50:14Z

Perhaps this should be our target landing spot? https://www.tensorflow.org/datasets

masonearles · 2024-05-01T13:41:10Z

@KeynesYouDigIt As mentioned offline, the data currently lives in a publicly readable S3 bucket. We manage admin for write. It would be great to create a pipeline for AgML users to contribute data but with a gate for an admin to run a QA check before uploading to the S3 bucket.

masonearles added the enhancement New feature or request label Oct 31, 2022

masonearles assigned dariojavo, masonearles and amogh7joshi Oct 31, 2022

masonearles mentioned this issue Oct 31, 2022

Datasets on plant diseases, pests detection & miscellaneous #15

Open

masonearles pinned this issue Oct 31, 2022

geezacoleman mentioned this issue Oct 31, 2022

Integration with AgML Weed-AI/Weed-AI#703

Open

masonearles unpinned this issue Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Community `data` contributions API #33

Community `data` contributions API #33

masonearles commented Oct 31, 2022

geezacoleman commented Oct 31, 2022

KeynesYouDigIt commented Apr 14, 2024 •

edited

Loading

KeynesYouDigIt commented Apr 14, 2024

KeynesYouDigIt commented Apr 20, 2024

masonearles commented May 1, 2024

Community data contributions API #33

Community data contributions API #33

Comments

masonearles commented Oct 31, 2022

geezacoleman commented Oct 31, 2022

KeynesYouDigIt commented Apr 14, 2024 • edited Loading

KeynesYouDigIt commented Apr 14, 2024

KeynesYouDigIt commented Apr 20, 2024

masonearles commented May 1, 2024

Community `data` contributions API #33

Community `data` contributions API #33

KeynesYouDigIt commented Apr 14, 2024 •

edited

Loading