Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community data contributions API #33

Open
masonearles opened this issue Oct 31, 2022 · 5 comments
Open

Community data contributions API #33

masonearles opened this issue Oct 31, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@masonearles
Copy link
Contributor

We're opening up this issue regarding how to enable easy, yet high quality, data contributions to AgML. This was raised initially in Issue 15. If you are interested in contributing to this discussion and code development, let's have this conversation below.

@geezacoleman
Copy link

What you're doing here with AgML is really awesome, and will make using image datasets for testing/developing so much easier! I think this was briefly mentioned some time ago, but it would be great to form a connection between Weed-AI and AgML for the weeds image side of things.

Weed-AI now supports annotation through CVAT, so unannotated data can be annotated publicly before being uploaded to the platform. We've also worked on establishing agricultural metadata reporting standards for weeds called AgContext, so each dataset has information on where/how it was collected. There is also version control and dataset editing functions too. One limitation is that it currently is only for weeds, not all the various form of image data used in agriculture currently.

Helping make the API is a little beyond my skillset, but if it's something of interest I'd be happy to help some other way. At least it might help make the connection between annotation/upload > standardised metadata > use/editing > download.

@KeynesYouDigIt
Copy link

KeynesYouDigIt commented Apr 14, 2024

@amogh7joshi / @masonearles where is the s3 bucket located where the data goes now? I can for sure build and API no sweat, but I need a target :)

Also for you 2 plus @geezacoleman / other users

(To be clear your answers are going to depend on your use case and data sets, so just answer for what you know!)

  1. what are the MOST important data quality specs, if any ? When should we say about a data set "this is could enough to accept and use" ?

  2. what formats is data typically uploaded in?

  3. what formats would we like download to support?

  4. what else should I know about the most minimal version of this API?

Excited to get started!

@KeynesYouDigIt
Copy link

To be clear are we just looking for an automated API to do all of this stuff? Does the data just live here in the repo? https://github.com/Project-AgML/AgML/blob/main/CONTRIBUTING.md

@KeynesYouDigIt
Copy link

Perhaps this should be our target landing spot? https://www.tensorflow.org/datasets

@masonearles masonearles unpinned this issue Apr 28, 2024
@masonearles
Copy link
Contributor Author

@KeynesYouDigIt As mentioned offline, the data currently lives in a publicly readable S3 bucket. We manage admin for write. It would be great to create a pipeline for AgML users to contribute data but with a gate for an admin to run a QA check before uploading to the S3 bucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants