Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets: best practices and questions #147

Closed
constantinpape opened this issue Jul 14, 2021 · 21 comments
Closed

Datasets: best practices and questions #147

constantinpape opened this issue Jul 14, 2021 · 21 comments

Comments

@constantinpape
Copy link
Contributor

I want to add a few datasets to the website to document the training for my models better. However I am currently not quite sure what the best practices for this are. In particular:

  • Are there any example rdfs available? I could not find any.
  • The "source" button leads to the url of the dataset, and does not show the rdf (as is the case for the model).
    • I think it would be good to have the buttons do the same thing in both cards, so maybe we could change the symbol in the dataset card to something more indicative that this will take you to a new url?
    • I think it would be good to also have a button that displays an rdf in the dataset card.
  • What are the requirements for sources? Does it have to be zenodo? I would also like to just link to e.g. https://cremi.org/
  • Is it possible to update the dataset rdfs? (I think that would be fairly important)
@oeway
Copy link
Collaborator

oeway commented Jul 14, 2021

Hi @constantinpape It's good that you started to look into this. We have very loose definition at the moment, here are some examples from zero: https://github.com/HenriquesLab/ZeroCostDL4Mic/blob/8ba67e98a894cb271b98fa8da2ec46af46a541fe/manifest.bioimage.io.yaml#L29-L55

The source can be any URI at the moment. So you can basically upload a rdf.yaml file with source=https://cremi.org/... to zenodo. You should be able to do this the same way as you do for models, and you should be able to update it as well (we don't have dedicated UI for dataset though, so you need to edit the yaml file directly).

(also see the dataset RDF discussion here)

@constantinpape
Copy link
Contributor Author

Thanks, that's a good starting point! I will see how upload via the website works for this.

@constantinpape
Copy link
Contributor Author

@oeway, I tried uploading a dataset rdf.yaml, but it fails:

ds_upload-2021-07-14_15.34.20.mp4

The console shows the following error:

Error: Can't find end of central directory : is this a zip file ? If it is, see https://stuk.github.io/jszip/documentation/howto/read_zip.html
    readEndOfCentral chunk-vendors.470b33a2.js:67758
    load chunk-vendors.470b33a2.js:67758
    exports chunk-vendors.470b33a2.js:67758

@oeway
Copy link
Collaborator

oeway commented Jul 14, 2021

Ok, I will fix that!

@oeway
Copy link
Collaborator

oeway commented Jul 14, 2021

@constantinpape It should been fixed now! Sorry that I forgot to implement the yaml load at all, we treated is as a zip and try to parse with jszip. Please give it a try in the preview (BTW, you can find a bug report button in the netlify preview page located in the lower left corner, if needed you can use that to record screen and fire bug report directly ;).

BTW, I am thinking also to replace the jszip with uzip in the website too.

@constantinpape
Copy link
Contributor Author

constantinpape commented Jul 14, 2021

@oeway thanks for fixing this, and in theory it works now, but it requires a license in order to upload it.
For datasets that can be a bit difficult, because for many of the datasets I am not actually hosting the data, but just adding a link to the resource so that we can reference it from inside the model-zoo.
For these data-sets I am not sure what license to use...

Edit: I will go with CC-BY-4 for now, but this is something we should discuss.

@oeway
Copy link
Collaborator

oeway commented Jul 14, 2021

If we just add a link to the RDF, I would actually interpret the license as the license for the RDF file only. The license would apply to the entire dataset if the files are referred in relative path in the same zenodo deposit.

How about that? We can also add a note for the license definition.

@constantinpape
Copy link
Contributor Author

If we just add a link to the RDF, I would actually interpret the license as the license for the RDF file only. The license would apply to the entire dataset if the files are referred in relative path in the same zenodo deposit.

Yes, that makes sense but it feels a bit unnecessary to add a license if we just link to some external resource.
But it's not a big deal, we can keep it this way and add some clarification on this.

@constantinpape
Copy link
Contributor Author

@oeway did you see the 2 datasets i uploaded and could you add them to the zenodo community so that they show in the website?

@oeway
Copy link
Collaborator

oeway commented Jul 14, 2021

@constantinpape I already accepted it a while ago, and I just checked the console log of the website, it doesn't like the cover URL, because we assume the cover image are uploaded to zenodo.

I made it like that because an arbitrary cover image may disappear, or simple because the server doesn't have CORS enabled.

So it would be better if you can upload include the cover image in the upload, and use a relative file path at least for now. BTW, it seems the image from the URL you provided are rather big, could you provide a smaller file to improve the loading time? e.g. a few hundreds pixels.

Edit: in the future, I can also implement in the upload dialog that we will (try to) pull the external image cover image, resize it and upload to zenodo.

@constantinpape
Copy link
Contributor Author

So it would be better if you can upload include the cover image in the upload, and use a relative file path at least for now.

Ok, I will fix it.

BTW, it seems the image from the URL you provided are rather big, could you provide a smaller file to improve the loading time? e.g. a few hundreds pixels.

Ok. Btw, do we suport gifs?

@constantinpape
Copy link
Contributor Author

@oeway I updated the two datasets on zenodo, but still can't see them on the website.

@oeway
Copy link
Collaborator

oeway commented Jul 14, 2021

Ok. Btw, do we suport gifs?

Yes, we do

@oeway I updated the two datasets on zenodo, but still can't see them on the website.

I still see the error from the console from bioimage.io:

app.234425ac.js:1 Error: Invalid file identifier: https://grand-challenge-public-prod.s3.amazonaws.com/b/566/banner.x10.jpeg

One issue might be that when we update the covers, we somehow miss the metadata on zenodo, these identifiers needed to be updated too:
Screen Shot 2021-07-14 at 10 29 28 PM

You updated from the bioimageio upload page, right? I will need to look into it why it doesn't fix the metadata.

@constantinpape
Copy link
Contributor Author

You updated from the bioimageio upload page, right? I will need to look into it why it doesn't fix the metadata.

No, I updated directly on zenodo. Updating via page was not so easy because it wasn't listed there yet.

@oeway
Copy link
Collaborator

oeway commented Jul 14, 2021

That's true, ok then you only need to update the URL identifiers manually, you can copy the URL of the cover image from zenodo, then edit and change the identifier URLs for the cover image.

An example URL for the cover image: https://sandbox.zenodo.org/record/880817/files/input.png

EDIT: Now you can see the two: https://bioimage.io/#/?type=dataset&id=10.5072%2Fzenodo.881020 so you can also use the edit button to update from bioimage.io.

@constantinpape
Copy link
Contributor Author

Thanks, @oeway. It works now.

@oeway
Copy link
Collaborator

oeway commented Jul 15, 2021

Thanks, @oeway. It works now.

Great! BTW, do you see the cover images for this one is broken: https://bioimage.io/#/?type=dataset&id=10.5072%2Fzenodo.881018 Could you try to update it from the upload page?

@constantinpape
Copy link
Contributor Author

Great! BTW, do you see the cover images for this one is broken: https://bioimage.io/#/?type=dataset&id=10.5072%2Fzenodo.881018

For me it works:
Screenshot from 2021-07-15 14-41-46

@oeway
Copy link
Collaborator

oeway commented Jul 15, 2021

@constantinpape I just checked, it only works in firefox, not chrome. Here is the error:

Failed to load resource: net::ERR_CERT_COMMON_NAME_INVALID
https://brainiac2.mit.edu/isbi_challenge/sites/default/files/Challenge-ISBI-2012-sample-image.png

This is another reason for us to upload the coverimage, I think. And I guess you already did, but you just need to update the zenodo meta info, could you click edit button and go through the upload process once more?

@constantinpape
Copy link
Contributor Author

Ok, I will update it later. Unrelated: I pinged you in gitter, could you have a quick look and check why the isbi model is not displayed on the website rn?

@constantinpape
Copy link
Contributor Author

I updated the isbi dataset now and checked that it displays correctly in chrome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants