Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features for first release #26

Open
3 of 8 tasks
kevinyamauchi opened this issue Mar 1, 2024 · 8 comments
Open
3 of 8 tasks

Features for first release #26

kevinyamauchi opened this issue Mar 1, 2024 · 8 comments

Comments

@kevinyamauchi
Copy link
Collaborator

kevinyamauchi commented Mar 1, 2024

This issue is to discuss the minimal features before our first "public" release (i.e., release where would be comfortable sharing with a broad audience).

The broader goals and scope of CellCanvas can be discussed here: #24

@kevinyamauchi
Copy link
Collaborator Author

cc @kephale

@kephale
Copy link
Collaborator

kephale commented Mar 1, 2024

It probably doesn't hurt to provide the sample data plugin as well, but I would just note that so far the strategy has been to run something like this album solution to do a fetch of the zarr, which will then be used to store the painting and prediction layers.

My impression with a sample data plugin is that it could make it easy to provide the input image and embedding, but wouldn't help with storing the painting/predictions, so this would require some changes with the approach.

One of the reasons that I like the current approach is that provenance is 100% clear. The data used for painting/prediction is contained within the same zarr. While writing this though I realize that maybe the sample data plugin would simply build a local zarr on the user's machine which could help with the provenance tracking.

@kephale kephale added this to the Initial Release milestone Mar 1, 2024
@kevinyamauchi
Copy link
Collaborator Author

While writing this though I realize that maybe the sample data plugin would simply build a local zarr on the user's machine which could help with the provenance tracking.

We can also directly download from zenodo using pooch. See this sample data plugin that @alisterburt and I made as an example:

https://github.com/teamtomo/cryo-et-sample-data/blob/main/src/cryo_et_sample_data/_hiv.py

My impression with a sample data plugin is that it could make it easy to provide the input image and embedding, but wouldn't help with storing the painting/predictions, so this would require some changes with the approach.

I don't think we want to force the users into storing the labels back into the same file the image came from. I think this is especially true if we want to make working with remote data easy, as the user might not have write privileges to where the image and/or embedding are stored. I think we can add some convenience functions for saving the labels to various standard formats, but I would hope that the users can save from the labels layer via the standard napari machinery.

@kevinyamauchi
Copy link
Collaborator Author

It probably doesn't hurt to provide the sample data plugin as well, but I would just note that so far the strategy has been to run something like this album solution to do a fetch of the zarr, which will then be used to store the painting and prediction layers.

I don't fully understand how album works. Can you use album to install into an existing environment? Can the user install extra tools into an environment managed by album?

@kephale
Copy link
Collaborator

kephale commented Mar 1, 2024

Mmm I've used pooch for this stuff before, and I do agree about not forcing users to keep the outputs in the same place (in the long term). I'm not sure when I'm comfortable breaking the current assumption because it makes development much simpler; it would be really easy to lose track of which embeddings were used to support which painting/predictions. That should probably just go into metadata though....

@kephale
Copy link
Collaborator

kephale commented Mar 1, 2024

I don't fully understand how album works. Can you use album to install into an existing environment? Can the user install extra tools into an environment managed by album?

To be clear, album manages environments but it isn't an environment manager. We don't alter existing environments. If someone wants to use this album solution, then they:

  1. Install album
  2. Add the cellcanvas album catalog: album add-catalog https://github.com/cellcanvas/album-catalog
  3. Install the album solution: album install ux_evaluation_winter2024:fetch_data:0.0.4
  4. Run the album solution like any command line call: album run ux_evaluation_winter2024:fetch_data:0.0.4 --dataset_name cellcanvas_crop_007

This particular solution has a limited number of dataset options because it is part of the UX evaluation, but for other cases you would just point it at your zarr.

Things that you use typer and click for you might also use album for. album would additionally handle the environment + provide the catalog to make it easier to share/install on other systems. If you are only using album solutions, then you never touch conda/mamba, you just let album do that.

@kevinyamauchi
Copy link
Collaborator Author

Thanks for the explanation, @kephale ! I definitely need to look into album more. It seems neat. I am open to continuing to support an album installation route if it makes it easy to distribute versions of the software with data (e.g., for the UX evals).

I think it's important to also make sure cellcanvas is installable and usable via pip without installing album for the sake of interoperability (e.g., other packages or distributions want to depend on cellcanvas). In doing so, I think it's important to provide some small sample data so people can quickly test. It seems to me like a napari sample data plugin with sample data fetchable via pooch is the easiest way, but I am open to other solutions.

I do agree about not forcing users to keep the outputs in the same place (in the long term). I'm not sure when I'm comfortable breaking the current assumption because it makes development much simpler; it would be really easy to lose track of which embeddings were used to support which painting/predictions. That should probably just go into metadata though....

I think we should consider doing so soon. Or at least add a layer of abstraction for IO so that it is easy to swap out the source of the different arrays (image, embedding, labels, etc.) in the future. I think we should do it soon because the longer we wait, the more IO operations will be sprinkled through out the codebase that we have to update. Perhaps we can discuss when you're here in Basel.

@kephale
Copy link
Collaborator

kephale commented Mar 2, 2024

Swapping out the arrays is actually a top priority now for these reasons and a few more (e.g. annotating/fitting multiple tomograms at once).

Yeah, album is useful for more than just UX evals, but the intent isn't to have it the only distribution mechanism. pip, conda, probably a click interface, etc..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants