Features for first release #26

kevinyamauchi · 2024-03-01T14:42:54Z

This issue is to discuss the minimal features before our first "public" release (i.e., release where would be comfortable sharing with a broad audience).

Able to load custom embeddings (Allow users to import any additional voxel-wise feature map in addition to basic/TomoTwin features #3)
Simple docs with example workflows
sample data plugin so people can just get up and running to test on a small section of an image
segmentation works well for both membranes and particles
ability to store and load trained segmentation models (store trained segmentation models #37 )
ability to export segmentation as meshes and point annotations (star file? Support for starfiles #27)
pip install (add github actions for test and build #13)
separate model and view (Seprate the model from the GUI/rendering code #44 )

The broader goals and scope of CellCanvas can be discussed here: #24

kevinyamauchi · 2024-03-01T14:44:27Z

cc @kephale

kephale · 2024-03-01T14:49:59Z

It probably doesn't hurt to provide the sample data plugin as well, but I would just note that so far the strategy has been to run something like this album solution to do a fetch of the zarr, which will then be used to store the painting and prediction layers.

My impression with a sample data plugin is that it could make it easy to provide the input image and embedding, but wouldn't help with storing the painting/predictions, so this would require some changes with the approach.

One of the reasons that I like the current approach is that provenance is 100% clear. The data used for painting/prediction is contained within the same zarr. While writing this though I realize that maybe the sample data plugin would simply build a local zarr on the user's machine which could help with the provenance tracking.

kevinyamauchi · 2024-03-01T15:40:04Z

While writing this though I realize that maybe the sample data plugin would simply build a local zarr on the user's machine which could help with the provenance tracking.

We can also directly download from zenodo using pooch. See this sample data plugin that @alisterburt and I made as an example:

https://github.com/teamtomo/cryo-et-sample-data/blob/main/src/cryo_et_sample_data/_hiv.py

My impression with a sample data plugin is that it could make it easy to provide the input image and embedding, but wouldn't help with storing the painting/predictions, so this would require some changes with the approach.

I don't think we want to force the users into storing the labels back into the same file the image came from. I think this is especially true if we want to make working with remote data easy, as the user might not have write privileges to where the image and/or embedding are stored. I think we can add some convenience functions for saving the labels to various standard formats, but I would hope that the users can save from the labels layer via the standard napari machinery.

kevinyamauchi · 2024-03-01T15:41:20Z

It probably doesn't hurt to provide the sample data plugin as well, but I would just note that so far the strategy has been to run something like this album solution to do a fetch of the zarr, which will then be used to store the painting and prediction layers.

I don't fully understand how album works. Can you use album to install into an existing environment? Can the user install extra tools into an environment managed by album?

kephale · 2024-03-01T15:43:27Z

Mmm I've used pooch for this stuff before, and I do agree about not forcing users to keep the outputs in the same place (in the long term). I'm not sure when I'm comfortable breaking the current assumption because it makes development much simpler; it would be really easy to lose track of which embeddings were used to support which painting/predictions. That should probably just go into metadata though....

kephale · 2024-03-01T15:49:41Z

I don't fully understand how album works. Can you use album to install into an existing environment? Can the user install extra tools into an environment managed by album?

To be clear, album manages environments but it isn't an environment manager. We don't alter existing environments. If someone wants to use this album solution, then they:

Install album
Add the cellcanvas album catalog: album add-catalog https://github.com/cellcanvas/album-catalog
Install the album solution: album install ux_evaluation_winter2024:fetch_data:0.0.4
Run the album solution like any command line call: album run ux_evaluation_winter2024:fetch_data:0.0.4 --dataset_name cellcanvas_crop_007

This particular solution has a limited number of dataset options because it is part of the UX evaluation, but for other cases you would just point it at your zarr.

Things that you use typer and click for you might also use album for. album would additionally handle the environment + provide the catalog to make it easier to share/install on other systems. If you are only using album solutions, then you never touch conda/mamba, you just let album do that.

kevinyamauchi · 2024-03-02T10:15:04Z

Thanks for the explanation, @kephale ! I definitely need to look into album more. It seems neat. I am open to continuing to support an album installation route if it makes it easy to distribute versions of the software with data (e.g., for the UX evals).

I think it's important to also make sure cellcanvas is installable and usable via pip without installing album for the sake of interoperability (e.g., other packages or distributions want to depend on cellcanvas). In doing so, I think it's important to provide some small sample data so people can quickly test. It seems to me like a napari sample data plugin with sample data fetchable via pooch is the easiest way, but I am open to other solutions.

I do agree about not forcing users to keep the outputs in the same place (in the long term). I'm not sure when I'm comfortable breaking the current assumption because it makes development much simpler; it would be really easy to lose track of which embeddings were used to support which painting/predictions. That should probably just go into metadata though....

I think we should consider doing so soon. Or at least add a layer of abstraction for IO so that it is easy to swap out the source of the different arrays (image, embedding, labels, etc.) in the future. I think we should do it soon because the longer we wait, the more IO operations will be sprinkled through out the codebase that we have to update. Perhaps we can discuss when you're here in Basel.

kephale · 2024-03-02T15:24:56Z

Swapping out the arrays is actually a top priority now for these reasons and a few more (e.g. annotating/fitting multiple tomograms at once).

Yeah, album is useful for more than just UX evals, but the intent isn't to have it the only distribution mechanism. pip, conda, probably a click interface, etc..

kephale added this to the Initial Release milestone Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features for first release #26

Features for first release #26

kevinyamauchi commented Mar 1, 2024 •

edited

kevinyamauchi commented Mar 1, 2024

kephale commented Mar 1, 2024

kevinyamauchi commented Mar 1, 2024

kevinyamauchi commented Mar 1, 2024

kephale commented Mar 1, 2024

kephale commented Mar 1, 2024

kevinyamauchi commented Mar 2, 2024

kephale commented Mar 2, 2024

Features for first release #26

Features for first release #26

Comments

kevinyamauchi commented Mar 1, 2024 • edited

kevinyamauchi commented Mar 1, 2024

kephale commented Mar 1, 2024

kevinyamauchi commented Mar 1, 2024

kevinyamauchi commented Mar 1, 2024

kephale commented Mar 1, 2024

kephale commented Mar 1, 2024

kevinyamauchi commented Mar 2, 2024

kephale commented Mar 2, 2024

kevinyamauchi commented Mar 1, 2024 •

edited