Example Dataset #685

srivarra · 2022-08-30T21:48:31Z

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Closes #657. Adds an example dataset available at Hugging Face.

How did you implement your changes

Added a set of example FOVs in the dataset here.

Remaining issues

Add an option to download the example dataset in the jupyter notebook.
Adjust the notebook paths to automatically work with the default dataset.
In the future, add another version of the dataset with all intermediate data, and a small dataset of a couple of fovs and channels for rapid testing.

review-notebook-app · 2022-09-01T00:47:22Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

srivarra · 2022-09-01T00:55:04Z

Here is the current structure after downloading the example dataset and running through Notebook 1.

data/
└── example_dataset/
    ├── image_data/input_data/
    │   ├── fov0/
    │   │   ├── CD3.tiff
    │   │   ├── CD4.tiff
    │   │   ├── ...
    │   │   └── Vim.tiff
    │   ├── ...
    │   └── fov10/
    │       ├── CD3.tiff
    │       ├── CD4.tiff
    │       ├── ...
    │       └── Vim.tiff
    ├── segmentation/
    │   ├── deepcell_input/
    │   ├── deepcell_output/
    │   ├── deepcell_visualization/
    │   └── cell_table/
    ├── pixie/
    ├── post_clustering/
    │   ├── mantis/
    │   └── masks/
    └── analysis/
        ├── spatial_enrichment/
        └── spatial_lda/
            ├── processed/
            └── visualization/

…tuitive), and update notebook tests

ngreenwald · 2022-09-01T15:01:51Z

Can we have all of the segmentation subfolders at the same level? deepcell_input, deepcell_output, deepcell_visualization, cell_tables. And put all of those in a folder called segmentation instead of processed? And instead of raw, call the folder image_data, so it's exactly the same as toffy, without any subfolders, just the FOV folders

…taset

…_masks, change dtype from int16 to int32

ark/utils/deepcell_service_utils.py

alex-l-kong · 2022-09-03T18:31:36Z

Yes, that’s still an issue.

…

On Sep 3, 2022, at 10:10 AM, Noah F. Greenwald ***@***.***> wrote: @ngreenwald commented on this pull request. In ark/utils/deepcell_service_utils.py: > """ float_mask = imread(BytesIO(seg_mask)) # Reshape as ranked_mask returns a 1D numpy array, dims: n^2 x 1 -> 1 x n x n shape = float_mask.shape # Create the ranked mask - ranked_mask: np.ndarray = stats.rankdata(float_mask).astype(dtype="int16").reshape(shape) + ranked_mask: np.ndarray = stats.rankdata(float_mask).astype(dtype="int32").reshape(shape) Right, I get that it removes the size 0 cells, but are there still cell IDs in the millions? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

alex-l-kong · 2022-09-04T19:27:18Z

@srivarra I did some testing and reading of rankdata documentation, rank is inherently non-contiguous. Let's say you have a 1024x1024 image that's all 0 except for 1 cell. That 1 is going to get assigned a rank of 1048576 (1024 ** 2).

Main question is why isn't _convert_deepcell_seg_masks currently being tested? A test would've likely revealed this source of error beforehand.

Also wanted to re-verify that the raw segmentation output labels in #609 matched up with the previous version. If that's the case, it could also mean the inherent scipy implementation of rankdata changed (the library has gone through multiple updates since _convert_deepcell_seg_masks was implemented, including one just 9 days ago).

srivarra · 2022-09-04T21:30:47Z

@alex-l-kong
There wasn't a test function since the bytes input wasn't formatting exactly as how deepcell returns the segmentation mask. I've just figured it out however, and push the test in once we figure out which algorithm we want to use.

Here is the function _convert_deepcell_seg_masks:

def _convert_deepcell_seg_masks(seg_mask: bytes) -> np.ndarray:
    float_mask = imread(BytesIO(seg_mask))

    # Reshape as ranked_mask returns a 1D numpy array, dims:  n^2 x 1 -> 1 x n x n
    shape = float_mask.shape

    # Create the ranked mask
    ranked_mask_repr: np.ndarray = stats.rankdata(float_mask, method = "average")
    ranked_mask: np.ndarray = ranked_mask_repr.astype(dtype="int32").reshape(shape)

    return ranked_mask

Consider the rudimentary test function below:

def test_convert_deepcell_seg_masks():
    with tempfile.TemporaryDirectory() as temp_dir:
        test_mask = np.zeros((10,10))
        test_mask[0,0] = 1
        test_mask[0,1] = 2
        test_mask[0,2] = 2
        tifffile.imwrite(f"{temp_dir}/test_mask.tiff", data = test_mask)
        
        
        with open(f"{temp_dir}/test_mask.tiff", 'r+b') as test_mask_bytes:

            print(_convert_deepcell_seg_masks(test_mask_bytes.read()))

We can adjust the method parameter in stats.rankdata. In the test function, we have a matrix of zeros: $\mathbf{A} = \mathbf{0}_{10 \times 10}$, however the following adjustments have been made: $\mathbf{A}_{0,0} = 1$, $\mathbf{A}_{0,1} = 2$, $\mathbf{A}_{0,2} = 2$.

For method = "average" we get the following matrix:

[[98 99 99 49 49 49 49 49 49 49]
 [49 49 49 49 49 49 49 49 49 49]
 [49 49 49 49 49 49 49 49 49 49]
 [49 49 49 49 49 49 49 49 49 49]
 [49 49 49 49 49 49 49 49 49 49]
 [49 49 49 49 49 49 49 49 49 49]
 [49 49 49 49 49 49 49 49 49 49]
 [49 49 49 49 49 49 49 49 49 49]
 [49 49 49 49 49 49 49 49 49 49]
 [49 49 49 49 49 49 49 49 49 49]]

For method = "min" we get the following matrix:

[[98 99 99  1  1  1  1  1  1  1]
 [ 1  1  1  1  1  1  1  1  1  1]
 [ 1  1  1  1  1  1  1  1  1  1]
 [ 1  1  1  1  1  1  1  1  1  1]
 [ 1  1  1  1  1  1  1  1  1  1]
 [ 1  1  1  1  1  1  1  1  1  1]
 [ 1  1  1  1  1  1  1  1  1  1]
 [ 1  1  1  1  1  1  1  1  1  1]
 [ 1  1  1  1  1  1  1  1  1  1]
 [ 1  1  1  1  1  1  1  1  1  1]]

For method = "dense" we get the following matrix:

[[2 3 3 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]]

Dense looks like what we want. It is the same algorithm as min however it guarantees 'integer continuity' as the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.

If we change the test data to be a random matrix of integers like below:

       ....
        # Initialize a new generator - set seed for reproducibility
        rng = np.random.default_rng(12345)
        
        test_mask = rng.integers(low = 0, high = 1000, size=(10,10))
       ....

Then for method = "dense" we get the output below:

[[68 20 77 29 16 78 60 65 93 37]
 [81 33 51 53 18 14 21 64 55 91]
 [70 22 87 92 73 63 12  8 26 38]
 [ 5 84 43 67 17 30  9 74 76 19]
 [71  6 37 13 75 35 42 41 44 26]
 [50 80 46 15  2 11  6  7 10 53]
 [79 82 61 54 32 89 59 72 74 83]
 [69 88 47 48 23 90 49 45 29 27]
 [58 39 52 62 85 31 66 86 40 24]
 [28 34 57 25 81 36  4  1  3 56]]

Ties will rank the integer values with the same value, as there are 2 instances of $6$.

alex-l-kong · 2022-09-06T21:53:02Z

@srivarra it looks like seaborn has released a new version which is causing the testing errors, looks like relplot was changed during this version. Probably worth taking some time to investigate.

… notebooks

ngreenwald

Looks good, just some minor suggestions. Also looks like some extraneous files got added in spatial_enrichment_input_data

.gitignore

ark/utils/data_utils.py

templates_ark/1_Segment_Image_Data.ipynb

…opy data over to ark-analysis/data

templates_ark/1_Segment_Image_Data.ipynb

ngreenwald

Looks good, just some path changes

ngreenwald

Great, looks good

added datasets as requirement

56587d0

srivarra added the enhancement New feature or request label Aug 30, 2022

srivarra self-assigned this Aug 30, 2022

Merge branch 'main' into example_dataset

78013de

alex-l-kong marked this pull request as ready for review August 31, 2022 16:33

alex-l-kong marked this pull request as draft August 31, 2022 16:34

srivarra added 3 commits August 31, 2022 15:21

added dataset download function

96ef87e

doc test fix

f9f24a1

Notebook 1 updated for example dataset

b9b9aa8

alex-l-kong and others added 7 commits August 31, 2022 18:21

Remove call to is_mibitiff cell in testbook script

68cc60b

removed data/* and added data/ to the .gitignore

e941055

Remove .DS_Store, change input_dir variable back to tiff_dir (more in…

010338b

…tuitive), and update notebook tests

Remove MIBItiff test

501dca0

Change input_dir to tiff_dir

262a368

Nuke another instance of input_dir

59ddca1

added original dataset back in so tests can pass

1197dc5

srivarra and others added 10 commits September 2, 2022 11:01

updated nb1 and nb2 to use segmentation/

954d16f

nb3 paths fixed

814d7ba

reverted pixie subdir for nb2,3

16c89c6

Change testbook path to reflect new directory structure

00bf887

Remove unneeded cell outputs

74c1e26

LDA path updates

bf22aec

remove test import

f26f6b5

Prevent division by NA if a column is all 0

1016205

Merge remote-tracking branch 'origin/example_dataset' into example_da…

0f0fee1

…taset

cell_size of 0 fixed, deepcell_service_utils::_convert_deepcell_seg…

6a21e67

…_masks, change dtype from int16 to int32

srivarra commented Sep 3, 2022

View reviewed changes

ark/utils/deepcell_service_utils.py Outdated Show resolved Hide resolved

deepcell bytes conversion fix + test

5c906bd

srivarra added 8 commits September 7, 2022 11:58

seaborn fix

434b906

removed changes for nb2, nb3, nb4

18a8277

removed changes for LDA, example neighborhood analysis, visualization…

a9ca023

… notebooks

revert pixel clustering test

77653c1

testbook fix

b0899bf

testbook fix

fceccc2

dataset download test

daee9d6

reverted pytest-randomly

3aee205

srivarra marked this pull request as ready for review September 7, 2022 23:31

srivarra requested a review from ngreenwald September 7, 2022 23:31

ngreenwald requested changes Sep 7, 2022

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

ark/utils/data_utils.py Outdated Show resolved Hide resolved

ngreenwald reviewed Sep 7, 2022

View reviewed changes

templates_ark/1_Segment_Image_Data.ipynb Outdated Show resolved Hide resolved

srivarra added 3 commits September 7, 2022 17:31

made requested changes

7a0e4ab

download dataset function, will save to default cache location, and c…

951ab25

…opy data over to ark-analysis/data

test fix

90ab753

srivarra requested a review from ngreenwald September 8, 2022 19:17

ngreenwald reviewed Sep 8, 2022

View reviewed changes

templates_ark/1_Segment_Image_Data.ipynb Outdated Show resolved Hide resolved

templates_ark/1_Segment_Image_Data.ipynb Outdated Show resolved Hide resolved

ngreenwald requested changes Sep 8, 2022

View reviewed changes

srivarra added 2 commits September 8, 2022 14:16

fixed paths

7a1d501

nb1 fix

c6055bd

srivarra requested a review from ngreenwald September 8, 2022 22:08

ngreenwald approved these changes Sep 8, 2022

View reviewed changes

ngreenwald merged commit da41621 into main Sep 8, 2022

ngreenwald deleted the example_dataset branch September 8, 2022 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example Dataset #685

Example Dataset #685

srivarra commented Aug 30, 2022 •

edited by camisowers

Loading

review-notebook-app bot commented Sep 1, 2022

srivarra commented Sep 1, 2022 •

edited

Loading

ngreenwald commented Sep 1, 2022

alex-l-kong commented Sep 3, 2022 via email

alex-l-kong commented Sep 4, 2022 •

edited

Loading

srivarra commented Sep 4, 2022 •

edited

Loading

alex-l-kong commented Sep 6, 2022 •

edited

Loading

ngreenwald left a comment

ngreenwald left a comment

ngreenwald left a comment

Example Dataset #685

Example Dataset #685

Conversation

srivarra commented Aug 30, 2022 • edited by camisowers Loading

review-notebook-app bot commented Sep 1, 2022

srivarra commented Sep 1, 2022 • edited Loading

ngreenwald commented Sep 1, 2022

alex-l-kong commented Sep 3, 2022 via email

alex-l-kong commented Sep 4, 2022 • edited Loading

srivarra commented Sep 4, 2022 • edited Loading

alex-l-kong commented Sep 6, 2022 • edited Loading

ngreenwald left a comment

Choose a reason for hiding this comment

ngreenwald left a comment

Choose a reason for hiding this comment

ngreenwald left a comment

Choose a reason for hiding this comment

srivarra commented Aug 30, 2022 •

edited by camisowers

Loading

srivarra commented Sep 1, 2022 •

edited

Loading

alex-l-kong commented Sep 4, 2022 •

edited

Loading

srivarra commented Sep 4, 2022 •

edited

Loading

alex-l-kong commented Sep 6, 2022 •

edited

Loading