Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MeasureObjectintensity: IndexError if label missing in a segmentation mask #3582

Open
votti opened this issue Jun 6, 2018 · 17 comments
Open

MeasureObjectintensity: IndexError if label missing in a segmentation mask #3582

votti opened this issue Jun 6, 2018 · 17 comments

Comments

@votti
Copy link
Contributor

@votti votti commented Jun 6, 2018

The measureobjectintensity module seems to have an issue, if a label is missing in the segmentation mask, i.e. not all labels 1:N are present.

For me the error occurred in a pipeline where I identify primary objects, filter them using FilterObjects and the use IdentifySecondaryObjects using the filtered objects as input objects.
I also encountered it after using ResizeObjects - another module that can produce such segmentation masks with missing labels.
Generally I think it would be important that missing labels are supported.

Error message:

> integrated_intensity[lindexes - 1] = centrosome.cpmorphology.fixup_scipy_ndimage_result(scipy.ndimage.sum(limg, llabels, lindexes))
E                       IndexError: index 1 is out of bounds for axis 0 with size 1

../../cellprofiler/modules/measureobjectintensity.py:406: IndexError

I also wrote a test to cover that case :
https://github.com/BodenmillerGroup/CellProfiler/blob/831f6affe1dcebbc8757bea84c48a10c4035060a/tests/modules/test_measureobjectintensity.py#L994

@bgriffen

This comment has been minimized.

Copy link

@bgriffen bgriffen commented Jun 12, 2018

I am also observing this behavior. I identify Primary Objects as "nuclei" then create Secondary Objects as "cells" using propagation from nuclei objects (no filtering performed like @votti). I originally thought it was an issue due to the filling in holes of identified (secondary) objects, but even if I don't fill in holes, I still receive this error. I am on 3.1.3 on Ubuntu.

Worker 0:     integrated_intensity[lindexes - 1] = centrosome.cpmorphology.fixup_scipy_ndimage_result(scipy.ndimage.sum(limg, llabels, lindexes))
Worker 0: IndexError: index 1096 is out of bounds for axis 0 with size 1095
@apahl

This comment has been minimized.

Copy link

@apahl apahl commented Jun 19, 2018

The error also occurs for me with the same use case as @bgriffen (identify Primary Objects as "nuclei" then create Secondary Objects as "cells" using propagation from nuclei objects (no filtering)). This is for the latest version of CellProfiler 3.1.3 (commit cfac37b0e9ac from Jun 7th), on a Linux cluster.
I encountered this error after upgrading CellProfiler and my cppipe pipelines to 3.1.3.

Here is the traceback:

Error detected during run of module MeasureColocalization
Traceback (most recent call last):
  File "[...]/.conda/envs/cellprof/lib/python2.7/site-packages/CellProfiler-3.1.3-py2.7.egg/ce
llprofiler/pipeline.py", line 1779, in run_with_yield
    self.run_module(module, workspace)
  File "[...]/.conda/envs/cellprof/lib/python2.7/site-packages/CellProfiler-3.1.3-py2.7.egg/ce
llprofiler/pipeline.py", line 2031, in run_module
    module.run(workspace)
  File "[...]/.conda/envs/cellprof/lib/python2.7/site-packages/CellProfiler-3.1.3-py2.7.egg/ce
llprofiler/modules/measurecolocalization.py", line 284, in run
    object_name)
  File "[...]/.conda/envs/cellprof/lib/python2.7/site-packages/CellProfiler-3.1.3-py2.7.egg/cellprofiler/modules/measurecolocalization.py", line 571, in run_image_pair_objects
    std1 = np.sqrt(fix(scind.sum((first_pixels - mean1[labels - 1]) ** 2,
IndexError: index 161 is out of bounds for axis 0 with size 161
Tue Jun 19 09:42:13 2018: Image # 721, module MeasureColocalization # 10: CPU_time = 4.43 secs, Wall_time = 4.50 secs

Many thanks for your help!

Kind regards,
Axel

@bethac07

This comment has been minimized.

Copy link
Member

@bethac07 bethac07 commented Jun 19, 2018

Can one (or more) of you upload images plus a pipeline to help us replicate the issue? Thanks!

@apahl

This comment has been minimized.

Copy link

@apahl apahl commented Jun 20, 2018

Hi Beth,

thanks for coming back to this.

I have checked out the latest commit (c11b509c4572f) from yesterday, but the problem persists.

Please find enclosed a single-image example which reproduces the error.
cp_error.zip

The pipeline was started with this command:

cellprofiler -c -p /home/pahl/cp_error/180611_gwdg.cppipe -r \
             --file-list=/home/pahl/cp_error/images.txt \
             -o /home/pahl/cp_error/output -L 10 \
             -t /home/pahl/cp_error/tmp

Many thanks.

Kind regards,
Axel

@votti

This comment has been minimized.

Copy link
Contributor Author

@votti votti commented Aug 6, 2018

Hi there,
I just run again into this issue in CP3.1.5 (both in Ubuntu installed from source as well from a Windows install from the Website).

Attached a small example of a mask that generates problems.

The pipeline can be started with:
cellprofiler -c -p ./measure_missing_obj.cppipe -r --file-list=./testfiles.txt -o out

My workaround is currently to not load the mask as an Object in Names and Types but as a grayscale image and then use ConvertImageToObjects with using Preserve original labels as False.
issue_3583.zip

The error is still the same as with the test I previously wrote in the first post.

I will also look into generating an example pipeline that generates such offending masks.

@votti

This comment has been minimized.

Copy link
Contributor Author

@votti votti commented Aug 7, 2018

Hi there,
I now systematically tested it an @apahl is correct - the IdentifySecondary module is indeed the culprit of generating the offending masks and it never was an issue of resize or filter objects.

Attached a small example of a pipeline that generates problems.

The pipeline can be started with:
cellprofiler -c -p ./measure_error_secondary.cppipe -r --file-list=./testfiles.txt -o ./out

issue_3583_identifysecondary.zip

I also made the pipeline save the masks and added a script test_outmasks.py to check if all ObjectIDs were present in the primary nuc and expanded cell mask.
This identified that indeed IdentifySecondary is generating masks with missing labels.
Specifically the number of unique labels is decreasing from 5308 in the nuclear mask to 5301 in the cell mask. This is curious to me as I would not expect IdentifySecondary with Propagation and without Discard secondary objects touching the border of the image to loose any of my primary objects! To me this sounds like a potential issue of IdentifySecondary and would almost warrant a separate issue.

Independently of this I think CP should handle missing labels in masks for measurement.

@jwindhager

This comment has been minimized.

Copy link

@jwindhager jwindhager commented Aug 9, 2018

Experiencing this as well. Any updates on this issue?

@votti

This comment has been minimized.

Copy link
Contributor Author

@votti votti commented Aug 13, 2018

Hi team,

I was looking a bit into potential solutions for this problem, as it is really affecting us repeatedly.

I noticed that internally, the measurement module seems to assume that ObjectNumbers are continuously numbered and all present.
For example:

np.arange(1, len(d) + 1)

Above code from measurements.add_measurements will just renumber the ObjectNumbers of saved measurements to 1:n, irrespectively what the labels actually are!

This is also seen in other places, such as:

numpy.arange(1, objects.count + 1)

Where the M_NUMBER_OBJECT_NUMBER is just set to numpy.arange(1, objects.count + 1) with
objects.count being equal to the """The number of objects labeled""" according to the implementation:

def count(self):

This implementation strikes me as extremely dangerous, as it breaks the 1:1 correpondence between labels on a segmentation mask and ObjectNumber, which e.g. is assumed in the MeasureObjectIntensity (which is the root cause for this issue) and crucial to map the measurements back to the object masks.

This correpondence is CellProfiller internally also assumed in other modules, such the displaydataonimage module:

Also I am pretty sure I have encountered masks with missing ObjectNumbers in CP2 and they were also correctly handled (meaning the label:ObjectNumber relationship was maintained)! I even have the case that ObjectNumbers can be missing specifically implemented in my custom visualization scripts for cellprofiler 2 output.

My proposed short term solution would be to change the Objects.count property to not report the number of unique labels but the the maximum label number.

def count(self):

This could however break some existing code relying on Objects.count being the number of unique labels.
I explored this option in this branch: https://github.com/BodenmillerGroup/CellProfiler/tree/issue/3582objcount

Another option would be to add a maxlabel property to objects and changing all functions that wrongly use Objects.count to use this new properties.
I explored this option in this branch: https://github.com/BodenmillerGroup/CellProfiler/tree/issue/3582maxlabel

Both these options would require that the initialization of the measurements should then be done with np.NANs and not with zeros, to highlight which objects were actually not present.

Together this mostly should prevent this issue and not change anything for the cases where all ObjectNumbers are present as labels (the only supported case currently). However it would be a bit memory inefficient in cases were many labels are missing, as also space for non-existing labels would usually be allocated e.g. in the measurement tables.

A long term solution would be to change the way that ObjectNumbers are internally handeled, so that it is guaranteed that the label:ObjectNumber relationship can always be established.

Finally there would be the option to just not support masks with missing objects, which would be inconvenient and would also warrant the use of more assert statements to check that this is actually the case.

Please let me know if I missed something fundamentally!

Corrections:
After reading through it after a good night's sleep, I noticed two things in my post:

  • The code line quoted for displaydataonimage is wrong. displaydataonimage does assume no missing labels, but at a different place:

    img = colors[labels, :3] * pixel_data[:, :, np.newaxis]

    as colors has the length of values, which is only checked against object.count at line:
    elif len(values) > objects.count:

    As object.count is the number of unique labels and not the maximum label, if there are missing labels in the mask, there will be also an 'IndexError`.

  • Question: Or is Number_Object_Number actually corresponding to the object index by design and is it allowed to be different to the ObjectNumber?

@bgriffen

This comment has been minimized.

Copy link

@bgriffen bgriffen commented Aug 13, 2018

This does indeed seem very critical as those who simply merge the say Nuclei and Cells objects will not be mapping the appropriate objects. Correct me if I'm wrong, but could you, as a temporary fix, merge the two sets by image number and link the parent object number in cells to the object number in the Nuclei dataframe. One quick (hacky) way to get a correctly linked master dataframe might be...

nuclei_df = pd.read_csv(path_cp + nuclei_fname)
cells_df = pd.read_csv(path_cp + cells_fname)
cells_df.columns = ["cells_" + str(col) for col in cells_df.columns]
nuclei_df.columns = ["nuclei_" + str(col) for col in nuclei_df.columns]
cells_df.rename(index=str, columns={"cells_ImageNumber": "nuclei_ImageNumber","cells_Parent_Nuclei": "nuclei_ObjectNumber"},inplace=True)
df = pd.merge(cells_df, nuclei_df, on=['nuclei_ImageNumber','nuclei_ObjectNumber'])

Here I am assuming Nuclei are your primary objects and Cells are your secondary objects.

Yes, the user will miss some objects for the ones that fail but the mapping is preserved so long as you don't just do a row-wise merge Or perhaps I misunderstand and for a given image number, the parent object number from the secondary dataframe does not map to any index in the primary object dataframe. I did a quick (bad) check of the X-X and Y-Y position of objects and it is perfectly linear. I guess I'm trying to find the relevant index in the sheets that allow me to reconstruct the links rather than just the rows index which is clearly broken. Either way, this does seems quite urgent though.

@votti

This comment has been minimized.

Copy link
Contributor Author

@votti votti commented Aug 14, 2018

I also made a reproducible example, that displaydataonimage doesn't deal with missing labels:
The pipeline can be started with:
cellprofiler -c -p ./missing_display.cppipe -r --file-list=./testfiles.txt -o ./out

It first tries to display the x position on an image where the mask has been relabeled (=guaranteed continuous labels) and then when it uses the original mask with missing labels, which fails with an indexing error.


Index Error: index 3707 is out of bounds for axis 9 with size 3707

issue_3583_missingdisplay.zip

Edit: This also reproduces if the cell is loaded as an Object in NamesAndTypes directly - I only use ConvertImageToObject in the pipeline to demonstrate that it works if the relabeling option is used.

@votti

This comment has been minimized.

Copy link
Contributor Author

@votti votti commented Aug 14, 2018

@bgriffen I think this issue should really not be about how IdentifySecondaryObjects can break generate primary-secondary objects with differing numbers. As-is CellProfiller does not guarantee this and thus explicilty returns the primary_Parent_secondary relationship already.

The problem is more that the relationship between the label in segmentation masks and the cellprofiler measurements can be potentially lost in the current implementation, if labels are missing. I think this is extremely dangerous as a lot of visualizations etc rely on the fact that ObjectNumbers match labels in segmentation masks! Even the CellProfiler internal ones.
I am fairly sure that CP2 handled this cases actually correct.
But if this is not guaranteed, the only way to match objects with a segmentation mask is e.g. by matching positions which is really not as trivial as it should be.

I crafted an example pipeline with a Jupyter analysis notebook demonstrating the problem:
issue_3583_label_missmatch.zip

Run as:
cellprofiler -c -p ./label_missmatch.cppipe -r --file-list=./testfiles.txt -o ./out

In the CP pipeline I once process an object mask with and without missing objects (called 'Original'(=with missing) and 'Relabeled'(=without missing)). Then I save the results. In the Jupyter notebook I then compare the cell data and the mask labels. I find that indeed cellprofiler internally, without any warning seems to relable ObjectNumbers, which makes it no longer possible to directly map CP measurements on the corresponding segmentation mask.
I find this a real problem in my view!

"Luckily" most 'measurement' modules currently just throw an indexing error if they encounter missing labels, thus the effect of this problem is likely not so wide spread (but super annoying). Still it needs to be fixed!

The Jupyter Notebook of the analysis can be found here: https://gist.github.com/votti/08870efac40c677abb2dc428d58e35d3

Edit: due to time constraints I threw the notebook fairly quickly together, please let me know if you find any issue.

@votti

This comment has been minimized.

Copy link
Contributor Author

@votti votti commented Sep 12, 2018

Hi there,

In the meanwhile this issue is also being discussed on the image.sc forum: https://forum.image.sc/t/index-is-out-of-bounds-error/19078/5

Thus I was wondering what the 'officially' proposed workaround for this issue for now is and if a fix is being worked at.

Thanks a lot for your efforts!

@AetherUnbound

This comment has been minimized.

Copy link
Collaborator

@AetherUnbound AetherUnbound commented Sep 13, 2018

@votti thanks for pinging on this. This has come up in a number of different modules, namely RelateObjects as well. I think there's a common root cause to all of these, and I think I should have some time to look into this in the near future :) I'm sorry that I don't have more time to address this, but THANK YOU for the super detailed thread; it will be incredibly helpful when I do have a spare moment to look into this!

@Swarchal

This comment has been minimized.

Copy link
Contributor

@Swarchal Swarchal commented Sep 27, 2018

As a really simple bodge I've tried reverting this commit, and while now previously broken pipelines are running again I'm worried I've broken some parent -> child object_number mapping.

I had a very quick check of object x,y co-ordinates and object numbers here, but if like us you're just using image averages it might suffice.

@santoshhariharan

This comment has been minimized.

Copy link

@santoshhariharan santoshhariharan commented Nov 16, 2018

Hello,

I am new to CellProfiler and trying to use the program in headless mode on a cluster. I am also getting similar error (Please see below) in MeaureColocalization module. I tried the same pipeline with the older version (2.1.2). Currently we have 3.1.5. Any advise is appreciated. Here is the trace

Thu Nov 15 16:44:25 2018: Image # 2, module Images # 1: CPU_time = 0.00 secs, Wall_time = 0.00 secs
Thu Nov 15 16:44:25 2018: Image # 2, module Metadata # 2: CPU_time = 0.00 secs, Wall_time = 0.00 secs
Thu Nov 15 16:44:26 2018: Image # 2, module NamesAndTypes # 3: CPU_time = 2.45 secs, Wall_time = 2.84 secs
Thu Nov 15 16:44:28 2018: Image # 2, module Groups # 4: CPU_time = 0.00 secs, Wall_time = 0.00 secs
Thu Nov 15 16:44:28 2018: Image # 2, module CorrectIlluminationApply # 5: CPU_time = 0.22 secs, Wall_time = 0.23 secs
Thu Nov 15 16:44:29 2018: Image # 2, module MeasureImageQuality # 6: CPU_time = 8.77 secs, Wall_time = 8.87 secs
Thu Nov 15 16:44:37 2018: Image # 2, module MeasureImageQuality # 7: CPU_time = 0.27 secs, Wall_time = 0.26 secs
Thu Nov 15 16:44:38 2018: Image # 2, module FlagImage # 8: CPU_time = 0.00 secs, Wall_time = 0.00 secs
Thu Nov 15 16:44:38 2018: Image # 2, module IdentifyPrimaryObjects # 9: CPU_time = 0.71 secs, Wall_time = 0.71 secs
Thu Nov 15 16:44:39 2018: Image # 2, module IdentifySecondaryObjects # 10: CPU_time = 0.61 secs, Wall_time = 0.66 secs
Thu Nov 15 16:44:39 2018: Image # 2, module IdentifyTertiaryObjects # 11: CPU_time = 0.10 secs, Wall_time = 0.10 secs
Error detected during run of module MeasureColocalization
Traceback (most recent call last):
File "/home/harihs06/CellProfiler/cellprofiler/pipeline.py", line 1782, in run_with_yield
self.run_module(module, workspace)
File "/home/harihs06/CellProfiler/cellprofiler/pipeline.py", line 2034, in run_module
module.run(workspace)
File "/home/harihs06/CellProfiler/cellprofiler/modules/measurecolocalization.py", line 288, in run
object_name)
File "/home/harihs06/CellProfiler/cellprofiler/modules/measurecolocalization.py", line 575, in run_image_pair_objects
std1 = np.sqrt(fix(scind.sum((first_pixels - mean1[labels - 1]) ** 2,
IndexError: index 19 is out of bounds for axis 0 with size 19

@ranka47

This comment has been minimized.

Copy link

@ranka47 ranka47 commented May 30, 2019

I was facing the same error in CP 3.1.5
After identifying the primary objects (nucleus) and the secondary objects (cytoplasm), the pipeline was supposed to measure the intensity. However, it raised the IndexError.

However, the pipeline is working fine for CP 3.1.8. I guess it has been fixed in the latest version.

@apahl

This comment has been minimized.

Copy link

@apahl apahl commented Jun 3, 2019

@ranka47 thanks for letting us know. Because of this error, I have started our project with CP 3.0.0 and we are therefore currently stuck with this version.

Kind regards,
Axel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.