Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Defining and Prototyping "Labelmap" Segmentations in DICOM Format #643

Closed
CPBridge opened this issue May 25, 2023 · 56 comments
Closed

Comments

@CPBridge
Copy link
Contributor

CPBridge commented May 25, 2023

Project Description

The DICOM Segmentation format is used to store image segmentations in DICOM format. Using DICOM Segmentations, which use the DICOM information model and can be communicated over DICOM interfaces, has many advantages when it comes to deploying automated segmentation algorithms in practice. However, DICOM Segmentations are criticized for being inefficient, both in terms of their storage utilization and in terms of the speed at which they can be read and written. This is in comparison to other widely-used segmentation formats within the medical imaging community such as NifTi and NRRD.

While improvements in tooling may alleviate this to some extent, there appears to be an emerging consensus that changes to the standard are also necessary to allow DICOM Segmentations to compete with other formats. One of the major reasons for poor performance is that in segmentation images containing multiple segments (sometimes referred to as "classes"), each segment must be stored as an independent set of binary frames. This is in contrast to formats like NifTi and NRRD that store "labelmap" style arrays in which a pixel's value represents its segment membership and thus many (non-overlapping) segments can be stored in the same array. While the DICOM Segmentation has the advantage that it allows for overlapping segments, in my experience the overwhelming majority of segmentations consists of non-overlapping segments, and thus this representation is very inefficient when there are a large number of segments.

The goal of this project is to gather a team of relevant experts to formulate changes to the standard to address some issues with DICOM Segmentation. I propose to focus primarily on "labelmap" style segmentations, but I am open to other suggestions for focus.

The specific goals would be to complete or make significant progress on the following:

  • Formulate changes to the standard to allow for labelmap segmentations (@dclunie)
  • Complete prototype implementations within the highdicom library (of which I am a maintainer), dcmjs (@pieper) and possibly dcmqi (@fedorov )
  • Create example datasets for dissemination to others wishing to implement the changes
  • Begin the process of reaching out to others in the open source community to accelerate other implementations, particularly viewers such as slicer (@pieper ) and OHIF

Open questions:

  • Should we implement a new IOD or a new SegmentationType within the existing Segmentation IOD?
  • Should we implement "instance" segmentations, in which each segment is assumed to be a different instances of the same type, and thus need not be described separately, in addition to label-map style semantic segmentations?
  • Should we also allow 16 bit pixels to allow for more segments? How does this interact with the choice of new IOD vs new SegmentationType?

Other possible (alternative) topics:

  • Single bit compression to allow for more space-efficient storage
  • Omitting the per-frame functional group (like TILED FULL) for other types of segmentation image.
  • The inefficiency of pydicom in parsing long sequences, such as the per-frames functional groups sequence in segmentations, is a key bottleneck in Python. We could think through how to overcome this

Relevant team members: @fedorov @dclunie @pieper (@hackermd ) please give your feedback to help shape this project!

@wayfarer3130
Copy link

It would be interesting to see this displayed in a viewer such as OHIF - the loading on this shouldn't be too different from the existing SEG loader, and would give a useful comparison for performance purposes.

@CPBridge
Copy link
Contributor Author

It would be interesting to see this displayed in a viewer such as OHIF - the loading on this shouldn't be too different from the existing SEG loader, and would give a useful comparison for performance purposes.

It would be fantastic to have someone from OHIF involved!

@wayfarer3130
Copy link

Having a transparent conversion between old and new SEG objects would be really nice, but some of the interesting aspects might require a custom IOD to allow for more tag values. Still, there is precedent for how to implement such a conversion in the enhanced legacy multiframes - although I don't know anyone who uses that.

If the representation was a multiframe object, with one or more color LUT tables, and also a pixel value to set of labels, then it becomes possible to define an overlapping segmentation - the algorithm being to just assign the next instance number whenever a new combination of labels is assigned to a pixel. That algorithm also allows defining two labels for a given region - the "edge" label and the center label, which can be used to nicely show the outline.

@wayfarer3130
Copy link

It would be interesting to compare HTJ2K, JPEG-LS, RLE and compressed TSUID's for this representation at various numbers of label maps/overlaps. David Clunie at the last compression WG-04 meeting stated that the compressed TSUID performed better than JPEG 2000 in terms of size for single bit segmentations, which isn't surprising given how sparse they are. I would hope that the representation would be quite a bit better for this as it would be much less sparse. The efficiency gains in not needing to handle so many images might be significant as well (comparing to single bit), but the straight implementation of being able to just overlay an image with transparencies without needing to look at pixel values is enormous - that is, most image display systems can just be told to draw a given image with a given LUT, and will do that efficiently, so with the representation above, that directly includes color LUT table(s). To do that, the color LUT should include a transparency channel.

@wayfarer3130
Copy link

As a co-chair for the DICOMweb working group, I'd really like to see the new proposal include a well defined representation for how to fetch the rendered images with segmentation from the DICOMweb /rendered endpoint. This really requires making sure the segmentation has some sort of default viewable representation, and preferably one that is easy for developers to implement without thinking too much (that helps make implementations consistent).
My suggestion is to define two mechanisms for it:

  1. Add a segmentation reference, basically the same as how the GSPS representation. That allows fetching a single image, or a particular series with a given segmentation applied.
  2. Fetch the rendered representation of the segmentation. This would render the referenced images with the segmentation applied, as a multiframe image of some sort.

Eventually, adding this to the proposed DICOMweb 3d would allow for returning rendered 3d representations.

@sedghi
Copy link
Member

sedghi commented May 26, 2023

Great to see the discussion is happening! putting my benchmarks here https://docs.google.com/document/d/14tgwQKfjbpxnaXXeH1AEzunWRnYeMkcBNNS1efCf0Jo/edit?usp=sharing

For the questions

  • 16 bit is inevitable with full body SEG tools that we are seeing emerging everyday
  • some flavor of RLE would be very much suited based on my experience

@pieper
Copy link
Contributor

pieper commented May 26, 2023

Thanks everyone for participating in this effort. Efficiency of SEG is essential. It's been good to use today's use cases for benchmarking (such as brain or body segmentations with around a 100 segments), but given the way segmentation tools are improving, it's clear to me that soon we will be looking to encode thousands of segments, if not more, and any work we put in now should be able to handle these use cases. Good compression and efficient representation of metadata are going to be essential.

I'm happy to work on this in both Slicer and dcmjs.

@fedorov
Copy link
Member

fedorov commented May 26, 2023

Is refining support of the existing SEG, as defined in the standard now, in scope for this project? I am very supportive of the efforts to develop new representations, but we should not forget about the existing datasets and implementations of the current standard in the existing tools. Also, it will take time to develop the proposal, get it into the standard and gain acceptance. Together with @igoroctaviano I have been looking at the benchmarking of the OHIF v2 and v3 implementations, and will have those available along with samples in case this can help.

Regarding prototyping of the implementation, dcmqi is leveraging IOD implementation in DCMTK, and I don't think I will be able to prototype that.

Pinging Michael Onken @michaelonken for awareness.

@CPBridge
Copy link
Contributor Author

CPBridge commented May 26, 2023

Thanks everyone for the feedback and participation!

Having a transparent conversion between old and new SEG objects would be really nice

This is a really good point

If the representation was a multiframe object, with one or more color LUT tables, and also a pixel value to set of labels, then it becomes possible to define an overlapping segmentation - the algorithm being to just assign the next instance number whenever a new combination of labels is assigned to a pixel. That algorithm also allows defining two labels for a given region - the "edge" label and the center label, which can be used to nicely show the outline.

I'm not really following this. Perhaps you could clarify. The current segmentation IOD does allow overlapping segments. I have been thinking along the lines that if people want to use overalapping segments they would continue to use the existing segmentation IOD, and we would simply define a new "special case" to make the case of non-overlapping segments more efficient.

Compression is definitely important, though I personally put this second to having a labelmap style encoding.

Thanks @sedghi for those benchmarks, very useful!

but given the way segmentation tools are improving, it's clear to me that soon we will be looking to encode thousands of segments, if not more, and any work we put in now should be able to handle these use cases

I agree!

Is refining support of the existing SEG, as defined in the standard now, in scope for this project? I am very supportive of the efforts to develop new representations, but we should not forget about the existing datasets and implementations of the current standard in the existing tools.

I completely agree that improving tooling for the existing segmentations is important. I have spent quite a bit of time recently improving the efficiency of both encoding and decoding in highdicom, and plan to do more. However, I feel that that may be something best left to individual developers to do in their own time, and a better use of the limited time we have together at project week would be to work on the piece that we need to collaborate on, which is drafting an improved version of the segmentations (as it seems to me that we have reached a consensus is necessary). What do you think @fedorov ?

@CPBridge
Copy link
Contributor Author

CPBridge commented May 26, 2023

As a co-chair for the DICOMweb working group, I'd really like to see the new proposal include a well defined representation for how to fetch the rendered images with segmentation from the DICOMweb /rendered endpoint.

This sounds like potentially a good idea, but not one that I am best placed to execute on. Are there particular considerations for the design of the actual IOD that will make this easier, that we should bear in mind? It seems to me that this could be a separate proposal without interdepencies with the other things that we are discussing here, but maybe I am wrong.

Generally speaking I am of the opinion that the Segmentation object should simply encode the segmentations and their semantics, and viewers are free to choose how to render them, perhaps with reference to a presentation state if desired. But then again, I don't write any viewers :)

I definitely want to make sure we don't do anything that makes viewers' lives harder

@fedorov
Copy link
Member

fedorov commented May 26, 2023

However, I feel that that may be something best left to individual developers to do in their own time, and a better use of the limited time we have together at project week would be to work on the piece that we need to collaborate on, which is drafting an improved version of the segmentations (as it seems to me that we have reached a consensus is necessary).

That's a fair point - makes sense to focus this project on the proposal development.

@lassoan
Copy link
Contributor

lassoan commented May 26, 2023

Overlapping segments

In Slicer we have gone through a number of different representations of overlapping labels. The best solution, clearly, by far, is multiple 3D labelmaps (we stored them in a 4D array but multiple 3D arrays are fine, too). For non-overlapping labels (typical for atlases and AI segmentation results with hundreds of segments) it is as fast, simple, and memory-efficient as simple labelmaps. Segments typically overlap in small groups, for example, tumor or vasculature can be specified over solid organs - without overlap within the group (e.g., all vessel labels can be stored in a single labelmap). This is confirmed to work really well, for over several years now, over a very large number of Slicer-based projects, so I'm confident that this representation can fulfill all voxel-based segmentation storage needs.

DICOM standardization

The current DICOM standardization practice is that:

  1. Each vendor first implement their solutions independently using private DICOM tags (often in a way that third-parties cannot decypher them)
  2. Fight it out in long, relatively infrequent, formal meetings of DICOM working groups what the common standard will be. When the standard is finalized often there is no working implementation and definitely not a significant amount of experience with how the new data object works in practice.
  3. Vendors implement the standard. During implementation it may turn out that it is too complicated, too slow, ambigous, etc.
  4. Amendements, modifications are attempted, but since the standard is already out, in use, the possibilities are limited and any change is very expensive.

I would propose to change this process by replacing this with an open, iterative, code-first (code is the specification) approach:

  1. Developers from multiple medical application development groups (companies, research groups, commercial and open-source developers) agree in a DICOM data structure at a high level.
  2. Implement a library for reading/writing the data structure.
  3. All groups use this library in their application to get real-life experience with it.
  4. Keep iteratively improving the library based on the feedback.
  5. After 6-12 months of real-world usage, start the formal DICOM standardization process as it is done today.

We can follow the usual github development process, discussing things in issues and proposing changes through pull request, etc. The project week could be a good candidate to try one iteration of this new approach!

@lassoan
Copy link
Contributor

lassoan commented May 26, 2023

better use of the limited time we have together at project week would be to work on the piece that we need to collaborate on, which is drafting an improved version of the segmentations

I agree, adding that I would suggeset the "drafting" to be done by writing code, not documentation.

We already have a lot of code that can be reused, so we probably don't need to implement a lot from scratch.

@fedorov
Copy link
Member

fedorov commented May 26, 2023

The current DICOM standardization practice is that:

@lassoan I agree with you in general, but it is also, at least sometimes, in my experience, the case than given the opportunity, repeated reminders and invitations to participate in the standard development, vendors (for a variety of reasons, I am sure) are not committing resources to test the proposals and provide feedback. And by the time they express interest, it is too late to change the standard. I think often there are no incentives for vendors to commit resources to develop the standard. It takes huge effort to recruit vendor participation.

But I completely agree that we should follow the approach you are proposing in this project. Would be great to have developers of commercial tools that at least touched DICOM SEG in the past to participate, but I am not very optimistic this will be feasible. Here are the companies/groups that have, or had products, that support/supported/attempted to support DICOM SEG (that was in 2018): https://dicom4qi.readthedocs.io/en/latest/results/seg/. Add to this Sectra, Kaapana, there may be more.

There is also the balance between agility and inclusivity of the process. Since the more voices you have, the more difficult will be to reach consensus. Maybe after trying rallying various groups around this activity, it will be easier to empathize with the challenges of developing DICOM and shepherding DICOM working groups.

@CPBridge
Copy link
Contributor Author

I agree, adding that I would suggeset the "drafting" to be done by writing code, not documentation.

Not sure that I'd go quite this far, but certainly would want to have prototype implementations from an early stage, as many problems are only discovered by implementing them.

We already have a lot of code that can be reused, so we probably don't need to implement a lot from scratch.

Would this be slicer code? In core slicer or elsewhere?

@lassoan
Copy link
Contributor

lassoan commented May 26, 2023

@fedorov I agree, you raise many good questions. The new approach would not solve all problems, but would have significant advantages.

@CPBridge

I agree, adding that I would suggeset the "drafting" to be done by writing code, not documentation.

Not sure that I'd go quite this far, but certainly would want to have prototype implementations from an early stage, as many problems are only discovered by implementing them.

It makes sense to spend some time with discussion and documentation, because maybe we don't have 100% common understanding of how things should work. However, we should try to keep that at the minimum, because we should already know enough to be able to implement in a few days a DICOM-based solution that is almost as capable but much simpler and several magnitudes faster than the current standard. This implementation would be a much better basis of discussion and further developments than any document.

Would this be slicer code? In core slicer or elsewhere?

Yes, in Slicer we have implemention for storing segmentation as 4D labelmap, fractional labelmap, closed surface, planar contour, ribbon, representations, including metadata that is needed for creating the DICOM representation; storage in NRRD and a few other research file formats; and conversions of these representations to/from current DICOM objects (DICOM Segmentation object, RT structure set, fake CT), mostly in C++, with some plumbing in Python. It should not be hard to reorganize this code to store segmentation in an existing DICOM information object with some private fields.

I'm sure you have some code to build on, too.

It would make sense to start with a reference implementation, probably in Python (pydicom with ITK or with just numpy) during the project week. Later on we could add C++ and JS implementations.

@CPBridge
Copy link
Contributor Author

CPBridge commented May 26, 2023

I intend to write a reference implementation by building on the current implementation of the Segmentation IOD in highdicom, since I know that code very well and prototyping should be pretty fast. Other developers associated with other projects are very welcome to join in. It sounds like @pieper at least will be doing something similar in dcmjs, and perhaps slicer.

@lassoan I am not sure I really understand what you mean by a "4D labelmap", could you clarify? To me, a segmentation array would be either 4D (3 spatial dimensions + 1 segment dimension) or a 3D "labelmap" (3 spatial dimensions with pixel value encoding segment membership), but not both at the same time. Is this because you have multiple groups of non-overlapping segments, with each group stacked down the 4th dimension and within each group, segment membership encoded by pixel value. If so, I see that this may be more space efficient, but worry that it would do little to help the criticism that SEGs are complex and hard to parse.

@pieper
Copy link
Contributor

pieper commented May 26, 2023

It sounds like @pieper at least will be doing something similar in dcmjs, and perhaps slicer.

Yes, I'm willing to look at both dcmjs (and how it's used in OHIF) and Slicer. For Slicer I think we can explore using highdicom directly since from what @fedorov said it may be hard to extend the current implementation with dcmqi / dcmtk.

@fedorov
Copy link
Member

fedorov commented May 26, 2023

For Slicer I think we can explore using highdicom directly

Yes, I discussed this with Chris when we talked about this project yesterday. It makes a lot of sense to have a highdicom-based plugin to read SEG.

@lassoan
Copy link
Contributor

lassoan commented May 27, 2023

I am not sure I really understand what you mean by a "4D labelmap", could you clarify?

The dimensions are I, J, K, layer. Each "layer" is a 3D labelmap that stores a number of non-overlapping segments.

worry that it would do little to help the criticism that SEGs are complex and hard to parse

The beauty of this scheme is that it is very simple, yet it fulfills the requirements of many use cases. If segments do not overlap then there is only one layer, so we are fully backward compatible with all the usual 3D segmentation files. If segments overlap then we store the data in a few 3D arrays ("layers") instead of one 3D array. We very rarely need more than a few of these layers, so the rendering performance is good and it is independent from the number of segments. You can extract voxels of a segment using simple numpy indexing.

It makes a lot of sense to have a highdicom-based plugin to read SEG.

Agreed. We can create a high DICOM-based DICOM Segmentation object importer exporter in Slicer during the project week.

@CPBridge
Copy link
Contributor Author

CPBridge commented May 27, 2023

Would it be so bad if you simply stored each non-overlapping set ("layer") of segments as its own segmentation instance? My instinct is to try and make the new proposal as simple as possible

@lassoan
Copy link
Contributor

lassoan commented May 27, 2023

Storing a segmentation in a single series but storing each layer in a separate instance (file) could slightly simplify the DICOM specification and implementation of DICOM toolkits. However, it would significantly complicate implementation at the application level:

  • Metadata and image geometry (origin, spacing, axis directions, extents) could be inconsistent between instances. The application would need check for inconsistencies (increasing complexity and processing time) and resolve any detected inconsistencies. Applications could simply reject loading an entire series if any inconsistencies are found, but that would make everything very rigid and fragile (applications that are not prepared for dealing with time sequences or multi-resolution could not read segmentations at all. Applications could resample the labelmaps into a common coordinate frame, but to which, and how much data loss would be acceptable during the process? Should the user be warned? But what if there is no user interface? Should such series be completely rejected or partially loaded? Different applications would choose different solutions and so behavior would be inconsistent and unpredictable. It would be a mess.
  • It would be much harder to determine how many instances make up the segmentation and ensure that the data is completely read/written/transferred. Similar problems are there for image instances as well, but images are just acquired once, while segmentations can be added anytime.
  • We already need the sub-series grouping for storing segmentations at multiple time points and multiple resolutions. It would be nice to avoid the extra level of grouping based on layers. Overlap between segments can change between time steps and resolutions, which could make cross-references between time points very complex (you would not be able to use isntance UID to store correspondence between segmentations across time points and resolutions, but you would need to introduce some new sub-series level UIDs).
  • Existing DICOM representations (SEG, RTSTRUCT) store multiple overlapping segments in a single instance, too, and changing this could hinder adoption of the new representation.

On the other hand, allowing storage of a 4D array instead of just 3D would barely make any difference in the complexity of the standard or DICOM toolkits. We would need to store the number of layers (one extra field) and for each segment store not just the label value but also the layer index (one extra field per segmentation).

@pieper
Copy link
Contributor

pieper commented May 27, 2023

I just saw the response from @lassoan which came in while I was writing the message below. He has many good points that are somewhat different from mine, but we both agree.

simply stored each non-overlapping set ("layer") of segments as its own segmentation instance?

@dclunie suggested the same idea of having one instance per layer, but I'd prefer to push "one file per segmentation" to be really be on par with nrrd or nifti in the minds of users. Since the concept of frames is central to SEG, I don't see why we can't put multiple layers of labelmaps into the same multiframe instance. Is there a technical problem you foresee @CPBridge?

My main argument would be that the legacy of needing multiple files as part of the same conceptual "unit" has been a valid objection to many dicom scenarios (think how much people hate one file per slice of a volume, or how confusing it is for tiff users to have one file per layer of a WSI pyramid). I know multiframes have been slow to catch on, but I understand the new Siemens MR will start generating them by default and I'm guessing people will like them once they are used to them.

On a similar note, I think we should adopt a strict convention of using .seg.dcm to name any files of SEG instances (not part of the standard, but convention across our tools). If we can make the argument that .seg.dcm is a drop in replacement for, but better than, .nii.gz or .nrrd we will have a much better chance of adoption.

@pieper
Copy link
Contributor

pieper commented May 28, 2023

Thanks for testing @CPBridge. For reference the gzipped labelmap version of that data in nii.gz format is 2.3MB, so I think your intuition is correct that labelmap would be better. I'd guess that image-oriented compressors would do even better than gzip on the labelmaps.

@sedghi
Copy link
Member

sedghi commented May 29, 2023

tagging more people on this @JamesAPetts @chafey

@wayfarer3130
Copy link

A labelmap with a single label is equivalent to a binary segmentation, and can be stored in 8 or 16 bits, so would allow for JPEG-LS storage. It should be pretty efficient in JPEG-LS as suggested above.

@wayfarer3130
Copy link

What about sparse labelmaps for whole slide imaging? Any changes to this standard should also allow optimized storage of fairly small labelmaps for whole slide imaging. For this case, one might decide to implement a degenerate solution, storing a fairly large rows/columns image, but allowing it to be offset anywhere, then storing an encoded representation that has fewer rows and columns than the specified. According to DICOM part 5, this will take precedence over the DICOM specified rows and columns, and so it becomes possible to store just a small region. That is, one would have something like:

Rows/Columns: 8192 (8k)
Images
Position X,Y, focal distance
Maybe actual rows/columns
Image - reduced size to actually fit required area, stored in something like JPEG-LS

@CPBridge
Copy link
Contributor Author

CPBridge commented May 29, 2023

A labelmap with a single label is equivalent to a binary segmentation, and can be stored in 8 or 16 bits, so would allow for JPEG-LS storage. It should be pretty efficient in JPEG-LS as suggested above.

Yes indeed, one could store a single binary segment as a labelmap to get the benefits of 8 bit compression under the proposed changes. A remaining issue would be that multiple overlapping segments would need to be stored with the existing binary representation, so there is still a motivation to get some sort of effective compression working for the existing seg iod and BINARY SegmentationType.

What about sparse labelmaps for whole slide imaging?

This is already possible as far as I am concerned. I have recently produced many segmentations of WSIs in tiled format and I omit any tile where the segmentation is empty. Alternatively, one can specify arbitrary plane positions and orientation to store a seg at an arbitrary offset/rotation/zoom with respect to the source image. Highdicom has (admittedly low level) support for this.

@sedghi
Copy link
Member

sedghi commented May 29, 2023

There were some discussion above about having each layer in a separate instance. Are we talking about one file and muliple instances (multiframe) or separate SOPInstanceUIDs? multiframe makes a lot of sense. However, in writing the SEG we are assuming the library is taking care of separating the instances to as minimum as possible for them to not have overlap.
And what happens if say one SEG that was 2 layers (2 instance), and user edits it and add a new segment which overlaps and it becomes 3 layers. Are you proposing to append the layer so that the order doesn't change? what about remove?

@lassoan
Copy link
Contributor

lassoan commented May 29, 2023

All these ideas are very interesting, but it would be very hard to get things right if we open up so many questions for debate. Could we aim for making already proven, widely used, well-liked research file formats available in DICOM? Building on existing research file formats would be a safer bet, because we would need to make less decisions (thus less chance for making mistakes), and it would be very easy for the community to adopt the new file format (due to the trivial, lossless conversion between DICOM and the research file format).

For example, I am confident that standardizing the current .seg.nrrd file fomat would fulfill all needs of radiology research applications and we could slighly adjust it (taking over a few things from the current DICOM Segmentation object) to make sure clinical needs are fulfilled, too.

Is there a popular research file format for WSI that is proven to be sufficient for most applications and very widely used?

Since radiology and WSI research file formats has not converged over the years (the OME-Zarr initiative is a good illustration of why), there is a high chance that there will not be a simple, single DICOM format that is optimal for these two completely different applications. But this should be fine, there is no need to force everything into a single information object definition. We already have a separate IOD for surface segmentation and the current image segmentation IOD will not disappear immediately either, so we'll have multiple IODs for segmentations anyway.

@CPBridge
Copy link
Contributor Author

There were some discussion above about having each layer in a separate instance. Are we talking about one file and muliple instances (multiframe) or separate SOPInstanceUIDs? multiframe makes a lot of sense. However, in writing the SEG we are assuming the library is taking care of separating the instances to as minimum as possible for them to not have overlap.

SEGs are already multiframe. The discussion was about whether we should allow each "layers" to be stacked along another "dimension" in the multiframe instance (I am gradually warming to the idea) or each stored as separate instances (SOPInstanceUID). Who allocates segments to layers is an implementation detail for those who write libraries.

And what happens if say one SEG that was 2 layers (2 instance), and user edits it and add a new segment which overlaps and it becomes 3 layers. Are you proposing to append the layer so that the order doesn't change? what about remove?

Since DICOM objects should be immutable, I do not think we should concern ourselves with editing. I tried to go down this path once and you get into a real mess very quickly because the fact that frames may be omitted etc means that you basically need to re-do the frame organization from scratch with the full information available each time you want to add a segment. Admittedly, this is a weakness of DICOM seg, but in keeping with its primary use as a clinical format.

All these ideas are very interesting, but it would be very hard to get things right if we open up so many questions for debate. Could we aim for making already proven, widely used, well-liked research file formats available in DICOM?

I largely disagree with this direction. Although there is a lot of useful discussion on this thread in various directions, I absolutely believe that with just a couple of minor and quite manageable tweaks to the standard, namely:

  • New "LABELMAP" segmentation type in addition to BINARY and FRACTIONAL (actually a very straightforward change, we already had one prototype implementation in highdicom a while back)
  • Enable storing BINARY segmentations with either true single bit compression, or allowing it to use 8 bit compression with BitsStored=1.

we could end up with a format that is very significantly more space efficient than what we currently have (and possibly more efficient than nrrd or nifti due to the use of proper image compression methods) that works well across WSI and radiology, allows for seeking/decompression/web-retrieval of individual frames, and is much more similar to existing image storage meaning that existing systems for dealing with multiframe DICOM (such as viewers and archives) will have a much easier time working with it than if they had to implement support for a net new storage format like nrrd. We just need to get the right people together to agree and get it implemented in a few key places. I think creating a wrapper for nrrd would actually be a lot more work and a lot harder to get right. That would open up even more questions.

@lassoan
Copy link
Contributor

lassoan commented May 29, 2023

I think creating a wrapper for nrrd would actually be a lot more work and a lot harder to get right. That would open up even more questions.

I meant to adopt the general ideas of successful research file formats (i.e., labelmap and standard compression algorithms) and focus on reproducing their features that are proven to be necessary (don't try to develop new features for now).

I agree that having the LABELMAP segmentation type and selecting some standard compression algorithms would take care of the voxel data in radiology applications. From the discussion above it seemed that there are some open questions for WSI and that's why I suggested we could focus on allowing to do in DICOM that is already commonly done in existing, widely used WSI research file formats.

It would be also nice to review if we can simplify the currently required metadata. For example, it may not worth paying the price for having slice-level references in PerFrameFunctionalGroupsSequence, especially when automatic 3D segmentation methods are used.

@fedorov
Copy link
Member

fedorov commented May 29, 2023

NRRD-like formats are so efficient in part because they are very restricted in expressiveness as compared to DICOM SEG, but this is actually good, since users value good performance and compact representation a lot more than expressiveness. My overall feeling about DICOM is that so often it is designed to make sure that it can address 99% if not 100% of use cases - in principle! - but the price for that is that - in practice! - 80% of applications need to suffer from enormous complexities and inefficiencies, sacrificing adoption for the sake of future-proofing and "utility in principle".

Here are top of the list items that I would like to see revisited:

  • Single-bit packing: I am sure the intent was good, but I did not see any evidence it helped anyone in practice, instead significantly complicating the implementation, complicating frame access, causing problems for compression. NRRD and likes do not invent any new packing schemes, deferring compression to established algorithms.
  • Empty frames ambiguity: empty segmentation frames can be (but do not have to be!) skipped, which, again, I believe was a good intent to help reduce the size, but in practice complicates implementation. Maybe there is a benefit to requiring that the revised SEG should have regular sampling of the volume? This might potentially eliminate/reduce the need for per-frame FGs (i.e., they can still be present to communicate references to the instances used in the derivation, but would not be needed for reconstructing volume geometry).
  • Extreme flexibility: with the ability to specify per-frame attributes one can have changing orientation, spacing, sparse frame sampling. Those features are no available in NRRD-likes, and time showed that there are many many applications that just do not need them.

I would think that by revisiting those various unorthodox choices made in SEG, implementations would not need to change drastically, but would be able to greatly reduce complexity and improve performance by eliminating a lot of special cases.

@CPBridge
Copy link
Contributor Author

Thanks @fedorov I think that summarises things nicely!

I have largely avoided the metadata issue until now but I agree that there are problems worthy of addressing there too (mostly regarding per-frame functional groups). In my opinion they are a bit less important since they are mostly concerns for developers rather than users. I also worry that many of the issues are inherent to the entire "multiframe" formulation, which spans many IODs within DICOM, not just SEG. This means that changes we make propose could have knock-on effects all over the standard and many IODs already in clinical use.

Nevertheless, I think it would be worthwhile to at least go through the exercise of thinking through what changes to the simplify the metadata might look like. I will try and summarise my thoughts on the topic at some point in the next few days.

@JamesAPetts
Copy link
Collaborator

Interesting discussion! +1 to the primary target being DICOMization of the common nrrd/nifti style labelmaps. I think WSI is a distraction in this pursuit personally, if the goal is to fix the common difficulties with transfer of volumetric voxel-labelled segmentation.

I'm all for expanding BINARY also for single segmentations as compared to LABELMAP which is more essential for things like neuro segmentations with 50-400+ labels.

@JamesAPetts
Copy link
Collaborator

JamesAPetts commented May 30, 2023

Empty frames ambiguity:

@fedorov, I'm not sure I agree with removing the practice of only encoding non-blank frames, I personally think this still makes sense in a LABELMAP representation. Its true that an empty frame would collapse to a very small RLE'd frame, but I still wonder if they should be included at all? In the case of e.g. a two segment liver-with-tumor labelmap, maybe only 50 frames of a 500 frame CT would be labelled?

EDIT: Oh I get that this is to remove the notion of per frame functional groups so people don't put random frames out of plane? If the goal is to remove this ambiguity than maybe I agree.

Or perhaps the orientation should be in a shared functional group, and be banned from per frame functional groups, but I worry we are going to affect backwards compatibility if we take that route.

@wayfarer3130
Copy link

As a co-chair for the DICOMweb working group, I'd really like to see the new proposal include a well defined representation for how to fetch the rendered images with segmentation from the DICOMweb /rendered endpoint.

This sounds like potentially a good idea, but not one that I am best placed to execute on. Are there particular considerations for the design of the actual IOD that will make this easier, that we should bear in mind? It seems to me that this could be a separate proposal without interdepencies with the other things that we are discussing here, but maybe I am wrong.

Generally speaking I am of the opinion that the Segmentation object should simply encode the segmentations and their semantics, and viewers are free to choose how to render them, perhaps with reference to a presentation state if desired. But then again, I don't write any viewers :)

I definitely want to make sure we don't do anything that makes viewers' lives harder

The reason I think this should be done here is that an appropriate labelmap definition of the segmentation objects makes it easy to define what is meant by a rendered view of that object, and allows creating very simple viewers which just use the already rendered version - eg for things like thumbnails. Such a definition also tends to make views of segmentations more consistent between viewers because it forces inclusion of at least a minimal set of colormap and transparency definitions, and specifies things like overlap colors. What I'm thinking about labelmaps is something like:

3 -> { color: #ff000030, labels: 'Left Ventricle', ... }
4 -> { color: #ffff0030, labels: ['Left Ventricle', 'Heart'], ... }
so that there is at least some presentation information in addition to the labelling information.

That is then fairly obvious how to render it, as semi-transparent colours overlayed on top of grayscale images is fairly well defined. It also works as shown for overlapping segmentations (even if the particular sample data doesn't look realistic).

I will try to be available to comment/help on the DICOMweb section of it - which is really mostly about ensuring that how to render the segmentations is at least partly defined.

@fedorov
Copy link
Member

fedorov commented May 30, 2023

I'm not sure I agree with removing the practice of only encoding non-blank frames

@JamesAPetts The purpose of doing that would be to allow a concise definition of the overall volume geometry of the segmentation. With NRRD-like, you read the tiny header, and you know exactly how to lay out the volume in memory and how to fill it up from the pixel data. With DICOM, you currently need to iterate over all per-frame FGs before you know that.

We cannot fix it in the current SEG, but if the LABELMAP mandates regular sampling of the volume that encloses all of the segments it contains, we might be able to achieve the above. Shared FGs can then contain orientation/spacing, and per-frame image position can be defined in terms of inter-slice spacing defined in the shared FGs. I did not mean that the empty slices on top/bottom of the segmentation should be encoded. I.e., if you have a whole body CT, and segmentation of the heart and liver, empty slices in between would be included, but not above the heart or below the liver.

But Chris made a point, which may be the killer of many of those suggestions - since SEG is (and, most likely, LABELMAP will) share components of the broader enhanced multiframe family of objects, and backward compatibility will need to be maintained, there are hard limits on what can be revisited. Fortunately, @dclunie will be at the PW in person to guide this development appropriately. Let's keep the fingers crossed it is actually feasible to improve within the standard boundaries!

What I'm thinking about labelmaps is something like:

3 -> { color: #ff000030, labels: 'Left Ventricle', ... }
4 -> { color: #ffff0030, labels: ['Left Ventricle', 'Heart'], ... }
so that there is at least some presentation information in addition to the labelling information.

@wayfarer3130 I am confused why you list 2 labels accompanying "4", but other than that, there is already a mechanism to allow encoding color alongside the semantics of the segment.

https://dicom.innolitics.com/ciods/segmentation/segmentation-image/00620002

image

This looks like the following when instantiated: https://viewer.imaging.datacommons.cancer.gov/viewer/1.3.6.1.4.1.14519.5.2.1.7311.5101.170561193612723093192571245493?seriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.7311.5101.206828891270520544417996275680,1.2.276.0.7230010.3.1.3.1070885483.15960.1599120307.701

image

image

What is missing in the current definition?

@fedorov
Copy link
Member

fedorov commented May 30, 2023

@wayfarer3130 I missed your point re color for overlapping segmentations, I get it now I think. But I think something like this may belong to Presentation States, not Segmentation - would you agree?

@lassoan
Copy link
Contributor

lassoan commented May 30, 2023

What is missing in the current definition?

There could be many visualization options that could be useful to have in the segmentation file (if a segment is displayed or hidden by default; 3D opacity - so that if the skin surface is segmented you could still see the other segments, etc.), but these all could be defined at a later point and/or separately in presentation states (so that you can change appearance of the segmentation without recreating the segmentation). Storing color information in the segmentation object is already conceptually questionable, but it is just very convenient in practice that you don't need add a separate file for this.

broader enhanced multiframe family of objects, and backward compatibility will need to be maintained

Most "multiframe" images and segmentations are actually 3D volumes (parallel slices, orthogonal axes, uniform slice spacing along each axis). It could make sense to express this explicitly in DICOM so that applications do not need to deduce this from costly and complex inspection of per-frame metadata. Maybe it is already available in the standard we just need to start using this? This information could be stored in extra fields, so that applications that recognize these new fields could be more efficient, without impacting existing applications. For segmentations, we would not even need to store per-frame metadata for 3D volumes, as there would be no legacy applications to worry about and per-frame metadata can be computed very easily for 3D volumes.

I'm not sure I agree with removing the practice of only encoding non-blank frames

@JamesAPetts The purpose of doing that would be to allow a concise definition of the overall volume geometry of the segmentation. With NRRD-like, you read the tiny header, and you know exactly how to lay out the volume in memory and how to fill it up from the pixel data.

In Slicer, we initially chose to crop the segmentation to the minimum necessary bounding box. However, users struggled with this a lot, so after a few years we switched to export the entire volume (without cropping) by default. Including empty slices did not lead to perceivable difference in compression time and storage size when we used zlib compression. In other use cases (WSI, etc.) empty slices may make a significant difference, so keeping an option for sparse volumes could be useful.

@wayfarer3130
Copy link

@wayfarer3130 I missed your point re color for overlapping segmentations, I get it now I think. But I think something like this may belong to Presentation States, not Segmentation - would you agree?

No, I think the base segmentation should define the colors and transparency/opacity levels as a base part of the standard so that the segmentation is reasonably well defined across viewers as to the basic representation. There might be lots of other representations stored to presentation states, but it shouldn't require a presentation state to create a well defined rendered view of the segmentation.

@fedorov
Copy link
Member

fedorov commented May 30, 2023

Most "multiframe" images and segmentations are actually 3D volumes (parallel slices, orthogonal axes, uniform slice spacing along each axis). It could make sense to express this explicitly in DICOM so that applications do not need to deduce this from costly and complex inspection of per-frame metadata. Maybe it is already available in the standard we just need to start using this?

I think DimensionOrganizationType = 3D might be it actually. Maybe this one can be used in the new LABELMAP object?

https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.17.html

@wayfarer3130
Copy link

In terms of the WSI display of segmentations, all I'm asking is to make it not incompatible with whole slide imaging as well as not incompatible with other types of non-volumetric imaging such as simple DX or CR scans. I would vote against any proposal that excluded imaging modalities because that probably means the proposal hasn't been well enough thought out yet.

@CPBridge
Copy link
Contributor Author

I think DimensionOrganizationType = 3D might be it actually. Maybe this one can be used in the new LABELMAP object?

Will respond further later, but I am aware of this but think it needs clarification. Does it require that ImageOrientation is in the shared functional groups? It appears not to. Is this incompatible with omitting empty frames?

@fedorov
Copy link
Member

fedorov commented Jun 1, 2023

I would vote against any proposal that excluded imaging modalities because that probably means the proposal hasn't been well enough thought out yet.

It is a provocative thought. Is RTSTRUCT compatible with SM? Is SM Bulk Annotation compatible with MR? Is such compatibility and resulting complexity truly warranted?

I completely agree compatibility of this kind should be considered and explored, but I also strongly believe there are limits on trying to make things compatible across domains that have very different needs, communities and experiences. It is not a black and white situation. I would caution against making decision on whether to vote or not for a specific proposal based on such a general requirement.

@sjh26
Copy link
Contributor

sjh26 commented Jun 8, 2023

closing with #710

@sjh26 sjh26 closed this as completed Jun 8, 2023
@dclunie
Copy link
Collaborator

dclunie commented Jun 8, 2023

A couple of additional thoughts on this from the perspective of what is already in the standard and what might need to be added:

  • you can't mess with the existing SOP Class in a manner that is not backward compatible, so even if you were to try to re-use the existing SEG IOD, new parts of it (such as a labelmap) would need to be defined as conditional (on the new SOP Class), so as to not break existing implementations - cleanest is just to create a new labelmap IOD and SOP class and reuse relevant parts of the existing Segmentation IOD
  • you need to think about PhotometricInterpretation and what value is semantically accurate and is appropriate for the compression schemes; a labelmap is an index and the values are arbitrary and not monotonic; this has an impact on the model underlying compression schemes that care about that (e.g., have a predictor), in theory if not in practice; so, the closest thing in DICOM is PALETTE COLOR (i.e., you can't use MONOCHROME2) - this has three implications
    (1) there can only be one label map in an instance (because the index values have to be in the same "space"),
    (2) if color information is to be sent it would normally be in a color palette table rather than the SEG way (though the NM IOD, unusually, allows PALETTE COLOR and does not require a color LUT to be encoded (colors are unspecified),
    (3) for PALETTE COLOR only lossless compression schemes are applicable and even though permitted may not work as well as expected, depending on the values chosen for the indices and their spatial relationships
  • DICOM already has a means of identifying and sending the color palette separately (e.g., as a well-known or user supplied stored palette that is patient independent), and one might want to extend this mechanism to also include the semantics of each label (e.g., add codes to the color palette IOD or something like it)
  • what bit depth is needed for labelmap pixel data - is 8 (OB) or 16 (OW) sufficient or do you also want to allow for 32 (OL) or 64 (OV), recognizing that toolkit support may be an issue for the larger ones - for instance of class label maps maybe the larger the better, but if you go that route, how do you encode what class the instance is of (answer is probably separate SOP instances, one for each class, indices are only instances not classes) - as a use case, to encode every nucleus in a whole slide image labelmap at with 100 nuclei per tile and 200,000 tiles 32 bits would be more than sufficient but 16 bits would not.
  • what bit depth is needed for the color values if they are encoded as a color palette - is 8 sufficient or is 16 needed?

@CPBridge
Copy link
Contributor Author

CPBridge commented Jun 8, 2023

I have now created a full project definition for this project: #710

I also wanted to follow up on the broad topic of the per-frame metadata. I generally feel that the flexibility/expressiveness of the per-frame functional groups and dimension organization sequence is a strength but the cost in terms of simplicity for the majority of simple use cases is too high. Having written code in highdicom to interpret the metadata and use it to "reconstruct" segmentation masks from the frames, I can say that this process feels far more complicated than it ought to be. However I do believe it is possible to allow the flexibility for those who need it while introducing optional attributes that make it much simpler to work in the very common special cases. There are two major related issues currently:

  1. The overwhelming majority of segmentations store frames in regularly spaced increments along each dimension. A receiver should be able to determine whether this is true, and if so determine the spacing along the dimension, without having to parse the metadata of each frame and perform arithmetic operations to determine the spacing. When thinking about the very common case of 3D images, there is a mechanism by which the creator can convey that planes are equally spaced in 3D space by setting the DimensionOrganizationType to '3D'. This helps a bit, but does not require that SpacingBetweenSlices attribute be present in the SharedFunctionalGroupsSequence, so the receiver in the general case still needs to calculate the spacing for themselves. Neither does it actually require the ImageOrientationPatient to be present in the SharedFunctionalGroupsSequence. So really the '3D' DimensionOrganization is largely "toothless". It is worth noting that nothing in this issue is specific to Segmentations, it is true of all multiframe DICOM object. Furthermore I would assume this rules out omitting empty slices, though this is not totally clear to me right now (for this reason highdicom does not currently ever create segmentations using the 3D dimension organization type).

  2. In my opinion, a much worse problem is when the above is combined with the fact that empty slices may be omitted from segmentations. I actually feel that this is a very well-motivated decision, since segmentations of medical images are often very sparse (think segmentation of lymph node in a chest/abdomen CT) and it makes sense to save the space by omitting them, but the way it has been implemented gives rise to a number of problems. The core of the issue is that there is nowhere in the segmentation object to store information about the slices that were omitted due to being empty. The first problem this gives rise to is ambiguous semantics: if a slice is not explicitly listed as a source image of any segmentation frame, does that mean that it wasn't segmented or that it was segmented and the segment(s) was/were found not to be present so the slice was omitted when the segmentation was created? These are semantically very different but not distinguished currently. The second is the "reversibility" problem. Programmers working with Seg objects would reasonably assume that if they pass a segmentation mask to a routine to create a Segmentation instance, store that file, then read it in again, that they would be able to recover exactly the same mask that they put in. In fact this is not possible currently, because there is nowhere in the segmentation object to store information about the slices that were omitted due to being empty. Currently, in highdicom we at least store the ordered list of segmented instances in the ReferencedSeriesSequence at the root of the object, but this is just our convention and not one that can be relied upon to be understood by other implementations.

Having thought about this on and off for a while, I am fairly convinced that my strong preference for the best way to fix all of these issues would be to do (optionally) for 3D images what is done for tiled images by introducing the concept of a "3D TotalPixelMatrix" (a TotalVoxelVolume?), and link it to a new value of DimensionOrganizationType (e.g. "3D_VOLUME") that actually implies some requirements. The 3D array described by the TotalVoxelVolume would conceptually exist, even if not every voxel within it is actually explicitly encoded within the dataset. Analogously to the TotalPixelMatrix's TotalPixelMatrixOriginSequence TotalPixelMatrixRows and TotalPixelMatrixColumns, the information about the origin and full size of this TotalVoxelVolume would be explicitly recorded (and I would also add spacing between slices as a requirement), and individual slices could give their SlicePositionInTotalVoxelVolume, analogously to RowPositionInTotalImagePixelMatrix and ColumnPositionInTotalImagePixelMatrix are now used for tiled images. This way it would be very clear that the slices present exist within a known 3D volume with an explicitly defined spatial affine matrix. I would be very interested to hear people's thoughts on this, and its plausibility. I would like to discuss this at project week, but I'm not sure whether we will have the time to make this concrete (even assuming there would be a consensus behind it).

Failing this, I would propose that we either disallow omission of slices from our new segmentation IOD or introduce some mechanism that stores information about omitted frames somewhere in the segmentation instance.

@CPBridge
Copy link
Contributor Author

CPBridge commented Jun 8, 2023

Thanks @dclunie for the thoughtful reply. This all makes sense to me.

cleanest is just to create a new labelmap IOD and SOP class and reuse relevant parts of the existing Segmentation IOD

Originally I was hoping we could avoid this (by creating a new value of SegmentationType) but at this point I think I am convinced that a new IOD is required. However, I also feel that this shouldn't mean that we "abandon" the old one, since I feel that there would still be value in binary segmentation IOD and I believe there are things there that could be improved (mostly regarding pixel compression).

you need to think about PhotometricInterpretation and what value is semantically accurate and is appropriate for the compression schemes

I agree that using PALETTE COLOR is the best option currently available. I take your point about the issue of the values being non-ordinal and this breaking the assumptions of compression schemes and therefore being potentially sub-optimal. In practice I am not too concerned as I suspect that, say JPEG LS Lossless would still work well, if sub-optimally, on these images and would be considerably better than the current situation and therefore would be a practical compromise. It would be good to do some experiments though to see how well it works in practice.

I am less keen on the semantics of PALETTE COLOR as I feel that segmentation pixel values semantically encode the labels rather than colors. It certainly makes sense for the segmentation SOP Class to suggest that certain colors could be used for display, but a viewer should be at liberty to change this and display the segments how it likes, in my opinion. Would it be crazy to define a new PhotometricInterpretation that is largely similar to PALETTE COLOR practically speaking but does not explicitly imply a color mapping?

there can only be one label map in an instance (because the index values have to be in the same "space")

Yes, I would also prefer this, such that the existing SegmentSequence does not have to change. There would have to be a little thought about how to do this if we introduce "layers" but I do not foresee any insurmountable problems

one might want to extend this mechanism to also include the semantics of each label (e.g., add codes to the color palette IOD or something like it)

I have to say that I am not at all keen on this idea. I feel that the meaning of the segmentations should be encoded within the segmentation if the object is to act as a clinical record of some sort of segmentation process.

how do you encode what class the instance is of

I think we need to make sure we are discussing the same thing here. To use the terminology of computer vision, semantic segmentation is where pixel values denote the "class" of the pixels (e.g. "nucleus") and instance segmentation is where the pixel values denote the instance of the class (e.g. "nucleus number 123456"). I have primarily been discussing semantic segmentation, however it would be nice to be able to support instance segmentation too (so that the segment description only needs to appear once). Probably we could do this quite easily with a single code string attribute telling you which of the two it is, and in the case of instance segmentation, limit the segment sequence to length 1 with all pixel values then representing a single instance of that single class. However I do not think it would be wise to try and support a mixture of the two (i.e. multiple classes each of which have potentially multiple distinct instances) within a single instance. That would add considerable complexity. I don't know of any format that can realistically do that, even the vaunted nrrd.

what bit depth is needed for labelmap pixel data - is 8 (OB) or 16 (OW) sufficient or do you also want to allow for 32 (OL) or 64 (OV), recognizing that toolkit support may be an issue for the larger ones

This is a good point. I would probably err towards allowing up to 32 bit (I can only imagine this being practical for "instance segmentation" style arrays rather than "semantic segmentation style arrays" otherwise the segment metadata would get absurd) accepting that there may be some work required on toolkits to support this. I anticipate that 16 bit would be sufficient for the overwhelming majority of cases so most segmentations would be usable immediately. Definitely something to discuss further.

what bit depth is needed for the color values if they are encoded as a color palette - is 8 sufficient or is 16 needed?

I don't know, would probably want to think this through further with someone who writes a viewer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests