Skip to content
This repository has been archived by the owner on Aug 4, 2023. It is now read-only.

Slicer DICOM UID org root #16

Closed
fedorov opened this issue Dec 12, 2013 · 8 comments
Closed

Slicer DICOM UID org root #16

fedorov opened this issue Dec 12, 2013 · 8 comments
Assignees
Labels

Comments

@fedorov
Copy link
Member

fedorov commented Dec 12, 2013

As we plan to create DICOM objects in Slicer, should we consider using a proper UID org root? Or should is there one in DCMTK that we should use?

Steve, you mentioned there was an effort in the past to get this UID root for Slicer - should we follow up on that effort?

@ghost ghost assigned pieper Dec 12, 2013
@pieper
Copy link
Member

pieper commented Dec 12, 2013

At one point I got an SPL root as a subpart of the BWH space but I didn't end up using it. We could probably dig it up or get a new one. We should discuss the nature of what these roots mean and how we want to treat them. Ideally the operator of the software (the institution) should provide the org root, not the software itself. For example, we would not want people to generate documents that claimed to be from BWH.

As I understand it the spec calls for the UID to be strictly globally unique (which is sometimes possible, but actually typically means 'probably unique'), with some pseudo provenance property about the organization (hospital) that generated it. This is unlike git hashes, which are essentially random numbers that very very unlikely to be non-unique and contain no other information. So we need to take care in how we handle UIDs to live up to the spirit of the spec. Also we should check for clarification in the spec itself, since it's been many years since I read it in detail (probably the 'non-organizational' approach is a well established convention by now). Slicer should at least allow people to do it right if they want to put in the effort, but slicer should still generate valid documents with a default configuration.

Right now slicer uses the gdcm generator, which relies on a default root and the MAC address of the network adapter (which has caused some practical issues, since gdcm printed a message to stderr on linux machines when in airplane mode and that screwed up some CLI processing, but more importantly probably meant it could not generate a valid UID without network adapter). DCMTK has a documented algorithm about how the UIDs are generated and it looks robust.

http://www.medicalconnections.co.uk/kb/UID_Rules

http://stackoverflow.com/questions/10295792/how-to-generate-sopinstance-uid-for-dicom-file

http://en.wikipedia.org/wiki/Universally_unique_identifier

http://forum.dcmtk.org/viewtopic.php?f=4&t=910&sid=dcbebbe5e89764b36e0dc76540347b4d

http://support.dcmtk.org/docs-dcmrt/classOFUUID.html

@dclunie
Copy link
Member

dclunie commented Dec 12, 2013

Hi guys

I doubt there is a need for a Slicer-specific root.

Marco also commented on dcmtk's approach again here:

http://forum.dcmtk.org/viewtopic.php?t=783

Note the dependence on a timestamp and machine ID.

I usually use a similar mechanism.

If we are going to use dcmtk for the project, I would stick with
the mechanism they use rather than considering other alternatives,
unless there is significant concern about it.

The matter of being able to generate the same UID repeatably
is primarily a concern during testing ... e.g., if one has an
output file from a previous run and wants to compare them, it
is easier if the UIDs don't change. On the other hand, if a
tool runs in production then the same UID should never be reissued
(unless you can guarantee that the UID refers to exactly the same
content). Sometimes this is desirable, e.g., round trip conversions
from one form to another like single to multi-frame, for example,
but this is usually achieved by recording what the previous UIDs
were.

One can conceive of convoluted ways to manipulate UID generation
such they are repeatable during specific tests but not in
production, or filters on test results that exclude UID
differences (which eliminates binary file comparison).

An alternative approach that is also not deterministic (at least
to the extent that you cannot generate the same UUID twice)
is just to use UUIDs that are converted to UIDs (and thus
depend on the reliability of whatever UUID source you have
access to).

Steve already included a link to the Wikipedia description
of UUIDs.

See DICOM PS 3.5 Annex B.2, or the CP that described this:

http://www.dclunie.com/dicom-status/status.html#CP1156

Also:

http://www.dclunie.com/medical-image-faq/html/part2.html#UUID

E.g., the UUID

f81d4fae-7dec-11d0-a765-00a0c91e6bf6

becomes the DICOM UID

2.25.329800735698586629295641978511506172918

Also, see:

http://www.dclunie.com/pixelmed/software/javadoc/com/pixelmed/utils/UUIDBasedOID.html

David

On 12/12/13 7:57 AM, Steve Pieper wrote:

At one point I got an SPL root as a subpart of the BWH space but I didn't end up using it. We could probably dig it up or get a new one. We should discuss the nature of what these roots mean and how we want to treat them. Ideally the operator of the software (the institution) should provide the org root, not the software itself. For example, we would not want people to generate documents that claimed to be from BWH.

As I understand it the spec calls for the UID to be strictly globally unique (which is sometimes possible, but actually typically means 'probably unique'), with some pseudo provenance property about the organization (hospital) that generated it. This is unlike git hashes, which are essentially random numbers that very very unlikely to be non-unique and contain no other information. So we need to take care in how we handle UIDs to live up to the spirit of the spec. Also we should check for clarification in the spec itself, since it's been many years since I read it in detail (probably the 'non-organizational' approach is a well established convention by now). Slicer should at least allow people to do it right if they want to put in the effort, but slicer should still generate valid documents with a default configuration.

Right now slicer uses the gdcm generator, which relies on a default root and the MAC address of the network adapter (which has caused some practical issues, since gdcm printed a message to stderr on linux machines when in airplane mode and that screwed up some CLI processing, but more importantly probably meant it could not generate a valid UID without network adapter). DCMTK has a documented algorithm about how the UIDs are generated and it looks robust.

http://www.medicalconnections.co.uk/kb/UID_Rules

http://stackoverflow.com/questions/10295792/how-to-generate-sopinstance-uid-for-dicom-file

http://en.wikipedia.org/wiki/Universally_unique_identifier

http://forum.dcmtk.org/viewtopic.php?f=4&t=910&sid=dcbebbe5e89764b36e0dc76540347b4d

http://support.dcmtk.org/docs-dcmrt/classOFUUID.html


Reply to this email directly or view it on GitHub:
#16 (comment)

@pieper
Copy link
Member

pieper commented Dec 12, 2013

Thanks for the clarification and info David!

Personally, I like the 2.25 approach since it treats everything the same
and everyone agrees that 'highly highly low probability of collisions' is
good enough for all practical purposes. But I'm also fine with using the
dcmtk default. If we have time or motivation to work on it, I think the
provenance of the software/hardware system that created the object should
be more explicitly identified and the prefix of the UID shouldn't carry any
particular meaning. Ideally if data is meant to be trusted it should be
signed rather than relying on a property of the UID. For testing we should
probably ignore the UID when comparing results from different runs.

-Steve

On Thu, Dec 12, 2013 at 9:30 AM, dclunie notifications@github.com wrote:

Hi guys

I doubt there is a need for a Slicer-specific root.

Marco also commented on dcmtk's approach again here:

http://forum.dcmtk.org/viewtopic.php?t=783

Note the dependence on a timestamp and machine ID.

I usually use a similar mechanism.

If we are going to use dcmtk for the project, I would stick with
the mechanism they use rather than considering other alternatives,
unless there is significant concern about it.

The matter of being able to generate the same UID repeatably
is primarily a concern during testing ... e.g., if one has an
output file from a previous run and wants to compare them, it
is easier if the UIDs don't change. On the other hand, if a
tool runs in production then the same UID should never be reissued
(unless you can guarantee that the UID refers to exactly the same
content). Sometimes this is desirable, e.g., round trip conversions
from one form to another like single to multi-frame, for example,
but this is usually achieved by recording what the previous UIDs
were.

One can conceive of convoluted ways to manipulate UID generation
such they are repeatable during specific tests but not in
production, or filters on test results that exclude UID
differences (which eliminates binary file comparison).

An alternative approach that is also not deterministic (at least
to the extent that you cannot generate the same UUID twice)
is just to use UUIDs that are converted to UIDs (and thus
depend on the reliability of whatever UUID source you have
access to).

Steve already included a link to the Wikipedia description
of UUIDs.

See DICOM PS 3.5 Annex B.2, or the CP that described this:

http://www.dclunie.com/dicom-status/status.html#CP1156

Also:

http://www.dclunie.com/medical-image-faq/html/part2.html#UUID

E.g., the UUID

f81d4fae-7dec-11d0-a765-00a0c91e6bf6

becomes the DICOM UID

2.25.329800735698586629295641978511506172918

Also, see:

http://www.dclunie.com/pixelmed/software/javadoc/com/pixelmed/utils/UUIDBasedOID.html

David

On 12/12/13 7:57 AM, Steve Pieper wrote:

At one point I got an SPL root as a subpart of the BWH space but I
didn't end up using it. We could probably dig it up or get a new one. We
should discuss the nature of what these roots mean and how we want to treat
them. Ideally the operator of the software (the institution) should provide
the org root, not the software itself. For example, we would not want
people to generate documents that claimed to be from BWH.

As I understand it the spec calls for the UID to be strictly globally
unique (which is sometimes possible, but actually typically means 'probably
unique'), with some pseudo provenance property about the organization
(hospital) that generated it. This is unlike git hashes, which are
essentially random numbers that very very unlikely to be non-unique and
contain no other information. So we need to take care in how we handle UIDs
to live up to the spirit of the spec. Also we should check for
clarification in the spec itself, since it's been many years since I read
it in detail (probably the 'non-organizational' approach is a well
established convention by now). Slicer should at least allow people to do
it right if they want to put in the effort, but slicer should still
generate valid documents with a default configuration.

Right now slicer uses the gdcm generator, which relies on a default root
and the MAC address of the network adapter (which has caused some practical
issues, since gdcm printed a message to stderr on linux machines when in
airplane mode and that screwed up some CLI processing, but more importantly
probably meant it could not generate a valid UID without network adapter).
DCMTK has a documented algorithm about how the UIDs are generated and it
looks robust.

http://www.medicalconnections.co.uk/kb/UID_Rules

http://stackoverflow.com/questions/10295792/how-to-generate-sopinstance-uid-for-dicom-file

http://en.wikipedia.org/wiki/Universally_unique_identifier

http://forum.dcmtk.org/viewtopic.php?f=4&t=910&sid=dcbebbe5e89764b36e0dc76540347b4d

http://support.dcmtk.org/docs-dcmrt/classOFUUID.html


Reply to this email directly or view it on GitHub:

#16 (comment)


Reply to this email directly or view it on GitHubhttps://github.com//issues/16#issuecomment-30426435
.

The information in this e-mail is intended only for the person to whom it
is
addressed. If you believe this e-mail was sent to you in error and the
e-mail
contains patient information, please contact the Partners Compliance
HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in
error
but does not contain patient information, please contact the sender and
properly
dispose of the e-mail.

@dclunie
Copy link
Member

dclunie commented Dec 12, 2013

Hi Steve

Provenance should NEVER be assumed from UIDs (both because you
are not supposed to parse them, and because some systems will
mess with them during ingestion for one reason or another,
especially in a research/clinical trial context that requires
de-identification).

There are a whole bunch of attributes specifically designed to
encode provenance, assuming that the original Manufacturer,
Manufacturer's Model Name, Device Serial Number and Software
Versions in the top level data set are not sufficient (or are
left alone/copied from the values supplied by the scanner).

Specifically, the Contributing Equipment Sequence is designed
for this. See PS 3.3 C.12.1 SOP Common Module. To summarize
it is multi-valued (multiple items) and includes:

Contributing Equipment Sequence

Purpose of Reference Code Sequence

Include ‘Code Sequence Macro’ Table 8.8-1
Manufacturer
Institution Name
Institution Address
Station Name
Institutional Department Name
Operators' Name
Operator Identification Sequence
Include ‘Person Identification Macro’ Table 10-1
Manufacturer’s Model Name
Device Serial Number
Software Versions
Spatial Resolution
Date of Last Calibration
Time of Last Calibration
Contribution DateTime
Contribution Description

For SR objects, additional mechanisms are available in various templates
that describe "observer context".

I think we should avoid the matter of electronic or digital signatures.
Cryptographic digital signatures are defined in DICOM, and are supported
in dcmtk, but raise all sorts of issues and are never used in practice.

David

On 12/12/13 9:58 AM, Steve Pieper wrote:

Thanks for the clarification and info David!

Personally, I like the 2.25 approach since it treats everything the same
and everyone agrees that 'highly highly low probability of collisions' is
good enough for all practical purposes. But I'm also fine with using the
dcmtk default. If we have time or motivation to work on it, I think the
provenance of the software/hardware system that created the object should
be more explicitly identified and the prefix of the UID shouldn't carry any
particular meaning. Ideally if data is meant to be trusted it should be
signed rather than relying on a property of the UID. For testing we should
probably ignore the UID when comparing results from different runs.

-Steve

On Thu, Dec 12, 2013 at 9:30 AM, dclunie notifications@github.com wrote:

Hi guys

I doubt there is a need for a Slicer-specific root.

Marco also commented on dcmtk's approach again here:

http://forum.dcmtk.org/viewtopic.php?t=783

Note the dependence on a timestamp and machine ID.

I usually use a similar mechanism.

If we are going to use dcmtk for the project, I would stick with
the mechanism they use rather than considering other alternatives,
unless there is significant concern about it.

The matter of being able to generate the same UID repeatably
is primarily a concern during testing ... e.g., if one has an
output file from a previous run and wants to compare them, it
is easier if the UIDs don't change. On the other hand, if a
tool runs in production then the same UID should never be reissued
(unless you can guarantee that the UID refers to exactly the same
content). Sometimes this is desirable, e.g., round trip conversions
from one form to another like single to multi-frame, for example,
but this is usually achieved by recording what the previous UIDs
were.

One can conceive of convoluted ways to manipulate UID generation
such they are repeatable during specific tests but not in
production, or filters on test results that exclude UID
differences (which eliminates binary file comparison).

An alternative approach that is also not deterministic (at least
to the extent that you cannot generate the same UUID twice)
is just to use UUIDs that are converted to UIDs (and thus
depend on the reliability of whatever UUID source you have
access to).

Steve already included a link to the Wikipedia description
of UUIDs.

See DICOM PS 3.5 Annex B.2, or the CP that described this:

http://www.dclunie.com/dicom-status/status.html#CP1156

Also:

http://www.dclunie.com/medical-image-faq/html/part2.html#UUID

E.g., the UUID

f81d4fae-7dec-11d0-a765-00a0c91e6bf6

becomes the DICOM UID

2.25.329800735698586629295641978511506172918

Also, see:

http://www.dclunie.com/pixelmed/software/javadoc/com/pixelmed/utils/UUIDBasedOID.html

David

On 12/12/13 7:57 AM, Steve Pieper wrote:

At one point I got an SPL root as a subpart of the BWH space but I
didn't end up using it. We could probably dig it up or get a new one. We
should discuss the nature of what these roots mean and how we want to treat
them. Ideally the operator of the software (the institution) should provide
the org root, not the software itself. For example, we would not want
people to generate documents that claimed to be from BWH.

As I understand it the spec calls for the UID to be strictly globally
unique (which is sometimes possible, but actually typically means 'probably
unique'), with some pseudo provenance property about the organization
(hospital) that generated it. This is unlike git hashes, which are
essentially random numbers that very very unlikely to be non-unique and
contain no other information. So we need to take care in how we handle UIDs
to live up to the spirit of the spec. Also we should check for
clarification in the spec itself, since it's been many years since I read
it in detail (probably the 'non-organizational' approach is a well
established convention by now). Slicer should at least allow people to do
it right if they want to put in the effort, but slicer should still
generate valid documents with a default configuration.

Right now slicer uses the gdcm generator, which relies on a default root
and the MAC address of the network adapter (which has caused some practical
issues, since gdcm printed a message to stderr on linux machines when in
airplane mode and that screwed up some CLI processing, but more importantly
probably meant it could not generate a valid UID without network adapter).
DCMTK has a documented algorithm about how the UIDs are generated and it
looks robust.

http://www.medicalconnections.co.uk/kb/UID_Rules

http://stackoverflow.com/questions/10295792/how-to-generate-sopinstance-uid-for-dicom-file

http://en.wikipedia.org/wiki/Universally_unique_identifier

http://forum.dcmtk.org/viewtopic.php?f=4&t=910&sid=dcbebbe5e89764b36e0dc76540347b4d

http://support.dcmtk.org/docs-dcmrt/classOFUUID.html


Reply to this email directly or view it on GitHub:

#16 (comment)


Reply to this email directly or view it on GitHubhttps://github.com//issues/16#issuecomment-30426435
.

The information in this e-mail is intended only for the person to whom it
is
addressed. If you believe this e-mail was sent to you in error and the
e-mail
contains patient information, please contact the Partners Compliance
HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in
error
but does not contain patient information, please contact the sender and
properly
dispose of the e-mail.


Reply to this email directly or view it on GitHub:
#16 (comment)

@pieper
Copy link
Member

pieper commented Dec 12, 2013

Right - we're on the same wavelength here - that's why I prefer the
2.5.UUID version of the UID since it explicitly removes the temptation to
interpret anything about the UID prefix.

And yes, we want to really get the provenance concepts deeply integrated
into our plans.

On Thu, Dec 12, 2013 at 10:26 AM, dclunie notifications@github.com wrote:

Hi Steve

Provenance should NEVER be assumed from UIDs (both because you
are not supposed to parse them, and because some systems will
mess with them during ingestion for one reason or another,
especially in a research/clinical trial context that requires
de-identification).

There are a whole bunch of attributes specifically designed to
encode provenance, assuming that the original Manufacturer,
Manufacturer's Model Name, Device Serial Number and Software
Versions in the top level data set are not sufficient (or are
left alone/copied from the values supplied by the scanner).

Specifically, the Contributing Equipment Sequence is designed
for this. See PS 3.3 C.12.1 SOP Common Module. To summarize
it is multi-valued (multiple items) and includes:

Contributing Equipment Sequence

Purpose of Reference Code Sequence

Include ‘Code Sequence Macro’ Table 8.8-1
Manufacturer
Institution Name
Institution Address
Station Name
Institutional Department Name
Operators' Name
Operator Identification Sequence
Include ‘Person Identification Macro’ Table 10-1
Manufacturer’s Model Name
Device Serial Number
Software Versions
Spatial Resolution
Date of Last Calibration
Time of Last Calibration
Contribution DateTime
Contribution Description

For SR objects, additional mechanisms are available in various templates
that describe "observer context".

I think we should avoid the matter of electronic or digital signatures.
Cryptographic digital signatures are defined in DICOM, and are supported
in dcmtk, but raise all sorts of issues and are never used in practice.

David

On 12/12/13 9:58 AM, Steve Pieper wrote:

Thanks for the clarification and info David!

Personally, I like the 2.25 approach since it treats everything the same
and everyone agrees that 'highly highly low probability of collisions' is
good enough for all practical purposes. But I'm also fine with using the
dcmtk default. If we have time or motivation to work on it, I think the
provenance of the software/hardware system that created the object should
be more explicitly identified and the prefix of the UID shouldn't carry
any
particular meaning. Ideally if data is meant to be trusted it should be
signed rather than relying on a property of the UID. For testing we
should
probably ignore the UID when comparing results from different runs.

-Steve

On Thu, Dec 12, 2013 at 9:30 AM, dclunie notifications@github.com
wrote:

Hi guys

I doubt there is a need for a Slicer-specific root.

Marco also commented on dcmtk's approach again here:

http://forum.dcmtk.org/viewtopic.php?t=783

Note the dependence on a timestamp and machine ID.

I usually use a similar mechanism.

If we are going to use dcmtk for the project, I would stick with
the mechanism they use rather than considering other alternatives,
unless there is significant concern about it.

The matter of being able to generate the same UID repeatably
is primarily a concern during testing ... e.g., if one has an
output file from a previous run and wants to compare them, it
is easier if the UIDs don't change. On the other hand, if a
tool runs in production then the same UID should never be reissued
(unless you can guarantee that the UID refers to exactly the same
content). Sometimes this is desirable, e.g., round trip conversions
from one form to another like single to multi-frame, for example,
but this is usually achieved by recording what the previous UIDs
were.

One can conceive of convoluted ways to manipulate UID generation
such they are repeatable during specific tests but not in
production, or filters on test results that exclude UID
differences (which eliminates binary file comparison).

An alternative approach that is also not deterministic (at least
to the extent that you cannot generate the same UUID twice)
is just to use UUIDs that are converted to UIDs (and thus
depend on the reliability of whatever UUID source you have
access to).

Steve already included a link to the Wikipedia description
of UUIDs.

See DICOM PS 3.5 Annex B.2, or the CP that described this:

http://www.dclunie.com/dicom-status/status.html#CP1156

Also:

http://www.dclunie.com/medical-image-faq/html/part2.html#UUID

E.g., the UUID

f81d4fae-7dec-11d0-a765-00a0c91e6bf6

becomes the DICOM UID

2.25.329800735698586629295641978511506172918

Also, see:

http://www.dclunie.com/pixelmed/software/javadoc/com/pixelmed/utils/UUIDBasedOID.html

David

On 12/12/13 7:57 AM, Steve Pieper wrote:

At one point I got an SPL root as a subpart of the BWH space but I
didn't end up using it. We could probably dig it up or get a new one. We
should discuss the nature of what these roots mean and how we want to
treat
them. Ideally the operator of the software (the institution) should
provide
the org root, not the software itself. For example, we would not want
people to generate documents that claimed to be from BWH.

As I understand it the spec calls for the UID to be strictly globally
unique (which is sometimes possible, but actually typically means
'probably
unique'), with some pseudo provenance property about the organization
(hospital) that generated it. This is unlike git hashes, which are
essentially random numbers that very very unlikely to be non-unique and
contain no other information. So we need to take care in how we handle
UIDs
to live up to the spirit of the spec. Also we should check for
clarification in the spec itself, since it's been many years since I
read
it in detail (probably the 'non-organizational' approach is a well
established convention by now). Slicer should at least allow people to
do
it right if they want to put in the effort, but slicer should still
generate valid documents with a default configuration.

Right now slicer uses the gdcm generator, which relies on a default
root
and the MAC address of the network adapter (which has caused some
practical
issues, since gdcm printed a message to stderr on linux machines when in
airplane mode and that screwed up some CLI processing, but more
importantly
probably meant it could not generate a valid UID without network
adapter).
DCMTK has a documented algorithm about how the UIDs are generated and it
looks robust.

http://www.medicalconnections.co.uk/kb/UID_Rules

http://stackoverflow.com/questions/10295792/how-to-generate-sopinstance-uid-for-dicom-file

http://en.wikipedia.org/wiki/Universally_unique_identifier

http://forum.dcmtk.org/viewtopic.php?f=4&t=910&sid=dcbebbe5e89764b36e0dc76540347b4d

http://support.dcmtk.org/docs-dcmrt/classOFUUID.html


Reply to this email directly or view it on GitHub:

#16 (comment)


Reply to this email directly or view it on GitHub<
#16 (comment)

.

The information in this e-mail is intended only for the person to whom
it
is
addressed. If you believe this e-mail was sent to you in error and the
e-mail
contains patient information, please contact the Partners Compliance
HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you
in
error
but does not contain patient information, please contact the sender and
properly
dispose of the e-mail.


Reply to this email directly or view it on GitHub:

#16 (comment)


Reply to this email directly or view it on GitHubhttps://github.com//issues/16#issuecomment-30431557
.

@fedorov
Copy link
Member Author

fedorov commented Dec 12, 2013

Thank you for this lively discussion! I am closing this issue, as it clearly appears we do not need to get a special UID root, which is a good news, and can use DCMTK mechanisms and/or "2.25" approaches for UID generation.

@fedorov fedorov closed this as completed Dec 12, 2013
@michaelonken
Copy link
Member

By the way, DCMTK also permits creation of UIDs based on the 2.25 approach, see ofuuid.h .

@fedorov
Copy link
Member Author

fedorov commented Dec 16, 2013

It was clarified by @dclunie at the today's call that the 2.25 UID generation approach is indeed part of the standard.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants