Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highdicom allows incorrectly formatted "person name" (PN) attributes #56

Closed
CPBridge opened this issue Mar 28, 2021 · 21 comments · Fixed by #75
Closed

Highdicom allows incorrectly formatted "person name" (PN) attributes #56

CPBridge opened this issue Mar 28, 2021 · 21 comments · Fixed by #75
Assignees
Labels
bug Something isn't working

Comments

@CPBridge
Copy link
Collaborator

There are a few places where the highdicom API requests str parameters and directly encodes them as attributes with value representation PN (person name). This includes but (but may not be limited to) the content_creator_name parameter of the segmentation SOP constructor, and the verifying_observer_name parameter of the EnhancedSR, ComprehensiveSR, and Comprehensive3DSR constructors.

The format of a PN attribute is quite specific - you can't just enter free text here. See the PN entry in this table. Briefly, for human names there are five fields (family name, given name, middle name, name prefix, name suffix) that should be separated by caret characters (^). See also the examples in the standard.

Unfortunately, no attempt is made at the pydicom level to enforce or check correct formatting. This propagates to highdicom. Therefore there is no checking or enforcement on these in highdicom, nor any documentation that there is even a format that should be followed. I suspect that the result is that the vast majority of users will pass "John Doe" instead of "Doe^John" and end up with incorrectly formatted attributes.

I consider these formatting details to be far lower level than users of highdicom should have to understand in order to create objects with correctly formatted PN attributes.

I am happy to work on a solution. Here are a few options that come to mind:

  • (My preferred solution): Create a PersonName object (either in the highdicom.content module or a new highdicom.vrmodule perhaps) with a constructor that takes the five parts of the name (family name, given name, middle name, name prefix, name suffix), any of which can be None, and has a method that returns the correctly formatted string. Then change the API of the various parts of the code expecting person names as string to instead expect PersonName objects.
  • Continue to expect strings but check that the format is correct (difficult because in theory each component can contain a space and components can be missing)
  • Try to fix this at the pydicom level.

@hackermd thoughts?

@CPBridge CPBridge self-assigned this Mar 28, 2021
@CPBridge CPBridge added enhancement New feature or request bug Something isn't working and removed enhancement New feature or request labels Mar 28, 2021
@pieper
Copy link
Member

pieper commented Mar 28, 2021

For reference: https://support.dcmtk.org/docs/classDcmPersonName.html

@hackermd
Copy link
Collaborator

I think it would be ideal if we could address that at the pydicom level. I realized that there are several issues with VRs (DS, PN, DA, etc.) and that pydicom does not enforce correct value representation.

@darcymason would you be open to changing the pydicom approach and add functionality for asserting correct value representation (max value length, set of characters, etc.)?

@hackermd
Copy link
Collaborator

Create a PersonName object (either in the highdicom.content module or a new highdicom.vrmodule perhaps) with a constructor that takes the five parts of the name (family name, given name, middle name, name prefix, name suffix), any of which can be None, and has a method that returns the correctly formatted string. Then change the API of the various parts of the code expecting person names as string to instead expect PersonName objects.

I like the approach, but would rely on pydicom.valuerep.PersonName instead. Would that work?

Highdicom callables could then have name: Union[str, pydicom.valuerep.PersonName] and we internally store the attributes using pydicom.valuerep.PersonName.

@CPBridge What do you think?

@darcymason
Copy link

I think it would be ideal if we could address that at the pydicom level. I realized that there are several issues with VRs (DS, PN, DA, etc.) and that pydicom does not enforce correct value representation.

Traditionally pydicom has avoided being overly strict because there are so many files with invalid DICOM that it causes a lot of troubles (just take a look at the issue list). We have put some checks in place over time, though, in conjunction with the config.enforce_valid_values flag, which does now enforce things like valid length and values for DS, IS and a number of other things, if set to True.

I'd be open to PRs adding such DICOM validity checks to PersonName (when the config flag is True).

difficult because in theory each component can contain a space and components can be missing)

Which I think means the example "John Doe" is actually valid, so this can be a bit tricky, as noted.

with a constructor that takes the five parts of the name (family name, given name, middle name, name prefix, name suffix), any of which can be None, ...

I think this could be a good class method on the pydicom PersonName class, if you wish to add it:
e.g. something like

person_name = PersonName.from_components(...)   # or .from_component_groups(...) maybe?

and has a method that returns the correctly formatted string.

If there is something needed beyond the __str__ value, then sure.

Anyway, summarizing, I'd be happy to entertain pydicom PRs for stricter checks in conjunction with the config flag, and e.g. class methods or others that might be useful.

@CPBridge
Copy link
Collaborator Author

CPBridge commented Mar 30, 2021

Thanks @darcymason. I think you are right that adding checks is probably impossible because there are ambiguous cases ("John Doe" could in principle be a double barrelled family name with a missing given name). I like the suggestion for from_componentsand will work on a pydicom PR for this. I imagine __str__ is sufficient as you say.

@CPBridge
Copy link
Collaborator Author

CPBridge commented Mar 30, 2021

@hackermd I think when a user import highdicom, we should probably set the pydicom.config.enforce_valid_values to True to make use of the checks that do exist in pydicom. What do you think? Maybe this should go into highdicom.__init__?

@CPBridge
Copy link
Collaborator Author

I like the approach, but would rely on pydicom.valuerep.PersonName instead. Would that work?

Highdicom callables could then have name: Union[str, pydicom.valuerep.PersonName] and we internally store the attributes using pydicom.valuerep.PersonName.

@hackermd the problem with this approach is that checks enforcing the validity of Person Name attributes is much harder (and probably impossible) than simply forcing people to use a constructor that makes them individually specify each component. E.g. it's much easier to make sure someone doesn't enter "John Doe" when they mean "Doe^Joe" by making them pass 'John' and 'Doe' separately than it is to try and work out after the fact that when they entered "John Doe" they actually meant "Doe^John". If we assume that it is impossible to implement checks at the pydicom level (even with the enfore_valid_values flag) and then if we do as you suggest and let people pass pydicom.valuerep.PersonName objects, then I imagine what will happen is that everyone will just use the main __init__ method for PersonName and do this:

dcm = Segmentation(
    ...
    content_creator_name=pydicom.valuerep.PersonName("Jonathan Horatio Doe-Smyth, 11th Earl of Dribbling"),
    ...
)

which is a perfectly valid way to create a pydicom.valuerep.PersonName, and we have essentially no way to check this, we're back at square one.

I would still be tempted to have a highdicom subclass that users of highdicom are required to use when constructing objects that forces users to go through a constructor with components, but then under the hood pass off the construction of the name string to as-yet non-existent functionality in pydicom.

@CPBridge
Copy link
Collaborator Author

See pydicom/pydicom#1331

@hackermd
Copy link
Collaborator

@hackermd I think when a user import highdicom, we should probably set the pydicom.config.enforce_valid_values to True to make use of the checks that do exist in pydicom. What do you think? Maybe this should go into highdicom.__init__?

Not sure whether this would be such a good idea. The library also provides utilities for reading data and we may not want those to fail with every compliance issue. The ideal behavior would in my opinion be restrictive upon dataset creation (writing) and permissive upon parsing (reading). It's unfortunate that this is a global variable and cannot be set for individual instances of pydicom.dataset.Dataset.

@hackermd
Copy link
Collaborator

I would still be tempted to have a highdicom subclass that users of highdicom are required to use when constructing objects that forces users to go through a constructor with components, but then under the hood pass off the construction of the name string to as-yet non-existent functionality in pydicom.

You convinced me. This is also in line with the overall approach we use throughout the library.

@CPBridge
Copy link
Collaborator Author

Not sure whether this would be such a good idea. The library also provides utilities for reading data and we may not want those to fail with every compliance issue. The ideal behavior would in my opinion be restrictive upon dataset creation (writing) and permissive upon parsing (reading). It's unfortunate that this is a global variable and cannot be set for individual instances of pydicom.dataset.Dataset.

Ah you're right, I didn't realise it would also affect files read in. That would have all sorts of unexpected knock on effects.

@CPBridge
Copy link
Collaborator Author

Maybe we could enable it for tests though?

@hackermd
Copy link
Collaborator

hackermd commented Mar 31, 2021

Maybe we could enable it for tests though?

Good idea!

(I accidentally closed the issue.)

@hackermd hackermd reopened this Mar 31, 2021
@hackermd
Copy link
Collaborator

hackermd commented Mar 31, 2021

Anyway, summarizing, I'd be happy to entertain pydicom PRs for stricter checks in conjunction with the config flag, and e.g. class methods or others that might be useful.

@darcymason thank you for your feedback! I wasn't aware of the pydicom.config.enforce_valid_values flag. Thank you for pointing it out. As elaborated in #56 (comment), I would prefer performing strict checks only upon creation of Datasets via highdicom constructor methods, but not when reading Datasets from files using the pydicom filereader methods. Unfortunately, that doesn't seem to be possible with the global configuration value. However, I would ultimately like to be able to rely on pydicom for such low-level routines. Do you see a way how we would implement such dual behavior (restrictive upon write, permissive upon read) in pydicom in backwards compatible manner? Could this be implemented via an optional parameter of the pydicom.dataset.Dataset constructor?

For example:

ds = Dataset(enforce_valid_values=True)
ds.PatientName = 'Jonathan Horatio Doe-Smyth, 11th Earl of Dribbling'   # raise ValueError

@CPBridge
Copy link
Collaborator Author

CPBridge commented Mar 31, 2021

ds.PatientName = 'Jonathan Horatio Doe-Smyth, 11th Earl of Dribbling' # raise ValueError

The PersonName attribute is a bad example of this because my silly Earl example is in principle a correct Person Name. It represents a person with a really long, complicated family name and no other parts to their name. It is only "incorrect" in so far as it clearly doesn't represent the intent of the user, but this cannot be checked for automatically. Therefore the line above raising ValueError is probably underirable. (On the highdicom side I think it might be reasonable to raise a warning if a value with no ^ characters is passed with a message that the PersonName is likely not what the user intended, but this may be too high level for pydicom)

However I think your approach may be useful for other VRs where the validity may be automatically determined (e.g. DS).

@hackermd
Copy link
Collaborator

Damn @CPBridge! You are a DICOM pro!🥇

Thanks for your clarification.

On the highdicom side I think it might be reasonable to raise a warning if a value with no ^ characters is passed with a message that the PersonName is likely not what the user intended, but this may be too high level for pydicom

I agree. I think we can/should be opinionated.

@darcymason
Copy link

Could this be implemented via an optional parameter of the pydicom.dataset.Dataset constructor?

For example:

ds = Dataset(enforce_valid_values=True)
ds.PatientName = 'Jonathan Horatio Doe-Smyth, 11th Earl of Dribbling'   # raise ValueError

I think that is reasonable. Perhaps a slightly different name might be used, rather than enforce_valid_values, to avoid confusion with the pydicom.config flag, although it is a very descriptive name and I can't think of a better one off-hand.

@hackermd
Copy link
Collaborator

hackermd commented Apr 1, 2021

I think that is reasonable. Perhaps a slightly different name might be used, rather than enforce_valid_values, to avoid confusion with the pydicom.config flag, although it is a very descriptive name and I can't think of a better one off-hand.

For dicomweb_client.uri.URI we called it permissive (False by default). Given that we need the opposite logic here, shall we call it strict?

It would then look as follows:

ds = Dataset()
ds.PatientName = f'{wrong_name}'   # no error raised

ds = Dataset(scrict=True)
ds.PatientName = f'{wrong_name}'   # raises ValueError

@darcymason
Copy link

I'm trying to remember - I know the term 'strict' came up before and we decided not to use that for some reason. Perhaps because it is vague - e.g. could be interpreted as ensuring that all required data elements are present, for example. However, I do like it for being short and reasonably intuitive - that or a similar term is what I would expect if I was learning a library.

I'd say go ahead with a pull request with that, and just make sure what exactly is "strict" is well-documented (e.g. in the class docstring). If needed, it is easy enough to update the term in the PR.

@hackermd
Copy link
Collaborator

hackermd commented Apr 1, 2021

I'd say go ahead with a pull request with that, and just make sure what exactly is "strict" is well-documented (e.g. in the class docstring). If needed, it is easy enough to update the term in the PR.

Great! We can further brainstorm about potential names in the meantime. I agree that we will be able to change the variable name easily and I like your idea of using a more specific name. We could use assert_vr or something along this line.

@CPBridge
Copy link
Collaborator Author

CPBridge commented Apr 7, 2021

My PR to add the functionality to construct PersonNames has now been merged into pydicom master branch: pydicom/pydicom#1331 and will be in the next release of pydicom (looks like this will be 2.2.0). Leaving this issue open to figure out the best way to integrate this into highdicom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants