-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highdicom allows incorrectly formatted "person name" (PN) attributes #56
Comments
For reference: https://support.dcmtk.org/docs/classDcmPersonName.html |
I think it would be ideal if we could address that at the pydicom level. I realized that there are several issues with VRs (DS, PN, DA, etc.) and that pydicom does not enforce correct value representation. @darcymason would you be open to changing the pydicom approach and add functionality for asserting correct value representation (max value length, set of characters, etc.)? |
I like the approach, but would rely on pydicom.valuerep.PersonName instead. Would that work? Highdicom callables could then have @CPBridge What do you think? |
Traditionally pydicom has avoided being overly strict because there are so many files with invalid DICOM that it causes a lot of troubles (just take a look at the issue list). We have put some checks in place over time, though, in conjunction with the I'd be open to PRs adding such DICOM validity checks to PersonName (when the config flag is
Which I think means the example "John Doe" is actually valid, so this can be a bit tricky, as noted.
I think this could be a good class method on the pydicom PersonName class, if you wish to add it:
If there is something needed beyond the Anyway, summarizing, I'd be happy to entertain pydicom PRs for stricter checks in conjunction with the config flag, and e.g. class methods or others that might be useful. |
Thanks @darcymason. I think you are right that adding checks is probably impossible because there are ambiguous cases ("John Doe" could in principle be a double barrelled family name with a missing given name). I like the suggestion for |
@hackermd I think when a user import highdicom, we should probably set the |
@hackermd the problem with this approach is that checks enforcing the validity of Person Name attributes is much harder (and probably impossible) than simply forcing people to use a constructor that makes them individually specify each component. E.g. it's much easier to make sure someone doesn't enter "John Doe" when they mean "Doe^Joe" by making them pass 'John' and 'Doe' separately than it is to try and work out after the fact that when they entered "John Doe" they actually meant "Doe^John". If we assume that it is impossible to implement checks at the pydicom level (even with the dcm = Segmentation(
...
content_creator_name=pydicom.valuerep.PersonName("Jonathan Horatio Doe-Smyth, 11th Earl of Dribbling"),
...
) which is a perfectly valid way to create a I would still be tempted to have a highdicom subclass that users of highdicom are required to use when constructing objects that forces users to go through a constructor with components, but then under the hood pass off the construction of the name string to as-yet non-existent functionality in pydicom. |
Not sure whether this would be such a good idea. The library also provides utilities for reading data and we may not want those to fail with every compliance issue. The ideal behavior would in my opinion be restrictive upon dataset creation (writing) and permissive upon parsing (reading). It's unfortunate that this is a global variable and cannot be set for individual instances of |
You convinced me. This is also in line with the overall approach we use throughout the library. |
Ah you're right, I didn't realise it would also affect files read in. That would have all sorts of unexpected knock on effects. |
Maybe we could enable it for tests though? |
Good idea! (I accidentally closed the issue.) |
@darcymason thank you for your feedback! I wasn't aware of the For example: ds = Dataset(enforce_valid_values=True)
ds.PatientName = 'Jonathan Horatio Doe-Smyth, 11th Earl of Dribbling' # raise ValueError |
The PersonName attribute is a bad example of this because my silly Earl example is in principle a correct Person Name. It represents a person with a really long, complicated family name and no other parts to their name. It is only "incorrect" in so far as it clearly doesn't represent the intent of the user, but this cannot be checked for automatically. Therefore the line above raising ValueError is probably underirable. (On the highdicom side I think it might be reasonable to raise a warning if a value with no ^ characters is passed with a message that the PersonName is likely not what the user intended, but this may be too high level for pydicom) However I think your approach may be useful for other VRs where the validity may be automatically determined (e.g. DS). |
Damn @CPBridge! You are a DICOM pro!🥇 Thanks for your clarification.
I agree. I think we can/should be opinionated. |
I think that is reasonable. Perhaps a slightly different name might be used, rather than |
For dicomweb_client.uri.URI we called it It would then look as follows: ds = Dataset()
ds.PatientName = f'{wrong_name}' # no error raised
ds = Dataset(scrict=True)
ds.PatientName = f'{wrong_name}' # raises ValueError |
I'm trying to remember - I know the term 'strict' came up before and we decided not to use that for some reason. Perhaps because it is vague - e.g. could be interpreted as ensuring that all required data elements are present, for example. However, I do like it for being short and reasonably intuitive - that or a similar term is what I would expect if I was learning a library. I'd say go ahead with a pull request with that, and just make sure what exactly is "strict" is well-documented (e.g. in the class docstring). If needed, it is easy enough to update the term in the PR. |
Great! We can further brainstorm about potential names in the meantime. I agree that we will be able to change the variable name easily and I like your idea of using a more specific name. We could use |
My PR to add the functionality to construct PersonNames has now been merged into pydicom master branch: pydicom/pydicom#1331 and will be in the next release of pydicom (looks like this will be 2.2.0). Leaving this issue open to figure out the best way to integrate this into highdicom |
There are a few places where the highdicom API requests
str
parameters and directly encodes them as attributes with value representation PN (person name). This includes but (but may not be limited to) thecontent_creator_name
parameter of the segmentation SOP constructor, and theverifying_observer_name
parameter of theEnhancedSR
,ComprehensiveSR
, andComprehensive3DSR
constructors.The format of a PN attribute is quite specific - you can't just enter free text here. See the PN entry in this table. Briefly, for human names there are five fields (family name, given name, middle name, name prefix, name suffix) that should be separated by caret characters (
^
). See also the examples in the standard.Unfortunately, no attempt is made at the pydicom level to enforce or check correct formatting. This propagates to highdicom. Therefore there is no checking or enforcement on these in highdicom, nor any documentation that there is even a format that should be followed. I suspect that the result is that the vast majority of users will pass "John Doe" instead of "Doe^John" and end up with incorrectly formatted attributes.
I consider these formatting details to be far lower level than users of highdicom should have to understand in order to create objects with correctly formatted PN attributes.
I am happy to work on a solution. Here are a few options that come to mind:
highdicom.content
module or a newhighdicom.vr
module perhaps) with a constructor that takes the five parts of the name (family name, given name, middle name, name prefix, name suffix), any of which can be None, and has a method that returns the correctly formatted string. Then change the API of the various parts of the code expecting person names as string to instead expectPersonName
objects.@hackermd thoughts?
The text was updated successfully, but these errors were encountered: