Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain exception in dicomdump output for character '\' #225

Closed
malaterre opened this issue Jan 22, 2024 · 1 comment
Closed

Explain exception in dicomdump output for character '\' #225

malaterre opened this issue Jan 22, 2024 · 1 comment

Comments

@malaterre
Copy link
Contributor

Could you please add a section in dicomdump --help output to clarify notation for backslash character (\134). thanks !

@dgobbi
Copy link
Owner

dgobbi commented Jul 6, 2024

For the new release (0.8.16), it's documented on the wiki and in the --help output.

On the wiki:

Any bytes that are not part of a printable character are printed with a backslash followed by an octal code. For example, control characters like <CR>, <NL>, and <FF> are printed as \015, \012, and \014 respectively. In well-formed DICOM data, these characters are only present when the VR is ST, LT, and UT. For these VRs, when dicomdump prints \134 (octal for backslash), this indicates that a backslash was present in the original text.

In dicomdump --help:

For text attribute values, any unprintable bytes will be replaced with the four characters "\nnn", where "nnn" is the three digit octal code for the byte. Unprintable bytes are control characters or bytes that cannot be decoded with the SpecificCharacterSet of the DICOM file. A backslash itself will be replaced by its byte value "\134" if the VR is ST, LT or UT (that is, any VR where backslash isn't used as a separator for multi-valued attributes).

My original plan was to use the shorter codes where possible, e.g. \r, \n, \f, \t, \\, but decided on octal because of this text in DICOM Part 5 6.1.2.3 Encoding of Character Repertoires:

Implementations may also encounter Control Characters that they have no means to print or display. The machine may print or display such Control Characters by replacing the Control Character with the four characters "\nnn", where "nnn" is the three digit octal representation of each byte.

But if I followed this, there is no way to tell whether '\015' means a carriage return was present in the value, or whether the raw text '\015` was present in the value. Hence backslash itself must be escaped. Which is a hardly a perfect solution, since backslash already has a special meaning DICOM.

@dgobbi dgobbi closed this as completed Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants