Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DicomInputStream: correct invalid VR code for known tag #918

Closed
dj-shin opened this issue Mar 10, 2021 · 1 comment
Closed

DicomInputStream: correct invalid VR code for known tag #918

dj-shin opened this issue Mar 10, 2021 · 1 comment
Assignees
Milestone

Comments

@dj-shin
Copy link

dj-shin commented Mar 10, 2021

I'm trying to parse a dicom file from GE modality.
uk_tag
This is a screenshot of RadiAnt viewer reading the image.

This (0008, 0068) tag is presented as it has CS VR, but actually the VR is UK in this file.
(I don't know why this modality uses this weird VR)
The reason why this tag is interpreted as CS is that this specific tag is known to have CS VR

And I believe DicomInputStream also supports a same mechanism.

public void readAttributes(Attributes attrs, int len, Predicate<DicomInputStream> stopPredicate)
        throws IOException {
    ...
    while (undeflen || this.pos < endPos) {
        try {
            readHeader();
        } catch (EOFException e) {
          ...
        }
        ...
        if (vr != null) {
            if (vr == VR.UN) {
                vr = ElementDictionary.vrOf(tag,
                        attrs.getPrivateCreator(tag));
                ...
            }
           ...
        } else
            skipAttribute(UNEXPECTED_ATTRIBUTE);
    }
}

In this method it first reads the header, and when VR is unknown it searches from the dictionary of known tags.

But the problem is that during readHeader, it determines tag length based on the wrong VR.

public void readHeader() throws IOException {
    byte[] buf = buffer;
    tagPos = pos; 
    readFully(buf, 0, 8);
    encodedVR = 0;
    switch(tag = ByteUtils.bytesToTag(buf, 0, bigEndian)) {
    case Tag.Item:
    case Tag.ItemDelimitationItem:
    case Tag.SequenceDelimitationItem:
       vr = null;
       break;
    default:
        if (explicitVR) {  // true
            vr = VR.valueOf(encodedVR = ByteUtils.bytesToVR(buf, 4));  // should be CS, but null here
            if (vr != null && vr.headerLength() == 8) {
                // This should be run to get a proper length
                length = ByteUtils.bytesToUShort(buf, 6, bigEndian);
                return;
            }
            readFully(buf, 4, 4);
        } else {
            vr = VR.UN;
        }
    }
    // This is run instead, resulting an invalid length
    length = ByteUtils.bytesToInt(buf, 4, bigEndian);
    if (length < -1) {
        throw tagValueTooLargeException();
    }
}

It first reads 8 bytes and determines VR => UK. This is an unknown VR, and also this file is explicitVR so vr becomes null and reads 4 more bytes.
This tag's true VR is CS, so length should be length = ByteUtils.bytesToUShort(buf, 6, bigEndian) but what happens is it does not take that path, and instead falls through length = ByteUtils.bytesToInt(buf, 4, bigEndian), resulting to have a wrong length.
Not only calculating a wrong length, this also reads additional 4 bytes as header which should not be.

I think in order to properly resolve an unknown VR based on known tag, the way of reading input stream and calculating length should be fixed.

@gunterze
Copy link
Member

The current implementations follows PS 3.5, 7.1.2 Data Element Structure with Explicit VR (setting to bold by me):

  • for VRs of AE, AS, AT, CS, DA, DS, DT, FL, FD, IS, LO, LT, PN, SH, SL, SS, ST, TM, UI, UL and US the Value Length Field is the 16-bit unsigned integer following the two byte VR Field (Table 7.1-2). The value of the Value Length Field shall equal the length of the Value Field.

  • for all other VRs the 16 bits following the two byte VR Field are reserved for use by later versions of the DICOM Standard. These reserved bytes shall be set to 0000H and shall not be used or decoded (Table 7.1-1). The Value Length Field is a 32-bit unsigned integer.

Anyway, will verify, if I can implement auto-correction of wrong VR codes for known attributes without harm on the performance.

@gunterze gunterze self-assigned this Mar 21, 2021
@gunterze gunterze added this to the 5.23.2 milestone Mar 21, 2021
@gunterze gunterze changed the title DicomInputStream fails upon known tag with unknown VR DicomInputStream: correct invalid VR code for known tag Mar 21, 2021
gunterze added a commit that referenced this issue Mar 23, 2021
…ndian Transfer Syntax fix #929

DicomInputStream: correct invalid VR code for known tag #918
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants