Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk data Endianness not retained when converting from dicom to XML and back #849

Closed
quintonn opened this issue Dec 11, 2020 · 13 comments
Closed
Assignees
Labels
Milestone

Comments

@quintonn
Copy link

If I convert a dcm file to XML using the dcm4che library, and I set the setIncludeBulkData to IncludeBulkData.URI it saves the path to the original dcm file with an offset in the output XML.

But there is no bigEndian (true/false) sort of value.

The problem is, if I later try and convert the XML back to a dcm file, the output file is corrupted and the resulting images appear inverted (black and white colors switched).
Some image viewers have a mechanism to fix that, but I would have expected dcm to XML and XML to dcm to retain all information.

My code to convert DCM to XML is as follows:

public static void convertDicomToXml(File source, File dest, boolean includeBulkData) {
    DicomInputStream dis = null;
    OutputStream outputStream = null;
    try {
        SAXTransformerFactory tf = (SAXTransformerFactory)TransformerFactory.newInstance();
        TransformerHandler th = tf.newTransformerHandler();
        Transformer t = th.getTransformer();
        t.setOutputProperty(OutputKeys.INDENT, "yes");
        t.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

        ByteArrayOutputStream xml = new ByteArrayOutputStream();
        th.setResult(new StreamResult(xml));

        dis = new DicomInputStream(source);

        SAXWriter saxWriter = new SAXWriter(th);
        saxWriter.setIncludeKeyword(true);
        saxWriter.setIncludeNamespaceDeclaration(true);
        dis.setDicomInputHandler(saxWriter);
        if (includeBulkData == false) {
            dis.setIncludeBulkData(IncludeBulkData.URI);
        } else {
            dis.setIncludeBulkData(IncludeBulkData.YES);
        }
        dis.readDataset(-1, -1);
        outputStream = new FileOutputStream(dest);
        xml.writeTo(outputStream);

    } catch (Exception e) {
        e.printStackTrace();
    }
    finally {
        SafeClose.close(dis);
        SafeClose.close(outputStream);
    }
}

And my code to convert XML back into DCM is as follows:

public void test() {
            final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            final SAXParserFactory saxFactory = SAXParserFactory.newInstance();

	    factory.setValidating(false);
	    factory.setIgnoringElementContentWhitespace(true);

	    final DocumentBuilder builder = factory.newDocumentBuilder();
	    final Document doc = builder.parse(xml);
            final InputStream xmlStream = new ByteArrayInputStream(xml.getBytes());

           Attributes dataset = new Attributes();
           final ContentHandlerAdapter contentHandler = new ContentHandlerAdapter(dataset);
           final SAXParser parser = saxFactory.newSAXParser();
	   parser.parse(xmlStream, contentHandler);

           Attributes fmi = contentHandler.getFileMetaInformation();

          copyDicomToFile(new File("......tempFile.dcm"), fmi, dataset);
}

       public static void copyDicomToFile(final File dest, final Attributes fmi, final Attributes dataset) throws IOException {
		final DicomOutputStream out = new DicomOutputStream(dest);
		out.writeDataset(fmi, dataset);
		SafeClose.close(out);
	}

To Reproduce
Steps to reproduce the behavior:

  1. Convert a BigEndian DCM file to XML while setting setIncludeBulkData(IncludeBulkData.URI)
  2. Convert XML file back to DCM
  3. Open the new DCM file with an Image Viewer
  4. The image will now be inverted

Expected behavior
The final DCM image should not be inverted

I notice in the ContentHandlerAdapter class that when it reads the bulk data element, it sets its bigEndian value based on items.getLast().bigEndian().
I couldn't figure out why or what items.getLast() would be at this point, but this is where I would sort of expect it to read the big endian value from an attribute that was saved in the XML.
I tested my theory by creating my own SAXWriter and ContentHandlerAdapter classes where I set and read a bigEndian attribute on the BulkData element and it worked for my case.

I would be happy to create a PR for this, but not sure if that is the correct solution.
Maybe I'm missing something more obvious.

Thanks in advance.

@quintonn
Copy link
Author

quintonn commented Dec 11, 2020

I did notice that the ContentHandlerAdapter uses the bigEndianness of the Attributes element passed into its constructor, so I tried setting that value based on the transfer syntax.
But that doesn't work because in the startItem method it creates a new Attributes item and when it tries to add it to the sequence, it throws an error because its endianness is not the same as its parent, because in that place it does not pass in the same bigEndian value that was passed into the ContentHandlerAdapter constructor.
But if I do pass in this.bigEndian at that point, it works too. (in my own content handler adapter class)

@gunterze
Copy link
Member

You have to include the File Meta Information attributes - particularly (0002,0010) TransferSyntaxUID - into the XML representation!

s. dcm2xml utility

$ dcm2xml -I  /tmp/src.dcm
<?xml version="1.0" encoding="UTF-8"?><NativeDicomModel xml:space="preserve">
    <DicomAttribute keyword="FileMetaInformationVersion" tag="00020001" vr="OB">
        <InlineBinary>AAE=</InlineBinary>
    </DicomAttribute>
    <DicomAttribute keyword="MediaStorageSOPClassUID" tag="00020002" vr="UI">
        <Value number="1">1.2.840.10008.5.1.4.1.1.6.1</Value>
    </DicomAttribute>
    <DicomAttribute keyword="MediaStorageSOPInstanceUID" tag="00020003" vr="UI">
        <Value number="1">1.2.840.1136190195280574824680000700.3.0.1.19970424140438</Value>
    </DicomAttribute>
    <DicomAttribute keyword="TransferSyntaxUID" tag="00020010" vr="UI">
        <Value number="1">1.2.840.10008.1.2.2</Value>
    </DicomAttribute>
...
$ dcm2xml /tmp/src.dcm | xml2dcm -o /tmp/out.dcm -x -
$ dcmdump -w110 /tmp/out.dcm 
0: [0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
132: (0002,0000) UL #4 [204] FileMetaInformationGroupLength
144: (0002,0001) OB #2 [0\1] FileMetaInformationVersion
158: (0002,0002) UI #28 [1.2.840.10008.5.1.4.1.1.6.1] MediaStorageSOPClassUID
194: (0002,0003) UI #58 [1.2.840.1136190195280574824680000700.3.0.1.19970424140438] MediaStorageSOPInstanceUID
260: (0002,0010) UI #20 [1.2.840.10008.1.2.2] TransferSyntaxUID
...

@quintonn
Copy link
Author

it is in the XML:

<DicomAttribute keyword="TransferSyntaxUID" tag="00020010" vr="UI"> <Value number="1">1.2.840.10008.1.2.2</Value> </DicomAttribute>

@quintonn
Copy link
Author

I don't see any code in the ContentHandlerAdapter that reads the transfer syntax from the XML though

@gunterze gunterze reopened this Dec 11, 2020
@gunterze
Copy link
Member

Could reproduce the issue.

@quintonn
Copy link
Author

Is the expectation that I submit a PR for this bug or will someone who is more familiar with the library be able to take a look ??

We have a client that is blocked by this bug and in the short term I might have to provide a custom ContentHandlerAdapter in my code using the work-around that I've found.
But I don't think my work around properly solves the problem because it doesn't set the bigEndian value of the resulting Attributes value after parsing the XML, it only encodes the bulkData value so it displays in the image viewer.

I can attempt to provide a proper solution and PR if someone could give me some pointers though.

@gunterze
Copy link
Member

gunterze commented Dec 12, 2020

Fixed by permitting to pass null as Attributes attrs to

    public ContentHandlerAdapter(Attributes attrs) {
        this(attrs, false);
    }

    public ContentHandlerAdapter(Attributes attrs, boolean lenient) {
...

and - if so - defer creation of Attributes instance after parsing File Meta Information, so it's bigEndian value can reflect the value of previous parsed

<DicomAttribute keyword="TransferSyntaxUID" tag="00020010" vr="UI">
 <Value number="1">1.2.840.10008.1.2.2</Value>
</DicomAttribute>

Add

    public Attributes getDataset() {
        return items.getFirst();
    }

to access created Attributes instance by the client after parsing.

@gunterze gunterze self-assigned this Dec 12, 2020
@gunterze gunterze added the bug label Dec 12, 2020
@gunterze gunterze added this to the 5.23.0 milestone Dec 12, 2020
@quintonn
Copy link
Author

thanks

@quintonn
Copy link
Author

Hi @gunterze,

I have loaded the 5.23.0 release and trying my code but it looks like I'm still running into the same issue.

I am initializing my ContentHandlerAdapter like this now:

final ContentHandlerAdapter contentHandler = new ContentHandlerAdapter(null, false);

I am getting this error:

java.lang.IllegalArgumentException: Endian of Item must match Endian of parent Data Set
	at org.dcm4che3.data.Attributes.setParent(Attributes.java:260)
	at org.dcm4che3.data.Sequence.add(Sequence.java:111)
	at org.dcm4che3.io.ContentHandlerAdapter.startItem(ContentHandlerAdapter.java:197)
	at org.dcm4che3.io.ContentHandlerAdapter.startElement(ContentHandlerAdapter.java:116)

@gunterze gunterze reopened this Dec 15, 2020
@gunterze
Copy link
Member

gunterze commented Dec 15, 2020

Should be fixed... (replaced binary distribution with fixed version and moved tag 5.23.0 in git)

@quintonn
Copy link
Author

I am testing with the 5.23.0 jar files right now. And it's not working, I am getting the error I pasted above.

I am trying to look at this library's source code to understand if it's something I'm doing wrong.
But from the error I understand that when I have an Attribute object, any child Attribute I add to it, must have the same bigEndian value.

But I can see if I pass in NULL into the ContentHandlerAdapter constructor, the first time a new Attribute object is added to the items variable, is probably inside startItem. But that will add an Attribute with bigEndian set to false, and it's a final field, so it's not going to change again.

So according to my understanding, the ContentHandlerAdapter needs to know the bigEndian value before it starts adding Attribute values to the items field.
And the only way to find that out, is by checking the following element in the XML:

<DicomAttribute keyword="TransferSyntaxUID" tag="00020010" vr="UI">

Also, when it is time to parse the bulkData element, it uses the bigEndian value of the previous element.
So according to my logic, this will always be false if I passed in null in the constructor.

Or am I missing something?

@quintonn
Copy link
Author

Oh, i see you made a change. Ok, i'll download it from sourceforge and try again...

@quintonn
Copy link
Author

ok that seems to work now.
Thank you very much.
Really appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants