Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: DCMTK & GDCM CanRead fails very slowly for non-DICOM files. #389

Merged
merged 1 commit into from Jan 14, 2019

Conversation

hjmjohnson
Copy link
Member

Some images take an extremely long time to load because of GDCMImageIO::CanReadFile
takes several minutes to fail. This is due to GDCM's default behavior of trying
to read non-compliant Dicom files that do not have the required DICM header.

This resolves #388.

NOTE: This does break backward compatibility for default reading of degenerate Dicom files through the factory mechanism. One can still read degenerate files by explicitly requesting the GDCM io modules.

@hjmjohnson hjmjohnson self-assigned this Jan 9, 2019
@hjmjohnson hjmjohnson added type:Bug Inconsistencies or issues which will cause an incorrect result under some or all circumstances type:Performance Improvement in terms of compilation or execution time area:IO Issues affecting the IO module labels Jan 9, 2019
@hjmjohnson
Copy link
Member Author

@malaterre

The following test was done using the head of gdcm tree.

Download this image: GIPL image that GDCM takes very long time to not read

NOTE Given a non-dicom image (png), gdcmdump fails very quickly

time ./gdcmdump ~/Dashboard/src/ITK-bld/ExternalData/Testing/Data/Baseline/Algorithms/VoronoiPartioningImageFilterTest1.2.png
Failed to read: /Users/johnsonhj/Dashboard/src/ITK-bld/ExternalData/Testing/Data/Baseline/Algorithms/VoronoiPartioningImageFilterTest1.2.png

real	0m0.076s

NOTE Given a different non-dicom image (gipl), gdcmdump fails very slowly

time ./gdcmdump ~/Downloads/15-1-1-brain.gipl 
Failed to read: /Users/johnsonhj/Downloads/15-1-1-brain.gipl

real	0m38.684s

I think it would be best to improve this in GDCM proper. I hope we can find an easy way to fail more quickly for identifying non-dicom images.

Can you give me some advice?

@hjmjohnson
Copy link
Member Author

@malaterre I made a more extensive change to test for "dicom ness" more extensively.

I'd love your review. Do you think there is a place where this can be pushed up to GDCM proper?

@lassoan
Copy link
Contributor

lassoan commented Jan 10, 2019

Thanks a lot for fixing this so quickly! I've tested this and it made the GDCM::CanReadFile now always return quickly.

However, now it turns out DCMTKFileReader::CanReadFile takes a long time, too. Probably due to the same issue. Would you be able to address that, too? Thank you!

@hjmjohnson
Copy link
Member Author

@lassoan The DCMTK case is different, and will require some more review.

@lassoan
Copy link
Contributor

lassoan commented Jan 11, 2019

Thank you for working on this!

A user reported that it worked well in Slicer-4.8.1 - with about a 1-2 year old ITK.

@hjmjohnson hjmjohnson changed the title PERF: GDCM CanRead function fail quickly for non-compliant DICOM files. WIP: GDCM CanRead function fail quickly for non-compliant DICOM files. Jan 13, 2019
@hjmjohnson
Copy link
Member Author

@thewtex Thanks for the review. I will try to revisit this today.

My plan is to make "GDCMImageIO::readNoPreambleDicom" a static function that is added to both GDCM and DCMTK as a preliminary check before each toolkit internal tools are used to try to read the images.

This new static function for ITK must pass an initial "smells like a dicom data set" heuristics before the actual (dcmtk or gdcm) libraries make their determinations.

Hans

NOTE: DCMTK and GDCM demonstrate the same behavior with certain
      non-DICOM files that have byte patterns similar to DICOM

Some images take extremely long time to load because
DCMTKImageIO::CanReadFile & GDCMImageIO::CanReadFile takes several
minutes to fail.  This is due to DCMTK's and GDCM's default behavior of
trying to read non-compliant dicom files that do not have the required
DICM header.

Add more extensive testing about the structure of the file to determine
if it looks like a dicom file.  Previous testing only looked to see if the
files without preables had values of 2 or 8 as the first byles of the file,
but that resulted in many false positives.

This implementation looks at all the SOP Instances that start with 2 or 8
to ensure that the proper dicom structure is found.

This resolves #388.
@hjmjohnson hjmjohnson changed the title WIP: GDCM CanRead function fail quickly for non-compliant DICOM files. PERF: DCMTK & GDCM CanRead fails very slowly for non-DICOM files. Jan 14, 2019
@hjmjohnson
Copy link
Member Author

@thewtex @lassoan I've added the same fix to both GDCM and DCMTK CanRead functions.

If the dashboard comes back clean, this should be good to go.

Hans

@hjmjohnson hjmjohnson merged commit 19fa58f into master Jan 14, 2019
@lassoan
Copy link
Contributor

lassoan commented Jan 15, 2019

This is great! Thank you @hjmjohnson for fixing this so quickly.

@thewtex thewtex deleted the gdcm-can-read-fail-faster branch January 16, 2019 04:36
@thewtex
Copy link
Member

thewtex commented Jan 18, 2019

@hjmjohnson It looks like this is causing a false positive on

https://open.cdash.org/testDetails.php?test=723709171&build=5713438

reading the file:

Testing/Data/Input/dicom-sc_cs-1.dcm

could you please take a look?

@hjmjohnson
Copy link
Member Author

It will need to be tomorrow before I can look. Today is crammed full already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:IO Issues affecting the IO module type:Bug Inconsistencies or issues which will cause an incorrect result under some or all circumstances type:Performance Improvement in terms of compilation or execution time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extermely slow loading of image files due to GDCMImageIO::CanReadFile
3 participants