New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: DCMTK & GDCM CanRead fails very slowly for non-DICOM files. #389
Conversation
f3a5a3d
to
c556a59
Compare
The following test was done using the head of gdcm tree. Download this image: GIPL image that GDCM takes very long time to not read NOTE Given a non-dicom image (png), gdcmdump fails very quickly
NOTE Given a different non-dicom image (gipl), gdcmdump fails very slowly
I think it would be best to improve this in GDCM proper. I hope we can find an easy way to fail more quickly for identifying non-dicom images. Can you give me some advice? |
c556a59
to
ed557ee
Compare
@malaterre I made a more extensive change to test for "dicom ness" more extensively. I'd love your review. Do you think there is a place where this can be pushed up to GDCM proper? |
Thanks a lot for fixing this so quickly! I've tested this and it made the GDCM::CanReadFile now always return quickly. However, now it turns out DCMTKFileReader::CanReadFile takes a long time, too. Probably due to the same issue. Would you be able to address that, too? Thank you! |
@lassoan The DCMTK case is different, and will require some more review. |
Thank you for working on this! A user reported that it worked well in Slicer-4.8.1 - with about a 1-2 year old ITK. |
@thewtex Thanks for the review. I will try to revisit this today. My plan is to make "GDCMImageIO::readNoPreambleDicom" a static function that is added to both GDCM and DCMTK as a preliminary check before each toolkit internal tools are used to try to read the images. This new static function for ITK must pass an initial "smells like a dicom data set" heuristics before the actual (dcmtk or gdcm) libraries make their determinations. Hans |
NOTE: DCMTK and GDCM demonstrate the same behavior with certain non-DICOM files that have byte patterns similar to DICOM Some images take extremely long time to load because DCMTKImageIO::CanReadFile & GDCMImageIO::CanReadFile takes several minutes to fail. This is due to DCMTK's and GDCM's default behavior of trying to read non-compliant dicom files that do not have the required DICM header. Add more extensive testing about the structure of the file to determine if it looks like a dicom file. Previous testing only looked to see if the files without preables had values of 2 or 8 as the first byles of the file, but that resulted in many false positives. This implementation looks at all the SOP Instances that start with 2 or 8 to ensure that the proper dicom structure is found. This resolves #388.
ed557ee
to
ecbdfbe
Compare
This is great! Thank you @hjmjohnson for fixing this so quickly. |
@hjmjohnson It looks like this is causing a false positive on https://open.cdash.org/testDetails.php?test=723709171&build=5713438 reading the file: Testing/Data/Input/dicom-sc_cs-1.dcm could you please take a look? |
It will need to be tomorrow before I can look. Today is crammed full already. |
Some images take an extremely long time to load because of GDCMImageIO::CanReadFile
takes several minutes to fail. This is due to GDCM's default behavior of trying
to read non-compliant Dicom files that do not have the required DICM header.
This resolves #388.
NOTE: This does break backward compatibility for default reading of degenerate Dicom files through the factory mechanism. One can still read degenerate files by explicitly requesting the GDCM io modules.