Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: MeanImageFilter faster, using BufferedImageNeighborhoodPixelAccessPolicy #1365

Merged
merged 2 commits into from Oct 31, 2019
Merged

PERF: MeanImageFilter faster, using BufferedImageNeighborhoodPixelAccessPolicy #1365

merged 2 commits into from Oct 31, 2019

Conversation

N-Dekker
Copy link
Contributor

Improved the performance MeanImageFilter, using ShapedImageNeighborhoodRange with a newly added pixel access policy, BufferedImageNeighborhoodPixelAccessPolicy. Tested performance by:

#include <itkMeanImageFilter.h>
#include <itkImage.h>
#include <chrono>
#include <iostream>

int main()
{
  using ImageType = itk::Image<char, 3>;
  const auto image = ImageType::New();
  const auto size = ImageType::SizeType{ 1024, 1024, 16 };
  image->SetRegions(size);
  image->Allocate(true);

  const auto filter = itk::MeanImageFilter<ImageType, ImageType>::New();

  for (int i{}; i < 8; ++i)
  {
    filter->SetInput(nullptr);
    filter->SetInput(image);

    using namespace std::chrono;
    const auto timePointBeforeUpdate = high_resolution_clock::now();

    filter->Update();

    const auto timePointAfterUpdate = high_resolution_clock::now();

    const auto durationSeconds =
      duration_cast<duration<double>>(timePointAfterUpdate - timePointBeforeUpdate);
    std::cout
      << " Duration: " << durationSeconds.count() << " second(s)"
      << std::endl;
  }
}

Copy link
Member

@dzenanz dzenanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@dzenanz
Copy link
Member

dzenanz commented Oct 28, 2019

Does median filter have similar implementation pattern? Can this approach be applied to it?

@blowekamp
Copy link
Member

blowekamp commented Oct 28, 2019

Does median filter have similar implementation pattern? Can this approach be applied to it?

The median filter should be exploiting coherence of a sorted neighborhood, so the method should be quite difference. Come to think of it the mean could be exploit a moving average as well to reduce repeated adjacent computation.

EDIT: And there are there trade off with exploiting the separability of the mean too.

@blowekamp
Copy link
Member

As for the PR, are there any asserts of validations of the buffered region assumptions? or will be just get "undefined" behavior when the assumptions are not met?

@N-Dekker
Copy link
Contributor Author

Thank you both for your comments!

@dzenanz

Does median filter have similar implementation pattern? Can this approach be applied to it?

Yes it does 😄 I just did MeanImageFilter with this PR, because it's slightly easier than MedianImageFilter. I do think that if this approach also appears applicable on other places, some extra utility could be helpful to limit the amount of copy-and-paste boilerplate code.

@blowekamp

As for the PR, are there any asserts of validations of the buffered region assumptions? or will be just get "undefined" behavior when the assumptions are not met?

An attempt to access a location outside the buffered region, using the proposed new pixel access policy, will produce undefined behavior, indeed. The calculation of the index value into the image buffer does not do bounds checking:

That's really a fundamental property of this new pixel access policy. It should be as fast as possible. Within MeanImageFilter, it is essential that it is only being used with the first region produces by ImageBoundaryFacesCalculator, which is the center (non-boundary) region.

Some more profiling results from VS2017 Release:

image size duration before PR duration after PR
256x256x256 0.26 sec 0.15 sec
16x1024x1024 0.49 sec 0.16 sec
1024x1024x16 1.4 (!!!) sec 0.2 sec

Could you please check if you can observe similar performance improvements from this PR?

@blowekamp
Copy link
Member

I hate always being the nay sayer. But the checks for these assumptions (at least in debug mode) is very important. I have spent a lot of time debugging filters and pipeline issues. With the removal of these safety checks it will make development and maintenance even more difficult and time consuming than it is.

@blowekamp
Copy link
Member

blowekamp commented Oct 29, 2019

I would suggest looking at the ConvolutionImageFilter next. There is easily room for 10x-100x performance improvement there.

There are faster implementations of the Mean and Median filters in other implementations in ITK.

@dzenanz
Copy link
Member

dzenanz commented Oct 29, 2019

MeanImageFilter being slightly easier than MedianImageFilter was my guess - I am glad to hear I was right 😄

But the checks for these assumptions (at least in debug mode)

itkAssertInDebugAndIgnoreInReleaseMacro or a regular C++ assert should work well. Of course, it will make debug mode somewhat slower, but it will be much easier to spot an error when it is introduced.

With included test, before: 0.3 sec, with this PR: 0.15 sec. 2x speedup, nice!

@N-Dekker
Copy link
Contributor Author

Thanks for your comments @blowekamp

There are faster implementations of the Mean and Median filters in other implementations in ITK.

Which faster implementation of the Mean filter do you mean?

Is the current (v5.1b01) itk::MeanImageFilter still being used, or is it outdated? It looks like my PR would make it twice as fast. Do you think that's worth the effort?

@blowekamp
Copy link
Member

Which faster implementation of the Mean filter do you mean?

The BoxMeanImageFilter performs better for higher dimension images and larger radius as it uses a moving accumulation. This implementation likely performs better for 2-D and smaller radius.

Is the current (v5.1b01) itk::MeanImageFilter still being used, or is it outdated? It looks like my PR would make it twice as fast. Do you think that's worth the effort?

ITK is not about THE implementation for an algorithm. Many algorithms have multiple implementations which perform better in different situations. Others are frequent exemplar of how to do common operations in a generalized templated way. This filter I consider an exemplar filter for doing neighborhood operations, as well as giving good performance for small radius. It is most certainly worth the effort and is a great contribution.

Also do you know if these iterators work well with VectorImages? This filter did not work with them before.

@N-Dekker
Copy link
Contributor Author

@dzenanz

itkAssertInDebugAndIgnoreInReleaseMacro or a regular C++ assert should work well.

Just added an assert, please check:

assert((pixelIndexValue >= 0) && (static_cast<SizeValueType>(pixelIndexValue) < imageSize[i]));

Copy link
Member

@dzenanz dzenanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but CI has qualms with it.

@N-Dekker
Copy link
Contributor Author

Looks good, but CI has qualms with it.

Thanks @dzenanz I see at https://open.cdash.org/viewBuildError.php?type=1&buildid=6181856

itkBufferedImageNeighborhoodPixelAccessPolicy.h:76:50: warning: unused parameter 'imageSize' [-Wunused-parameter]

Even though I did itkNotUsed(imageSize)! I guess I have not used itkNotUsed correctly 😺 Instead, I'll force-push an amend to just do a silly void-cast, (void)imageSize;, to silence the warning... OK?

@dzenanz
Copy link
Member

dzenanz commented Oct 30, 2019

What happens if you use itkAssertInDebugAndIgnoreInReleaseMacro instead of assert? If that doesn't help, you could apply this trick:

#if !defined(ITK_LEGACY_REMOVE)
region
#endif
,

@N-Dekker
Copy link
Contributor Author

@dzenanz

What happens if you use itkAssertInDebugAndIgnoreInReleaseMacro instead of assert?

Unfortunately, same warnings still there, when using itkAssertInDebugAndIgnoreInReleaseMacro.

If that doesn't help, you could apply this trick...

OK! (But then of course: #if !defined(NDEBUG)...!)

Added `BufferedImageNeighborhoodPixelAccessPolicy`, for faster pixel
access during neighborhood iteration.

Unlike `ConstantBoundaryImageNeighborhoodPixelAccessPolicy` and
`ZeroFluxNeumannImageNeighborhoodPixelAccessPolicy`, this new pixel
access policy for `ShapedImageNeighborhoodRange` assumes that the pixel
index of the selected neighbor is always within the buffered region of
the image.
Improved the performance of `MeanImageFilter::Update`
by replacing its local `itk::ConstNeighborhoodIterator` variable by
an `Experimental::ShapedImageNeighborhoodRange` variable, using the new
`BufferedImageNeighborhoodPixelAccessPolicy`.

~40% reduction of the run-time duration was observed on a
`filter->Update()` call, using VS2017, processing a 256x256x256 image.
For a 1024x1024x16 image, the performance improvement appeared even higher.
@N-Dekker
Copy link
Contributor Author

@blowekamp

The BoxMeanImageFilter performs better for higher dimension images and larger radius as it uses a moving accumulation. This implementation likely performs better for 2-D and smaller radius.

Indeed: I just observed that with the "default" radius value (1), MeanImageFilter (with my proposed PR) appears significantly faster than BoxMeanImageFilter, even for 3-D... For a larger radius, I get mixed results: sometimes one is faster, sometimes the other.

The output of BoxMeanImageFilter is also slightly different than MeanImageFilter. It appears that BoxMeanImageFilter produces different values for border pixels. Is that a known issue?

Also do you know if these iterators work well with VectorImages? This filter did not work with them before.

The iterator ranges from Experimental should work well with VectorImage. Looks like the incompatibility is caused by sum = NumericTraits<InputRealType>::ZeroValue(), in the filter itself: https://github.com/InsightSoftwareConsortium/ITK/blob/v5.1b01/Modules/Filtering/Smoothing/include/itkMeanImageFilter.hxx#L73

With this PR, I did not want to change that logic, because this PR is intended to just improve the performance of existing use cases. If you want to add VectorImage support to this filter, can you please make it a separate PR? Of course, I would like it if my PR (#1365) could be handled first 😄

@N-Dekker N-Dekker changed the title WIP: MeanImageFilter much faster, using BufferedImageNeighborhoodPixelAccessPolicy PERF: MeanImageFilter faster, using BufferedImageNeighborhoodPixelAccessPolicy Oct 30, 2019
@N-Dekker N-Dekker marked this pull request as ready for review October 30, 2019 20:01
@N-Dekker
Copy link
Contributor Author

Succeeded on ITK.Linux, ITK.Windows, and ITK.macOS 🎉

The ITK.Linux.Python failures at the dashboard seem unrelated to this PR:

AttributeError: module 'ITKMathematicalMorphology' has no attribute 'swig'
itkTestDriver: Process exited with return value: 1

https://open.cdash.org/testDetails.php?test=811772611&build=6183942

@blowekamp blowekamp self-requested a review October 31, 2019 12:58
@dzenanz dzenanz merged commit b657c12 into InsightSoftwareConsortium:master Oct 31, 2019
for (ImageDimensionType i = 0; i < ImageDimension; ++i)
{
const auto pixelIndexValue = pixelIndex[i];
assert((pixelIndexValue >= 0) && (static_cast<SizeValueType>(pixelIndexValue) < imageSize[i]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzenanz
Copy link
Member

dzenanz commented Nov 6, 2019

The assertion also gets triggered here, by a test you wrote @N-Dekker. So much about relying on people being careful, or however you phrased it Niels 😄

@N-Dekker
Copy link
Contributor Author

N-Dekker commented Nov 7, 2019

The assertion also gets triggered here, by a test you wrote @N-Dekker.

Thanks @dzenanz. I think it's related to issue #1397 (Valgrind Defects related to the MedianImageFilter), which should be fixed by pull request #1398 (BUG: Fix IndexRange for itk::Size that has one or more zero values)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants