Faster Linear Transforms #1591

JimBobSquarePants · 2021-04-06T03:25:50Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

I haven't been happy with the existing linear transform code since I originally wrote it so thought I'd have another crack.

Benchmarks are very good. Large speedup on all targets with a 2x speedup on .NET Core 3.1+. However they don't tell the whole story. Originally two 2D buffers were rented from the pool of DestinationHeight x MaxDiameter to store kernel weight buffers but, since the kernel weights must be calculated based on the exact transformed subpixel point, it was pointless to do so. Those buffers are now gone.

In addition I discovered a rounding issue where the incorrect scaled filter radius was being calculated which lead to rogue pixels containing RGB values but 0 alpha to be generated outside transform edges. I've updated two references to reflect the improved accuracy.

I consider readability much improved now also. It's much easier to understand how the transform operation actually works.

Ignore individual commits during review, I generated a lot of noise during my experimentation (basically properly learning how the process actually worked 😇) so I'll be squashing to a single commit on merge.

Benchmarks

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
  Job-WMQSXI : .NET Framework 4.8 (4.8.4341.0), X64 RyuJIT
  Job-ZPQJBS : .NET Core 2.1.26 (CoreCLR 4.6.29812.02, CoreFX 4.6.29812.01), X64 RyuJIT
  Job-LHSZUP : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT

ROTATE

BEFORE

Method	Job	Runtime	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
DoRotate	Job-OBWWBY	.NET 4.7.2	44.56 ms	0.464 ms	0.411 ms	-	-	-	11.33 KB
DoRotate	Job-MMNLYB	.NET Core 2.1	19.18 ms	0.320 ms	0.355 ms	-	-	-	5.91 KB
DoRotate	Job-UKOETF	.NET Core 3.1	24.90 ms	0.328 ms	0.274 ms	-	-	-	7.23 KB

AFTER

Method	Job	Runtime	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
DoRotate	Job-BAUEPW	.NET 4.7.2	30.73 ms	0.397 ms	0.331 ms	-	-	-	6.75 KB
DoRotate	Job-SNWMCN	.NET Core 2.1	16.31 ms	0.317 ms	0.352 ms	-	-	-	5.25 KB
DoRotate	Job-MRMBJZ	.NET Core 3.1	12.21 ms	0.239 ms	0.245 ms	-	-	-	6.61 KB

SKEW

BEFORE

Method	Job	Runtime	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
DoSkew	Job-GBTORI	.NET 4.7.2	37.08 ms	0.463 ms	0.433 ms	-	-	-	10.29 KB
DoSkew	Job-RAEJFO	.NET Core 2.1	17.27 ms	0.187 ms	0.175 ms	-	-	-	5.85 KB
DoSkew	Job-LFXFDP	.NET Core 3.1	19.34 ms	0.149 ms	0.140 ms	-	-	-	7.17 KB

AFTER

Method	Job	Runtime	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
DoSkew	Job-YEGFRQ	.NET 4.7.2	23.563 ms	0.0731 ms	0.0570 ms	-	-	-	6.75 KB
DoSkew	Job-HZHOGR	.NET Core 2.1	13.700 ms	0.2727 ms	0.5122 ms	-	-	-	5.25 KB
DoSkew	Job-LTEUKY	.NET Core 3.1	9.971 ms	0.0254 ms	0.0225 ms	-	-	-	6.61 KB

JimBobSquarePants · 2021-04-06T03:29:46Z

src/ImageSharp/Processing/Processors/Transforms/Linear/LinearTransformUtils.cs

-        internal static int GetSamplingRadius<TResampler>(in TResampler sampler, int sourceSize, int destinationSize)
-            where TResampler : struct, IResampler
-        {
-            double scale = sourceSize / destinationSize;


Spot the rounding error 😄

JimBobSquarePants · 2021-04-06T12:56:33Z

@antonfirsov I'm gonna need some help here. MacOS .NET 3.1 was showing a 4% error on projective transforms but I couldn't pull down the test output because it was incomplete. (DebugSave disabled on CI). I enabled saving and now I'm getting a bunch of MemoyAllocator errors. Any idea?

antonfirsov · 2021-04-06T14:00:17Z

DebugSave uses image.GetConfiguration().MemoryAllocator which is artificially limited to very small buffers to test discontiguous decoding:

ImageSharp/tests/ImageSharp.Tests/Formats/Png/PngDecoderTests.cs

Lines 414 to 417 in 5ab768c

    
           provider.LimitAllocatorBufferCapacity().InPixelsSqrt(100); 
        
           using Image<Rgba32> image = provider.GetImage(PngDecoder); 
        
           image.DebugSave(provider, testOutputDetails: nonContiguousBuffersStr);

I can't think of any better solution than changing the following line to use Configuration.Default instead of the image's configuration (with a comment on why are we doing so). I hope it won't lead to unwanted sideffects:

ImageSharp/tests/ImageSharp.Tests/TestUtilities/ImagingTestCaseUtility.cs

Line 181 in a1784a6

image.Save(stream, encoder);

JimBobSquarePants · 2021-04-06T21:33:07Z

@antonfirsov Thanks for you help! I've just created a clone of the encoder and use that when registering for tests.

JimBobSquarePants · 2021-04-06T23:45:59Z

OK... tried everything I can think of here. Rewrote all the sampling to use (since reverted) to use double precision and it still fails for NET Core 3.1 on Mac OS.

antonfirsov

The new transform code looks good and easy to follow, great job!

Can you post some Beyond Compare screenshots for the most problematic, and the less problematic failures?

For troubleshooting I would log and compare as many internals as possible. Since it may take a lot of time, we can just disable the Mac tests with [ActiveIssue] instead of trying to figure out the problem as part of the PR.

antonfirsov · 2021-04-07T09:58:37Z

src/ImageSharp/Processing/Processors/Transforms/Linear/AffineTransformProcessor{TPixel}.cs

+                        int top = LinearTransformUtility.GetRangeStart(yRadius, pY, maxY);
+                        int bottom = LinearTransformUtility.GetRangeEnd(yRadius, pY, maxY);
+                        int left = LinearTransformUtility.GetRangeStart(xRadius, pX, maxX);
+                        int right = LinearTransformUtility.GetRangeEnd(xRadius, pX, maxX);


Idea 1:
log these 4 values during the whole operation, and look for differences between Mac and Windows.

antonfirsov · 2021-04-07T10:09:16Z

src/ImageSharp/Processing/Processors/Transforms/Linear/AffineTransformProcessor{TPixel}.cs

+                                float xWeight = sampler.GetValue(xK - pX);
+
+                                Vector4 current = sourceBuffer.GetElementUnsafe(xK, yK).ToScaledVector4();
+                                Numerics.Premultiply(ref current);


Idea 2:
Also log the premultiplied current value, compare the log files, look for highest difference.

antonfirsov · 2021-04-07T10:10:11Z

src/ImageSharp/Processing/Processors/Transforms/Linear/AffineTransformProcessor{TPixel}.cs

-                        in this.sampler,
-                        point,
-                        sourceBuffer,
+                    Numerics.UnPremultiply(span);


Have we used the bulk UnPremultiply in the old code?

We didn't. Have just ran a test commit using the non-bulk variant and the major issues disappeared leaving only 2 minor (max 0.03% diff) remaining.

Below is the output of the worst issue. The last 3 pixels of each row appear to be affected which suggests to me that something is going wrong here. My guess is that it has something to do with unpremultiplying Vector4.Zero

ImageSharp/src/ImageSharp/Common/Helpers/Numerics.cs

Lines 556 to 576 in bb72769

if (Modulo2(vectors.Length) != 0)

{

// Vector4 fits neatly in pairs. Any overlap has to be equal to 1.

UnPremultiply(ref MemoryMarshal.GetReference(vectors.Slice(vectors.Length - 1)));

}

}

else

#endif

{

ref Vector4 vectorsStart = ref MemoryMarshal.GetReference(vectors);

ref Vector4 vectorsEnd = ref Unsafe.Add(ref vectorsStart, vectors.Length);

while (Unsafe.IsAddressLessThan(ref vectorsStart, ref vectorsEnd))

{

UnPremultiply(ref vectorsStart);

vectorsStart = ref Unsafe.Add(ref vectorsStart, 1);

}

}

}

The question is, what do we do? The issue is clearly fixed for .NET 5 but .NET Core 3.1 is LTS.

I think we should keep the bulk code for the rest of the platforms, because it likely takes an important part in the perf improvements we see in your benchmarks, and fall back to scalar stuff on Mac. I would add a check for the OS platform at this line with an explanatory comment:

ImageSharp/src/ImageSharp/Common/Helpers/Numerics.cs

Line 543 in bb72769

if (Avx2.IsSupported && vectors.Length >= 2)

if (Avx2.IsSupported && !RuntimeInformation.IsOSPlatform(OSPlatform.OSX) && vectors.Length >= 2)

In no case I would complicate this with additional logic to detect if we are running on .NET 5 (Mac is not that important).

JimBobSquarePants · 2021-04-09T01:08:13Z

You better re-review this @antonfirsov there's been some awful hackery.

codecov · 2021-04-09T01:20:08Z

Codecov Report

Merging #1591 (ac0f01f) into master (97dde7f) will decrease coverage by 0.01%.
The diff coverage is 94.44%.

@@            Coverage Diff             @@
##           master    #1591      +/-   ##
==========================================
- Coverage   83.69%   83.68%   -0.02%     
==========================================
  Files         747      748       +1     
  Lines       33055    33032      -23     
  Branches     3692     3698       +6     
==========================================
- Hits        27665    27642      -23     
+ Misses       4676     4675       -1     
- Partials      714      715       +1

Flag	Coverage Δ
unittests	`83.68% <94.44%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...ssing/Processors/Transforms/Resize/ResizeKernel.cs	`100.00% <ø> (ø)`
...nsforms/Linear/AffineTransformProcessor{TPixel}.cs	`97.02% <93.87%> (-2.98%)`	⬇️
...rms/Linear/ProjectiveTransformProcessor{TPixel}.cs	`94.05% <93.87%> (-2.92%)`	⬇️
...rc/ImageSharp/Common/Helpers/RuntimeEnvironment.cs	`100.00% <100.00%> (ø)`
...essors/Transforms/Linear/LinearTransformUtility.cs	`100.00% <100.00%> (ø)`
...rp/Metadata/Profiles/Exif/Values/ExifSignedLong.cs	`42.85% <0.00%> (+14.28%)`	⬆️
...ImageSharp/Formats/Png/Chunks/PhysicalChunkData.cs	`100.00% <0.00%> (+15.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97dde7f...ac0f01f. Read the comment docs.

antonfirsov

I'm wondering why did you decide to duplicate? Wasn't there a mistake in 35065c7 adding the new conditions checks to Premultiply instead of Unpremultiply?

If there is a perf concern regarding the long expression, I recommend to cache !RuntimeEnvironment.IsOSPlatform(OSPlatform.OSX) && RuntimeEnvironment.IsNetCore into a static variable.

antonfirsov · 2021-04-09T09:06:47Z

src/ImageSharp/Common/Helpers/RuntimeEnvironment.cs

+        /// <summary>
+        /// Gets the name of the .NET installation on which an app is running.
+        /// </summary>
+        public static string FrameworkDescription => RuntimeInformation.FrameworkDescription;


Does this have to be exposed? I would make it private.

I did make a mistake yeah but corrected it in a later commit.

What I've tried so far:

Filtered out the runtime instrinsics part of the UnPremultiply method to aviod Mac on Core 3.1

Rewrote the scalar overhang portion of the method to use a simple for loop over the span.

Neither worked. The only thing I could get working was duplicating the code which is simply awful. I couldn't replicate the issue locally either to effectively debug.

Regarding the method, no it doesn't need to be exposed.

src/ImageSharp/Common/Helpers/RuntimeEnvironment.cs

antonfirsov

If it doesn't work without the duplication, let's keep it. LGTM with one suggestion.

src/ImageSharp/Common/Helpers/RuntimeEnvironment.cs

Co-authored-by: Anton Firszov <antonfir@gmail.com>

JimBobSquarePants added 11 commits March 31, 2021 16:55

Attempt to use same weight generation algorithm as resize.

fd85fa2

tests pass

9695d58

Identical output

f93cc72

Update LinearTransformKernelFactory{TResampler}.cs

405d144

Use new low allocation iterator

54a0dda

Migrate projective transforms.

b38623a

Optimizations

643bf29

Smaller kernel

38837e5

Fix sampling accuracy

7ec2de4

Finalize and update refs

4a4b57b

Revert unnecessary changes

b39412b

JimBobSquarePants added area:performance area:transforms labels Apr 6, 2021

JimBobSquarePants added this to the 1.1.0 milestone Apr 6, 2021

JimBobSquarePants requested review from Sergio0694 and antonfirsov April 6, 2021 03:25

Remove enumerator

96a8566

JimBobSquarePants commented Apr 6, 2021

View reviewed changes

Actually save output for debugging.

ab9fb07

JimBobSquarePants added 2 commits April 6, 2021 22:13

Use custom test png encoder for reduced memory environments

87871ce

Convolution should use scaled vectors

7df3f13

JimBobSquarePants added 4 commits April 6, 2021 22:35

Update TestEnvironmentTests.cs

5b13e95

Try using doubles

8f4fdfe

Moar double precision

9086d25

Fix radius calculation

b54dff0

antonfirsov approved these changes Apr 7, 2021

View reviewed changes

JimBobSquarePants added 11 commits April 7, 2021 23:03

Test if issue is SIMD related.

01293bb

Detect runtime to determine pipeline.

35065c7

Fix stack overflow

28973d9

fix condition

41c7ec3

Try simplified scalar run

f033310

Merge branch 'master' into js/linear-transforms-experiments

326cf18

Simplify unpremultiply scalar

29889e4

Update Numerics.cs

6d78536

Fix runtime environment

7107544

Update ImageSharp.csproj

696408f

Duplicate the caller with scalar versions

91ebd94

JimBobSquarePants requested a review from antonfirsov April 9, 2021 01:07

antonfirsov reviewed Apr 9, 2021

View reviewed changes

Update method name, exclude from coverage.

dc1c81e

antonfirsov approved these changes Apr 14, 2021

View reviewed changes

src/ImageSharp/Common/Helpers/RuntimeEnvironment.cs Show resolved Hide resolved

JimBobSquarePants and others added 2 commits April 14, 2021 14:49

Don't save output during coverage tests for perf.

67101b0

Update src/ImageSharp/Common/Helpers/RuntimeEnvironment.cs

ac0f01f

Co-authored-by: Anton Firszov <antonfir@gmail.com>

JimBobSquarePants merged commit dc0982f into master Apr 14, 2021

JimBobSquarePants deleted the js/linear-transforms-experiments branch April 14, 2021 15:09


	if (Modulo2(vectors.Length) != 0)
	{
	// Vector4 fits neatly in pairs. Any overlap has to be equal to 1.
	UnPremultiply(ref MemoryMarshal.GetReference(vectors.Slice(vectors.Length - 1)));
	}
	}
	else
	#endif
	{
	ref Vector4 vectorsStart = ref MemoryMarshal.GetReference(vectors);
	ref Vector4 vectorsEnd = ref Unsafe.Add(ref vectorsStart, vectors.Length);

	while (Unsafe.IsAddressLessThan(ref vectorsStart, ref vectorsEnd))
	{
	UnPremultiply(ref vectorsStart);

	vectorsStart = ref Unsafe.Add(ref vectorsStart, 1);
	}
	}
	}

Uh oh!

Faster Linear Transforms #1591

Faster Linear Transforms #1591

Uh oh!

Conversation

JimBobSquarePants commented Apr 6, 2021

Prerequisites

Description

Benchmarks

ROTATE

SKEW

Uh oh!

JimBobSquarePants Apr 6, 2021

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants commented Apr 6, 2021

Uh oh!

antonfirsov commented Apr 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JimBobSquarePants commented Apr 6, 2021

Uh oh!

JimBobSquarePants commented Apr 6, 2021

Uh oh!

antonfirsov left a comment

Choose a reason for hiding this comment

Uh oh!

antonfirsov Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

antonfirsov Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonfirsov Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

antonfirsov Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants commented Apr 9, 2021

Uh oh!

codecov bot commented Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

antonfirsov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonfirsov Apr 9, 2021

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants Apr 9, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antonfirsov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

antonfirsov commented Apr 6, 2021 •

edited

Loading

antonfirsov Apr 7, 2021 •

edited

Loading

antonfirsov Apr 8, 2021 •

edited

Loading

codecov bot commented Apr 9, 2021 •

edited

Loading

antonfirsov left a comment •

edited

Loading