Skip to content

Conversation

@JimBobSquarePants
Copy link
Member

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

I haven't been happy with the existing linear transform code since I originally wrote it so thought I'd have another crack.

Benchmarks are very good. Large speedup on all targets with a 2x speedup on .NET Core 3.1+. However they don't tell the whole story. Originally two 2D buffers were rented from the pool of DestinationHeight x MaxDiameter to store kernel weight buffers but, since the kernel weights must be calculated based on the exact transformed subpixel point, it was pointless to do so. Those buffers are now gone.

In addition I discovered a rounding issue where the incorrect scaled filter radius was being calculated which lead to rogue pixels containing RGB values but 0 alpha to be generated outside transform edges. I've updated two references to reflect the improved accuracy.

I consider readability much improved now also. It's much easier to understand how the transform operation actually works.

Ignore individual commits during review, I generated a lot of noise during my experimentation (basically properly learning how the process actually worked 😇) so I'll be squashing to a single commit on merge.

Benchmarks

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
  Job-WMQSXI : .NET Framework 4.8 (4.8.4341.0), X64 RyuJIT
  Job-ZPQJBS : .NET Core 2.1.26 (CoreCLR 4.6.29812.02, CoreFX 4.6.29812.01), X64 RyuJIT
  Job-LHSZUP : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT

ROTATE

BEFORE

Method Job Runtime Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
DoRotate Job-OBWWBY .NET 4.7.2 44.56 ms 0.464 ms 0.411 ms - - - 11.33 KB
DoRotate Job-MMNLYB .NET Core 2.1 19.18 ms 0.320 ms 0.355 ms - - - 5.91 KB
DoRotate Job-UKOETF .NET Core 3.1 24.90 ms 0.328 ms 0.274 ms - - - 7.23 KB

AFTER

Method Job Runtime Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
DoRotate Job-BAUEPW .NET 4.7.2 30.73 ms 0.397 ms 0.331 ms - - - 6.75 KB
DoRotate Job-SNWMCN .NET Core 2.1 16.31 ms 0.317 ms 0.352 ms - - - 5.25 KB
DoRotate Job-MRMBJZ .NET Core 3.1 12.21 ms 0.239 ms 0.245 ms - - - 6.61 KB

SKEW

BEFORE

Method Job Runtime Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
DoSkew Job-GBTORI .NET 4.7.2 37.08 ms 0.463 ms 0.433 ms - - - 10.29 KB
DoSkew Job-RAEJFO .NET Core 2.1 17.27 ms 0.187 ms 0.175 ms - - - 5.85 KB
DoSkew Job-LFXFDP .NET Core 3.1 19.34 ms 0.149 ms 0.140 ms - - - 7.17 KB

AFTER

Method Job Runtime Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
DoSkew Job-YEGFRQ .NET 4.7.2 23.563 ms 0.0731 ms 0.0570 ms - - - 6.75 KB
DoSkew Job-HZHOGR .NET Core 2.1 13.700 ms 0.2727 ms 0.5122 ms - - - 5.25 KB
DoSkew Job-LTEUKY .NET Core 3.1 9.971 ms 0.0254 ms 0.0225 ms - - - 6.61 KB

internal static int GetSamplingRadius<TResampler>(in TResampler sampler, int sourceSize, int destinationSize)
where TResampler : struct, IResampler
{
double scale = sourceSize / destinationSize;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spot the rounding error 😄

@JimBobSquarePants
Copy link
Member Author

@antonfirsov I'm gonna need some help here. MacOS .NET 3.1 was showing a 4% error on projective transforms but I couldn't pull down the test output because it was incomplete. (DebugSave disabled on CI). I enabled saving and now I'm getting a bunch of MemoyAllocator errors. Any idea?

@antonfirsov
Copy link
Member

antonfirsov commented Apr 6, 2021

DebugSave uses image.GetConfiguration().MemoryAllocator which is artificially limited to very small buffers to test discontiguous decoding:

provider.LimitAllocatorBufferCapacity().InPixelsSqrt(100);
using Image<Rgba32> image = provider.GetImage(PngDecoder);
image.DebugSave(provider, testOutputDetails: nonContiguousBuffersStr);

I can't think of any better solution than changing the following line to use Configuration.Default instead of the image's configuration (with a comment on why are we doing so). I hope it won't lead to unwanted sideffects:

@JimBobSquarePants
Copy link
Member Author

@antonfirsov Thanks for you help! I've just created a clone of the encoder and use that when registering for tests.

@JimBobSquarePants
Copy link
Member Author

OK... tried everything I can think of here. Rewrote all the sampling to use (since reverted) to use double precision and it still fails for NET Core 3.1 on Mac OS.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new transform code looks good and easy to follow, great job!

Can you post some Beyond Compare screenshots for the most problematic, and the less problematic failures?

For troubleshooting I would log and compare as many internals as possible. Since it may take a lot of time, we can just disable the Mac tests with [ActiveIssue] instead of trying to figure out the problem as part of the PR.

Comment on lines +191 to +194
int top = LinearTransformUtility.GetRangeStart(yRadius, pY, maxY);
int bottom = LinearTransformUtility.GetRangeEnd(yRadius, pY, maxY);
int left = LinearTransformUtility.GetRangeStart(xRadius, pX, maxX);
int right = LinearTransformUtility.GetRangeEnd(xRadius, pX, maxX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea 1:
log these 4 values during the whole operation, and look for differences between Mac and Windows.

float xWeight = sampler.GetValue(xK - pX);

Vector4 current = sourceBuffer.GetElementUnsafe(xK, yK).ToScaledVector4();
Numerics.Premultiply(ref current);
Copy link
Member

@antonfirsov antonfirsov Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea 2:
Also log the premultiplied current value, compare the log files, look for highest difference.

in this.sampler,
point,
sourceBuffer,
Numerics.UnPremultiply(span);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we used the bulk UnPremultiply in the old code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't. Have just ran a test commit using the non-bulk variant and the major issues disappeared leaving only 2 minor (max 0.03% diff) remaining.

Below is the output of the worst issue. The last 3 pixels of each row appear to be affected which suggests to me that something is going wrong here. My guess is that it has something to do with unpremultiplying Vector4.Zero

if (Modulo2(vectors.Length) != 0)
{
// Vector4 fits neatly in pairs. Any overlap has to be equal to 1.
UnPremultiply(ref MemoryMarshal.GetReference(vectors.Slice(vectors.Length - 1)));
}
}
else
#endif
{
ref Vector4 vectorsStart = ref MemoryMarshal.GetReference(vectors);
ref Vector4 vectorsEnd = ref Unsafe.Add(ref vectorsStart, vectors.Length);
while (Unsafe.IsAddressLessThan(ref vectorsStart, ref vectorsEnd))
{
UnPremultiply(ref vectorsStart);
vectorsStart = ref Unsafe.Add(ref vectorsStart, 1);
}
}
}

image

The question is, what do we do? The issue is clearly fixed for .NET 5 but .NET Core 3.1 is LTS.

Copy link
Member

@antonfirsov antonfirsov Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the bulk code for the rest of the platforms, because it likely takes an important part in the perf improvements we see in your benchmarks, and fall back to scalar stuff on Mac. I would add a check for the OS platform at this line with an explanatory comment:

if (Avx2.IsSupported && vectors.Length >= 2)

if (Avx2.IsSupported && !RuntimeInformation.IsOSPlatform(OSPlatform.OSX) && vectors.Length >= 2)

In no case I would complicate this with additional logic to detect if we are running on .NET 5 (Mac is not that important).

@JimBobSquarePants
Copy link
Member Author

You better re-review this @antonfirsov there's been some awful hackery.

@codecov
Copy link

codecov bot commented Apr 9, 2021

Codecov Report

Merging #1591 (ac0f01f) into master (97dde7f) will decrease coverage by 0.01%.
The diff coverage is 94.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1591      +/-   ##
==========================================
- Coverage   83.69%   83.68%   -0.02%     
==========================================
  Files         747      748       +1     
  Lines       33055    33032      -23     
  Branches     3692     3698       +6     
==========================================
- Hits        27665    27642      -23     
+ Misses       4676     4675       -1     
- Partials      714      715       +1     
Flag Coverage Δ
unittests 83.68% <94.44%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ssing/Processors/Transforms/Resize/ResizeKernel.cs 100.00% <ø> (ø)
...nsforms/Linear/AffineTransformProcessor{TPixel}.cs 97.02% <93.87%> (-2.98%) ⬇️
...rms/Linear/ProjectiveTransformProcessor{TPixel}.cs 94.05% <93.87%> (-2.92%) ⬇️
...rc/ImageSharp/Common/Helpers/RuntimeEnvironment.cs 100.00% <100.00%> (ø)
...essors/Transforms/Linear/LinearTransformUtility.cs 100.00% <100.00%> (ø)
...rp/Metadata/Profiles/Exif/Values/ExifSignedLong.cs 42.85% <0.00%> (+14.28%) ⬆️
...ImageSharp/Formats/Png/Chunks/PhysicalChunkData.cs 100.00% <0.00%> (+15.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97dde7f...ac0f01f. Read the comment docs.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why did you decide to duplicate? Wasn't there a mistake in 35065c7 adding the new conditions checks to Premultiply instead of Unpremultiply?

If there is a perf concern regarding the long expression, I recommend to cache !RuntimeEnvironment.IsOSPlatform(OSPlatform.OSX) && RuntimeEnvironment.IsNetCore into a static variable.

/// <summary>
/// Gets the name of the .NET installation on which an app is running.
/// </summary>
public static string FrameworkDescription => RuntimeInformation.FrameworkDescription;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be exposed? I would make it private.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did make a mistake yeah but corrected it in a later commit.

What I've tried so far:

  • Filtered out the runtime instrinsics part of the UnPremultiply method to aviod Mac on Core 3.1
  • Rewrote the scalar overhang portion of the method to use a simple for loop over the span.

Neither worked. The only thing I could get working was duplicating the code which is simply awful. I couldn't replicate the issue locally either to effectively debug.

Regarding the method, no it doesn't need to be exposed.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't work without the duplication, let's keep it. LGTM with one suggestion.

@JimBobSquarePants JimBobSquarePants merged commit dc0982f into master Apr 14, 2021
@JimBobSquarePants JimBobSquarePants deleted the js/linear-transforms-experiments branch April 14, 2021 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants