Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sse2 version of inverse transform #1819

Merged
merged 15 commits into from
Nov 14, 2021
Merged

Add sse2 version of inverse transform #1819

merged 15 commits into from
Nov 14, 2021

Conversation

brianpopow
Copy link
Collaborator

@brianpopow brianpopow commented Nov 10, 2021

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This PR introduces a SSE2 version of the method ITransform, which is used during lossy webp encoding.
Related to #1786

The profiling results look good:

master branch:

ITransform

PR

ITransform_sse

Note: the different call counts are due to the SSE version is doing two transforms at once.

@codecov
Copy link

codecov bot commented Nov 10, 2021

Codecov Report

Merging #1819 (16bb94f) into master (c22919d) will increase coverage by 0%.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #1819    +/-   ##
=======================================
  Coverage      87%     87%            
=======================================
  Files         936     936            
  Lines       48193   48320   +127     
  Branches     6037    6038     +1     
=======================================
+ Hits        42099   42221   +122     
- Misses       5094    5097     +3     
- Partials     1000    1002     +2     
Flag Coverage Δ
unittests 87% <100%> (+<1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs 100% <100%> (ø)
src/ImageSharp/Formats/Webp/Lossy/QuantEnc.cs 96% <100%> (ø)
src/ImageSharp/Formats/Webp/Lossy/Vp8Encoding.cs 100% <100%> (ø)
...mageSharp/Formats/Webp/Lossless/NearLosslessEnc.cs 88% <0%> (-8%) ⬇️
...rc/ImageSharp/Formats/Webp/Lossless/Vp8LEncoder.cs 97% <0%> (-1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3a10e93...16bb94f. Read the comment docs.

@brianpopow brianpopow changed the title WIP: Add sse2 version of inverse transform Add sse2 version of inverse transform Nov 11, 2021
@brianpopow brianpopow requested a review from a team November 11, 2021 20:49
// a01 a11 a21 a31 x x x x
// a02 a12 a22 a32 x x x x
// a03 a13 a23 a33 x x x x
if (doTwo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to avoid giving unnecessary extra work to the branch predictor, and refactor ITransform into two separate methods.

One hidden cost is that assembly code having branches on a runtime constant will be longer, and needs more cache lines, which are expensive to load.

Alternative "fun" solution, sort of template metaprogramming in C#, that will eliminate branches at JIT time:

ITransform<TDoTwo>(...) {
    ...
    if (typeof(TDoTwo) == typeof(DoTwo_True)) {
        ...
    }
    else {
       ...
    }
}

Can't decide if it's worth it or better to just duplicate the code, and maybe create reusable helper methods for shared bits.

Comment on lines +187 to +189
Unsafe.As<byte, Vector64<byte>>(ref outputRef) = ref0.GetLower();
Unsafe.As<byte, Vector64<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps)) = ref1.GetLower();
Unsafe.As<byte, Vector64<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps * 2)) = ref2.GetLower();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first 3 calls, it's possible and better to avoid GetLower. The second store will overwrite the upper 8 bit written in the first store and so on:

Suggested change
Unsafe.As<byte, Vector64<byte>>(ref outputRef) = ref0.GetLower();
Unsafe.As<byte, Vector64<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps)) = ref1.GetLower();
Unsafe.As<byte, Vector64<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps * 2)) = ref2.GetLower();
Unsafe.As<byte, Vector128<byte>>(ref outputRef) = ref0;
Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps)) = ref1;
Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps * 2)) = ref2;

If you guarantee at the call site that dst has a patting of 8 bytes, you can also avoid it at the last call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think that will work. I only want to write 8 bytes from each ref vector.
The first write will be at position 0, the second at 32 (note the Unsafe.Add() with WebpConstants.Bps, WebpConstants.Bps is 32), the third at 64 and the last at 96.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah haven't noticed that Bps is 32, nevermind then.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@brianpopow brianpopow merged commit 69c30f8 into master Nov 14, 2021
@brianpopow brianpopow deleted the bp/itransformsse branch November 14, 2021 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants