Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PackFromRgbPlanes for Rgba32 and Rgb24 #1462

Merged
merged 14 commits into from
Dec 9, 2020

Conversation

antonfirsov
Copy link
Member

@antonfirsov antonfirsov commented Dec 5, 2020

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This is a replacement for #1242, implementing the algorithm suggested by @saucecontrol in #1242 (comment).
Contributes to #1121 and #1410.

Results

The baseline is the float AVX2 packing which is the final step of all JpegColorConverters.

.NET Core 3.1.1
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores


|                           Method | Count |       Mean |    Error |   StdDev | Ratio | RatioSD |
|--------------------------------- |------ |-----------:|---------:|---------:|------:|--------:|
|     Rgb24_Scalar_PerElement_Span |  1024 | 1,634.6 ns | 26.56 ns | 24.84 ns |  3.12 |    0.05 |
|   Rgb24_Scalar_PerElement_Unsafe |  1024 | 1,284.7 ns |  4.70 ns |  4.16 ns |  2.46 |    0.01 |
| Rgb24_Scalar_PerElement_Batched8 |  1024 | 1,182.3 ns |  5.12 ns |  4.27 ns |  2.26 |    0.01 |
| Rgb24_Scalar_PerElement_Batched4 |  1024 | 1,146.2 ns | 16.38 ns | 14.52 ns |  2.19 |    0.02 |
|                Rgba32_Avx2_Float |  1024 |   522.7 ns |  1.78 ns |  1.39 ns |  1.00 |    0.00 |
|                 Rgb24_Avx2_Bytes |  1024 |   243.3 ns |  1.56 ns |  1.30 ns |  0.47 |    0.00 |
|                Rgba32_Avx2_Bytes |  1024 |   146.0 ns |  2.48 ns |  2.32 ns |  0.28 |    0.01 |

Conclusions:

  • The trick seems to worth it
  • Although staying on Rgba32 would be faster, we can bring Rgb24 closer in the final implementation because we need to convert 25% less float-s to byte-s with Image<Rgb24>.

Follow-up

Finishing #1121 and #1410 should be a trivial refactor from now on, I can describe all the steps if someone wants to take it. (I likely won't have the time before January.)


/// <summary>
/// Bulk operation that packs 3 seperate RGB channels to <paramref name="destination"/>.
/// The destination must have a padding of 3.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect destination to be padded, because of quirks of the Rgb24 AVX2 implementation. This will need some extra trickery (a copy) when using it arond the last line(s) of Jpeg, but that shouldn't be too hard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah when I did the slice from 4 to 3channels I offset the write so there wasn't any padding required but that could only be done with SSE so likely wouldn't be fast enough here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that I'm doing overlapped stores of 256 byte vectors in the following format:

rgb1   = <R00, G00, B00, R01, G01, B01, R02, G02 | B02, R03, G03, B03, R04, G04, B04, R05 || B05, G05, R06, G06, B06, R07, G07, B07 | ___, ___, ___, ___, ___, ___, ___, ___>
rgb2   = <R08, G08, B08, R09, G09, B09, R10, G10 | B10, R11, G11, B11, R12, G12, B12, R13 || B13, G13, R14, G14, B14, R15, G15, B15 | ___, ___, ___, ___, ___, ___, ___, ___>
rgb3   = <R16, G16, B17, R17, G17, B17, R18, G18 | B18, R19, G19, B19, R20, G20, B20, R21 || G21, B21, R22, G22, B22, R23, G23, B23 | ___, ___, ___, ___, ___, ___, ___, ___> 
rgb4   = <R24, G24, B24, R25, G25, B25, R26, G26 | B26, R27, G27, B27, R28, G28, B28, R29 || G29, B29, R30, G30, B30, R31, G31, B31 | ___, ___, ___, ___, ___, ___, ___, ___>  

Storing rgb4 would result in an overflow, since B31 is the last element.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it perhaps be better to vectorize one less round than max and slice accordingly after? I'm concerned about the padding requirement.

Copy link
Member Author

@antonfirsov antonfirsov Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's something we can do on the call-site yeah ... but the input span has to be still padded, there is no other way for us to make sure there is no overflow within the PackFromRgbPlanes method.

Idea:

  • Define a utility Buffer2D.TryGetPaddedRowSpan(y, padding), which will return true for everything except the last row, or last few rows for the width < padding corner case.
  • If there is no padding do an automatic copy in PackFromRgbPlanes() instead of throwing an exception. (Slow(er) path for the last row)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

.gitattributes Outdated
@@ -105,11 +105,8 @@
*.pvr binary
*.snk binary
*.tga binary
*.tif binary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change comes from the automatic copy from the shared-infrastructure submodule when building.

I think you need to rebuild the solution and push again making sure all submodules are up-to-date and that the root .gitattributes and .editorconfig files are updated from the submodule on build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this if you want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this change.

@codecov
Copy link

codecov bot commented Dec 6, 2020

Codecov Report

Merging #1462 (aed233e) into master (a311a16) will decrease coverage by 0.09%.
The diff coverage is 69.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1462      +/-   ##
==========================================
- Coverage   83.66%   83.56%   -0.10%     
==========================================
  Files         736      737       +1     
  Lines       32012    32232     +220     
  Branches     3609     3618       +9     
==========================================
+ Hits        26782    26935     +153     
- Misses       4516     4581      +65     
- Partials      714      716       +2     
Flag Coverage Δ
unittests 83.56% <69.54%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ImageSharp/Common/Helpers/SimdUtils.cs 65.95% <ø> (ø)
src/ImageSharp/Common/Helpers/SimdUtils.Pack.cs 28.72% <28.72%> (ø)
...mageSharp/Common/Helpers/SimdUtils.HwIntrinsics.cs 98.25% <100.00%> (+0.45%) ⬆️
...entations/PixelOperations/Rgb24.PixelOperations.cs 100.00% <100.00%> (ø)
...ntations/PixelOperations/Rgba32.PixelOperations.cs 100.00% <100.00%> (ø)
...ImageSharp/PixelFormats/PixelOperations{TPixel}.cs 60.00% <100.00%> (+20.60%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4bda224...aed233e. Read the comment docs.

@JimBobSquarePants JimBobSquarePants added this to the 1.1.0 milestone Dec 7, 2020
@JimBobSquarePants
Copy link
Member

@antonfirsov I can't see any issues here. Love how quickly you did this, would have taken me weeks! 👍

@JimBobSquarePants JimBobSquarePants merged commit 1f351ee into master Dec 9, 2020
@JimBobSquarePants JimBobSquarePants deleted the af/PackFromRgbPlanes branch December 9, 2020 17:34
JimBobSquarePants added a commit that referenced this pull request Mar 13, 2021
Implement PackFromRgbPlanes for Rgba32 and Rgb24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants