Implement PackFromRgbPlanes for Rgba32 and Rgb24 #1462

antonfirsov · 2020-12-05T20:33:17Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

This is a replacement for #1242, implementing the algorithm suggested by @saucecontrol in #1242 (comment).
Contributes to #1121 and #1410.

Results

The baseline is the float AVX2 packing which is the final step of all JpegColorConverters.

.NET Core 3.1.1
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores


|                           Method | Count |       Mean |    Error |   StdDev | Ratio | RatioSD |
|--------------------------------- |------ |-----------:|---------:|---------:|------:|--------:|
|     Rgb24_Scalar_PerElement_Span |  1024 | 1,634.6 ns | 26.56 ns | 24.84 ns |  3.12 |    0.05 |
|   Rgb24_Scalar_PerElement_Unsafe |  1024 | 1,284.7 ns |  4.70 ns |  4.16 ns |  2.46 |    0.01 |
| Rgb24_Scalar_PerElement_Batched8 |  1024 | 1,182.3 ns |  5.12 ns |  4.27 ns |  2.26 |    0.01 |
| Rgb24_Scalar_PerElement_Batched4 |  1024 | 1,146.2 ns | 16.38 ns | 14.52 ns |  2.19 |    0.02 |
|                Rgba32_Avx2_Float |  1024 |   522.7 ns |  1.78 ns |  1.39 ns |  1.00 |    0.00 |
|                 Rgb24_Avx2_Bytes |  1024 |   243.3 ns |  1.56 ns |  1.30 ns |  0.47 |    0.00 |
|                Rgba32_Avx2_Bytes |  1024 |   146.0 ns |  2.48 ns |  2.32 ns |  0.28 |    0.01 |

Conclusions:

The trick seems to worth it
Although staying on Rgba32 would be faster, we can bring Rgb24 closer in the final implementation because we need to convert 25% less float-s to byte-s with Image<Rgb24>.

Follow-up

Finishing #1121 and #1410 should be a trivial refactor from now on, I can describe all the steps if someone wants to take it. (I likely won't have the time before January.)

antonfirsov · 2020-12-05T20:38:38Z

src/ImageSharp/PixelFormats/PixelOperations{TPixel}.cs

+
+        /// <summary>
+        /// Bulk operation that packs 3 seperate RGB channels to <paramref name="destination"/>.
+        /// The destination must have a padding of 3.


We expect destination to be padded, because of quirks of the Rgb24 AVX2 implementation. This will need some extra trickery (a copy) when using it arond the last line(s) of Jpeg, but that shouldn't be too hard.

Yeah when I did the slice from 4 to 3channels I offset the write so there wasn't any padding required but that could only be done with SSE so likely wouldn't be fast enough here.

The problem is that I'm doing overlapped stores of 256 byte vectors in the following format:

rgb1 = <R00, G00, B00, R01, G01, B01, R02, G02 | B02, R03, G03, B03, R04, G04, B04, R05 || B05, G05, R06, G06, B06, R07, G07, B07 | ___, ___, ___, ___, ___, ___, ___, ___> rgb2 = <R08, G08, B08, R09, G09, B09, R10, G10 | B10, R11, G11, B11, R12, G12, B12, R13 || B13, G13, R14, G14, B14, R15, G15, B15 | ___, ___, ___, ___, ___, ___, ___, ___> rgb3 = <R16, G16, B17, R17, G17, B17, R18, G18 | B18, R19, G19, B19, R20, G20, B20, R21 || G21, B21, R22, G22, B22, R23, G23, B23 | ___, ___, ___, ___, ___, ___, ___, ___> rgb4 = <R24, G24, B24, R25, G25, B25, R26, G26 | B26, R27, G27, B27, R28, G28, B28, R29 || G29, B29, R30, G30, B30, R31, G31, B31 | ___, ___, ___, ___, ___, ___, ___, ___>

Storing rgb4 would result in an overflow, since B31 is the last element.

Would it perhaps be better to vectorize one less round than max and slice accordingly after? I'm concerned about the padding requirement.

That's something we can do on the call-site yeah ... but the input span has to be still padded, there is no other way for us to make sure there is no overflow within the PackFromRgbPlanes method.

Idea:

Define a utility Buffer2D.TryGetPaddedRowSpan(y, padding), which will return true for everything except the last row, or last few rows for the width < padding corner case.

If there is no padding do an automatic copy in PackFromRgbPlanes() instead of throwing an exception. (Slow(er) path for the last row)

Sounds good to me!

JimBobSquarePants · 2020-12-05T20:47:36Z

.gitattributes

@@ -105,11 +105,8 @@
 *.pvr               binary
 *.snk               binary
 *.tga               binary
-*.tif               binary


This change comes from the automatic copy from the shared-infrastructure submodule when building.

I think you need to rebuild the solution and push again making sure all submodules are up-to-date and that the root .gitattributes and .editorconfig files are updated from the submodule on build.

I can do this if you want?

Reverted this change.

codecov · 2020-12-06T12:56:20Z

Codecov Report

Merging #1462 (aed233e) into master (a311a16) will decrease coverage by 0.09%.
The diff coverage is 69.54%.

@@            Coverage Diff             @@
##           master    #1462      +/-   ##
==========================================
- Coverage   83.66%   83.56%   -0.10%     
==========================================
  Files         736      737       +1     
  Lines       32012    32232     +220     
  Branches     3609     3618       +9     
==========================================
+ Hits        26782    26935     +153     
- Misses       4516     4581      +65     
- Partials      714      716       +2

Flag	Coverage Δ
unittests	`83.56% <69.54%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/ImageSharp/Common/Helpers/SimdUtils.cs	`65.95% <ø> (ø)`
src/ImageSharp/Common/Helpers/SimdUtils.Pack.cs	`28.72% <28.72%> (ø)`
...mageSharp/Common/Helpers/SimdUtils.HwIntrinsics.cs	`98.25% <100.00%> (+0.45%)`	⬆️
...entations/PixelOperations/Rgb24.PixelOperations.cs	`100.00% <100.00%> (ø)`
...ntations/PixelOperations/Rgba32.PixelOperations.cs	`100.00% <100.00%> (ø)`
...ImageSharp/PixelFormats/PixelOperations{TPixel}.cs	`60.00% <100.00%> (+20.60%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4bda224...aed233e. Read the comment docs.

JimBobSquarePants · 2020-12-07T16:50:03Z

@antonfirsov I can't see any issues here. Love how quickly you did this, would have taken me weeks! 👍

Implement PackFromRgbPlanes for Rgba32 and Rgb24

antonfirsov added 10 commits December 5, 2020 00:48

hack

05db9ed

entry API & tests

7a4fbcf

benchmarks & scalar implementation

4e52f5e

AVX2 implemetation seems to work

2177ea8

started Rgba32

1242c21

polishing

5943905

unhack

fee7a22

add benchmark results and fix PixelOperations

7580d7c

fix typo

ae9f416

revert .gitattributes

f4107cc

antonfirsov mentioned this pull request Dec 5, 2020

Add PackFromRgbPlanes AVX2 vectorised implementation for Rgba32 and Rgba24 pixels #1242

Closed

antonfirsov commented Dec 5, 2020

View reviewed changes

JimBobSquarePants reviewed Dec 5, 2020

View reviewed changes

antonfirsov added 2 commits December 5, 2020 21:48

fix build

98d3160

fix scalar code

e675710

Merge branch 'master' into af/PackFromRgbPlanes

2a383b1

JimBobSquarePants added the area:performance label Dec 7, 2020

JimBobSquarePants added this to the 1.1.0 milestone Dec 7, 2020

Merge branch 'master' into af/PackFromRgbPlanes

aed233e

JimBobSquarePants merged commit 1f351ee into master Dec 9, 2020

JimBobSquarePants deleted the af/PackFromRgbPlanes branch December 9, 2020 17:34

JimBobSquarePants added a commit that referenced this pull request Mar 13, 2021

Merge pull request #1462 from SixLabors/af/PackFromRgbPlanes

3d3537e

Implement PackFromRgbPlanes for Rgba32 and Rgb24

antonfirsov mentioned this pull request Jun 23, 2021

JpegDecoder: post-process baseline spectral data per MCU-row #1597

Closed

antonfirsov mentioned this pull request Oct 3, 2021

Decode Jpeg into Rgb24 when pixel type not specified #1773

Merged

4 tasks

antonfirsov mentioned this pull request Nov 30, 2023

Fatal error. System.AccessViolationException When Encode #2595

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement PackFromRgbPlanes for Rgba32 and Rgb24 #1462

Implement PackFromRgbPlanes for Rgba32 and Rgb24 #1462

antonfirsov commented Dec 5, 2020 •

edited

Loading

antonfirsov Dec 5, 2020

JimBobSquarePants Dec 5, 2020

antonfirsov Dec 5, 2020

JimBobSquarePants Dec 6, 2020

antonfirsov Dec 7, 2020 •

edited

Loading

JimBobSquarePants Dec 7, 2020

JimBobSquarePants Dec 5, 2020

JimBobSquarePants Dec 5, 2020

antonfirsov Dec 5, 2020

codecov bot commented Dec 6, 2020 •

edited

Loading

JimBobSquarePants commented Dec 7, 2020

Implement PackFromRgbPlanes for Rgba32 and Rgb24 #1462

Implement PackFromRgbPlanes for Rgba32 and Rgb24 #1462

Conversation

antonfirsov commented Dec 5, 2020 • edited Loading

Prerequisites

Description

Results

Conclusions:

Follow-up

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Dec 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 6, 2020 • edited Loading

Codecov Report

JimBobSquarePants commented Dec 7, 2020

antonfirsov commented Dec 5, 2020 •

edited

Loading

antonfirsov Dec 7, 2020 •

edited

Loading

codecov bot commented Dec 6, 2020 •

edited

Loading