-
-
Notifications
You must be signed in to change notification settings - Fork 852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement PackFromRgbPlanes for Rgba32 and Rgb24 #1462
Conversation
|
||
/// <summary> | ||
/// Bulk operation that packs 3 seperate RGB channels to <paramref name="destination"/>. | ||
/// The destination must have a padding of 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We expect destination
to be padded, because of quirks of the Rgb24 AVX2 implementation. This will need some extra trickery (a copy) when using it arond the last line(s) of Jpeg, but that shouldn't be too hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah when I did the slice from 4 to 3channels I offset the write so there wasn't any padding required but that could only be done with SSE so likely wouldn't be fast enough here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that I'm doing overlapped stores of 256 byte vectors in the following format:
rgb1 = <R00, G00, B00, R01, G01, B01, R02, G02 | B02, R03, G03, B03, R04, G04, B04, R05 || B05, G05, R06, G06, B06, R07, G07, B07 | ___, ___, ___, ___, ___, ___, ___, ___>
rgb2 = <R08, G08, B08, R09, G09, B09, R10, G10 | B10, R11, G11, B11, R12, G12, B12, R13 || B13, G13, R14, G14, B14, R15, G15, B15 | ___, ___, ___, ___, ___, ___, ___, ___>
rgb3 = <R16, G16, B17, R17, G17, B17, R18, G18 | B18, R19, G19, B19, R20, G20, B20, R21 || G21, B21, R22, G22, B22, R23, G23, B23 | ___, ___, ___, ___, ___, ___, ___, ___>
rgb4 = <R24, G24, B24, R25, G25, B25, R26, G26 | B26, R27, G27, B27, R28, G28, B28, R29 || G29, B29, R30, G30, B30, R31, G31, B31 | ___, ___, ___, ___, ___, ___, ___, ___>
Storing rgb4
would result in an overflow, since B31
is the last element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it perhaps be better to vectorize one less round than max and slice accordingly after? I'm concerned about the padding requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's something we can do on the call-site yeah ... but the input span has to be still padded, there is no other way for us to make sure there is no overflow within the PackFromRgbPlanes
method.
Idea:
- Define a utility
Buffer2D.TryGetPaddedRowSpan(y, padding)
, which will returntrue
for everything except the last row, or last few rows for thewidth < padding
corner case. - If there is no padding do an automatic copy in
PackFromRgbPlanes()
instead of throwing an exception. (Slow(er) path for the last row)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me!
.gitattributes
Outdated
@@ -105,11 +105,8 @@ | |||
*.pvr binary | |||
*.snk binary | |||
*.tga binary | |||
*.tif binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change comes from the automatic copy from the shared-infrastructure submodule when building.
I think you need to rebuild the solution and push again making sure all submodules are up-to-date and that the root .gitattributes
and .editorconfig
files are updated from the submodule on build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do this if you want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted this change.
Codecov Report
@@ Coverage Diff @@
## master #1462 +/- ##
==========================================
- Coverage 83.66% 83.56% -0.10%
==========================================
Files 736 737 +1
Lines 32012 32232 +220
Branches 3609 3618 +9
==========================================
+ Hits 26782 26935 +153
- Misses 4516 4581 +65
- Partials 714 716 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@antonfirsov I can't see any issues here. Love how quickly you did this, would have taken me weeks! 👍 |
Implement PackFromRgbPlanes for Rgba32 and Rgb24
Prerequisites
Description
This is a replacement for #1242, implementing the algorithm suggested by @saucecontrol in #1242 (comment).
Contributes to #1121 and #1410.
Results
The baseline is the
float
AVX2 packing which is the final step of all JpegColorConverters.Conclusions:
Rgba32
would be faster, we can bringRgb24
closer in the final implementation because we need to convert 25% lessfloat
-s tobyte
-s withImage<Rgb24>
.Follow-up
Finishing #1121 and #1410 should be a trivial refactor from now on, I can describe all the steps if someone wants to take it. (I likely won't have the time before January.)