-
-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Color conversion with ICC profiles #273
Conversation
Codecov Report
@@ Coverage Diff @@
## master #273 +/- ##
==========================================
- Coverage 86.86% 85.49% -1.38%
==========================================
Files 849 678 -171
Lines 36075 30229 -5846
Branches 2660 2223 -437
==========================================
- Hits 31338 25843 -5495
+ Misses 3971 3724 -247
+ Partials 766 662 -104
Continue to review full report at Codecov.
|
@JBildstein Quick technical question. Do you think we could tie this in somehow with the SIMD colorspace transforms in our jpeg decoder? cc @antonfirsov |
@JimBobSquarePants I think that should be possible yes. However you will have to create a new instance of the converter for each image because some data has to be pulled from the ICC profile. I mean, it would be possible to pass the profile for each conversion call but it would be horribly inefficient. So in the |
@JBildstein I think we can manage something like that. Thanks. 👍 |
@JBildstein Apologies but I think I just broke the build merging master into your branch! |
@JimBobSquarePants no worries, the tests don't pass anyway at the moment (or rather, some aren't implemented yet). I finally found some time to work on this again and was able to implement most of the calculations. However, most of them haven't been tested yet and likely contain errors. It's rather cumbersome to do this so it takes a while. To make things more manageable, I decided to implement everything with Vector4 so only colors with up to 4 channels are supported. And (for now) I also won't implement multi process elements mainly because I have yet to find a profile using them. I would be very glad if someone could have a look at my n-dimensional linear interpolation (ClutCalculator.cs). I think it basically works but I'm a bit lost and not sure about the correctness. |
@JBildstein That's great news! I'm at your disposal so I'll try to get my head around the calculator for you and offer any advice help where I can. |
@JimBobSquarePants great, thank you very much. I'll be around on Gitter for questions and discussions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick once over. I don't know enough about the conversion process to give you any really useful feedback though.
{ | ||
private int inputCount; | ||
private int outputCount; | ||
private float[][] lut; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this array always jagged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually doesn't have to be, the second array is always the same length. In the Interpolate method it's currently useful though because I can just take the array reference instead of copying the values.
Do you think it would be better to have it in a single memory block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a helper Fast2DArray<T>
which might be useful here for n-length 2D objects, it's faster than the jagged array. Though if we template the interpolation we should look at custom structs for each known 2D grid since they'd be much faster.
{ | ||
Vector4.Clamp(value, Vector4.Zero, Vector4.One); | ||
|
||
float[] result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know that the maximum length is 4 could we stackalloc
the array and pass it as a sliced Span<byte>
to the Interpolate
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean with the Span<byte>
part but yes, I could use stackalloc also at other places and in general decrease memory allocations. I'll keep this in mind when I continue to work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mean Span<float>
.
This is what I mean:
We use a struct
wrapping around a fixed buffer and convert Vector4
to it using Unsafe.As
.
We can then slice that buffer using Span<float>
and prevent any allocation on the heap plus reduce the number of methods you need.
unsafe class Program
{
static void Main(string[] args)
{
Console.WriteLine("Using Unsafe to do clever things!");
Vector4 v = new Vector4(1, 2, 3, 4);
Floats f = Unsafe.As<Vector4, Floats>(ref v);
Interpolate(new Span<float>(f.Values, 1));
Interpolate(new Span<float>(f.Values, 2));
Interpolate(new Span<float>(f.Values, 3));
Interpolate(new Span<float>(f.Values, 4));
Console.ReadLine();
}
private static void Interpolate(Span<float> span)
{
Console.WriteLine($"Span of length {span.Length} passed.");
for (int i = 0; i < span.Length; i++)
{
Console.WriteLine($"Value at {i} equals {span[i]}");
}
}
public unsafe struct Floats
{
public fixed float Values[4];
}
}
This will print out.
Using Unsafe to do clever things!
Span of length 1 passed.
Value at 0 equals 1
Span of length 2 passed.
Value at 0 equals 1
Value at 1 equals 2
Span of length 3 passed.
Value at 0 equals 1
Value at 1 equals 2
Value at 2 equals 3
Span of length 4 passed.
Value at 0 equals 1
Value at 1 equals 2
Value at 2 equals 3
Value at 3 equals 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes that makes more sense. Thank you for the example, doing it like that is a lot better.
} | ||
|
||
float[] factors = new float[this.nodeCount]; | ||
for (int i = 0; i < factors.Length; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it could be vectorized but I could be wrong, @antonfirsov What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on what if (((i >> j) & 1) == 1)
does. We need to eliminate branches inside loops for vectorization.
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
private float CalculateInvertedCie122(float value) | ||
{ | ||
return ((float)Math.Pow(value, 1 / this.curve.G) - this.curve.B) / this.curve.A; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and others can use MathF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
@JimBobSquarePants, thank you very much for the review. I replaced the Math methods with MathF and will work on the interpolation a bit later. For reference here's a quick explanation of the CLUT (Color LookUp Table) interpolation: 1) Finding nodes:
The values of A and B aren't actually stored anywhere, they can be calculated and this is what the inner loop does. The outer loop finds the nodes for the interpolation. To do the interpolation we need every variation of lower and higher values. E.g. if A = 0.3 and B = 0.8 then we need to interpolate the values at
The line 2) Calculating the factors for interpolation: 3) Interpolation of the output: As a reference, this is code from the official ICC repo: InterpND |
Hey @JBildstein I got this back up to date with the master. 1097 commits! You'll have to resign the CLA again I'm afraid because we had to reimplement it to work as a single sign up across all our projects. |
impressive number! I'll soon be able to add some to that. I also have been working on some color conversion code lately that could be useful (it's using Vectors and is pretty fast) |
Great to hear! Looking forward to seeing whatever genius you produce. |
Been fiddling around with the CLUT interpolation:
The tested CLUT has three channels input and two channels output. Need to add a few more tests (4-channel input CLUT is still missing) but other than that it's looking pretty good. |
Oh that's great news! 😄 I hope the rapid churn isn't causing you too many problems. I can see there's some conflicts going on already. I think you can just use all the listed files from master. |
no worries, I haven't changed anything in those files so I can take them from master as you say. |
Ace... I've just merged the colorspace API into master, don't know if that's any interest/use to you. |
@JBildstein I just took some time to get his all up and running with all the tests passing (well almost.... Some variance issues but that is to be expected and we can pad the difference in the tests) so we can try and move forward. Would this be something you would be able to pick up on again? |
Much Faster sRGB Companding
Add PremultiplyAlpha to ResizeOptions
…bcr-conversion Vectorize Jpeg Encoder Color Conversion
Assembly for loading in the loop went from: ```asm vmovss xmm2, [rax] vbroadcastss xmm2, xmm2 vmovss xmm3, [rax+4] vbroadcastss xmm3, xmm3 vinsertf128 ymm2, ymm2, xmm3, 1 ``` To: ```asm vmovsd xmm3, [rax] vbroadcastsd ymm3, xmm3 vpermps ymm3, ymm1, ymm3 ```
See Vector256.Create issue: dotnet/runtime#47236
Speed improvements to resize kernel (w/ SIMD)
db51f69
to
172c48e
Compare
Prerequisites
Description
As the title says, this adds methods for converting colors with an ICC profile.
Architecturally, the idea is that the profile is checked once for available and appropriate conversion methods and a then a delegate is stored that only takes the color values to convert and returns the calculated values. The possible performance penalty for using a delegate is far smaller than searching through the profile for every conversion. I'm open for other suggestions though.
There are classes to convert from the profile connection space (=PCS, can be XYZ or Lab) to the data space (RGB, CMYK, etc.) and vice versa. There are also classes to convert from PCS to PCS and Data to Data but they are only used for special profiles and are not important for us now but I still added them for completeness sake.
A challenge here is writing tests for this because of the complexity of the calculations and the big amount of different possible conversion paths. This is a rough list of the paths that exist:
The three main approaches in that list are
The most used approaches are Color Trc for RGB profiles and LutAToB/LutBToA for CMYK profiles.
Todo list:
Help and suggestions are very welcome.