Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Color conversion with ICC profiles #273

Closed
wants to merge 8,943 commits into from

Conversation

JBildstein
Copy link
Contributor

@JBildstein JBildstein commented Jul 7, 2017

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practise as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

As the title says, this adds methods for converting colors with an ICC profile.

Architecturally, the idea is that the profile is checked once for available and appropriate conversion methods and a then a delegate is stored that only takes the color values to convert and returns the calculated values. The possible performance penalty for using a delegate is far smaller than searching through the profile for every conversion. I'm open for other suggestions though.

There are classes to convert from the profile connection space (=PCS, can be XYZ or Lab) to the data space (RGB, CMYK, etc.) and vice versa. There are also classes to convert from PCS to PCS and Data to Data but they are only used for special profiles and are not important for us now but I still added them for completeness sake.

A challenge here is writing tests for this because of the complexity of the calculations and the big amount of different possible conversion paths. This is a rough list of the paths that exist:

  • "A to B" and "B to A" tags
    • IccLut8TagDataEntry
      • Input IccLut[], Clut, Output IccLut[]
      • Matrix(3x3), Input IccLut[], IccClut, Output IccLut[]
    • IccLut16TagDataEntry
      • Input IccLut[], IccClut, Output IccLut[]
      • Matrix(3x3), Input IccLut[], IccClut, Output IccLut[]
    • IccLutAToBTagDataEntry/IccLutBToATagDataEntry (Curve types can either be IccCurveTagDataEntry or IccParametricCurveTagDataEntry (which has several curve subtypes))
      • CurveA[], Clut, CurveM[], Matrix(3x1), Matrix(3x3), CurveB[]
      • CurveA[], Clut, CurveB[]
      • CurveM[], Matrix(3x1), Matrix(3x3), CurveB[]
      • CurveB[]
  • "D to B" tags
    • IccMultiProcessElementsTagDataEntry that contains an array of any of those types in any order:
      • IccCurveSetProcessElement
        • IccOneDimensionalCurve[] where each curve can have several curve subtypes
      • IccMatrixProcessElement
        • Matrix(Nr. of input Channels by Nr. of output Channels), Matrix(Nr. of output channels by 1)
      • IccClutProcessElement
        • IccClut
  • Color Trc
    • Matrix(3x3), one curve for R, G and B each (Curve types can either be IccCurveTagDataEntry or IccParametricCurveTagDataEntry (which has several curve subtypes))
  • Gray Trc
    • Curve (Curve type can either be IccCurveTagDataEntry or IccParametricCurveTagDataEntry (which has several curve subtypes))

The three main approaches in that list are

  • A to B/B to A: using a combination of lookup tables, matrices and curves
  • D to B: using a chain of multi process elements (curves, matrices or lookup)
  • Trc: using curves (and matrices for color but not for gray)

The most used approaches are Color Trc for RGB profiles and LutAToB/LutBToA for CMYK profiles.

Todo list:

  • Integrate with the rest of the project
  • Write tests that cover all conversion paths
  • Review architecture
  • Improve speed and accuracy of the calculations

Help and suggestions are very welcome.

@codecov-io
Copy link

codecov-io commented Jul 7, 2017

Codecov Report

Merging #273 into master will decrease coverage by 1.37%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #273      +/-   ##
==========================================
- Coverage   86.86%   85.49%   -1.38%     
==========================================
  Files         849      678     -171     
  Lines       36075    30229    -5846     
  Branches     2660     2223     -437     
==========================================
- Hits        31338    25843    -5495     
+ Misses       3971     3724     -247     
+ Partials      766      662     -104
Impacted Files Coverage Δ
...version/Implementation/Icc/IccConverterBase.Trc.cs 0% <0%> (ø)
...tation/Icc/IccConverterBase.MultiProcessElement.cs 0% <0%> (ø)
...version/Implementation/Icc/IccConverterBase.Lut.cs 0% <0%> (ø)
...version/Implementation/Icc/IccPcsToPcsConverter.cs 0% <0%> (ø)
...Implementation/Icc/IccConverterbase.Conversions.cs 0% <0%> (ø)
...rsion/Implementation/Icc/IccDataToDataConverter.cs 0% <0%> (ø)
...ersion/Implementation/Icc/IccPcsToDataConverter.cs 0% <0%> (ø)
...ersion/Implementation/Icc/IccDataToPcsConverter.cs 0% <0%> (ø)
...sion/Implementation/Icc/IccConverterBase.Checks.cs 0% <0%> (ø)
...cessing/Transforms/Resamplers/Lanczos2Resampler.cs 0% <0%> (-100%) ⬇️
... and 1073 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce6eed6...3b5a8c0. Read the comment docs.

@JimBobSquarePants
Copy link
Member

@JBildstein Quick technical question. Do you think we could tie this in somehow with the SIMD colorspace transforms in our jpeg decoder?

https://github.com/SixLabors/ImageSharp/tree/8b2b7c780821a18db351e796ce41178c9ce95e95/src/ImageSharp/Formats/Jpeg/Common/Decoder/ColorConverters

cc @antonfirsov

@JBildstein
Copy link
Contributor Author

@JimBobSquarePants I think that should be possible yes. However you will have to create a new instance of the converter for each image because some data has to be pulled from the ICC profile. I mean, it would be possible to pass the profile for each conversion call but it would be horribly inefficient. So in the GetConverter method there would have to be an overload with an ICC profile.

@JimBobSquarePants
Copy link
Member

@JBildstein I think we can manage something like that. Thanks. 👍

@JimBobSquarePants
Copy link
Member

@JBildstein Apologies but I think I just broke the build merging master into your branch!

@JBildstein
Copy link
Contributor Author

@JimBobSquarePants no worries, the tests don't pass anyway at the moment (or rather, some aren't implemented yet).

I finally found some time to work on this again and was able to implement most of the calculations. However, most of them haven't been tested yet and likely contain errors. It's rather cumbersome to do this so it takes a while.

To make things more manageable, I decided to implement everything with Vector4 so only colors with up to 4 channels are supported. And (for now) I also won't implement multi process elements mainly because I have yet to find a profile using them.

I would be very glad if someone could have a look at my n-dimensional linear interpolation (ClutCalculator.cs). I think it basically works but I'm a bit lost and not sure about the correctness.
What I know for sure is that finding the nodes is correct but not about the actual interpolation.

@JimBobSquarePants
Copy link
Member

@JBildstein That's great news! I'm at your disposal so I'll try to get my head around the calculator for you and offer any advice help where I can.

@JBildstein
Copy link
Contributor Author

@JimBobSquarePants great, thank you very much. I'll be around on Gitter for questions and discussions.

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick once over. I don't know enough about the conversion process to give you any really useful feedback though.

{
private int inputCount;
private int outputCount;
private float[][] lut;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this array always jagged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually doesn't have to be, the second array is always the same length. In the Interpolate method it's currently useful though because I can just take the array reference instead of copying the values.
Do you think it would be better to have it in a single memory block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a helper Fast2DArray<T> which might be useful here for n-length 2D objects, it's faster than the jagged array. Though if we template the interpolation we should look at custom structs for each known 2D grid since they'd be much faster.

{
Vector4.Clamp(value, Vector4.Zero, Vector4.One);

float[] result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know that the maximum length is 4 could we stackalloc the array and pass it as a sliced Span<byte> to the Interpolate method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean with the Span<byte> part but yes, I could use stackalloc also at other places and in general decrease memory allocations. I'll keep this in mind when I continue to work on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean Span<float>.

This is what I mean:

We use a struct wrapping around a fixed buffer and convert Vector4 to it using Unsafe.As.
We can then slice that buffer using Span<float> and prevent any allocation on the heap plus reduce the number of methods you need.

    unsafe class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Using Unsafe to do clever things!");

            Vector4 v = new Vector4(1, 2, 3, 4);
            Floats f = Unsafe.As<Vector4, Floats>(ref v);

            Interpolate(new Span<float>(f.Values, 1));
            Interpolate(new Span<float>(f.Values, 2));
            Interpolate(new Span<float>(f.Values, 3));
            Interpolate(new Span<float>(f.Values, 4));

            Console.ReadLine();
        }

        private static void Interpolate(Span<float> span)
        {
            Console.WriteLine($"Span of length {span.Length} passed.");
            for (int i = 0; i < span.Length; i++)
            {
                Console.WriteLine($"Value at {i} equals {span[i]}");
            }
        }

        public unsafe struct Floats
        {
            public fixed float Values[4];
        }
    }

This will print out.

Using Unsafe to do clever things!
Span of length 1 passed.
Value at 0 equals 1
Span of length 2 passed.
Value at 0 equals 1
Value at 1 equals 2
Span of length 3 passed.
Value at 0 equals 1
Value at 1 equals 2
Value at 2 equals 3
Span of length 4 passed.
Value at 0 equals 1
Value at 1 equals 2
Value at 2 equals 3
Value at 3 equals 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that makes more sense. Thank you for the example, doing it like that is a lot better.

}

float[] factors = new float[this.nodeCount];
for (int i = 0; i < factors.Length; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could be vectorized but I could be wrong, @antonfirsov What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on what if (((i >> j) & 1) == 1) does. We need to eliminate branches inside loops for vectorization.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private float CalculateInvertedCie122(float value)
{
return ((float)Math.Pow(value, 1 / this.curve.G) - this.curve.B) / this.curve.A;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and others can use MathF

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@JBildstein
Copy link
Contributor Author

@JimBobSquarePants, thank you very much for the review. I replaced the Math methods with MathF and will work on the interpolation a bit later.

For reference here's a quick explanation of the CLUT (Color LookUp Table) interpolation:
It's nothing more than an n-dimensional linear interpolation plus finding the nodes.
There are three things that need to be done and in the current interpolation method each step is a loop.

1) Finding nodes:
This is an example CLUT, two input channels (A, B), three output channels (X, Y, Z) and a grid point count of 3 for A and B (it could be different for each channel but usually isn't). The values of X, Y, Z are nonsense and don't matter for this example.

A B X Y Z
0 0 0.1 0.1 0.1
0 0.5 0.2 0.2 0.2
0 1 0.3 0.3 0.3
0.5 0 0.4 0.4 0.4
0.5 0.5 0.5 0.5 0.5
0.5 1 0.6 0.6 0.6
1 0 0.7 0.7 0.7
1 0.5 0.8 0.8 0.8
1 1 0.9 0.9 0.9

The values of A and B aren't actually stored anywhere, they can be calculated and this is what the inner loop does. The outer loop finds the nodes for the interpolation. To do the interpolation we need every variation of lower and higher values. E.g. if A = 0.3 and B = 0.8 then we need to interpolate the values at

A B X Y Z
0 0.5 A low, B low 0.2 0.2 0.2
0 1 A low, B high 0.3 0.3 0.3
0.5 0.5 A high, B low 0.5 0.5 0.5
0.5 1 A high, B high 0.6 0.6 0.6

The line if (((i >> j) & 1) == 1) that @antonfirsov pointed out above was the simplest way I could think of to iterate over all variations of high and low (it's the same as an integer in binary). If there's a more vector friendly way I'd be happy to implement that.

2) Calculating the factors for interpolation:
The actual interpolation is the same as described here for bilinear unit square but done for n channels instead of just two: Wikipedia
This part calculates all the factors for the third loop.

3) Interpolation of the output:
This loop calculates the final interpolated output values for each output channel using the previously calculated factors.

As a reference, this is code from the official ICC repo: InterpND
They also have separate interpolation routines for a channel count of 1 to 6 and it'll likely be a lot faster. I'd like to do the same later even if it won't be pretty. Having a working n-dimensional interpolation is still beneficial for reference/comparison and for potential expansion later.

@CLAassistant
Copy link

CLAassistant commented Aug 31, 2018

CLA assistant check
All committers have signed the CLA.

@JimBobSquarePants
Copy link
Member

Hey @JBildstein I got this back up to date with the master. 1097 commits!

You'll have to resign the CLA again I'm afraid because we had to reimplement it to work as a single sign up across all our projects.

@JBildstein
Copy link
Contributor Author

impressive number! I'll soon be able to add some to that. I also have been working on some color conversion code lately that could be useful (it's using Vectors and is pretty fast)
I signed the CLA again, no problem.

@JimBobSquarePants
Copy link
Member

Great to hear! Looking forward to seeing whatever genius you produce.

@JBildstein
Copy link
Contributor Author

Been fiddling around with the CLUT interpolation:

Method Job Runtime Mean Error StdDev Gen 0 Allocated
Vectorized Clr Clr 48.69 ns 0.2570 ns 0.2404 ns - 0 B
Looped Clr Clr 299.21 ns 3.3207 ns 3.1062 ns 0.0277 88 B
Vectorized Core Core 51.18 ns 0.3024 ns 0.2829 ns - 0 B
Looped Core Core 240.43 ns 1.4354 ns 1.3427 ns 0.0277 88 B

The tested CLUT has three channels input and two channels output.
Looped is the current implementation (for any amount of in- or output channels), Vectorized is a specific implementation for a 3-channel input CLUT and is using various Vector structs.
I did a specific implementation each for an input channel count of 1, 2, 3 and 4.

Need to add a few more tests (4-channel input CLUT is still missing) but other than that it's looking pretty good.

@JimBobSquarePants
Copy link
Member

Oh that's great news! 😄

I hope the rapid churn isn't causing you too many problems. I can see there's some conflicts going on already. I think you can just use all the listed files from master.

@JBildstein
Copy link
Contributor Author

no worries, I haven't changed anything in those files so I can take them from master as you say.

@JimBobSquarePants
Copy link
Member

Ace... I've just merged the colorspace API into master, don't know if that's any interest/use to you.

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Jun 18, 2020

@JBildstein I just took some time to get his all up and running with all the tests passing (well almost.... Some variance issues but that is to be expected and we can pad the difference in the tests) so we can try and move forward.

Would this be something you would be able to pick up on again?

JimBobSquarePants and others added 27 commits December 18, 2020 15:39
…bcr-conversion

Vectorize Jpeg Encoder Color Conversion
Assembly for loading in the loop went from:
```asm
vmovss xmm2, [rax]
vbroadcastss xmm2, xmm2
vmovss xmm3, [rax+4]
vbroadcastss xmm3, xmm3
vinsertf128 ymm2, ymm2, xmm3, 1
```
To:
```asm
vmovsd xmm3, [rax]
vbroadcastsd ymm3, xmm3
vpermps ymm3, ymm1, ymm3
```
Speed improvements to resize kernel (w/ SIMD)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet