[WIP] Color conversion with ICC profiles #273

JBildstein · 2017-07-07T22:52:57Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practise as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

As the title says, this adds methods for converting colors with an ICC profile.

Architecturally, the idea is that the profile is checked once for available and appropriate conversion methods and a then a delegate is stored that only takes the color values to convert and returns the calculated values. The possible performance penalty for using a delegate is far smaller than searching through the profile for every conversion. I'm open for other suggestions though.

There are classes to convert from the profile connection space (=PCS, can be XYZ or Lab) to the data space (RGB, CMYK, etc.) and vice versa. There are also classes to convert from PCS to PCS and Data to Data but they are only used for special profiles and are not important for us now but I still added them for completeness sake.

A challenge here is writing tests for this because of the complexity of the calculations and the big amount of different possible conversion paths. This is a rough list of the paths that exist:

"A to B" and "B to A" tags
- IccLut8TagDataEntry
  - Input IccLut[], Clut, Output IccLut[]
  - Matrix(3x3), Input IccLut[], IccClut, Output IccLut[]
- IccLut16TagDataEntry
  - Input IccLut[], IccClut, Output IccLut[]
  - Matrix(3x3), Input IccLut[], IccClut, Output IccLut[]
- IccLutAToBTagDataEntry/IccLutBToATagDataEntry (Curve types can either be IccCurveTagDataEntry or IccParametricCurveTagDataEntry (which has several curve subtypes))
  - CurveA[], Clut, CurveM[], Matrix(3x1), Matrix(3x3), CurveB[]
  - CurveA[], Clut, CurveB[]
  - CurveM[], Matrix(3x1), Matrix(3x3), CurveB[]
  - CurveB[]
"D to B" tags
- IccMultiProcessElementsTagDataEntry that contains an array of any of those types in any order:
  - IccCurveSetProcessElement
    - IccOneDimensionalCurve[] where each curve can have several curve subtypes
  - IccMatrixProcessElement
    - Matrix(Nr. of input Channels by Nr. of output Channels), Matrix(Nr. of output channels by 1)
  - IccClutProcessElement
    - IccClut
Color Trc
- Matrix(3x3), one curve for R, G and B each (Curve types can either be IccCurveTagDataEntry or IccParametricCurveTagDataEntry (which has several curve subtypes))
Gray Trc
- Curve (Curve type can either be IccCurveTagDataEntry or IccParametricCurveTagDataEntry (which has several curve subtypes))

The three main approaches in that list are

A to B/B to A: using a combination of lookup tables, matrices and curves
D to B: using a chain of multi process elements (curves, matrices or lookup)
Trc: using curves (and matrices for color but not for gray)

The most used approaches are Color Trc for RGB profiles and LutAToB/LutBToA for CMYK profiles.

Todo list:

Integrate with the rest of the project
Write tests that cover all conversion paths
Review architecture
Improve speed and accuracy of the calculations

Help and suggestions are very welcome.

codecov-io · 2017-07-07T23:07:22Z

Codecov Report

Merging #273 into master will decrease coverage by 1.37%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master     #273      +/-   ##
==========================================
- Coverage   86.86%   85.49%   -1.38%     
==========================================
  Files         849      678     -171     
  Lines       36075    30229    -5846     
  Branches     2660     2223     -437     
==========================================
- Hits        31338    25843    -5495     
+ Misses       3971     3724     -247     
+ Partials      766      662     -104

Impacted Files	Coverage Δ
...version/Implementation/Icc/IccConverterBase.Trc.cs	`0% <0%> (ø)`
...tation/Icc/IccConverterBase.MultiProcessElement.cs	`0% <0%> (ø)`
...version/Implementation/Icc/IccConverterBase.Lut.cs	`0% <0%> (ø)`
...version/Implementation/Icc/IccPcsToPcsConverter.cs	`0% <0%> (ø)`
...Implementation/Icc/IccConverterbase.Conversions.cs	`0% <0%> (ø)`
...rsion/Implementation/Icc/IccDataToDataConverter.cs	`0% <0%> (ø)`
...ersion/Implementation/Icc/IccPcsToDataConverter.cs	`0% <0%> (ø)`
...ersion/Implementation/Icc/IccDataToPcsConverter.cs	`0% <0%> (ø)`
...sion/Implementation/Icc/IccConverterBase.Checks.cs	`0% <0%> (ø)`
...cessing/Transforms/Resamplers/Lanczos2Resampler.cs	`0% <0%> (-100%)`	⬇️
... and 1073 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce6eed6...3b5a8c0. Read the comment docs.

JimBobSquarePants · 2017-10-06T06:22:53Z

@JBildstein Quick technical question. Do you think we could tie this in somehow with the SIMD colorspace transforms in our jpeg decoder?

https://github.com/SixLabors/ImageSharp/tree/8b2b7c780821a18db351e796ce41178c9ce95e95/src/ImageSharp/Formats/Jpeg/Common/Decoder/ColorConverters

cc @antonfirsov

JBildstein · 2017-10-06T07:59:44Z

@JimBobSquarePants I think that should be possible yes. However you will have to create a new instance of the converter for each image because some data has to be pulled from the ICC profile. I mean, it would be possible to pass the profile for each conversion call but it would be horribly inefficient. So in the GetConverter method there would have to be an overload with an ICC profile.

JimBobSquarePants · 2017-10-08T01:15:47Z

@JBildstein I think we can manage something like that. Thanks. 👍

JimBobSquarePants · 2017-12-15T02:03:41Z

@JBildstein Apologies but I think I just broke the build merging master into your branch!

JBildstein · 2018-01-22T18:31:18Z

@JimBobSquarePants no worries, the tests don't pass anyway at the moment (or rather, some aren't implemented yet).

I finally found some time to work on this again and was able to implement most of the calculations. However, most of them haven't been tested yet and likely contain errors. It's rather cumbersome to do this so it takes a while.

To make things more manageable, I decided to implement everything with Vector4 so only colors with up to 4 channels are supported. And (for now) I also won't implement multi process elements mainly because I have yet to find a profile using them.

I would be very glad if someone could have a look at my n-dimensional linear interpolation (ClutCalculator.cs). I think it basically works but I'm a bit lost and not sure about the correctness.
What I know for sure is that finding the nodes is correct but not about the actual interpolation.

JimBobSquarePants · 2018-01-23T00:01:51Z

@JBildstein That's great news! I'm at your disposal so I'll try to get my head around the calculator for you and offer any advice help where I can.

JBildstein · 2018-01-23T00:40:34Z

@JimBobSquarePants great, thank you very much. I'll be around on Gitter for questions and discussions.

JimBobSquarePants

Just a quick once over. I don't know enough about the conversion process to give you any really useful feedback though.

JimBobSquarePants · 2018-02-08T11:42:18Z

src/ImageSharp/ColorSpaces/Conversion/Implementation/Icc/Calculators/ClutCalculator.cs

+    {
+        private int inputCount;
+        private int outputCount;
+        private float[][] lut;


Is this array always jagged?

It actually doesn't have to be, the second array is always the same length. In the Interpolate method it's currently useful though because I can just take the array reference instead of copying the values.
Do you think it would be better to have it in a single memory block?

We have a helper Fast2DArray<T> which might be useful here for n-length 2D objects, it's faster than the jagged array. Though if we template the interpolation we should look at custom structs for each known 2D grid since they'd be much faster.

JimBobSquarePants · 2018-02-08T11:45:39Z

src/ImageSharp/ColorSpaces/Conversion/Implementation/Icc/Calculators/ClutCalculator.cs

+        {
+            Vector4.Clamp(value, Vector4.Zero, Vector4.One);
+
+            float[] result;


Since we know that the maximum length is 4 could we stackalloc the array and pass it as a sliced Span<byte> to the Interpolate method?

I'm not sure what you mean with the Span<byte> part but yes, I could use stackalloc also at other places and in general decrease memory allocations. I'll keep this in mind when I continue to work on it.

Sorry, I mean Span<float>.

This is what I mean:

We use a struct wrapping around a fixed buffer and convert Vector4 to it using Unsafe.As.
We can then slice that buffer using Span<float> and prevent any allocation on the heap plus reduce the number of methods you need.

unsafe class Program { static void Main(string[] args) { Console.WriteLine("Using Unsafe to do clever things!"); Vector4 v = new Vector4(1, 2, 3, 4); Floats f = Unsafe.As<Vector4, Floats>(ref v); Interpolate(new Span<float>(f.Values, 1)); Interpolate(new Span<float>(f.Values, 2)); Interpolate(new Span<float>(f.Values, 3)); Interpolate(new Span<float>(f.Values, 4)); Console.ReadLine(); } private static void Interpolate(Span<float> span) { Console.WriteLine($"Span of length {span.Length} passed."); for (int i = 0; i < span.Length; i++) { Console.WriteLine($"Value at {i} equals {span[i]}"); } } public unsafe struct Floats { public fixed float Values[4]; } }

This will print out.

Using Unsafe to do clever things! Span of length 1 passed. Value at 0 equals 1 Span of length 2 passed. Value at 0 equals 1 Value at 1 equals 2 Span of length 3 passed. Value at 0 equals 1 Value at 1 equals 2 Value at 2 equals 3 Span of length 4 passed. Value at 0 equals 1 Value at 1 equals 2 Value at 2 equals 3 Value at 3 equals 4

Ah yes that makes more sense. Thank you for the example, doing it like that is a lot better.

JimBobSquarePants · 2018-02-08T11:49:20Z

src/ImageSharp/ColorSpaces/Conversion/Implementation/Icc/Calculators/ClutCalculator.cs

+            }
+
+            float[] factors = new float[this.nodeCount];
+            for (int i = 0; i < factors.Length; i++)


This looks like it could be vectorized but I could be wrong, @antonfirsov What do you think?

Depends on what if (((i >> j) & 1) == 1) does. We need to eliminate branches inside loops for vectorization.

JimBobSquarePants · 2018-02-08T11:51:13Z

...mageSharp/ColorSpaces/Conversion/Implementation/Icc/Calculators/ParametricCurveCalculator.cs

+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        private float CalculateInvertedCie122(float value)
+        {
+            return ((float)Math.Pow(value, 1 / this.curve.G) - this.curve.B) / this.curve.A;


This and others can use MathF

JBildstein · 2018-02-12T19:18:31Z

@JimBobSquarePants, thank you very much for the review. I replaced the Math methods with MathF and will work on the interpolation a bit later.

For reference here's a quick explanation of the CLUT (Color LookUp Table) interpolation:
It's nothing more than an n-dimensional linear interpolation plus finding the nodes.
There are three things that need to be done and in the current interpolation method each step is a loop.

1) Finding nodes:
This is an example CLUT, two input channels (A, B), three output channels (X, Y, Z) and a grid point count of 3 for A and B (it could be different for each channel but usually isn't). The values of X, Y, Z are nonsense and don't matter for this example.

A	B	X	Y	Z
0	0	0.1	0.1	0.1
0	0.5	0.2	0.2	0.2
0	1	0.3	0.3	0.3
0.5	0	0.4	0.4	0.4
0.5	0.5	0.5	0.5	0.5
0.5	1	0.6	0.6	0.6
1	0	0.7	0.7	0.7
1	0.5	0.8	0.8	0.8
1	1	0.9	0.9	0.9

The values of A and B aren't actually stored anywhere, they can be calculated and this is what the inner loop does. The outer loop finds the nodes for the interpolation. To do the interpolation we need every variation of lower and higher values. E.g. if A = 0.3 and B = 0.8 then we need to interpolate the values at

A	B		X	Y	Z
0	0.5	A low, B low	0.2	0.2	0.2
0	1	A low, B high	0.3	0.3	0.3
0.5	0.5	A high, B low	0.5	0.5	0.5
0.5	1	A high, B high	0.6	0.6	0.6

The line if (((i >> j) & 1) == 1) that @antonfirsov pointed out above was the simplest way I could think of to iterate over all variations of high and low (it's the same as an integer in binary). If there's a more vector friendly way I'd be happy to implement that.

2) Calculating the factors for interpolation:
The actual interpolation is the same as described here for bilinear unit square but done for n channels instead of just two: Wikipedia
This part calculates all the factors for the third loop.

3) Interpolation of the output:
This loop calculates the final interpolated output values for each output channel using the previously calculated factors.

As a reference, this is code from the official ICC repo: InterpND
They also have separate interpolation routines for a channel count of 1 to 6 and it'll likely be a lot faster. I'd like to do the same later even if it won't be pretty. Having a working n-dimensional interpolation is still beneficial for reference/comparison and for potential expansion later.

CLAassistant · 2018-08-31T20:03:58Z

All committers have signed the CLA.

JimBobSquarePants · 2018-08-31T20:07:15Z

Hey @JBildstein I got this back up to date with the master. 1097 commits!

You'll have to resign the CLA again I'm afraid because we had to reimplement it to work as a single sign up across all our projects.

JBildstein · 2018-09-01T06:50:50Z

impressive number! I'll soon be able to add some to that. I also have been working on some color conversion code lately that could be useful (it's using Vectors and is pretty fast)
I signed the CLA again, no problem.

JimBobSquarePants · 2018-09-01T07:33:50Z

Great to hear! Looking forward to seeing whatever genius you produce.

JBildstein · 2018-10-10T00:30:54Z

Been fiddling around with the CLUT interpolation:

Method	Job	Runtime	Mean	Error	StdDev	Gen 0	Allocated
Vectorized	Clr	Clr	48.69 ns	0.2570 ns	0.2404 ns	-	0 B
Looped	Clr	Clr	299.21 ns	3.3207 ns	3.1062 ns	0.0277	88 B
Vectorized	Core	Core	51.18 ns	0.3024 ns	0.2829 ns	-	0 B
Looped	Core	Core	240.43 ns	1.4354 ns	1.3427 ns	0.0277	88 B

The tested CLUT has three channels input and two channels output.
Looped is the current implementation (for any amount of in- or output channels), Vectorized is a specific implementation for a 3-channel input CLUT and is using various Vector structs.
I did a specific implementation each for an input channel count of 1, 2, 3 and 4.

Need to add a few more tests (4-channel input CLUT is still missing) but other than that it's looking pretty good.

JimBobSquarePants · 2018-10-10T12:21:42Z

Oh that's great news! 😄

I hope the rapid churn isn't causing you too many problems. I can see there's some conflicts going on already. I think you can just use all the listed files from master.

JBildstein · 2018-10-10T12:34:38Z

no worries, I haven't changed anything in those files so I can take them from master as you say.

JimBobSquarePants · 2018-10-10T16:31:08Z

Ace... I've just merged the colorspace API into master, don't know if that's any interest/use to you.

JimBobSquarePants · 2020-06-18T16:43:38Z

@JBildstein I just took some time to get his all up and running with all the tests passing (well almost.... Some variance issues but that is to be expected and we can pad the difference in the tests) so we can try and move forward.

Would this be something you would be able to pick up on again?

Much Faster sRGB Companding

…xLabors#1505

Add PremultiplyAlpha to ResizeOptions

Fix for Issue SixLabors#1505

…bcr-conversion Vectorize Jpeg Encoder Color Conversion

Assembly for loading in the loop went from: ```asm vmovss xmm2, [rax] vbroadcastss xmm2, xmm2 vmovss xmm3, [rax+4] vbroadcastss xmm3, xmm3 vinsertf128 ymm2, ymm2, xmm3, 1 ``` To: ```asm vmovsd xmm3, [rax] vbroadcastsd ymm3, xmm3 vpermps ymm3, ymm1, ymm3 ```

See Vector256.Create issue: dotnet/runtime#47236

Speed improvements to resize kernel (w/ SIMD)

JBildstein mentioned this pull request Sep 14, 2017

Support for reading writing ICC profiles in Jpeg #74

Closed

JimBobSquarePants mentioned this pull request Oct 6, 2017

Color Changed When Resize #129

Open

JimBobSquarePants added metadata:icc enhancement labels Dec 15, 2017

JimBobSquarePants added this to the 1.0.0 milestone Dec 15, 2017

JimBobSquarePants assigned JimBobSquarePants and JBildstein Dec 15, 2017

JimBobSquarePants reviewed Feb 8, 2018

View reviewed changes

JimBobSquarePants added the formats:jpeg label Jan 24, 2019

JimBobSquarePants modified the milestones: 1.0.0, Future Apr 24, 2020

JimBobSquarePants mentioned this pull request Jun 18, 2020

Opening and saving jpeg using CMYK color space drastically changes image's color #1238

Closed

4 tasks

JimBobSquarePants and others added 27 commits December 18, 2020 15:39

Update PixelOperationsTests.cs

41d98d6

Use explicit threadsafety declaration.

5659148

Merge pull request SixLabors#1481 from SixLabors/js/faster-resize

5ab593f

Much Faster sRGB Companding

Add PremultiplyAlpha to ResizeOptions

933dada

Split PixelConversionModifiers into a separate function.

900b73e

Updated referenced image submodule to latest origin master.

770dc06

Use this.maxColors when getting size of the reduced palette, fixes Si…

f85b686

…xLabors#1505

Add test case for SixLabors#1505

0830a97

Merge pull request SixLabors#1504 from ptasev/pt/resize-alpha-option

da7a8b7

Add PremultiplyAlpha to ResizeOptions

Merge branch 'master' into bp/Issue1505

b3146a7

Merge pull request SixLabors#1506 from SixLabors/bp/Issue1505

e2961dc

Fix for Issue SixLabors#1505

Add initial vectorized implementation with benchmarks

c602dd7

Fix mistakes in final touches

db6f90a

Add unit tests for both converters

51ce97f

Allow epsilon of 1F for existing LUT converter

47b0d9f

Improve algorithm

91b18b1

Merge pull request SixLabors#1508 from tkp1n/feature/vectorize-rgb2yc…

eab04e4

…bcr-conversion Vectorize Jpeg Encoder Color Conversion

Add initial FMA resize kernel convolve implementation

42632c7

Switch from FMA to AVX2 instructions

874e951

Revert to FMA, codegen improvements

941e173

Add unrolled FMA loop

493d04a

Add missing indexing update

407c2d9

Workaround for incorrect codegen on .NET 5

a7ca1b0

See Vector256.Create issue: dotnet/runtime#47236

Update image threshold for resize tests

e2211c3

Merge pull request SixLabors#1513 from SixLabors/sp/simd-resize-convolve

7eb5cc0

Speed improvements to resize kernel (w/ SIMD)

Merge branch 'master' into icc-color-conversion

2ee2351

JimBobSquarePants closed this Feb 17, 2021

JimBobSquarePants force-pushed the master branch from db51f69 to 172c48e Compare February 17, 2021 01:43

JimBobSquarePants mentioned this pull request Feb 27, 2021

Color conversion with ICC profiles #1567

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Color conversion with ICC profiles #273

[WIP] Color conversion with ICC profiles #273

JBildstein commented Jul 7, 2017 •

edited

Loading

codecov-io commented Jul 7, 2017 •

edited by codecov bot

Loading

JimBobSquarePants commented Oct 6, 2017

JBildstein commented Oct 6, 2017

JimBobSquarePants commented Oct 8, 2017

JimBobSquarePants commented Dec 15, 2017

JBildstein commented Jan 22, 2018

JimBobSquarePants commented Jan 23, 2018

JBildstein commented Jan 23, 2018

JimBobSquarePants left a comment

JimBobSquarePants Feb 8, 2018

JBildstein Feb 12, 2018

JimBobSquarePants Feb 13, 2018

JimBobSquarePants Feb 8, 2018

JBildstein Feb 12, 2018

JimBobSquarePants Feb 13, 2018 •

edited

Loading

JBildstein Feb 13, 2018

JimBobSquarePants Feb 8, 2018

antonfirsov Feb 8, 2018

JimBobSquarePants Feb 8, 2018

JBildstein Feb 12, 2018

JBildstein commented Feb 12, 2018

CLAassistant commented Aug 31, 2018 •

edited

Loading

JimBobSquarePants commented Aug 31, 2018

JBildstein commented Sep 1, 2018

JimBobSquarePants commented Sep 1, 2018

JBildstein commented Oct 10, 2018

JimBobSquarePants commented Oct 10, 2018

JBildstein commented Oct 10, 2018

JimBobSquarePants commented Oct 10, 2018

JimBobSquarePants commented Jun 18, 2020 •

edited

Loading

[WIP] Color conversion with ICC profiles #273

[WIP] Color conversion with ICC profiles #273

Conversation

JBildstein commented Jul 7, 2017 • edited Loading

Prerequisites

Description

codecov-io commented Jul 7, 2017 • edited by codecov bot Loading

Codecov Report

JimBobSquarePants commented Oct 6, 2017

JBildstein commented Oct 6, 2017

JimBobSquarePants commented Oct 8, 2017

JimBobSquarePants commented Dec 15, 2017

JBildstein commented Jan 22, 2018

JimBobSquarePants commented Jan 23, 2018

JBildstein commented Jan 23, 2018

JimBobSquarePants left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JimBobSquarePants Feb 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JBildstein commented Feb 12, 2018

CLAassistant commented Aug 31, 2018 • edited Loading

JimBobSquarePants commented Aug 31, 2018

JBildstein commented Sep 1, 2018

JimBobSquarePants commented Sep 1, 2018

JBildstein commented Oct 10, 2018

JimBobSquarePants commented Oct 10, 2018

JBildstein commented Oct 10, 2018

JimBobSquarePants commented Oct 10, 2018

JimBobSquarePants commented Jun 18, 2020 • edited Loading

JBildstein commented Jul 7, 2017 •

edited

Loading

codecov-io commented Jul 7, 2017 •

edited by codecov bot

Loading

JimBobSquarePants Feb 13, 2018 •

edited

Loading

CLAassistant commented Aug 31, 2018 •

edited

Loading

JimBobSquarePants commented Jun 18, 2020 •

edited

Loading