Fix cmyk_to_rgb causing off by one rounding errors. #1136

philippeitis · 2020-02-10T17:23:52Z

Running cmyk_to_rgb(&[0, 0, 0, 254]) produces the RGB color [0, 0, 0] when it should in fact produce RGB [1, 1, 1]:
R = 255 * (1 - C/255) * (1-K/255)
R = 255 * (1-0/255) * (1-254/255)
R = 255 * 1 * 1/ 255 = 1.

This bug occurs with a range of other values as well - 159 / 255 of the values [0, 0, 0, 0..255] are affected.
The fix involves reducing the number of multiplications and divisions, and also yields a ~15-25% speedup (results are from randomly generated images, using cargo build --release).

250 px
Orig took 0.0000020s
Fast took 0.0000015s
2500 px
Orig took 0.0000154s
Fast took 0.0000121s
25000 px
Orig took 0.0002961s
Fast took 0.0002334s
250000 px
Orig took 0.0017928s
Fast took 0.0014527s
2500000 px
Orig took 0.0255831s
Fast took 0.0213677s

I license past and future contributions under the dual MIT/Apache-2.0 license,
allowing licensees to chose either at their option.

(Resubmitted PR against next branch, not master - it's not specified which branch to specify pull requests against.)

Running cmyk_to_rgb(&[0, 0, 0, 254]) produces the RGB color [0, 0, 0] when it should in fact produce RGB [1, 1, 1]: R = 255 * (1 - C/255) * (1-K/255) R = 255 * (1-0/255) * (1-254/255) R = 255 * 1 * 1/ 255 = 1. This bug occurs with a range of other values as well - 159 / 255 of the values [0, 0, 0, 0..255] are affected. The fix involves reducing the number of multiplications and divisions, and also yields a ~25% speedup (results are from randomly generated images). ``` 250 px Orig took 0.0000020s Fast took 0.0000015s 2500 px Orig took 0.0000154s Fast took 0.0000121s 25000 px Orig took 0.0002961s Fast took 0.0002334s 250000 px Orig took 0.0017928s Fast took 0.0014527s 2500000 px Orig took 0.0255831s Fast took 0.0213677s ```

HeroicKatora · 2020-02-10T17:28:53Z

If it's a non-breaking change, it should be against master. If it is breaking then against next. You can change the target by clicking the Edit button on the top, it is not necessary to close and reopen another pull request.

philippeitis · 2020-02-10T18:01:29Z

Sorry - that's a pretty obvious feature to have, and I should've checked before closing and opening a new PR. I do think clarifying the purpose of master/next in the Contributors section would be helpful for future contributors.

I'm not sure if you would consider this change as breaking, since it does change output for certain values (from incorrect results to correct results).

HeroicKatora · 2020-02-10T18:43:33Z

Ah, 'breaking' as in a SemVer sense of changes to the interface. I don't think there are many value changes that would be considered breaking unless there was an explicit promise of additional guarantees in the documentation. (Maybe in exceptional circumstances such as format detection). Fixing the correctness of color conversion is not.

philippeitis · 2020-02-10T19:34:36Z

It's also possible to further optimize the code like so, but the difference is marginal (~ 1-5% on my computer) and requires specifying "target-cpu=native", otherwise performance regressions occur.

fn cmyk_to_rgb(input: &[u8]) -> Vec<u8> {
    let size = input.len() - input.len() / 4;
    let mut output = Vec::with_capacity(size);

    for pixel in input.chunks(4) {
        let c = f32::from(pixel[0]);
        let m = f32::from(pixel[1]);
        let y = f32::from(pixel[2]);
        let k = 255.0 - f32::from(pixel[3]);

        let k_sub = -k / 255.0;

        // CMY -> RGB
        let r = c.mul_add(k_sub, k) as u8;
        let g = m.mul_add(k_sub, k) as u8;
        let b = y.mul_add(k_sub, k) as u8;

        output.push(r);
        output.push(g);
        output.push(b);
    }

    output
}

HeroicKatora · 2020-02-10T20:13:43Z

I wanted to remark about how we might want to do the fix now, and performance later but then it occurred to me: Can we not avoid going through floating point entirely and stick to precise integer arithmetic? Your reformulation of the conversion should make it much more clear how to avoid the floating point steps by treating all intermediate computations as fractional arithmetic over the fixed denominator 255.

(e.g. let r = k - c * k_sub; is equivalent to (255*k - c*k)/255 = ((255 - c)*k)/255, is it not?)

HeroicKatora · 2020-02-10T20:15:51Z

Regardless: Can you push some tests and the benchmark as a separate commit so one can verify before and after the change?

This test checks all possible colours produced by CYMK (in one channel), and checks that the RGB conversion is correct.

philippeitis · 2020-02-10T21:14:17Z

(e.g. let r = k - c * k_sub; is equivalent to (255*k - c*k)/255 = ((255 - c)*k)/255, is it not?)

This is equivalent, but in general, division is slower than multiplication for all types, which is what makes my solution a lot faster (uses 1 div per step instead of 4), while the multiplies are pretty cheap.

HeroicKatora · 2020-02-10T21:33:32Z

This is equivalent, but in general, division is slower than multiplication for all types, which is what makes my solution a lot faster (uses 1 div per step instead of 4), while the multiplies are pretty cheap.

Please benchmark this before claiming it. Those are fixed integer divisions of 16-bit length which will get optimized to a multiplication with the inverse (or a shift). The floating point solution has three separate u8->fp->u8 conversions. Furthermore, an integer implementation has a much better chance of being compiled to SIMD instructions.

src/jpeg/decoder.rs

philippeitis · 2020-02-10T21:49:41Z

The following function

fn cmyk_to_rgb_fast_u16(input: &[u8]) -> Vec<u8> {
    let size = input.len() - input.len() / 4;
    let mut output = Vec::with_capacity(size);

    for pixel in input.chunks(4) {
        let c = u16::from(pixel[0]);
        let m = u16::from(pixel[1]);
        let y = u16::from(pixel[2]);
        let k = 255 - u16::from(pixel[3]);

        let k_sub = k;
        let k = k * 255;
        // CMY -> RGB
        let r = (k - c * k_sub) / 255;
        let g = (k - m * k_sub) / 255;
        let b = (k - y * k_sub) / 255;

        output.push(r as u8);
        output.push(g as u8);
        output.push(b as u8);
    }

    output
}

Benchmarked, best of 10, built with cargo build --release. Orig is the original implementation, fast is the current implementation which I've pushed, and fast_u16 is the one above.

250 pixels
Orig took 0.0000035s
Fast took 0.0000027s
fast_u16 took 0.000003s
2500 pixels
Orig took 0.0000312s
Fast took 0.000024s
fast_u16 took 0.0000285s
25000 pixels
Orig took 0.0001611s
Fast took 0.0001411s
fast_u16 took 0.0001605s
250000 pixels
Orig took 0.0015028s
Fast took 0.0013628s
fast_u16 took 0.0015633s
2500000 pixels
Orig took 0.017322s
Fast took 0.013907s
Fast, no fma took 0.0160462s
25000000 pixels
Orig took 0.17552221s
Fast took 0.14268969s
fast_u16 took 0.15868449s

As we can see, the current implementation using f32 and multiplies is faster than both the original solution and proposed u16 division solution.

HeroicKatora · 2020-02-10T21:52:53Z

Can we try a seriously optimized variant? Note the elided bounds checks if you check the assembly and the resulting use of SIMD (the main point to prefer short integer arithmetic). (Prediction: 4× to 16× speedup).

pub fn cmyk_to_rgb_int(input: &[u8]) -> Vec<u8> {
    let count = input.len()/4;
    let mut output = vec![0; 3*count];
    
    let in_pixels = input[..4*count].chunks_exact(4);
    let out_pixels = output[..3*count].chunks_exact_mut(3);

    for (pixel, outp) in in_pixels.zip(out_pixels) {
        let c = u16::from(pixel[0]);
        let m = u16::from(pixel[1]);
        let y = u16::from(pixel[2]);
        let k = 255u16 - u16::from(pixel[3]);

        // CMY -> RGB
        let r = (((255 - c)*k)/255) as u8;
        let g = (((255 - m)*k)/255) as u8;
        let b = (((255 - y)*k)/255) as u8;

        outp[0] = r;
        outp[1] = g;
        outp[2] = b;
    }

    output
}

…atora

…atora

HeroicKatora · 2020-02-10T22:24:35Z

That is actually faster than the original version

But you have previously written:

25000000 pixels
Orig took 0.17552221s
Fast took 0.14268969s
fast_u16 took 0.15868449s

And now we only have a tenth of the pixels in more time:

2500000 pixels
Orig took 0.4389242
float took 0.2150764
int took 0.2374431

What are we even benchmarking here?

philippeitis · 2020-02-10T22:38:34Z

Sorry, didn't include the release flag - it seems that the performance is in favour of your implementation, so I pushed that - correct benchmarks below:

250 pixels
Orig took 0.0000029s
float took 0.0000007s
int took 0.0000005s
2500 pixels
Orig took 0.0000154s
float took 0.0000035s
int took 0.0000025s
25000 pixels
Orig took 0.0001522s
float took 0.0000566s
int took 0.0000473s
250000 pixels
Orig took 0.001548s
float took 0.0005909s
int took 0.0004657s
2500000 pixels
Orig took 0.0172378s
float took 0.0053335s
int took 0.004726s
25000000 pixels
Orig took 0.1756037s
float took 0.0544639s
int took 0.0443116s

HeroicKatora · 2020-02-10T23:47:12Z

What hardware are you running? On two machines I could test it on, the floating point turned out faster by ~5%. Those are an i5-4690S and an i5-6200U. If you have a newer generation and that is faster for the integer version, I would use the integer version. Otherwise use your floating point division.

philippeitis · 2020-02-11T00:02:20Z

I'm running an i5-8350U - the integer solution is 5% faster when I compile with cargo build --release.

HeroicKatora · 2020-02-11T00:12:36Z

Then it may just flipflop depending on memory timings and exact CPU. I personally find integer code easier to reason about (don't have to remember rounding rules etc.) but if performance is so close, then it is a matter of taste.

Let's just get it in and maybe we can follow up on a more exact perf later.

philippeitis · 2020-02-11T03:41:52Z

I've added tests to cover the colors that the previous version failed to get correct. From what I've tried, I'd have to make cmyk_to_rgb public in order to add code in bench/ to benchmark it, which I don't know if I should do, or add a #![feature(test)] to decoder.rs directly, which also seems like the wrong choice. Other than that, the code should be ready to go.

HeroicKatora · 2020-02-11T09:38:48Z

You can add a benchmark within the usual tests as well, just need to put it behind a cfg with the benchmarks feature, like so:

mod test {
    #[cfg(feature = "benchmarks")]
    #[bench]
    fn cmyk(b: &mut test::Bencher) {
        …
    }
}

philippeitis · 2020-02-11T17:04:43Z

I've added benchmarks - everything should be ready to merge now.

src/jpeg/decoder.rs

HeroicKatora · 2020-02-12T17:58:12Z

Thank you, and special appreciation for compairing all the various implementations.

philippeitis changed the base branch from next to master February 10, 2020 19:08

philippeitis added 2 commits February 10, 2020 13:08

Add comprehensive test for cymk_to_rgb in decoder.rs

59862f9

This test checks all possible colours produced by CYMK (in one channel), and checks that the RGB conversion is correct.

Update decoder to be correct for all colors.

2f277f6

Clean up cymk_to_rgb code.

5c2dc9c

HeroicKatora requested changes Feb 10, 2020

View reviewed changes

src/jpeg/decoder.rs Show resolved Hide resolved

Add further optimizations to cmyk_to_rgb function, thanks to @HeroicK…

2a54721

…atora

philippeitis added 3 commits February 10, 2020 14:25

Test all channels independently.

cf6bec1

Add function to test specific cymk values.

92b35c9

Further optimized version.

6e1212a

philippeitis added 3 commits February 10, 2020 14:41

Decrease visibility of cmyk_to_rgb fn.

2b09f0d

Add lots of test cases to check correctness of cymk_to_rgb conversion.

3075f37

Update decoder.rs

0434dee

philippeitis added 2 commits February 10, 2020 15:50

Fix typos

489c58a

Update comments, add assertions for all color channels.

3a47f65

HeroicKatora approved these changes Feb 11, 2020

View reviewed changes

Add cmyk benchmarks.

e5ae7d2

HeroicKatora requested changes Feb 11, 2020

View reviewed changes

src/jpeg/decoder.rs Show resolved Hide resolved

src/jpeg/decoder.rs Show resolved Hide resolved

philippeitis added 3 commits February 11, 2020 10:03

Move benchmark flags, change ..256 to ..=255

f9c3671

Fix cfg flags for benchmarking.

c789647

Add missing cfg line.

8a7e26b

HeroicKatora approved these changes Feb 11, 2020

View reviewed changes

HeroicKatora merged commit 5a84e4c into image-rs:master Feb 12, 2020

philippeitis deleted the patch-1 branch February 27, 2020 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cmyk_to_rgb causing off by one rounding errors. #1136

Fix cmyk_to_rgb causing off by one rounding errors. #1136

philippeitis commented Feb 10, 2020

HeroicKatora commented Feb 10, 2020

philippeitis commented Feb 10, 2020 •

edited

HeroicKatora commented Feb 10, 2020

philippeitis commented Feb 10, 2020 •

edited

HeroicKatora commented Feb 10, 2020 •

edited

HeroicKatora commented Feb 10, 2020

philippeitis commented Feb 10, 2020

HeroicKatora commented Feb 10, 2020

philippeitis commented Feb 10, 2020

HeroicKatora commented Feb 10, 2020 •

edited

HeroicKatora commented Feb 10, 2020 •

edited

philippeitis commented Feb 10, 2020

HeroicKatora commented Feb 10, 2020 •

edited

philippeitis commented Feb 11, 2020

HeroicKatora commented Feb 11, 2020

philippeitis commented Feb 11, 2020 •

edited

HeroicKatora commented Feb 11, 2020

philippeitis commented Feb 11, 2020

HeroicKatora commented Feb 12, 2020

Fix cmyk_to_rgb causing off by one rounding errors. #1136

Fix cmyk_to_rgb causing off by one rounding errors. #1136

Conversation

philippeitis commented Feb 10, 2020

HeroicKatora commented Feb 10, 2020

philippeitis commented Feb 10, 2020 • edited

HeroicKatora commented Feb 10, 2020

philippeitis commented Feb 10, 2020 • edited

HeroicKatora commented Feb 10, 2020 • edited

HeroicKatora commented Feb 10, 2020

philippeitis commented Feb 10, 2020

HeroicKatora commented Feb 10, 2020

philippeitis commented Feb 10, 2020

HeroicKatora commented Feb 10, 2020 • edited

HeroicKatora commented Feb 10, 2020 • edited

philippeitis commented Feb 10, 2020

HeroicKatora commented Feb 10, 2020 • edited

philippeitis commented Feb 11, 2020

HeroicKatora commented Feb 11, 2020

philippeitis commented Feb 11, 2020 • edited

HeroicKatora commented Feb 11, 2020

philippeitis commented Feb 11, 2020

HeroicKatora commented Feb 12, 2020

philippeitis commented Feb 10, 2020 •

edited

philippeitis commented Feb 10, 2020 •

edited

HeroicKatora commented Feb 10, 2020 •

edited

HeroicKatora commented Feb 10, 2020 •

edited

HeroicKatora commented Feb 10, 2020 •

edited

HeroicKatora commented Feb 10, 2020 •

edited

philippeitis commented Feb 11, 2020 •

edited