Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Jpeg Huffman Scan Decoding #601

Closed
4 tasks done
JimBobSquarePants opened this issue May 30, 2018 · 15 comments · Fixed by #643
Closed
4 tasks done

Faster Jpeg Huffman Scan Decoding #601

JimBobSquarePants opened this issue May 30, 2018 · 15 comments · Fixed by #643

Comments

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented May 30, 2018

Prerequisites

  • I have written a descriptive issue title
  • I have verified that I am running the latest version of ImageSharp
  • I have verified if the problem exist in both DEBUG and RELEASE mode
  • I have searched open and closed issues to ensure it has not already been reported

Description

With #571 now included in the codebase we have an opportunity to further enhance the performance of our jpeg decoder.

There is one obvious candidate for improvement within the decoder: The decoding of the SOS (Start of Scan) segment.

Our approach works but is naïve. We only read/decode one byte at a time and do not use optimized tables for Huffman code lookup. There are further optimizations available with AC Huffman decoding that we also do not do.

I believe that with the correct implementation we can reduce our decoding time by at least 10ms when compared to our current benchmarks.

Inspiration for better code

I'm looking for help here. I've made several attempts to port from both sources but have failed miserably so far.

Update

@bitbank2 Has kindly written some information with pointers to demonstrate how to optimise entropy decoding.

http://bitbanksoftware.blogspot.com/2018/06/optimizing-jpeg-entropy-decoding.html

@hypeartist
Copy link

hypeartist commented May 30, 2018

@JimBobSquarePants As of StbImage, take a look at this:
StbSharp
May be it could be of help.

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented May 30, 2018

@hypeartist Very useful indeed! I've got something I can debug against now. Thanks!

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented May 31, 2018

I had another look at StbImage and managed to decode our Calliphora jpeg. It's a lot faster but there's inaccuracy in the spectral output compared to our current implementation. I think it might be better to try to improve our existing decoder instead of attempting a new port.

@hypeartist
Copy link

hypeartist commented May 31, 2018

@JimBobSquarePants I could give a try and port mozjpeg to C# if you like. What do you think?

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented May 31, 2018

If you could pull the off I would be blown away! Have a look at our code, it should just be the huffman decoder you need to port.

@hypeartist
Copy link

hypeartist commented May 31, 2018

@JimBobSquarePants It's much easier to me to port the whole stuff so you can strip off unneeded bits. :)

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented May 31, 2018

It’s totally up to you but we’ve already got all the marker parsing code in place plus we have SIMD optimized IDCT and colorspace conversion code.

As far as I understand the Mozjpeg source it’s only two jdhuff files to port

@hypeartist
Copy link

hypeartist commented May 31, 2018

@JimBobSquarePants Ok. I got you. Already grabbed the source and started to examine. Will write you back asap.

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented May 31, 2018

Brilliant thanks!

@antonfirsov
Copy link
Member

antonfirsov commented May 31, 2018

@JimBobSquarePants isn't "Faster Jpeg Huffman Decoding" a better title for this?

Gonna post some up-to-date profiler results tonight, but TryDecodeHuffman() and TryReadBit() are our major Jpeg Decoder bottlenecks as far as I remember.

@hypeartist in my opinion the fastest way to look for improvement opportunities is doing a comparative debug/analysis against other decoders. Doing a full port is very time consuming + having something fast in languages like C, C++, go, rust etc. doesn't guarantee the same code will be fast in C#. (We've been there several times!)
If you can figure out something, please let us know! Any help is appreciated.

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented May 31, 2018

@antonfirsov Perhaps, yeah... Naming is hard, it's the Scan segment we're decoding but it's Huffman encoded.

Those two sections will definitely be slowing us down.

  • TryReadBit() We should be working with a 4-byte buffer that gets cleared out when we hit a restart marker.

  • TryDecodeHuffman should be using a LUT for most of the returned results. Something like 95% of the code values should hit that LUT.

There's established practises that we should definitely be trying to adapt from MozJpeg, I just haven't managed to get it working with restart markers.

@JimBobSquarePants JimBobSquarePants changed the title Faster Jpeg Scan Decoding Faster Jpeg Huffman Scan Decoding May 31, 2018
@antonfirsov
Copy link
Member

antonfirsov commented May 31, 2018

Performance profile for running all the JpegProfilingBenchmarks.DecodeJpeg_PdfJs() (baseline) cases together (some AggressiveInlining were removed to get more information about the calls, but not all):

image

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented Jun 30, 2018

@antonfirsov @hypeartist @saucecontrol

So I revisited this problem this morning and took another look at porting the huffman decoder from StbSharp.

Check out the ScanDecoder.cs class in the new-jpeg-scan-decoder branch.

I'm actually getting somewhere!

I'm working on baseline currently, with 6/10 tests passing with spectral accuracy. I think the failing tests are due to me not reading the correct byte following a marker (I could be wrong though).

I could really do with another pair of eyes on the problem as I think once we have baseline ported, progressive will follow swiftly. It's definitely worth it imo as without any additional optimisation the port is already yielding healthy performance improvements (PdfJs Port).

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-6600U CPU 2.60GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
Frequency=2742192 Hz, Resolution=364.6718 ns, Timer=TSC
.NET Core SDK=2.1.300
  [Host]     : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT
  Job-JQBLQX : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3110.0
  Job-UIHOCS : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT

LaunchCount=1  TargetCount=3  WarmupCount=3

                           Method | Runtime |                    TestImage |      Mean |      Error |    StdDev | Scaled | ScaledSD |    Gen 0 | Allocated |
--------------------------------- |-------- |----------------------------- |----------:|-----------:|----------:|-------:|---------:|---------:|----------:|
   'Decode Jpeg - System.Drawing' |     Clr |  Jpg/baseline/Calliphora.jpg |  7.280 ms |   2.821 ms | 0.1594 ms |   1.00 |     0.00 | 117.1875 | 254.47 KB |
       'Decode Jpeg - ImageSharp' |     Clr |  Jpg/baseline/Calliphora.jpg | 36.378 ms |  12.269 ms | 0.6932 ms |   5.00 |     0.12 |        - |  52.63 KB |
 'Decode Jpeg - ImageSharp PdfJs' |     Clr |  Jpg/baseline/Calliphora.jpg | 28.817 ms |  33.441 ms | 1.8895 ms |   3.96 |     0.22 |        - |  25.25 KB |
                                  |         |                              |           |            |           |        |          |          |           |
   'Decode Jpeg - System.Drawing' |    Core |  Jpg/baseline/Calliphora.jpg |  8.807 ms |  13.427 ms | 0.7587 ms |   1.00 |     0.00 | 117.1875 | 254.11 KB |
       'Decode Jpeg - ImageSharp' |    Core |  Jpg/baseline/Calliphora.jpg | 37.305 ms |  12.295 ms | 0.6947 ms |   4.26 |     0.31 |        - |  47.73 KB |
 'Decode Jpeg - ImageSharp PdfJs' |    Core |  Jpg/baseline/Calliphora.jpg | 29.534 ms |  19.468 ms | 1.1000 ms |   3.37 |     0.26 |        - |   21.5 KB |
                                  |         |                              |           |            |           |        |          |          |           |
   'Decode Jpeg - System.Drawing' |     Clr | Jpg/baseline/jpeg420exif.jpg | 18.796 ms |  12.260 ms | 0.6927 ms |   1.00 |     0.00 | 343.7500 | 757.89 KB |
       'Decode Jpeg - ImageSharp' |     Clr | Jpg/baseline/jpeg420exif.jpg | 88.237 ms |  20.475 ms | 1.1569 ms |   4.70 |     0.15 | 250.0000 | 564.65 KB |
 'Decode Jpeg - ImageSharp PdfJs' |     Clr | Jpg/baseline/jpeg420exif.jpg | 61.836 ms |  15.687 ms | 0.8863 ms |   3.29 |     0.10 | 250.0000 | 535.01 KB |
                                  |         |                              |           |            |           |        |          |          |           |
   'Decode Jpeg - System.Drawing' |    Core | Jpg/baseline/jpeg420exif.jpg | 19.141 ms |  16.113 ms | 0.9104 ms |   1.00 |     0.00 | 343.7500 | 757.04 KB |
       'Decode Jpeg - ImageSharp' |    Core | Jpg/baseline/jpeg420exif.jpg | 94.172 ms | 130.098 ms | 7.3508 ms |   4.93 |     0.37 | 250.0000 | 548.71 KB |
 'Decode Jpeg - ImageSharp PdfJs' |    Core | Jpg/baseline/jpeg420exif.jpg | 64.507 ms |  37.116 ms | 2.0971 ms |   3.38 |     0.16 | 250.0000 | 522.28 KB |

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented Jun 30, 2018

9/10 working now. Only MultiScanBaselineCMYK.jpg to go.

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented Jun 30, 2018

Got both baseline and progressive working!! Latest changes pushed to the branch, will cleanup code asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants