Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JpegDecoder: post-process baseline spectral data per MCU-row #1694

Merged
merged 78 commits into from
Jul 15, 2021

Conversation

br3aker
Copy link
Contributor

@br3aker br3aker commented Jul 12, 2021

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

Fixes #1597.

This also fixes #1692 simply because tests were failing with new architecture.

Performance

No change. Baseline images should obviously be faster on the global scale due to lower memory footprint but in local benchmarks there's no change. And it's easy to explain: actual decoding code wasn't changed, only memory allocation and temporal buffers management.

Note: PR benchmark is a bit faster than master branch (it's even faster than that because of a virtual call per each image stride). But I recall this false positive due to a really small image size. Most likely it's due to lesser memory footprint of spectral blocks allocation. I assume that at 4k resolutions there should be a little to no difference between them as actual code behind parsing/decoding wasn't touched.

Method TestImage Mean Error StdDev Ratio
JpegDecoderCore.ParseStream master Jpg/b(...)e.jpg [21] 3.680 ms 0.0152 ms 0.0143 ms 1.00
JpegDecoderCore.ParseStream PR Jpg/b(...)e.jpg [21] 3.474 ms 0.0196 ms 0.0174 ms 0.94

Unfortunately, I don't have access to jetbrains memory profiler so I had a lot simplier check - custom debug allocator which simply prints all allocations. Here are dumps from the same image with same subsampling but different save methods - baseline and progressive:

All small byte allocations are removed for clarity

Progressive

Allocation of Rgba32[311400]
Allocation of Single[8448]
Allocation of Single[8448]
Allocation of Single[8448]
Allocation of Vector4[519]
Allocation of Block8x8[5016]
Allocation of Block8x8[1254]
Allocation of Block8x8[1254]

Baseline

Allocation of Rgba32[311400]
Allocation of Single[8448]
Allocation of Single[8448]
Allocation of Single[8448]
Allocation of Vector4[519]
Allocation of Block8x8[132] <- yay!
Allocation of Block8x8[33]  <- yay!
Allocation of Block8x8[33]  <- yay!

P.S. If somebody has some free time and access to jetbrains memory profiler - this is super welcome for the full picture!

Tests & Legacy code

Everything is up and running. Old pipeline is removed.

TODO

  • Fix existing tests
  • Tests for SpectralConverter<TPixel>
  • Remove legacy post processing code

Dmitry Pentin added 30 commits July 7, 2021 23:38
@br3aker
Copy link
Contributor Author

br3aker commented Jul 13, 2021

@JimBobSquarePants found an error in jpeg decoder reference output files which was even marked as a bug:

// BUG: The following image has a high difference compared to the expected output: 1.0096%
TestImages.Jpeg.Baseline.Jpeg420Small,

I've re-saved actual jpeg to png using photoshop and got this difference from my file and existing png file:
image

Those differences are actually visible by human eye with proper zoom in photoshop, looks like edges were anti-aliased for some reason.

With this fix image passes tolerance test:

*** Jpg/baseline/jpeg420small.jpg ***
Difference: 0,2863%

I'm not sure how I can update it, can you elaborate please?

@JimBobSquarePants
Copy link
Member

Awesome! It should be a case of replacing the reference image in this folder with the actual correct output from your tests as long as you have got Git LFS installed (see the readme for instructions)

https://github.com/SixLabors/ImageSharp/tree/master/tests/Images/External/ReferenceOutput/JpegDecoderTests

@br3aker
Copy link
Contributor Author

br3aker commented Jul 13, 2021

Thanks!

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great so far!

@br3aker
Copy link
Contributor Author

br3aker commented Jul 13, 2021

Update

  • Added integer ceiled division (tests included)
  • Fixed parse stream only benchmark and updated results for PR implementation (see top comment for comparison)
  • Fixed (last) invalidated test which saves decoded spectral blocks
  • Added docs to the SpectralConverter class
  • Restored Sandbox code

@br3aker br3aker marked this pull request as ready for review July 13, 2021 15:27
Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really well done! This is an excellent piece of refactoring!

I can't believe you managed to solve that longstanding issue also! 😍

@JimBobSquarePants JimBobSquarePants merged commit 61b137d into SixLabors:master Jul 15, 2021
@br3aker br3aker deleted the jpeg-decoder-memory branch July 15, 2021 14:23
@antonfirsov
Copy link
Member

antonfirsov commented Jul 24, 2021

I did some WPA profiling on this using my 10 Core i9-10900X, and the results are quite impressive, although for some reason we didn't get the 8x memory footprint drop I expected. (Miscalculated the memory footprint of blocks in comparison to the image data?)

1. LoadResizeSaveParallelMemoryStress with MaxDegreeOfParallelism = 8 and Filter = JpegKind.Baseline

Before the PR

Peak around 6.4 GB, total processing time 15.03 seconds.
image

Current master

Peak around 3.2 GB, total processing time 13.73 seconds.

image

2. LoadResizeSaveParallelMemoryStress with MaxDegreeOfParallelism = 20 and Filter = JpegKind.Any

Before the PR

Peak around 13 GB.
image

Current master

Peak around 4.4 GB.
image

Great job @br3aker, thanks for the contribution!
/cc @JimBobSquarePants

@JimBobSquarePants
Copy link
Member

Those numbers 😘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants