Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform degamma and gamma conversions on user request #32

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

MarijnS95
Copy link
Member

Fixes #25

This crate can't assume that the input and output is linear, nor did it correct for that in the test example where the output from stb_image is clearly nonlinear (it doesn't state this in the "docs", but is visible from not linearizing JPG and PNG inputs and applying a gamma of 1/2.2 when converting HDR to LDR). While we could request users to pre- correct for this and return linear output to them, it is more efficient to do it within the downsampling algorithm that already runs over all the pixels, and (more importantly!) requiring these parameters in the input forces the caller to think about it.

Unfortunately this has a massive performance regression of 150% (±40ms to ±107ms):

Downsample `square_test.png` using ispc_downsampler
                    time:   [106.94 ms 107.05 ms 107.18 ms]
                    change: [+146.31% +146.79% +147.27%] (p = 0.00 < 0.05)
                    Performance has regressed.

Will need to dissect if this is caused by the pointer refactor or purely the extra ALU overhead.

@MarijnS95
Copy link
Member Author

MarijnS95 commented Aug 26, 2023

Will need to dissect if this is caused by the pointer refactor or purely the extra ALU overhead.

The pointer refactor and additional if checks only seem to account for ±5ms, the rest is purely pow() overhead. And that makes sense: I made the remark above that it "is more efficient to do it within the downsampling algorithm that already runs over all the pixels", but this algorithm visits and applies gamma correction to every pixel so many times with the kernel that it becomes much more inefficient.

The alternatives are either preprocessing the input (but this doesn't seem to be too fast either... 😕) or simply documenting that the input and output will be linearized and let the user deal with this (if they need to).
(We should still have this as a separate step to make our bench and test do the correct thing)

Comment on lines +20 to +25
let params = Parameters {
// Input stb Image is gamma-corrected (i.e. expects to be passed through a CRT with exponent 2.2)
degamma: true,
// Output image is PNG which must be stored with a gamma of 1/2.2
gamma: true,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these can be merged into one parameter.

src/ispc/kernels/lanczos3.ispc Outdated Show resolved Hide resolved
src/ispc/kernels/lanczos3.ispc Outdated Show resolved Hide resolved
Before:

    Downsample `square_test.png` using ispc_downsampler
                        time:   [43.438 ms 43.468 ms 43.500 ms]

After:

    Downsample `square_test.png` using ispc_downsampler
                        time:   [29.891 ms 29.922 ms 29.953 ms]
                        change: [-31.246% -31.162% -31.077%] (p = 0.00 < 0.05)
This crate can't assume that the input and output is linear, nor did it
correct for that in the `test` example where the output from `stb_image`
is clearly nonlinear (it doesn't state this in the "docs", but is
visible from not linearizing JPG and PNG inputs and applying a gamma of
1/2.2 when converting HDR to LDR).  While we could request users to pre-
correct for this and return linear output to them, it is more efficient
to do it within the downsampling algorithm that already runs over all
the pixels, and (more importantly!) requiring these parameters in the
input forces the caller to think about it.
@MarijnS95
Copy link
Member Author

PR #48 seems to have added unorm and snorm texture formats in addition to the already existing Srgb(a) format. It is yet unused, but that will be the right enum to key the request for (de)linearization off going forward!

@MarijnS95
Copy link
Member Author

In that sense I do think that this is more so a bug now in that we expose the Srgb(a) vs Rgb(a)U/Snorm capability to the caller, but don't enact on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Critically missing linearization
2 participants