Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OpenMP for rgbe imageio loader #15944

Merged
merged 1 commit into from Dec 23, 2023

Conversation

jenshannoschwalm
Copy link
Collaborator

@jenshannoschwalm jenshannoschwalm commented Dec 23, 2023

Fixed OpenMP code

  • use an aligned buffer for the rgbe loading
  • use copy_pixel_nontemporal for performance
  • performance measurably improved as we don't interfere with cachelines
  • we ensure zero for alpha channel

Fixes #15935

Fixed OpenMP code
- use an aligned buffer for the rgbe loading
- use copy_pixel_nontemporal for performance
- performance measurably improved as we don't interfere with cachelines
@jenshannoschwalm jenshannoschwalm added bugfix pull request fixing a bug priority: high core features are broken and not usable at all, software crashes scope: image processing correcting pixels scope: performance doing everything the same but faster labels Dec 23, 2023
@jenshannoschwalm
Copy link
Collaborator Author

jenshannoschwalm commented Dec 23, 2023

For 4.6.1 and 4.8

Release note: Fixed RGBE image loader

@victoryforce
Copy link
Collaborator

Haven't taken the time to run tests to compare the speed of your code with mine yet, but I've verified it works. It also looks more "modern" anyway, and looks like it should be somewhat faster. So I'm closing my PR in favor of this.

@jenshannoschwalm
Copy link
Collaborator Author

Haven't taken the time to run tests to compare the speed of your code with mine yet, but I've verified it works. It also looks more "modern" anyway, and looks like it should be somewhat faster. So I'm closing my PR in favor of this.

I tested perf while coding, here it is about 20% faster by using the copypixel stuff (though we are in the ms range :-)

More relevant could be the making sure we don't push NaNs into the pipe data. There were a number of AMD OpenCL specific issues fixed. I could never verify myself - not owning such a card, not intending to buy one and not interested in amd/kernel/driver hassles at all - but it seems that reading float4 data and processing them is sometimes a problem if one of the channels is a NaN, at least for some cards & drivers.

Copy link
Member

@TurboGit TurboGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me, thanks!

@TurboGit TurboGit merged commit 1cb840b into darktable-org:master Dec 23, 2023
4 checks passed
@jenshannoschwalm jenshannoschwalm deleted the fix_15935 branch December 23, 2023 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix pull request fixing a bug priority: high core features are broken and not usable at all, software crashes scope: image processing correcting pixels scope: performance doing everything the same but faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DT 4.6.0 - corrupted image with .hdr files
3 participants