-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AVX-512 round function #4119
Merged
lgritz
merged 1 commit into
AcademySoftwareFoundation:master
from
AngryLoki:fix-avx512-round
Jan 20, 2024
Merged
Fix AVX-512 round function #4119
lgritz
merged 1 commit into
AcademySoftwareFoundation:master
from
AngryLoki:fix-avx512-round
Jan 20, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
AngryLoki
added a commit
to AngryLoki/gentoo
that referenced
this pull request
Jan 19, 2024
See also: AcademySoftwareFoundation/OpenImageIO#4119 Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
AngryLoki
force-pushed
the
fix-avx512-round
branch
from
January 19, 2024 16:06
171c342
to
e114063
Compare
lgritz
approved these changes
Jan 20, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks right to me, thanks for the fix!
lgritz
pushed a commit
to lgritz/OpenImageIO
that referenced
this pull request
Jan 21, 2024
This PR fixes vfloat16 round function. Intrinsic `_mm512_roundscale_ps` was used incorrectly, and caused failure on Zen4 CPU. ``` /var/tmp/portage/media-libs/openimageio-2.5.5.0-r1/work/OpenImageIO-2.5.5.0/src/libutil/simd_test.cpp:1579: FAILED: round(F) == mkvec<VEC>(std::round(F[0]), std::round(F[1]), std::round(F[2]), std::round(F[3])) values were '-1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4' and '-2 0 2 4 -2 0 2 4 -2 0 2 4 -2 0 2 4' ``` In old code `_mm512_roundscale_ps (a, (1<<4) | 3)` meant the following: ``` [0001] - Number of fixed points to preserve [0] - Use MSCSR exception mask [0] - Select mode from imm [11] - Truncate mode ``` Effectively enabling rounding to nearest 0.5, not to integer. References: * https://www.felixcloutier.com/x86/vrndscalepd#fig-5-29 * https://stackoverflow.com/questions/50854991/instrinsic-mm512-round-ps-is-missing-for-avx512 Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
DarkDefender
pushed a commit
to DarkDefender/gentoo
that referenced
this pull request
Jan 26, 2024
See also: AcademySoftwareFoundation/OpenImageIO#4119 Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
DarkDefender
pushed a commit
to DarkDefender/gentoo
that referenced
this pull request
Jan 26, 2024
See also: AcademySoftwareFoundation/OpenImageIO#4119 Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
1div0
pushed a commit
to 1div0/OpenImageIO
that referenced
this pull request
Feb 24, 2024
This PR fixes vfloat16 round function. Intrinsic `_mm512_roundscale_ps` was used incorrectly, and caused failure on Zen4 CPU. ``` /var/tmp/portage/media-libs/openimageio-2.5.5.0-r1/work/OpenImageIO-2.5.5.0/src/libutil/simd_test.cpp:1579: FAILED: round(F) == mkvec<VEC>(std::round(F[0]), std::round(F[1]), std::round(F[2]), std::round(F[3])) values were '-1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4 -1.5 0 1.5 4' and '-2 0 2 4 -2 0 2 4 -2 0 2 4 -2 0 2 4' ``` In old code `_mm512_roundscale_ps (a, (1<<4) | 3)` meant the following: ``` [0001] - Number of fixed points to preserve [0] - Use MSCSR exception mask [0] - Select mode from imm [11] - Truncate mode ``` Effectively enabling rounding to nearest 0.5, not to integer. References: * https://www.felixcloutier.com/x86/vrndscalepd#fig-5-29 * https://stackoverflow.com/questions/50854991/instrinsic-mm512-round-ps-is-missing-for-avx512 Signed-off-by: Sv. Lockal <lockalsash@gmail.com> Signed-off-by: Peter Kovář <peter.kovar@reflexion.tv>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes vfloat16 round function. Intrinsic
_mm512_roundscale_ps
wasused incorrectly, and caused failure on Zen4 CPU.
In old code
_mm512_roundscale_ps (a, (1<<4) | 3)
meant the following:Effectively enabling rounding to nearest 0.5, not to integer.
References:
Tests
Checklist:
(adding new test cases if necessary).
corresponding Python bindings (and if altering ImageBufAlgo functions, also
exposed the new functionality as oiiotool options).
already run clang-format before submitting, I definitely will look at the CI
test that runs clang-format and fix anything that it highlights as being
nonconforming.