Skip to content

Vectorize Atan for Vector64/128/256/512 and TensorPrimitives#126425

Open
stephentoub wants to merge 2 commits intomainfrom
copilot/vectorize-atan
Open

Vectorize Atan for Vector64/128/256/512 and TensorPrimitives#126425
stephentoub wants to merge 2 commits intomainfrom
copilot/vectorize-atan

Conversation

@stephentoub
Copy link
Copy Markdown
Member

@stephentoub stephentoub commented Apr 1, 2026

Add vectorized Atan implementations for float and double across all SIMD vector types, with corresponding unit tests.

Description

  • Adds Vector64/128/256/512.Atan APIs (float/double) and updates the System.Runtime.Intrinsics reference assembly surface area.
  • Implements vectorized atan kernels in VectorMath (double core and single via widening to double core).
  • Enables TensorPrimitives Atan vectorization for float/double (NET11+) and updates tolerances in generic tests.
  • Adds AtanDouble and AtanSingle test data to GenericMathTestMemberData with 29 test cases each, covering edge cases (±0, NaN, ±Infinity) and representative magnitudes using well-known mathematical constants.
  • Adds AtanDoubleTest and AtanSingleTest unit tests to Vector64Tests.cs, Vector128Tests.cs, Vector256Tests.cs, and Vector512Tests.cs, following the same [Theory]/[MemberData] pattern as the existing Asin, Cos, and Sin tests.

Add vectorized Atan implementations for float and double across all SIMD vector types.
- AtanDouble: AMD atan.c Remez(4,4) rational polynomial with 5-region argument reduction
- AtanSingle: widens to double, uses AMD atanf.c Remez(2,2) rational polynomial
- Hook up TensorPrimitives.Atan to use vectorized implementations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 1, 2026 18:14
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds vectorized Atan support across the intrinsic SIMD vector types and enables TensorPrimitives to use those vectorized implementations for float/double.

Changes:

  • Adds Vector64/128/256/512.Atan APIs (float/double) and updates the System.Runtime.Intrinsics reference assembly surface area.
  • Implements vectorized atan kernels in VectorMath (double core and single via widening to double core).
  • Enables TensorPrimitives Atan vectorization for float/double (NET11+) and updates tolerances in generic tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs Adds new Atan API declarations for Vector64/128/256/512 in the ref assembly.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/VectorMath.cs Introduces vectorized AtanDouble and AtanSingle implementations used by intrinsic vector APIs.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs Adds Vector64.Atan(float/double) plus scalar fallback helper.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs Adds Vector128.Atan(float/double) with accelerated path and fallback composition.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs Adds Vector256.Atan(float/double) with accelerated path and fallback composition.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs Adds Vector512.Atan(float/double) with accelerated path and fallback composition.
src/libraries/System.Numerics.Tensors/tests/TensorPrimitives.Generic.cs Adjusts tolerances for Atan/AtanPi validation.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Atan.cs Enables vectorization for Atan<T> when T is float/double (NET11+) by calling new vector intrinsics.

@stephentoub
Copy link
Copy Markdown
Member Author

@copilot please evaluate and address the feedback

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

🤖 Copilot Code Review — PR #126425

Note

This review was generated by Copilot.

Holistic Assessment

Motivation: Vectorizing Atan is clearly justified — TensorPrimitives.Atan previously had Vectorizable => false with a // TODO: Vectorize comment, and vectorized Asin already landed following the same pattern. This completes another step in the trig-function vectorization effort.

Approach: The implementation correctly mirrors the established Asin vectorization pattern: AtanDouble uses AMD AOCL-libm-ose's Remez(4,4) rational polynomial with 5-region argument reduction, AtanSingle widens to double and uses Remez(2,2), the VectorN.Atan entry points follow the exact same structure as VectorN.Asin, and TensorPrimitives.AtanOperator delegates to the new methods behind #if NET11_0_OR_GREATER.

Summary: ⚠️ Needs Human Review. The code is mathematically correct and follows established patterns precisely. The one item requiring human confirmation is API approval — this PR adds new public API surface (Vector64/128/256/512.Atan for float and double) to the ref assembly, and I could not verify the linked api-approved issue due to tooling limitations. A human reviewer should confirm that the API surface has been approved. No code bugs or correctness issues were found.


Detailed Findings

✅ Mathematical Correctness — Verified for all special IEEE 754 values

NaN: In AtanDouble, NaN propagates correctly through the polynomial evaluation — all GreaterThan comparisons return false for NaN, so the input falls through to region 1 (reduced = v = NaN), and NaN propagates through arithmetic. In AtanSingle, NaN is handled explicitly via nanMask = ~Equals(x, x) with a final ConditionalSelect. Both paths produce correct NaN output. The asymmetry (implicit vs. explicit NaN handling) is consistent with the existing AsinDouble/AsinSingle pattern.

±Infinity: AtanDouble correctly maps to region 5 (v > 39/16), computing reduced5 = -1/∞ = -0, with chi = π/2, yielding ±π/2. AtanSingle handles this via overflowMask (for |x| ≥ 2²⁶), returning ±1.5707964f (nearest float to π/2).

±0: Falls to region 1 in both paths. reduced = 0, s = 0, polynomial evaluates to 0, result is 0 ^ sign = ±0. ✓

Unconditional branch computation: All five reduced values are computed and then ConditionalSelected. This includes reduced5 = -1/v which produces -∞ when v = 0, but the value is discarded by ConditionalSelect. This is standard SIMD practice (branchless computation with masking) and IEEE 754–compliant (-1/0 = -∞ is well-defined).

✅ Pattern Consistency — Matches established Asin vectorization exactly

The code structure mirrors the Asin implementation precisely:

  • VectorMath.AtanDouble<TVectorDouble> / AtanSingle<TVectorSingle, TVectorInt32, TVectorDouble, TVectorInt64> — same generic signatures as AsinDouble/AsinSingle
  • Vector64/128.Atan has full XML doc; Vector256/512.Atan uses /// <inheritdoc cref="..."/>
  • Fallback paths (!IsHardwareAccelerated) delegate to lower-width vectors or element-wise scalar loops
  • Vector64<float>.Atan checks Vector128.IsHardwareAccelerated to widen optimally; Vector512<float>.Atan uses Vector512<double> directly — all consistent with the Asin pattern
  • AMD AOCL-libm-ose copyright notice is present in both AtanDouble and AtanSingle/AtanSingleCoreDouble, and the BSD 3-Clause license is already in THIRD-PARTY-NOTICES.TXT

✅ Test Changes — Appropriate tolerance values

The added tolerances (doubleTolerance: 1e-14, floatTolerance: 1e-6f) are reasonable for a Remez rational polynomial approximation. The AtanPi tolerance change is correctly motivated: AtanPiOperator.Invoke delegates to AtanOperator.Invoke(x) / Vector.Create(T.Pi), so AtanPi inherits the vectorized Atan path and needs matching tolerance. The tolerance is unconditional (not tied to FMA support like Asin's trigTolerance), which is a valid design choice given the tighter bounds.

✅ Ref Assembly — Entries correctly placed

The 8 new entries (2 per vector width × 4 widths) in System.Runtime.Intrinsics.cs are alphabetically placed after Asin and before AsByte, matching the existing ordering convention.

⚠️ API Approval — Unable to verify (needs human confirmation)

This PR adds new public API surface to the ref assembly (Vector64/128/256/512.Atan for float and double). Per dotnet/runtime policy, new public APIs require a linked issue with the api-approved label. I was unable to verify this due to tooling limitations (no authenticated GitHub API access). A human reviewer should confirm that these APIs have been approved through the standard API review process. The APIs follow the exact shape established by the already-approved Asin methods, so approval is expected to exist.

💡 Minor Observation — Unused type parameters in AtanSingle

The TVectorInt32 and TVectorInt64 type parameters in AtanSingle are not used in the method body (unlike AsinSingle which uses them for bit manipulation). They are retained for signature consistency with the AsinSingle pattern, which is reasonable for maintainability and future-proofing.

Generated by Code Review for issue #126425 ·

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 1, 2026

@copilot please evaluate and address the feedback

Added AtanDoubleTest and AtanSingleTest to all four Vector test files (Vector64/128/256/512Tests.cs) using GenericMathTestMemberData, following the same pattern as the existing Asin tests. Test data includes edge cases for ±0, NaN, ±Infinity, and representative magnitudes. The single-precision test values match the canonical values from MathF.cs scalar tests. See commit d6f67d2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants