Skip to content

add more tests into azure pipeline#16

Merged
shssf merged 1 commit into
masterfrom
azure_ci_tests
Sep 15, 2020
Merged

add more tests into azure pipeline#16
shssf merged 1 commit into
masterfrom
azure_ci_tests

Conversation

@shssf
Copy link
Copy Markdown
Contributor

@shssf shssf commented Sep 15, 2020

No description provided.

@shssf shssf merged commit 9b63bae into master Sep 15, 2020
@shssf shssf deleted the azure_ci_tests branch September 15, 2020 20:37
abagusetty pushed a commit to abagusetty/dpnp that referenced this pull request May 27, 2026
…ost Hessenberg lstsq

Closes audit items IntelPython#9, IntelPython#15, IntelPython#16, IntelPython#17, IntelPython#19 from the prior solver review.
All four touch the iterative-solver inner loop and were left out of the
previous correctness-only commit because they require a new C++ binding.

backend/extensions/blas/gemv.{cpp,hpp}, blas_py.cpp
  - Extend the typed gemv_impl signature with alpha and beta as
    doubles; the per-T impl casts them to the matrix value type at
    dispatch time. dpnp.dot and other legacy callers are unaffected
    -- the existing gemv() public function now forwards (1.0, 0.0) to
    the shared gemv_dispatch helper.
  - Add gemv_alpha_beta() entry point and a _gemv_alpha_beta pybind
    method computing y = alpha * op(A) * x + beta * y for caller-
    supplied scalars. Required by the GMRES Arnoldi fast path which
    fires gemv with (alpha=1, beta=0) writing into a Hessenberg
    column slice, then (alpha=-1, beta=1) fusing u -= V @ h into
    one kernel. For complex matrices the scalars are always one of
    {-1, 0, 1} and so survive the cast exactly; the impl docstring
    flags the silent imag-loss caveat for any other complex caller.
  - Refactor: hoist the existing validation / strides / dispatch
    plumbing into a static gemv_dispatch helper so both entry points
    share identical behaviour without duplication.

scipy/sparse/linalg/_iterative.py
  - _make_compute_hu now takes both V and H. The closure writes the
    projection coefficients h = V[:, :j+1]^H @ u directly into the
    Hessenberg column slice H[:j+1, j] via a single gemv with the
    output pointing at that slice -- no intermediate h buffer, no
    slice-assign copy (audit item IntelPython#16). Pass 2 fuses the AXPY-style
    update u = -V @ h + 1 * u into one gemv with alpha=-1, beta=1 --
    no tmp buffer, one kernel instead of gemv-plus-subtract (audit
    item IntelPython#19). For complex V the (j+1)-element h slice is conjugated
    in-place between the two passes (V^T -> V^H), negligible cost
    next to the n*(j+1) gemv.
  - Switch the Hessenberg least-squares H y = e from a device
    dpnp.linalg.lstsq (which dispatches an SVD kernel for a tiny
    21x20 problem per restart) to numpy.linalg.lstsq on the host
    (audit item IntelPython#17). Matches CuPy's choice and removes a device-
    side SVD launch that on Intel GPU dominates the restart cost
    for the default restart=20. RHS e is now maintained as a numpy
    array; H is copied via dpnp.asnumpy once per restart and the
    resulting y is shipped back as a dpnp array for the V @ y
    update.
  - V[:, j+1] = v retained as a single contiguous USM slice store
    (audit item IntelPython#15 closes as no-change-required: the assignment is
    already one dpnp op on an F-order buffer and there is no fused
    'normalise-then-store' path without further binding work).
  - cg per-iter syncs collapsed from 3-4 down to 1 (audit item IntelPython#9).
    The pAp and rz_new breakdown checks are no longer transferred
    to the host on every iteration; instead the loop relies on
    IEEE-754 inf / NaN propagation through alpha = rz / pAp. When
    pAp underflows the resulting alpha is inf or NaN, poisons the
    next residual via x += alpha * p and r -= alpha * Ap, and the
    single norm sync at the top of the next iteration detects the
    breakdown via numpy.isfinite(rnorm_host) and exits with
    info > 0. Mirrors the cuBLAS-style CG inner loop (nrm2 + scalar
    test, one host barrier per iter); the initial rz breakdown
    guard remains so a zero preconditioned residual still short-
    circuits correctly.

tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py
  - test_gmres_complex_arnoldi_fast_path: complex-dtype regression
    guard for the conjugate-in-place branch of _make_compute_hu --
    a silent miss of the conjugate would lose orthogonality and
    misconverge.
  - test_cg_inf_breakdown_returns_positive_info: regression guard
    for the per-iter-sync collapse. Runs cg on a deliberately
    singular SPD operator and asserts info > 0 (not zero, not -1)
    so a future re-introduction of explicit breakdown syncs would
    still pass but a regression to the old info contract would not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant