argmin, argmax, refactoring of sorting funcs#17
Conversation
| for (size_t i = 0; i < size; ++i) | ||
| { | ||
| result[i] = i; | ||
| } | ||
|
|
There was a problem hiding this comment.
Please use std::iota
http://www.cplusplus.com/reference/numeric/iota/
| for (size_t i = 0; i < size; ++i) | ||
| { | ||
| result[i] = array_1[i]; | ||
| } |
There was a problem hiding this comment.
| include "backend_searching.pyx" | ||
| include "backend_sorting.pyx" |
There was a problem hiding this comment.
| include "backend_searching.pyx" | |
| include "backend_sorting.pyx" | |
| include "backend_search.pyx" | |
| include "backend_sort.pyx" |
There was a problem hiding this comment.
These names are used in purpose to be compliant with original project/documentation. Are you really want to change them?
Sorting, searching, and counting
| policy.queue().wait(); | ||
| } | ||
|
|
||
| template void custom_sort_c<double>(void* array1_in, void* result1, size_t size); |
There was a problem hiding this comment.
Not sure in it but it would be greate to add these kernels into FPTR
dpnp/dpnp/backend/backend_iface_fptr.cpp
Line 46 in 9b63bae
|
|
||
| cdef size_t size = in_array1.size | ||
|
|
||
| if call_type == numpy.float64: |
There was a problem hiding this comment.
If the kernel exists in FPTR you can use
dpnp/dpnp/backend_mathematical.pyx
Lines 116 to 128 in 9b63bae
|
|
||
|
|
||
| def argmax(in_array1, axis=None, out=None): | ||
| """ |
Same tests are failed on master branch, They are not related to these changes |
…ost Hessenberg lstsq Closes audit items IntelPython#9, IntelPython#15, IntelPython#16, IntelPython#17, IntelPython#19 from the prior solver review. All four touch the iterative-solver inner loop and were left out of the previous correctness-only commit because they require a new C++ binding. backend/extensions/blas/gemv.{cpp,hpp}, blas_py.cpp - Extend the typed gemv_impl signature with alpha and beta as doubles; the per-T impl casts them to the matrix value type at dispatch time. dpnp.dot and other legacy callers are unaffected -- the existing gemv() public function now forwards (1.0, 0.0) to the shared gemv_dispatch helper. - Add gemv_alpha_beta() entry point and a _gemv_alpha_beta pybind method computing y = alpha * op(A) * x + beta * y for caller- supplied scalars. Required by the GMRES Arnoldi fast path which fires gemv with (alpha=1, beta=0) writing into a Hessenberg column slice, then (alpha=-1, beta=1) fusing u -= V @ h into one kernel. For complex matrices the scalars are always one of {-1, 0, 1} and so survive the cast exactly; the impl docstring flags the silent imag-loss caveat for any other complex caller. - Refactor: hoist the existing validation / strides / dispatch plumbing into a static gemv_dispatch helper so both entry points share identical behaviour without duplication. scipy/sparse/linalg/_iterative.py - _make_compute_hu now takes both V and H. The closure writes the projection coefficients h = V[:, :j+1]^H @ u directly into the Hessenberg column slice H[:j+1, j] via a single gemv with the output pointing at that slice -- no intermediate h buffer, no slice-assign copy (audit item IntelPython#16). Pass 2 fuses the AXPY-style update u = -V @ h + 1 * u into one gemv with alpha=-1, beta=1 -- no tmp buffer, one kernel instead of gemv-plus-subtract (audit item IntelPython#19). For complex V the (j+1)-element h slice is conjugated in-place between the two passes (V^T -> V^H), negligible cost next to the n*(j+1) gemv. - Switch the Hessenberg least-squares H y = e from a device dpnp.linalg.lstsq (which dispatches an SVD kernel for a tiny 21x20 problem per restart) to numpy.linalg.lstsq on the host (audit item IntelPython#17). Matches CuPy's choice and removes a device- side SVD launch that on Intel GPU dominates the restart cost for the default restart=20. RHS e is now maintained as a numpy array; H is copied via dpnp.asnumpy once per restart and the resulting y is shipped back as a dpnp array for the V @ y update. - V[:, j+1] = v retained as a single contiguous USM slice store (audit item IntelPython#15 closes as no-change-required: the assignment is already one dpnp op on an F-order buffer and there is no fused 'normalise-then-store' path without further binding work). - cg per-iter syncs collapsed from 3-4 down to 1 (audit item IntelPython#9). The pAp and rz_new breakdown checks are no longer transferred to the host on every iteration; instead the loop relies on IEEE-754 inf / NaN propagation through alpha = rz / pAp. When pAp underflows the resulting alpha is inf or NaN, poisons the next residual via x += alpha * p and r -= alpha * Ap, and the single norm sync at the top of the next iteration detects the breakdown via numpy.isfinite(rnorm_host) and exits with info > 0. Mirrors the cuBLAS-style CG inner loop (nrm2 + scalar test, one host barrier per iter); the initial rz breakdown guard remains so a zero preconditioned residual still short- circuits correctly. tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py - test_gmres_complex_arnoldi_fast_path: complex-dtype regression guard for the conjugate-in-place branch of _make_compute_hu -- a silent miss of the conjugate would lose orthogonality and misconverge. - test_cg_inf_breakdown_returns_positive_info: regression guard for the per-iter-sync collapse. Runs cg on a deliberately singular SPD operator and asserts info > 0 (not zero, not -1) so a future re-introduction of explicit breakdown syncs would still pass but a regression to the old info contract would not.
No description provided.