Skip to content

argmin, argmax, refactoring of sorting funcs#17

Merged
shssf merged 3 commits into
masterfrom
argmax-argmin
Sep 17, 2020
Merged

argmin, argmax, refactoring of sorting funcs#17
shssf merged 3 commits into
masterfrom
argmax-argmin

Conversation

@Alexander-Makaryev
Copy link
Copy Markdown
Contributor

No description provided.

Comment thread dpnp/backend/custom_kernels_sorting.cpp Outdated
Comment on lines +58 to +62
for (size_t i = 0; i < size; ++i)
{
result[i] = i;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread dpnp/backend/custom_kernels_sorting.cpp Outdated
Comment on lines +98 to +101
for (size_t i = 0; i < size; ++i)
{
result[i] = array_1[i];
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread dpnp/backend.pyx
Comment on lines +59 to 60
include "backend_searching.pyx"
include "backend_sorting.pyx"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
include "backend_searching.pyx"
include "backend_sorting.pyx"
include "backend_search.pyx"
include "backend_sort.pyx"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names are used in purpose to be compliant with original project/documentation. Are you really want to change them?
Sorting, searching, and counting

policy.queue().wait();
}

template void custom_sort_c<double>(void* array1_in, void* result1, size_t size);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure in it but it would be greate to add these kernels into FPTR

func_map_t func_map =

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread dpnp/backend_searching.pyx Outdated

cdef size_t size = in_array1.size

if call_type == numpy.float64:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the kernel exists in FPTR you can use

cpdef dparray dpnp_add(dparray array1, dparray array2):
cdef DPNPFuncType param1_type = dpnp_dtype_to_DPNPFuncType(array1.dtype)
cdef DPNPFuncType param2_type = dpnp_dtype_to_DPNPFuncType(array2.dtype)
cdef DPNPFuncData kernel_data = get_dpnp_function_ptr(DPNP_FN_ADD, param1_type, param2_type)
result_type = dpnp_DPNPFuncType_to_dtype( < size_t > kernel_data.return_type)
cdef dparray result = dparray(array1.shape, dtype=result_type)
cdef custom_math_2in_1out_func_ptr_t func = <custom_math_2in_1out_func_ptr_t > kernel_data.ptr
func(array1.get_data(), array2.get_data(), result.get_data(), array1.size)
return result

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



def argmax(in_array1, axis=None, out=None):
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

@shssf shssf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some proposed change are not required but likely
Also 2 tests filed

FAILED tests/test_linalg.py::test_eig_arange[8-float32] - AssertionError: 
FAILED tests/third_party/cupy/math_tests/test_sumprod.py::TestSumprod::test_sum_all2

@Alexander-Makaryev
Copy link
Copy Markdown
Contributor Author

@shssf

Some proposed change are not required but likely
Also 2 tests filed

FAILED tests/test_linalg.py::test_eig_arange[8-float32] - AssertionError: 
FAILED tests/third_party/cupy/math_tests/test_sumprod.py::TestSumprod::test_sum_all2

Same tests are failed on master branch, They are not related to these changes

@shssf shssf merged commit ab10a38 into master Sep 17, 2020
@shssf shssf deleted the argmax-argmin branch September 17, 2020 19:27
abagusetty pushed a commit to abagusetty/dpnp that referenced this pull request May 27, 2026
…ost Hessenberg lstsq

Closes audit items IntelPython#9, IntelPython#15, IntelPython#16, IntelPython#17, IntelPython#19 from the prior solver review.
All four touch the iterative-solver inner loop and were left out of the
previous correctness-only commit because they require a new C++ binding.

backend/extensions/blas/gemv.{cpp,hpp}, blas_py.cpp
  - Extend the typed gemv_impl signature with alpha and beta as
    doubles; the per-T impl casts them to the matrix value type at
    dispatch time. dpnp.dot and other legacy callers are unaffected
    -- the existing gemv() public function now forwards (1.0, 0.0) to
    the shared gemv_dispatch helper.
  - Add gemv_alpha_beta() entry point and a _gemv_alpha_beta pybind
    method computing y = alpha * op(A) * x + beta * y for caller-
    supplied scalars. Required by the GMRES Arnoldi fast path which
    fires gemv with (alpha=1, beta=0) writing into a Hessenberg
    column slice, then (alpha=-1, beta=1) fusing u -= V @ h into
    one kernel. For complex matrices the scalars are always one of
    {-1, 0, 1} and so survive the cast exactly; the impl docstring
    flags the silent imag-loss caveat for any other complex caller.
  - Refactor: hoist the existing validation / strides / dispatch
    plumbing into a static gemv_dispatch helper so both entry points
    share identical behaviour without duplication.

scipy/sparse/linalg/_iterative.py
  - _make_compute_hu now takes both V and H. The closure writes the
    projection coefficients h = V[:, :j+1]^H @ u directly into the
    Hessenberg column slice H[:j+1, j] via a single gemv with the
    output pointing at that slice -- no intermediate h buffer, no
    slice-assign copy (audit item IntelPython#16). Pass 2 fuses the AXPY-style
    update u = -V @ h + 1 * u into one gemv with alpha=-1, beta=1 --
    no tmp buffer, one kernel instead of gemv-plus-subtract (audit
    item IntelPython#19). For complex V the (j+1)-element h slice is conjugated
    in-place between the two passes (V^T -> V^H), negligible cost
    next to the n*(j+1) gemv.
  - Switch the Hessenberg least-squares H y = e from a device
    dpnp.linalg.lstsq (which dispatches an SVD kernel for a tiny
    21x20 problem per restart) to numpy.linalg.lstsq on the host
    (audit item IntelPython#17). Matches CuPy's choice and removes a device-
    side SVD launch that on Intel GPU dominates the restart cost
    for the default restart=20. RHS e is now maintained as a numpy
    array; H is copied via dpnp.asnumpy once per restart and the
    resulting y is shipped back as a dpnp array for the V @ y
    update.
  - V[:, j+1] = v retained as a single contiguous USM slice store
    (audit item IntelPython#15 closes as no-change-required: the assignment is
    already one dpnp op on an F-order buffer and there is no fused
    'normalise-then-store' path without further binding work).
  - cg per-iter syncs collapsed from 3-4 down to 1 (audit item IntelPython#9).
    The pAp and rz_new breakdown checks are no longer transferred
    to the host on every iteration; instead the loop relies on
    IEEE-754 inf / NaN propagation through alpha = rz / pAp. When
    pAp underflows the resulting alpha is inf or NaN, poisons the
    next residual via x += alpha * p and r -= alpha * Ap, and the
    single norm sync at the top of the next iteration detects the
    breakdown via numpy.isfinite(rnorm_host) and exits with
    info > 0. Mirrors the cuBLAS-style CG inner loop (nrm2 + scalar
    test, one host barrier per iter); the initial rz breakdown
    guard remains so a zero preconditioned residual still short-
    circuits correctly.

tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py
  - test_gmres_complex_arnoldi_fast_path: complex-dtype regression
    guard for the conjugate-in-place branch of _make_compute_hu --
    a silent miss of the conjugate would lose orthogonality and
    misconverge.
  - test_cg_inf_breakdown_returns_positive_info: regression guard
    for the per-iter-sync collapse. Runs cg on a deliberately
    singular SPD operator and asserts info > 0 (not zero, not -1)
    so a future re-introduction of explicit breakdown syncs would
    still pass but a regression to the old info contract would not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants