-
-
Notifications
You must be signed in to change notification settings - Fork 778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize spmatrix._set_many #7888
Conversation
@emcastillo how does this CI work? Are the tests actually running, or do they not start until triggered by something? |
CIs are triggered after review by one of the maintainers |
mask = offsets > -1 | ||
self.data[offsets[mask]] = x[mask] | ||
|
||
if mask.all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still doing device synchronization since you need to bring back the value to the host for the if comparisson, so I think that execution time won't benefit from these changes
Thanks a lot for the PR! Sorry, can you try to benchmark the code using this? https://docs.cupy.dev/en/stable/user_guide/performance.html I am interested in seeing the GPU performance. Thanks |
@emcastillo thanks for the comments. I understand that there will still be a synchronization because of the Test function (run on Google Colab with T4 GPU)import numpy as np
import cupy
import cupyx
def benchmark_update_only(size, nnz):
rows = cupy.random.randint(0, size, nnz)
cols = cupy.random.randint(0, size, nnz)
old_vals = cupy.random.random(nnz)
new_vals = cupy.random.random(nnz)
mat_cupy_old = cupyx.scipy.sparse.csr_matrix((old_vals, (rows, cols)))
mat_cupy_new = cupyx.scipy.sparse.csr_matrix((old_vals, (rows, cols)))
def run_old():
vals = cupy.roll(new_vals, -1)
mat_cupy_old._set_many(rows, cols, vals)
return mat_cupy_old.get()
def run_new():
# _set_many() uses mask.all()
vals = cupy.roll(new_vals, -1)
_set_many(mat_cupy_new, rows, cols, vals)
return mat_cupy_new.get()
print(f"{size = }, {nnz = }")
print("Old _set_many:")
print(cupyx.profiler.benchmark(run_old, (), n_repeat=100))
print("New _set_many:")
print(cupyx.profiler.benchmark(run_new, (), n_repeat=100))
mat_scipy_old = run_old()
mat_scipy_new = run_new()
assert np.array_equal(mat_scipy_old.indices, mat_scipy_new.indices)
assert np.array_equal(mat_scipy_old.indptr, mat_scipy_new.indptr)
assert np.allclose(mat_scipy_old.data, mat_scipy_new.data)
return mat_scipy_old, mat_scipy_new Results:
|
Seems a significant improvement, let me kick the CIs |
/test mini |
Actually, this looks to be just as fast as using Note the only change from def _set_many(self, i, j, x):
"""Sets value at each (i, j) to x
Here (i,j) index major and minor respectively, and must not contain
duplicate entries.
"""
i, j, M, N = self._prepare_indices(i, j)
x = cupy.array(x, dtype=self.dtype, copy=True, ndmin=1).ravel()
new_sp = cupyx.scipy.sparse.csr_matrix(
(cupy.arange(self.nnz, dtype=cupy.float32),
self.indices, self.indptr), shape=(M, N))
offsets = new_sp._get_arrayXarray(
i, j, not_found_val=-1).astype(cupy.int32).ravel()
if -1 not in offsets.get(): # This is the only line that is different from main
# only affects existing non-zero cells
self.data[offsets] = x
return
mask = offsets > -1
self.data[offsets[mask]] = x[mask]
# only insertions remain
warnings.warn('Changing the sparsity structure of a '
'{}_matrix is expensive.'.format(self.format),
_base.SparseEfficiencyWarning)
mask = ~mask
i = i[mask]
i[i < 0] += M
j = j[mask]
j[j < 0] += N
self._insert_many(i, j, x[mask]) Benchmark:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! nice find
This PR implements the change proposed in #7876.
Performance tests
Run with CuPy 11.0.0 on Google Colab because I don't have a local GPU.
Update only, no insert
Results:
Insert only, no update
I expect very little performance difference here because this line
cupy/cupyx/scipy/sparse/_compressed.py
Line 540 in 5c32e40
Results. Old and new are the same within the noise/reproducibility of the test.