-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slowness of Python implementation #30
Comments
Nice analysis, thanks @inakleinbottle! I gave these changes a quick run through and the timing on the Intel i7 was definitely an improvement (22.02 seconds vs. 39.92 seconds). I had to make some adjustments to get it to work, which I did with minimal thought: results = []
for i in range(total):
results.append(np.empty(c_matrices[i].shape))
start_time = time.process_time()
for count in range(BENCHMARK_REPEAT):
for i in range(total):
a = a_matrices[i]
b = b_matrices[i]
np.matmul(a, b, out=results[i])
end_time = time.process_time() If you'd like to make a PR against the repo with the correct changes, I'll be happy to run the i7 and M1 Pro benchmarks again (so they're done in the same environment as the other benchmarks). |
Hmm, I'm not sure why the empty setup that I used didn't work, probably the argument for empty needs to be a tuple. But this is pretty close. Note that this is not quite a like-for-like comparison because the old code was using component-wise multiplication and not matrix multiplication. EDIT: My apologies, I didn't see that you'd actually used np.matrix, which is deprecated in favour of just using a 2-dimensional ndarray and |
OK, I think I know why the Numpy matmul is not performing as expected. Consider the benchmarking loop in the Python code:
On the final line there, the result of
a*b
(this is not matrix multiplication, that isa @ b
), is stored into the temporary variablec
. This variable is already initialized with the product of two matrices, but this is not quite how variables work in Python. The value currently held byc
is aPyObject*
containing a NumPy array. At the start of the linec = a*b
the thing that happens isa*b
, which creates a newPyObject*
pointing to a new NumPy array. Now the assignment operator kicks in, which essentially does the following:At the first decref, the reference count of
c
hits zero, triggering the garbage collector and destroying the value (callingPyObject_Free
). Then the new pointer is given the variable spot and incremented.Thus this one line of Python code has essentially performed the following:To fix this problem, you need to either prevent the garbage collection of
c
, by pushing the result to a list or something similar instead, or better writing the result of the matmul directly into some already allocated Numpy array. This is my suggestion:Notice that I've been quite careful to make sure that the shape of the
results
array guarantees thatresults[i, :, :]
is a contiguous block of memory in C ordering (row major) so the write should be as efficient as possible. (Actually F ordering andresults[:, :, i]
would actually be better for the Fortran backend but let's ignore this.)I hope this makes sense.
The text was updated successfully, but these errors were encountered: