New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible performance regression in Debug mode on the stable branch #8520
Comments
Here is a slightly modified version of the benchmark. This takes me 140 ms with 9.1.1 and 150 ms with the current master branch. #include <deal.II/base/tensor.h>
#include <chrono>
#include <iostream>
int main()
{
using namespace dealii;
const auto start = std::chrono::high_resolution_clock::now();
double sum = 0.;
for (int i = 0; i < 10; ++i)
{
double values[3][3] = {2., 5., 1.,
6., 3., 8.,
-2., 5.5, 1.7};
const Tensor<2,3> t(values);
for (int j = 0; j < 100000; ++j)
sum += scalar_product(t, t);
}
const auto end = std::chrono::high_resolution_clock::now();
std::cout << "took " << std::chrono::duration_cast<std::chrono::milliseconds>
(end - start).count() << " milliseconds\n";
return static_cast<int>(sum);
} |
I guess one would need to look into the assembly code to see what has changed. I am willing to believe that the assertion could be part of the reason - however, I want to point out that it is not the check itself that is the problem, but rather the compiler which moves content back and forth between registers and the stack (because you need to be able to access all variables). Background information: If we had those checks also in release mode (not count some of the more expensive ones we do), I would think that the slowdown would be less than a factor of 2, often less than a factor 1.2. The reason is that almost all branches due to assertions should be easy to predict (branch prediction of the CPU). What could cause a slowdown is some missed optimization opportunities due to the check (say less loop unrolling, additional dependencies that harm out-of-order execution) and, sometimes, the decode issue that the CPU simply has to execute more instructions. But as I said, I believe this is less than 2x. Now to the actual cause: I believe the huge difference between version 9.0 and 9.1 is #7207. If you check your detailed.log, we have
in the case of 9.0.1 and
for 9.1.0, note the switch from |
@kronbichler Thanks for the explanation. It now makes perfect sense to me. I also remember in previous deal.II versions that gdb used to unnecessarily inline or optimize away variables that I wanted to inspect. So I support more powerful debugging in debug mode even if it is slower. I think there are many opportunities to improve the |
Yes, Since there is little we can do in concrete terms about your observation, should we close this issue? Of course, we'd be happy to convert |
Recently I upgraded the Candi deal.II installation (from 9.0.1 to 9.1.0) on my desktop and noticed significant performance degradation (~6-7 times) in the debug mode in my code (no change in the performance of the release mode). Using a profiler, I narrowed down the source of slowness. The tensor operations (double_contract etc) seem to be slowed the most. My hypothesis is that the performance regression comes mostly from
Assert
statements in tensor operations (specificallyTensor::operator()
).To test this hypothesis, I created a simple code that checks the speed of tensor operations.
The timing results in debug mode (using elapsed time provided by
/usr/bin/time
)Please let me know if you can reproduce these results. It seems that most of the error has been solved on the master branch, but maybe we should update the stable branch as well.
I used
callgrind
to profile it. The number of instruction read (IR) events areOne of the evidences suggesting that "Assert macro is causing the performance regression" is the following output from
callgrind
for v9.1.0The dealii installation logs and the
callgrind
profiling data are attached here.detailed-7d1f025430.log
detailed-v9.0.1.log
detailed-v9.1.0.log
profile-results.7d1f025430.txt
profile-results.v9.0.1.txt
profile-results.v9.1.0.txt
The text was updated successfully, but these errors were encountered: