Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code with celero::DoNotOptimizeAway can be partly optimized away #13

Closed
xuyuan opened this issue Jan 16, 2014 · 3 comments
Closed

code with celero::DoNotOptimizeAway can be partly optimized away #13

xuyuan opened this issue Jan 16, 2014 · 3 comments

Comments

@xuyuan
Copy link

xuyuan commented Jan 16, 2014

Thanks for the great project first.

The celero::DoNotOptimizeAway only cheats the compiler with calling putchar on the first char in data. But the compiler (at least GCC 4.7) is smart enough to keep calculation of first char and optimize other parts away.

Example:
Vector3 u, v;

celero::DoNotOptimizeAway(u + v);

the compiler will only calculate u[0] + v[0], and ignore u[1] + v[1] and u[2] + v[2]

this can be checked by generated asm code.

I have a dirty fix:

template<class T> void _dump_to_std(T&& datum) {
    char* p = static_cast<char*>(static_cast<void*>(&datum));
    for (size_t i=0; i<sizeof(T)/sizeof(char); ++i) {
        putchar(*p);
        p++;
    }
}

///
/// \func DoNotOptimizeAway
///
/// \author Andrei Alexandrescu
///
template<class T> void DoNotOptimizeAway(T&& datum)
{
    #ifdef WIN32
    if(_getpid() == 1) 
    #else
    if(getpid() == 1) 
    #endif
    {
        _dump_to_std(datum);
    }
}
@DigitalInBlue
Copy link
Owner

Can you demonstrate this bug on a POD or STL type such that it can be duplicated?

DoNotOptimizeAway takes a reference. In this case, it should take a reference to the result of the "u+v" operation and should not at all change the results or the way "u+v" is computed.

@xuyuan
Copy link
Author

xuyuan commented Jan 16, 2014

I think STL is too complicated for compiler, but for POD, compiler can do great job. I created a small example:

#include <celero/Celero.h>
#include <eigen3/Eigen/Eigen>

CELERO_MAIN;

Eigen::Vector3f u, v;
struct Vec {
  float x, y, z;
};
Vec a, b;

Vec add(const Vec& a, const Vec& b) {
  Vec c;
  c.x = a.x + b.x;
  c.y = a.y + b.y;
  c.z = a.z + b.z;
  return c;
}

BASELINE(DemoSimple, Baseline, 0, 7100000)
{
  asm("# test eigen begin");
  celero::DoNotOptimizeAway(Eigen::Vector3f(u + v));
  asm("# test eigen end");

  asm("# test POD begin");
  celero::DoNotOptimizeAway(add(a, b));
  asm("# test POD end");
}

The assembler I got from gcc 4.7 is

# 22 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test eigen begin
# 0 "" 2
#NO_APP
    movss   u(%rip), %xmm0
    addss   v(%rip), %xmm0
    movss   %xmm0, (%rsp)
    call    getpid
    cmpl    $1, %eax
    je  .L68
.L65:
#APP
# 24 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test eigen end
# 0 "" 2
# 26 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test POD begin
# 0 "" 2
#NO_APP
    movss   a(%rip), %xmm0
    addss   b(%rip), %xmm0
    movss   %xmm0, 16(%rsp)
    call    getpid
    cmpl    $1, %eax
    je  .L69
.L66:
#APP
# 28 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test POD end

With bugfix, the result is follow, so you can see the difference.

# 22 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test eigen begin
# 0 "" 2
#NO_APP
    movss   u(%rip), %xmm0
    addss   v(%rip), %xmm0
    movss   %xmm0, (%rsp)
    movss   u+4(%rip), %xmm0
    addss   v+4(%rip), %xmm0
    movss   %xmm0, 4(%rsp)
    movss   u+8(%rip), %xmm0
    addss   v+8(%rip), %xmm0
    movss   %xmm0, 8(%rsp)
    call    getpid
    cmpl    $1, %eax
    je  .L65
.L68:
#APP
# 24 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test eigen end
# 0 "" 2
# 26 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test POD begin
# 0 "" 2
#NO_APP
    movss   a+4(%rip), %xmm1
    movss   a+8(%rip), %xmm0
    addss   b+4(%rip), %xmm1
    movss   a(%rip), %xmm2
    addss   b+8(%rip), %xmm0
    addss   b(%rip), %xmm2
    movss   %xmm1, 20(%rsp)
    movss   %xmm0, 24(%rsp)
    movss   %xmm2, 16(%rsp)
    call    getpid
    cmpl    $1, %eax
    je  .L71
.L67:
#APP
# 28 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test POD end

@DigitalInBlue
Copy link
Owner

Acknowledged. I see there is a problem here. I am checking in a fix. The fix for Visual Studio is not as nice as for gcc & clang, but I believe it addresses this issue. Thanks for the bug report!

DigitalInBlue pushed a commit that referenced this issue Jan 17, 2014
Should fix "DoNotOptimizeAway" on GCC and Clang.  Seems to have fixed it
on Visual Studio 2013 as well, but this is harder to verify.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants