Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 11788 - [x86] Valgrind unhandled instruction bytes: 0xC8 0x8 0x0 0x0 #3609

Merged
merged 1 commit into from Jun 6, 2014

Conversation

yebblies
Copy link
Member

@yebblies yebblies commented Jun 1, 2014

Use combination of PUSH, MOV and SUB instead of unsupported ENTER instruction, as recommended here.

Also, why the hell is this code generated manually? @WalterBright

…8 0x0 0x0

Use combination of PUSH, MOV and SUB instead of unsupported ENTER instruction
@Safety0ff
Copy link
Contributor

Thanks, I look forward to being able to use valgrind.

@yebblies
Copy link
Member Author

yebblies commented Jun 1, 2014

Thanks, I look forward to being able to use valgrind.

Unfortunately it's going to take more than this patch. Next up is this: http://valgrind.10908.n7.nabble.com/how-to-disable-unhandled-instruction-bytes-error-reporting-td42100.html

@Safety0ff
Copy link
Contributor

Unfortunately it's going to take more than this patch. Next up is this: http://valgrind.10908.n7.nabble.com/how-to-disable-unhandled-instruction-bytes-error-reporting-td42100.html

Are the remaining unhandled instructions only floating point? Or will it still fault on certain integer instructions?

@yebblies
Copy link
Member Author

yebblies commented Jun 1, 2014

I don't know - with this patch hello world will work under valgrind, but that doesn't mean there aren't other broken integer instructions.

buf->writeByte(0);
buf->writeByte(0);
off += 6;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather inefficient way to do things, with all the repeated calls to writeByte. A smaller way is to use a fixed size array, assigning the values to the array, then passing the array in one call to buf->write():

unsigned char code[11];
unsigned char* p = &code[0];
code[0] = 0x50 + BP;
if (I64)
{   code[1] = REX | REX_W;
    ++p;
}
p[1] = 0x8B;
p[2] = modregrm(3,BP,SP);
if (align)
{
    if (I64)
    {
       p[3] = REX | REX_W;
        ++p;
    }
    p[3] = 0x81;
    ... etc ...
}
buf->write(code, p - code + 3);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that basically the same thing? Doesn't the outbuf essentially do this internally? For code like this that is run only once per program, I would think clarity is more important, and the code is dead simple.

I'll change it if you feel strongly about it, let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right in that the code is logically the same. But if you look at the object code, you'll find my proposed version is dramatically shorter. I'd prefer the shorter version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer @yebblies version, it's easier to read & maintain.
What's the point of the micro-optimization? Tradeoff the time of people maintaining the backend for a few bytes of object code? HDD is cheap and life is short.
Perhaps if you we're more active in the maintenance of the backend this type of "optimization" would be more palatable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of the micro-optimization?

Sloth and bloat seeps in through a thousand cracks.

Perhaps if you we're more active in the maintenance of the backend this type of "optimization" would be more palatable.

Uncalled for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right in that the code is logically the same. But if you look at the object code, you'll find my proposed version is dramatically shorter. I'd prefer the shorter version.

Looking at the code again, your version is ridiculously complicated, and does not match the surrounding code, which uses repeated calls to writeByte. It also does not appear to be significantly shorter, except for the fact that you removed the comments and some of the code.

This is not a frequently-called function, and micro-optimizing it at the expense of readability is not something I'm going to do.

Also note that align is usually (always?) zero, resulting in the same number of calls to writeByte as the original code.

Please reconsider merging this as-is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If align is 0, your version does 3 calls to writeByte() while mine does one call to write().

I do not agree that my version is ridiculously complicated, all it does is simply fill an array. I understand your argument about micro-optimizing something, but in this case do not agree with it. The road to a slow compiler is filled with individually insignificant slowdowns, and a primary attraction of dmd is ridiculously fast compile times.

But I turned on auto-merge because I'm tired of arguing about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also does not appear to be significantly shorter

It is in terms of generated code. Take a look at the asm output and see.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If align is 0, your version does 3 calls to writeByte() while mine does one call to write().

And the original code does 4 calls. (And I don't think it's actually possible for align to be non-zero, so this code is already an optimization)

I do not agree that my version is ridiculously complicated, all it does is simply fill an array. I understand your argument about micro-optimizing something, but in this case do not agree with it. The road to a slow compiler is filled with individually insignificant slowdowns, and a primary attraction of dmd is ridiculously fast compile times.

Since "ridiculously complicated" is a very subjective term I guess we'll have to agree to disagree here. I do not consider the readability hit worth the unnoticeable difference in performance. Also, I would have done it that way if the surrounding code did it that way.

But I turned on auto-merge because I'm tired of arguing about it.

Thank you.

@WalterBright
Copy link
Member

Auto-merge toggled on

WalterBright added a commit that referenced this pull request Jun 6, 2014
Issue 11788 - [x86] Valgrind unhandled instruction bytes: 0xC8 0x8 0x0 0x0
@WalterBright WalterBright merged commit 2115e0c into dlang:master Jun 6, 2014
@yebblies yebblies deleted the issue11788 branch December 15, 2014 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants