Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #41, #169 - Handle pointers with segment override correctly #391

Merged
merged 7 commits into from
Sep 20, 2018
Merged

Fix #41, #169 - Handle pointers with segment override correctly #391

merged 7 commits into from
Sep 20, 2018

Conversation

palant
Copy link
Contributor

@palant palant commented Sep 11, 2018

See my analysis of the issue in #41 (#169 seems to be a duplicate). capstone2llvmir currently ignores segment override on pointers like fs:[0]. This makes LLVM think that the code is handling null pointers and remove that code as unreachable.

First step was to treat segment overrides correctly, the matching LLVM concept is address spaces. Unfortunately, the only place resembling a documentation for the mapping between segments and address spaces is https://github.com/avast-tl/llvm/blob/725d0cee133c6ab9b95c493f05de3b08016f5c3c/lib/Target/X86/X86ISelDAGToDAG.cpp#L1427, that's what I implemented.

Having address spaces on pointers caused issues further down in the pipeline. bin2llvmir was always producing bitcasts for pointer casts, this had to be changed into address space casts when address spaces change.

And llvmir2hll didn't know address space casts. I made it treat them the same as bitcasts which should do in the short term. In fact, I don't really know how that add byte ptr gs:[ebx], dh instruction (the one that produced the address space cast) got into this driver and what it could be converted into.

@palant
Copy link
Contributor Author

palant commented Sep 11, 2018

The regression test bugs.idioms-reg-couples.TestCryptoLocker failed with these changes. Any reason why this regression test verifies that the bug I am fixing here is still present?

@@ -182,6 +182,7 @@ ShPtr<Expression> LLVMConverter::llvmConstantToExpression(llvm::Constant *c) {
return IntToPtrCastExpr::create(op, llvmTypeToType(ce->getType()));

case llvm::Instruction::BitCast:
case llvm::Instruction::AddrSpaceCast:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. Changes in capstone2llvmir and bin2llvmir will be reviewed by @PeterMatula, but I have a comment concerning the change in llvmir2hll. I think we should add case llvm::Instruction::AddrSpaceCast also to the following two places:

  1. src/llvmir2hll/llvm/llvmir2bir_converters/orig_llvmir2bir_converter/llvm_converter.cpp (in LLVMConverter::visitCastInst()):
    case llvm::Instruction::BitCast:
        return BitCastExpr::create(op, llvmTypeToType(i.getType()));
  2. src/llvmir2hll/llvm/llvmir2bir_converters/new_llvmir2bir_converter/llvm_instruction_converter.cpp (in LLVMInstructionConverter::convertConstExprToExpression):
    case llvm::Instruction::BitCast:
        return convertCastInstToExpression<BitCastExpr>(*cExpr);
    Including adding a unit test into tests/llvmir2hll/llvm/llvmir2bir_converters/new_llvmir2bir_converter/llvm_value_converter_tests/llvm_instruction_converter_tests.cpp. This code is part of a new LLVMIR to BIR converter that is under development. To use it, you need to run retdec-decompiler.py with --backend-llvmir2bir-converter new.

@palant
Copy link
Contributor Author

palant commented Sep 13, 2018

There we go. I made sure that the new LLVMIR to BIR converter can also deal with address space casts, and I added a unit test.

@PeterMatula PeterMatula changed the base branch from master to issue-41 September 20, 2018 12:24
@PeterMatula PeterMatula merged commit c2cd08c into avast:issue-41 Sep 20, 2018
@PeterMatula
Copy link
Collaborator

PeterMatula commented Sep 20, 2018

Thank you for this. We knew that work with nullptr pointers cause problems in LLVM passes, but did not get around to analyze and solve it. Your analysis and solution will no doubt improve the output quality a lot.

I did not merge it directly to master yet, since while studying the issue I decided to make some more changes upon your code:

  • Test your changes on a larger testsuite that we use internally.
  • Make it clear in the output C, that load/store from a different address space is happening. Right now this source (compiled using MSVC):
int main(void)
{
    DWORD x = __readfsdword (0x18);
//  mov         eax,dword ptr fs:[00000018h]
    return x;
}

gets decompiled to:

int main(int argc, char ** argv) {
    return *(int32_t *)24;
}

Hex-Rays generates this:

int __cdecl main(int argc, const char **argv, const char **envp)
{
  return __readfsdword(0x18u);
}

It looks like it generates the same functions MSVC uses (__readfsbyte, __writefsbyte).
We were thinking about 2 alternatives:

  • Generate these pseudo functions in capstone2llvmir, instead of the address space generation code you added.
  • Leave your code in capstone2llvmir, and only later transform it to pseudo function calls.

We decided for the second option, because it contains the load/store semantics for LLVM passes - there are actual load and store instructions that may be analyzed and modified. We can generate the pseudo functions at the end, only when no more optimizations are to be performed. We should make sure no addrspace(xyz) survives this transformation - all of them get replaced.

  • Test it again on a larger testsuite that we use internally.

This will solve most/many problems described in #41 and #169, since it removes one source of work with nullptr pointers. However, it does not guarantee there are no such instructions, therefore the LLVM optimizations you found in your analysis might still be applied if something goes wrong in the code RetDec sends to LLVM passes. I'm thinking about some sanitizer pass, that would try to find potential problems and hotfix them before running potentially aggressive LLVM passes. What do you think?

Also, if you have some objections/ideas about the things I wrote, lets discuss them here.

@palant
Copy link
Contributor Author

palant commented Sep 20, 2018

No objections on my end. I don't know the product too well, so I took the easy route - feel free to implement a proper solution. And - sure, ensuring that LLVM won't simply remove code it deems unreachable would be great, but that's also beyond my capabilities.

@PeterMatula
Copy link
Collaborator

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants