Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Dockerfile #3

Closed
wants to merge 3 commits into from
Closed

Added Dockerfile #3

wants to merge 3 commits into from

Conversation

Manouchehri
Copy link
Contributor

Had some issues with cmake on Xenial, so switching to bionic (next LTS) seemed to be the route of least resistance.

@Manouchehri
Copy link
Contributor Author

Using Prebuilt Image

docker run -it --rm thawsystems/retdec

Building Docker Image

git clone https://github.com/Manouchehri/retdec
docker build -t retdec .
docker run -it --rm retdec

@aidanhs
Copy link

aidanhs commented Dec 13, 2017

How about removing the maintainer (since it'll be maintained by repo contributors if it ends up in the repo) and doing COPY rather than cloning?

@Manouchehri
Copy link
Contributor Author

@aidanhs Hmm, doing COPY gets weird fast since we cannot assume that the submodules have been cloned (otherwise it will breaks most automated Docker build systems).

Could we merge in this now and revisit it once #14 is solved?

Copy link

@cquick97 cquick97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull i386 Ubuntu image instead of x64, since only 32-bit is supported.

FROM i386/ubuntu:bionic

"Supported architectures (32b only): Intel x86, ARM, MIPS, PIC32, and PowerPC"

@Manouchehri
Copy link
Contributor Author

Manouchehri commented Dec 13, 2017

@cquick97 That's for target binaries, not the host platform?

@cquick97
Copy link

Sure, but if 64-bit isn't required, why not just stick to 32-bit? I noticed that I couldn't compile a test binary within the container with gcc -m32, as there were architecture dependency problems. Of course you can just compile elsewhere and analyze within the container, but for simple testing I think having i386 dependencies would be better.

I'm halfway through an i386 build right now, so we'll see if that fixes anything. Or maybe I'm just crazy/dumb and you can ignore my nonsense.

@Manouchehri
Copy link
Contributor Author

Manouchehri commented Dec 13, 2017

You can install a 32-bit compiler if needed.

docker run -it --rm --user root thawsystems/retdec
apt-get install gcc-multilib
su - retdec
export PATH=${PATH}:/home/retdec/retdec-install/bin
echo 'void main() { printf("Dave says hello.\n"); }' > test.c
gcc -m32 test.c
decompile.sh a.out

I don't think changing the host is a good idea, going back to 32-bit Linux distributions is a nightmare. Other RE tools like IDA Pro, Binary Ninja, etc are already 64-bit only, I think it's an acceptable requirement.

@cquick97
Copy link

That's probably a better idea. Disregard :)

@breznak
Copy link

breznak commented Dec 14, 2017

👏 for the docker image, instead of full-dependencies installation.
Also, imho the64bit host is more future proof, as the 64bit decompilation support would hopefully be added soon? #9

@aidanhs
Copy link

aidanhs commented Dec 14, 2017

@Manouchehri the main thing I wasn't keen on was cloning the whole repo graph twice - maybe COPY and then git submodule update --init --recursive? Still not perfect (because it discards changes people might have made to change the commit of a submodule) but at least you can make changes in this repo and rebuild with those changes, which is one of the use-cases of a Dockerfile.

I'm just making suggestions while waiting for official feedback :)

@s3rvac
Copy link
Member

s3rvac commented Dec 14, 2017

@breznak I am not sure if I understood your question correctly, but the future support for decompilations of 64b binaries (#9) will be independent from the OS that is used to build and run RetDec. That is, once there is support for decompilations of 64b files, you will be able to use both 32b/64b RetDec builds.

@breznak
Copy link

breznak commented Dec 14, 2017

@s3rvac my idea was that on 64 you can run both 64/32bit apps, if you were to develop, debug something in the image. But basically disregard that, as it boils down to

I don't think changing the host is a good idea, going back to 32-bit Linux distributions is a nightmare. Other RE tools like IDA Pro, Binary Ninja, etc are already 64-bit only, I think it's an acceptable requirement.

@breznak
Copy link

breznak commented Dec 14, 2017

Also, Alpine linux is a very popular docker container OS for its low footprint,
https://hub.docker.com/_/alpine/

It may be more work to set all the things up (once), but then Alpine could be a lighter alternative than a full-blown Ubuntu. What do you think?

@Manouchehri
Copy link
Contributor Author

@breznak While Alpine is lighter, I prefer to use Ubuntu as it's more similar to the platform developers will be using and testing with.

@Manouchehri
Copy link
Contributor Author

@aidanhs You do not need to clone any of the submodules before building the image. See the commands listed in #3 (comment).

@aidanhs
Copy link

aidanhs commented Dec 14, 2017

@Manouchehri my point is that if you initialise submodules, you can COPY in the current working directory.

People can use your Dockerfile until they actually want to make and test a change, at which point they need to either edit it to use my proposed solution (COPY in the repo) or push every single thing they want to test to their own repo. I think COPY is a better solution, and following that with RUN git submodule update --init --recursive addresses your comment at #3 (comment).

Part of the benefit of Dockerfiles is to ease development, which is what I'm shooting for.

@Manouchehri
Copy link
Contributor Author

The problem is COPY only works if the submodules are initialized. Breaking expected build behavior isn't a good idea.

My suggestion is to make a separate Dockerfile for intended your development use case. This one is meant for users to get started quickly with using RetDec.

@HugoKlepsch
Copy link
Contributor

@Manouchehri I must agree with @aidanhs here. If you clone the repo while building the image, that Dockerfile could never be used to test local changes. That means you can never test to see if the Docker image works without commiting and pushing the change!

I also don't think that running git submodule update --init --recursive is a good idea, because that would forget any changes made to submodule versions.

At my work we also use Docker. Our general flow is to have the Dockerfile build an image from the local repository. This allows us to test local changes. Then when we want to publish a new version of the image, our CI system fully checks out the master branch and builds the new image.

If you are concerned users won't fully clone the repository, could we not add a README.md that explains the image creation process?

Copy link
Contributor

@HugoKlepsch HugoKlepsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! 😊
Some minor feedback...

Dockerfile Outdated
mkdir build && \
cd build && \
cmake .. -DCMAKE_INSTALL_PREFIX=/home/retdec/retdec-install && \
make && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way we can auto-detect the number of processors available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be done with a little bash, but we should probably try to improve the normal build process instead of diverging and hacking about in a Dockerfile.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this should be multi-platform, but on linux you can go
make -j$(nproc)
https://unix.stackexchange.com/questions/208568/how-to-determine-the-maximum-number-to-pass-to-make-j-option

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also suggest using make -j$(nproc) because otherwise, only a single core will be utilized for build. Or by setting MAKEFLAGS="-j $(nproc)".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-j(if the project is huge, it will spawn sometimes to many jobs) and -j $(nproc) are both good - you need coreutils for that, but I would expect this to be dragged with something like build-essential or already be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

ENV HOME /home/retdec

RUN apt-get -y update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential git bc graphviz upx cmake python zlib1g-dev flex bison libtinfo-dev autoconf pkg-config m4 libtool wget
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also use --no-install-recommends

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll eventually split this into a multi-stage so the build tools won't need to be in the production image.

DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential git bc graphviz upx cmake python zlib1g-dev flex bison libtinfo-dev autoconf pkg-config m4 libtool wget

USER retdec
RUN git clone --recursive https://github.com/avast-tl/retdec && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use COPY.
Could it be optional somehow? Maybe two Dockerfiles: one for local changes and one from upstream master?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I suggested.

Since it's not my use case, I'll let someone else create a "development" Dockerfile that they like.

@Manouchehri
Copy link
Contributor Author

@s3rvac Any feedback from the Avast team? =)

cd retdec && \
mkdir build && \
cd build && \
cmake .. -DCMAKE_INSTALL_PREFIX=/home/retdec/retdec-install && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we install it into /home/retdec/install? The additional retdec- prefix seems redundant to me. Or is there a motivation behind the current path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I use retdec-install is because if the end user adds more things to /home/retdec/, it can be a bit ambiguous what an install/ folder is referring to.

Not opposed to doing it either way though, which do you want me to use? =)

@aidanhs
Copy link

aidanhs commented Dec 19, 2017

Ok - I get that having two dockerfiles might be useful. But in that case, I'd like to suggest renaming the file in this pr to Dockerfile.master or Dockerfile.prod or something, so when I create a PR for a COPY dockerfile it will be the default (as it's the one that people will most likely want to use if they already have the repo cloned).

On that note, it'd be cool to have some instructions for using this particular Dockerfile - for example, you will (when this PR is merged) be able to just curl -sSL https://raw.githubusercontent.com/avast-tl/retdec/master/Dockerfile.prod | docker build - to get the latest build without cloning.

@Manouchehri
Copy link
Contributor Author

@aidanhs Dockerfiles must be named Dockerfile or otherwise they cannot be built on platforms like Docker Hub..

Once a functional Dockerfile like you're describing is written, we can switch to that. At the moment it's nonexistence, so it makes little sense to make this PR harder to use.

@HugoKlepsch
Copy link
Contributor

@Manouchehri In that case then the path it is under must make it obvious it is for publishing only, not development. ie. '/productionDockerfile/Dockerfile' or the like

@Manouchehri
Copy link
Contributor Author

Manouchehri commented Dec 19, 2017

Sure, but why do that if there's currently only one Dockerfile to begin with? If another Dockerfile is written, it should be added to a folder at that point. We're debating folder structure for files that don't exist.

@HugoKlepsch
Copy link
Contributor

Honestly if this patch were complete it would have all common use case Dockerfiles. (in our case, production and development)

It's better to not have to change readme's, build scripts etc later when the other common use case Dockerfile is added.

@Manouchehri
Copy link
Contributor Author

@HugoKlepsch Okay, go do that yourself then.

@aidanhs
Copy link

aidanhs commented Dec 20, 2017

Dockerfiles must be named Dockerfile or otherwise they cannot be built on platforms like Docker Hub..

FWIW, this isn't true (though I think it used to be the case) - you can see that https://hub.docker.com/r/rustlang/crater/builds/bnftxfvygyjsakpah66cvyh/ uses the docker/Dockerfile.mini file from https://github.com/rust-lang-nursery/crater/tree/b3168a8/docker

@HugoKlepsch
Copy link
Contributor

If that is the case, maybe we can have a Dockerfile and a Dockerfile.dev. I'm working on such a development Dockerfile in my off time now.

MatejKastak added a commit to MatejKastak/retdec that referenced this pull request Sep 13, 2018
- MOV, MVN and MOVZ instructions
- operand shift functions moved and changed for ARM64
- instructions like 'movz x0, avast#3 LSL 16' work now
MatejKastak added a commit to MatejKastak/retdec that referenced this pull request Sep 22, 2018
- Register parent map
- Storing registers
- Loading registers
- Headers

- Need more changes to conversions, I think 'mov w0, avast#3' zeroes out
  the upper 32bits of x0 register. But need to investigate further.
PeterMatula pushed a commit that referenced this pull request Mar 28, 2019
* Capstone2llvmirtool default basic modes for architectures

Run tool with reasonable Capstone basic modes for specified architecture.
Default values are as follows:
-a arm   : CS_MODE_ARM
-a arm64 : CS_MODE_ARM [looks like keystone doesn't like this]
-a mips  : CS_MODE_MIPS32
-a x86   : CS_MODE_32
-a ppc   : CS_MODE_32
-a <rest>: CS_MODE_LITTLE_ENDIAN

* Base for the ARM64 translator

- register maps(_reg2type)
- instructions map(_i2fm)
Modified ARM Translator unit, Work in progress.

* Fix the cs_reg_name

- register name could not be found because of the wrong cs_arch in constructor

* Add ARM64 support for capstone dependency

- capstone was configured without the ARM64 support, this caused
  cs_open to fail

* Temporary solution to call translate function

* Status register and program counter added to environment

- flags from status register added to arm64 env
- program counter added to arm64 env

* Methods store/load registers/operands skeletons + add instruction

- basic implementation of functions needed for loading and storing operands
- translateAdd is for testing purposes

* Store instruction base

- started implementation of MEM operand type
- Store register instruction translation method
e.g. retdec-capstone2llvmir -a arm64 -t 'str x0, [x1]'

* Operand shifts ported from ARM and MOV instruction tranlation

- MOV, MVN and MOVZ instructions
- operand shift functions moved and changed for ARM64
- instructions like 'movz x0, #3 LSL 16' work now

* Arm64 - tests ported from Arm

- test framework capstone2llvmirtranslator
- first INS_ADD test
- cmake compilation

* Basic MOV tests

- MOV, MOVZ

* Test for STR instruction and test header comments

* STP instruction + tests, pc in new enum, get op addr function

- Store pair instruction{pre-index, post-index, signed-offset}
- test for all cases except 32bit operands
- pc moved to its own enum
- generateGetOperandAddr to generate address from instruction operand

* LDR + STR, LDR tests from ARM, LDP stub

- LDR{pre-index, post-index, signed-offset} instruction implemented
- STR{pre-index, post-index, signed-offset} instruction implemented
- LDR tests ported from ARM
- LDP todo

* Implemented parent register handling

- Register parent map
- Storing registers
- Loading registers
- Headers

- Need more changes to conversions, I think 'mov w0, #3' zeroes out
  the upper 32bits of x0 register. But need to investigate further.

* LLVM data layout modified for ARM64

- taken from uname -a in qemu arm64 machine
Linux debian-aarch64 4.9.0-4-arm64 #1 SMP Debian
4.9.65-3+deb9u1 (2017-12-23) aarch64 GNU/Linux

* Removed useless debug output

* getCarryRegister for ARM64 fixed

* Store register ZEXT_TRUNC, 32bit tests baseline + tests

- when writing value to 32bit reg the 64bit, the value is zero
  extended to the vhole register
- parent register mapping enabled in tests
- 32bit version of tests

* Zero extension tests for ADD and MOV 32bit variants

* Implemented BL instruction

- added tests for label and imm branch

* Implemented RET instruction

- added tests

* Implemented LDP instruction

- added tests for instruction

* Implemeneted ADRP instruction

- real binary testing is needed
- without tests

* enable arm64 in decompiler.py and add arm64 architecture

in Architecture::setArch() ARM64 needs to be set before ARM
because "arm" from ARM matches the "arm aarch64" from ARM64

* Arm64 ABI implementation

* Arm64 decoder ported from Arm

* Arm64 imm operand shifts should not update flags by default.

- Added the option to switch this behaviour
- add one ADD test with shift

* Operand register extension generator + 64bit variant extension tests

- Arm supports the extension of operand e.g. 'add x0, x1, w2, SXTW'
  will sign-extend the w2 register to 64 bit and after that add the values
- test for 64bit variant implemented
- need to check the optional imm(shift VM outputs weird values)

* Arm64 Zero/Sign extension 32bit variant tests

* Implemented SUB instruction

- added tests for instruction

* Implemented BR instruction

- added tests for instruction

* Arm64 syscall id register is X8

* Specified call and return instruction ID for implemented instruction

- BL Branch link is hinting the function call
- RET is hinting the function return

* Fixed compilation after merge

- new methods added isOperandRegister, getOperandAccess
- loadOpTernaryop1op2 probably changed to loadOpBinaryOrTernaryOp1Op2
- made sure all unit tests passed
- TODO: implement new conventions from master

* Generate pseudoasm instruction when translation routine is not found

- Function to generate condition code

* Check preconditions in implemented arm64 instructions

* Changed register generation to match other modules.

* LDR instruction all 3 formats + tests

- register
- imm
- literal (label)

* Binaries can now be decompiled

- jumpTargetDryRun updated

* Generate condition codes for conditional instructions.

* ARM64: strb, strh instructions + tests

* Arm64: conditional and unconditional branch instruction + tests

- removed the generation of conditional code in translate instruction function,
  this is not necessary because condition is generated in body of given
  instruction and arm64 support only specific instruction to be conditional.

* Arm64: Instruction ret can have optional register operand + test

* Arm64: BLR instruction + test

* Arm64: CBNZ, CBZ instruction + test

* Arm64: TBNZ, TBZ implementation + tests

* Arm64: LDR different size variants, sign/zero extend + tests

* Arm64: LDPSW instruction + tests

- minor warning fix in STR instruction

* Arm64: ADC instruction + tests

- including flag setting for ADC and ADD instructions
- ADDS tests

* Arm64: ADCS 32bit tests for flags

* Arm64: ADR, ADRP instruction + tests

* Arm64: AND, ANDS instruction + tests

* Arm64: ASR instruction + tests

- ASRV variant

* Arm64: LSL, LSR, ROR instructions + tests

- all major shifts implemented

* Arm64: SUB, SBC flags + tests

- changed asserts to exceptions

* Arm64: CMP, CMN instructions + tests

* Arm64: CSEL instruction + tests

* Arm64: CSET, CSETM instruction + tests

* Arm64: MUL instruction + tests

* Arm64: MADD instruction + tests

- 32bit tests for MUL

* Arm64: MSUB instruction + tests

* Arm64: MNEG instruction + tests

* Arm64: NEG, NEGS instruction + tests

* Arm64: NGC, NGCS initial implementation + tests

- Check the carry flags + add tests

* Arm64: SDIV, UDIV instruction + tests

* Arm64: Fix correct semantics for SBC and NEG instructions

* Arm64: SMADDL, UMADDL instruction + tests

* Arm64: UMSUBL, SMSUBL instruction + tests

* Arm64: SMNEG, UMNEG instruction + tests

* Arm64: UMULL, SMULL, UMULH, SMULH instruction + tests

* Arm64: Conditional select operation instruction + tests

* Arm64: CINC, CINV, CNEG tests

* Arm64: EON, EOR instruction + tests

* Arm64: ORN, ORR instruction + tests

* Arm64: TST instruction + tests

- fixed the AND instruction to set carry and overflow flags to zero

* Arm64: EXTR instruction + tests

* Arm64: Extend instructions + tests

* Arm64: CCMN, CCMP instruction + tests

* Arm64: NOP instruction + tests

* Arm64: REV, RBIT, CLZ instructions + tests

* Arm64: BIC instruction + tests

* Arm64: Unpriviledged loads/stores instructions + tests

* Arm64: Load/Store exclusive instructions + tests

* ARM64: LDAXR instruction variants + tests

* Arm64: LDAR instruction variants + tests

* Arm64, llvmir-emul: don't lower bitreverse intrinsic

- updated tests to check if the correct intrinsic functions was
  called

* Arm64: FP environment + basic unary and binary operations + tests

* Arm64: FMIN, FMINNM, FMAX, FMAXNM instruction + tests

* Arm64: FCMP, FCCMP, FCVT, {U, S}CVTF instructions + tests

* Arm64: FCVTZS, FCVTZU instructions + tests

- let's start testing

* Arm64, bin2llvmir: Decoder should not analyse stack.

* Arm64: MOVK instruction + tests

* Arm64: MOVN instructions + tests

* Merge master with arm-prep

* Architecture: Change arm architectures to account for arm64

-> isArmOrThumb renamed to isArm32OrThumb
-> added isArm32 method
-> thumb is now set with a flag _thumbFlag

* Architecture: Removed the wrong architecture types

Now the enum eArch represents only general architecture and all
subtypes of architecture are checked to getBitSize() or _thumbFlag.

The function isArm() return true for every type of subarchitecture
e.g. {arm32, arm64 or thumb}

* Arm64: XZR loads zero and discards result when written

- Added some instruction IDs to branch types

* Arm64: STR and LDR instructions now determine correct register size

- For example 'str w0, [sp]' should store only 4bytes to stack pointer

* Arm64: Syscall optimalization and detection

Replace svc #0 with corresponding syscall decoded from previous assignments.

* Arm64: MOVI instructions + tests, Vector and half register

Generate Vector registers so in case the pseudo instructions with them
as operands is generated we don't crash. For the similar purpose I
changed the f16 in ARM64_REG_H* to i16 since half type in not
supported and we wan't to be able to at least generate pseudo instructions.

* Arm64: STR and LDR tests

Those tests target loading and storing floating point values.

* Arm64: Removed zero division semantics from llvmir

- Zero division is NOW undefined behaviour
- This caused problems in modulo idiom detection
- Also removed coresponding tests

* Arm64: FMOV instruction with immediate values

- Correctly handle imm values as operands of this instruction

* Revert "Arm64, bin2llvmir: Decoder should not analyse stack."

This reverts commit 7b88475.
This change caused other tests to fail.

* Arm64: Simplified and documented some code

- Removed unused code from decoder/arm64.cpp
- Fixed insnWrittesPcArm64 to work better
- Fixed Cond branch tests

* Arm64: Fixed documentation build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants