Skip to content

NoahMauthe/decompilation_analysis

Repository files navigation

Decompilation Failure Analysis

The code in this repository was used to perform a large scale analysis on Android applications presented in the paper "A Large-Scale Empirical Study of Android App Decompilation".

Build

Our work is split over multiple repositories to make it easier to manage, but if you are simply looking to recompute our results or perform your own study using our tools, we provide a singularity container definition file that can be built using just two lines code:

wget https://raw.githubusercontent.com/NoahMauthe/decompilation_analysis/master/decompilation_analysis.def
singularity build decompilation_analysis.sif decompilation_analysis.def

(Note that the container was built with singularity 3.5 and may not work with later versions.)

To establish the dataset we ran our tools on, we created a crawler that is able to retrieve apk files from F-Droid and, when given credentials, the Google Play Store. As we wanted to have as much expandability as possible, the crawler itself is just a simple script and everything store specific is implemented in our APIs.

Since the crawler was not the main focus of the work, the Google Play API we created is a heavily modified version of https://github.com/NoMore201/googleplay-api to accomodate the needs of our analysis.

Decompilation analysis

Our analysis tool, contained in this repository, runs four different decompilers on each application, and is capable of checking for the presence of certain packers that might hinder decompilation as well.

Additionally, we perform a matching of the failures reported by our decompilers so we get an idea whether there were some methods that failed decompilation with all of the decompilers.

Tools used

In order to achieve our analysis goal, we relied on a number of open-source tools:

Decompilers

As mentioned before, we used four different decompilers:

For CFR and procyon, we created our own versions that keep all of the functionality, but change the output so it is easier to process in an automated fashion. They can be found at https://github.com/NoahMauthe/cfr and https://github.com/NoahMauthe/procyon.

Dex-tools

Additionally, we employed multiple dex-tools for various reasons:

  • APKiD - A fingerprinting tool that checks for the presence of packers.
  • apkanalyzer - A part of the android command line tools, it allowed us to extract method sizes and signatures.
  • dex2jar - As the name implies, dex2jar is a conversion tool that enabled the use of our three java decompilers.

Dataset

The raw data (one CSV-file per app) is about 6.5 GB in size when compressed (108 GB uncompressed size), and is currently available only on demand. The set of crawled apps (F-Droid and Google Play) can also be made available to researchers.

Notebooks and dataframes

The directory notebooks contains Jupyter notebooks for analyzing the dataset. gen_dataframes.ipynb is used for preparing Pandas dataframes from the dataset, while extract_stats.ipynb is used for the actual data analysis and generating plots. Pre-generated dataframes are also provided in this directory.

About

Code repository of the paper "A Large-Scale Empirical Study of Android App Decompilation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published