ReproDroid is a framework which can be used to create, refine and execute reproducible benchmarks for Android app analysis tools.
!!! Update can be found in the Errata section below !!!
The complete ReproDroid framework consists of BREW and its underlying AQL-System which uses the AQL. The picture below summarizes how the framework works. BREW takes a set of apps or a complete benchmark as input and issues one AQL-Query per benchmark case. Then, one query after another arrives at an AQL-System which produces one AQL-Answer per query. To do so, it uses analysis tools specified in BREW's configuration file. All AQL-Answers are gathered by BREW. Based on these answers a final report for e.g. a benchmark is carried out.
The tools and results presented in the proposing paper can be downloaded for inspection here. In order to work with the framework, we suggest to download the up-to-date version of BREW. The underlying AQL-System is also available in a newer version.
To refine benchmarks and to determine the associated results the Benchmark Refinement and Execution Wizard (BREW) has been used. There are two versions available for download:
- BREW-Website or Github-Project (Up-to-date versions)
- BREW (Version used for the computation of the results below)
A tutorial on how to fully load ReproDroid benchmark results can be found here
A documentation of the Android App Analysis Query Langauge (AQL) as well as the AQL-System using it is also obtainable online:
- AQL-System-Website or Github-Project (Up-to-date versions)
- AQL-System (Version used by BREW to compute the results below)
None of the six evaluated tools are contained in either of these tools. How to set up a configuration file in order to use a tool is explained in this tutorial. The six evaluated tools themselves can be downloaded from their associated websites:
- Amandroid: https://bintray.com/arguslab/maven/argus-saf/3.1.2
- DIALDroid: https://github.com/dialdroid-android/DIALDroid
- DidFail: https://www.cert.org/secure-coding/tools/didfail.cfm
- DroidSafe: https://mit-pac.github.io/droidsafe-src
- FlowDroid: https://github.com/secure-software-engineering/soot-infoflow-android/wiki
- IccTA: https://sites.google.com/site/icctawebpage/source-and-usage
All result determined with ReproDroid can be found in this section.
The refined versions of DroidBench 2.0 and 3.0 as well as the extended DroidBench version can be downloaded here. Every download includes:
- Benchmark
- BREW benchmark file (.ser file)
- Benchmark Cases (.apk files)
- Groundtruth (Expected results in AQL format: .xml)
- Source Code (Eclipse/Android Studio project directories/archives)
- Results
- BREW benchmark file including result-summary (data/data.ser)
- AQL-Answers per app (data/storage/.xml)*
- Expected and actual AQL-Answers per benchmark case (output)
- Logfile (log.txt)
The Feature-Checking and Intent-Matching benchmark extensions can be downloaded here. Both are available for Android API 19 and 26. Every download includes:
- Benchmark
- BREW benchmark file (.ser file)
- Benchmark Cases (.apk files)
- Groundtruth (Expected results in AQL format: .xml)
- Source Code (Android Studio project directories/archives)
- Results
- BREW benchmark file including result-summary (data/data.ser)
- AQL-Answers per app (data/storage/*.xml)
- Expected and actual AQL-Answers per benchmark case (output)
- Logfile (log.txt)
- Feature-Checking (API 19)
- Feature-Checking (API 26)
- Intent-Matching (API 19)
- Intent-Matching (API 26)
The refined version of ICC-Bench 2.0 can be downloaded here. It includes:
- Benchmark
- BREW benchmark file (.ser file)
- Benchmark Cases (.apk files)
- Groundtruth (Expected results in AQL format: .xml)
- Source Code (Project directories)
- Results
- BREW benchmark file including result-summary (data/data.ser)
- AQL-Answers per app (data/storage/.xml)*
- Expected and actual AQL-Answers per benchmark case (output)
- Logfile (log.txt)
The iteratively refined version of DIALDroidBench can be downloaded here. It includes:
- Benchmark
- BREW benchmark file (.ser file)
- Benchmark Cases (.apk files)
- Groundtruth as far as known (Expected results in AQL format: .xml)
- Source Code (Decompiled .apks)
- Results
- BREW benchmark file including result-summary (data/data.ser)
- AQL-Answers per app (data/storage/*.xml)
All benchmarks above which are based on DroidBench contain four tiny bugs (mislabeled).
Category | Benchmark Case | Wrong Label | Correct Label |
---|---|---|---|
Aliasing | SimpleAliasing1 | Negative / Not-Expected Case | Positive / Expected Case |
UnreachableCode | UnreachableBoth | Positive / Expected Case | Negative / Not-Expected Case |
UnreachableCode | UnreachableSink1 | Positive / Expected Case | Negative / Not-Expected Case |
UnreachableCode | UnreachableSource1 | Positive / Expected Case | Negative / Not-Expected Case |
Furthermore the results in the category Reflection were incorrectly reported.
The filter included the category Reflection_ICC
in the category Reflection
- a simple (sub-)string matching mistake.
(The results for most benchmarks and all tools above will be re-evaluated and published here asap - still might take a while.)
Here you find an updated version of the DroidBench 3.0 benchmark (DroidBench website) and the new TaintBench benchmark (TaintBench website) - to open you need BREW version 2.0.0 or newer. This are the two benchmarks we recommend to use for your Android taint analysis tool evaluation.
Download
- Do Android Taint Analysis Tools Keep Their Promises? (Felix Pauck, Eric Bodden, Heike Wehrheim)
ESEC/FSE 2018 https://dl.acm.org/citation.cfm?id=3236029 - Together Strong: Cooperative Android App Analysis (Felix Pauck, Heike Wehrheim)
ESEC/FSE 2019 https://dl.acm.org/citation.cfm?id=3338915 - TaintBench: Automatic real-world malware benchmarking of Android taint analyses (Linghui Luo, Felix Pauck, ...) EMSE 2022 https://link.springer.com/article/10.1007%2Fs10664-021-10013-5
Felix Pauck (FoelliX)
Paderborn University
fpauck@mail.uni-paderborn.de
http://www.FelixPauck.de