Skip to content


Repository files navigation


This repository contains artifacts for An Infrastructure Approach to Improving Effectiveness of Android UI Testing Tools, published at ISSTA 2021.

The name of TOLLER comes from here.


We assume that you are using a Unix-like environment throughout this guide. All experiments were conducted on Ubuntu 16.04.

Source code and integration instructions of TOLLER

See on-device-agent/ for the source code of TOLLER's on-device agent. See here for integration instructions. To get started quickly, consider using our prebuilt Android emulator image below.

Android emulator used in our experiments

See emulator/ for more details.


See TOLLER Usages for more details. A trace recorder that works with TOLLER is available here (note that it's not used in the TOLLER paper).

Useful scripts from our experiments

See useful-scripts/. The following utilities should have been installed:

  • aapt (from Android SDK build tools)
  • adb (from Android SDK platform tools)
  • md5sum
  • timeout
  • Python 2 & 3
  • GNU screen
  • ts (apt install moreutils)

To start testing, refer to the following command:

$ cd useful-scripts
$ export ANDROID_SERIAL="xxx" # Set this if you have multiple devices connected

APP_ID, RUN_ID, and DEV_ID are set by you. Make sure that useful-scripts/run-all-{TOOL_ID}.sh exists. TOOL_ID is in the format of {TOOL_NAME}-{VARIANT} (e.g., chimp-original). You can specify an auto-login Python script using OPTIONAL_LOGIN_PY_SCRIPT, which will be executed on the computer after app launches and before testing starts. You also need to set the environment variable ANDROID_SERIAL if there are multiple devices connected.

You should see a folder named test-logs in the root directory of this repo after running the command. Within this folder, you should see a new folder named {TOOL_ID}-{APP_ID}-{RUN_ID}, where all experiment logs are placed.

Original and modified versions of testing tools

See test-tools/.

Installation packages of apps involved in experiments

Available here.

Raw experiment data

See here for raw experimental data. Specifically, each Bzipped tarball corresponds to one run, with filename in the form of {TOOL_NAME}-{APP_NAME}-{RUN_ID}.tar.bz2.

  • There are 12 different values for TOOL_NAME, corresponding to the 12 versions of tools shown in our paper. For tool names ending with enhanced, they correspond to TOLLER-enhanced tool versions.
  • There are 15 different values for APP_NAME, corresponding to the 15 apps that we use for evaluation.
  • There are 4 different values for RUN_ID: prof, 1, 2, and 3. prof indicates that the run is profiled to collect time usage statistics, while the rest correspond to the three runs for each pair of tool variant and app.

Within tarballs the format is explained as follows:

  • crash.log: Android logcat entries corresponding to the app's crash logs.
  • tool.log: Logs from the testing tool.
  • minitrace/cov-{TIMESTAMP}.log: Coverage information from MiniTrace throughout the test run, collected periodically.
  • adb-timer.log: All ADB commands invoked by the testing tool along with their time usages in milliseconds.
  • logcat-toller.log: All logs produced by TOLLER, which include time usages for UI capturing and event execution.
  • python-timer.log: Python invocations by the testing tool along with their time usages in milliseconds. Only exists for Stoat versions in order to measure the end-to-end time usages of event execution (Stoat wraps all event execution related logistics in Python scripts and calls them during testing).


Please direct your questions to Wenyu Wang.