DATA - Differential Address Trace Analysis
DATA is a dynamic side-channel analysis framework based on differential address traces. For installation and usage instructions see USAGE.md.
Side-channel attacks infer sensitive information from computing devices by monitoring supposedly benign properties like execution time or power consumption. Similar side-information can also be obtained from memory hierarchies and microarchitectures of modern processors. TLBs, caches, and branch prediction units, for instance, have all been targets to so-called microarchitectural attacks. These attacks typically exploit the persistent state of shared processor resources and can be launched without having physical access to a target device. Microarchitectural attacks bypass important isolation mechanisms employed by operating systems or hypervisors and consequently put critical applications at risk. Dedicated countermeasures, however, are only slowly deployed or often completely omitted in practice.
This is why we developed DATA, a differential address trace analysis framework. In simple terms, DATA is an analysis tool for detecting data and code accesses that can potentially be exploited by microarchitectural attacks. More precisely, DATA reveals address-based side-channel leaks, which account for attacks exploiting caches, TLBs, branch prediction, control channels, and likewise. The detection of leaks in a program under test works by instrumenting its binary while it processes secret information. If any data or code accesses exhibit relations to the processed secret, DATA reports them with instruction-level granularity. The relations are determined with statistical tests and classified according to the information an adversary could gain from them.
DATA is intended to be a companion during testing and verification of security-critical software. It helps to improve code quality by detecting information leaks early in the development process. Given the modest deployment of countermeasures against microarchitectural attacks in practice and the continuing discovery of new attacks, DATA constitutes an important part of the defense strategy of security and privacy critical applications.
For more information on microarchitectural attacks, the survey by Ge et al. is a good starting point.
DATA works in three consecutive phases.
Phase 1: Starting with the difference detection phase, the program under test is executed multiple times with different secret inputs and the addresses of all code and data accesses are written to so-called address traces. A trace comparison algorithm then compares the traces pairwise and reports differing code and data accesses. This is somewhat similar to comparing multiple text files with diff.
Phase 2: In the leakage detection phase, the control-flow and data differences reported by phase one are tested for dependencies on the secret input. This is done using a statistical test that compares the address traces of (i) a fixed secret input and (ii) random secret inputs. If the traces, and thus the execution, differ significantly, the corresponding control-flow and data differences are labeled as information leaks.
Phase 3: In the leakage classification phase, the leaks reported by phase two are tested for the type and amount of information that is leaked. This is achieved with a statistical test that finds linear and non-linear relations between the secret input and the address traces. The test is based on a so-called leakage model provided by the analyst. Common leakage models are Hamming weight, which reduces an input to its number of 1-bits, and input slicing, which tests individual bits and bytes of the input for relations.
Phase1: The address traces are generated by monitoring the program under test with a dynamic binary instrumentation framework. In the current implementation, we use Intel Pin with a custom Pintool that dumps the addresses of all executed instructions and all accessed operands. It preserves the call hierarchy and implements basic tracking of dynamic memory objects.
Phase2: The detection of information leaks is done with Kuiper's test. This metric is closely related to the Kolmogorov-Smirnov statistic, a non-parametric equality test of probability distributions. In DATA, Kuiper's test is used to compare the address distributions observed at a given control-flow or data difference. If these distributions are distinct for different input sets, we conclude that the tested difference leaks some information about the secret input. Since the amount of leakage (e.g., in bits) is unspecified, we say the test is a generic leakage test.
A common characteristic of above fixed-vs-random leakage tests is that the choice of the fixed input has an effect on the test results. A particularly poor choice might cause the test to miss some leaks. This is why the statistical test is repeated for multiple fixed inputs.
Phase3: To further quantify the information an adversary can obtain from observing a leak, a specific leakage test is added. The idea behind the test is to determine any relation between the secret input and the observed addresses at a given control-flow or data leak. This is done with a bivariate relationship test, for which we use the Randomized Dependence Coefficient or RDC. Similar to Mutual Information metrics, it captures any linear or non-linear relation between the observations of two random variables.
All contributing authors are listed in AUTHORS.md.
DATA is under active development and we always welcome feedback of any kind -- from bug reports to new ideas. Future versions of DATA will provide improved performance and usability. If you want to use DATA commercially, we offer appropriate licenses upon request. Feel free to send any inquiries directly to us.
DATA has been successfully applied to test the following libraries:
- OpenSSL: RSA, DSA, ECDSA, Ed25519, and various symmetric algorithms
- LibreSSL: DSA, ECDSA
- BoringSSL: DSA, ECDSA
- PyCrypto: AES, DES, Blowfish
If you want to add test wrappers for a library not listed above, we're happy to take pull requests.
The version history of DATA and all corresponding changes can be found here.
Samuel Weiser, Andreas Zankl, Raphael Spreitzer, Katja Miller, Stefan Mangard, and Georg Sigl. DATA – Differential Address Trace Analysis: Finding Address-based Side-Channels in Binaries. In Usenix Security 2018. Cite the paper.
Samuel Weiser, David Schrammel, Lukas Bodner, and Raphael Spreitzer. Big Numbers -- Big Troubles: Systematically Analyzing Nonce Leakage in (EC)DSA Implementations. In press at Usenix Security 2020. See more material including reports under papers/ecdsa.