Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
136 lines (104 sloc) 10.5 KB
\title{Related Work -- Concurrency, Systematic Testing, Operating Systems, Verification}
\author{Ben Blum}
\abstract{This document is intended to be a collection of summaries of publications and other related work I read in the course of my research.}
\subsection{Systematic Exploration}
\item Landslide \cite{Landslide} is my MS thesis. I wrote it and Garth advised.
\subsection{Symbolic Execution}
\item S$^2$E \cite{s2e} is a platform for selective symbolic execution. They identify a domain (set of modules of the program) for symbolic testing, such as a library, and straight-line execute everything else, such as the app and kernel. Their tool combines virtualisation, dynamic binary translation, operates directly on binaries ``in vivo'', and is freely downloadable.
\item SymDrive \cite{symdrive} is a mechanism that focuses on kernel device drivers, primarily to address the challenge of not being able to test code for esoteric devices for lack of testing hardware. It is built on top of S$^2$E, described above.
The ability to test device drivers comes from using symbolic data as input from the devices (though it's not clear to what extent this covers all possible device behaviours, since they admit later to not being able to anticipate certain esoteric patterns).
Their approach is solidly in the ``dynamic informal verification'' camp. Testing is largely driven by short, pluggable checkers, which seem to average 11 lines long among their 49 supplied ones. One that particularly interests me is the execution context checker, designed for precisely the Atomic All-Nighters bug. However, (a) they are vague about what is illegal besides {\tt GFP\_ATOMIC} vs {\tt GFP\_KERNEL} wrt {\tt kmalloc}, and (b) the check appears to be entirely dynamic, using a stack to track execution contexts each time a spinlock is taken, which means it can't possibly be more powerful than run-time asserts in Linux's scheduler.
\subsection{Invariant Analysis}
\item AVIO~\cite{avio} is a framework designed to detect high-level atomicity violations, eschewing data race detection. Their motivation section (a considerable chunk of the paper) makes a big deal of the ``Helgrind-clean does not mean correct'' problem, and they also make another good point: that in transaction-based models, data races will be even less of an issue.
Their analysis is implemented two ways, one way in hardware (extending the cache coherency protocol), less accurate but minimal overhead, and one way in software, more accurate with ``moderate'' overhead. It's built on a training-data model; AVIO starts by computing ``access interleaving invariants'' by analysing known-correct executions.
Extracting invariants from correct runs is done by first identifying all shared memory locations as ``invariants'', then pruning out locations that (they suppose) don't actually identify invariants by finding reordered accesses around them during known-good runs. They don't mention this, but I think this is hugely limiting in two ways: (1) it might introduce false negatives (e.g. if their training set doesn't have enough interleavings in it), and (2) is necessarily incompatible with interleavings which depend on multiple locations interacting.
Overall I was expecting from this paper a principled approach to reasoning about ``more general'' race conditions than just data races (given their expansive motivation section), but actually they've just identified another, disjoint, subset of races and built a specialised tool for finding just those.
\section{Static Analysis}
\subsection{Constraint Solving}
\item {\sc Kint} \cite{kint} is a framework for finding integer errors (overflow, divide-by-zero, shifting, truncation, sign misinterpretation) in C. Their related work section is at the beginning. They have an enormous table of bugs that were found, with columns denoting e.g. impact (OOB write, DoS, etc), attack vector (userspace, disk, etc), and how many times people had previously tried but failed to fix the bug.
Garth noted that most of {\sc Kint}'s techniques are all well-known static analysis techniques -- taint flow, bounds checks insertions, etc., and the contribution is more in how they are put together.
Their other novel contribution (which I think is cooler, in the ``change the way people program'' sense) is the NaN integer semantics.
Basically, once an integer overflows (or something else bad(?)) it gains ``NaN taint'' (in other words, the NaN value is ``sticky'').
The implementation (currently only for unsigned ints) reserves {\tt UINT\_MAX} for the NaN value (which I worry about) and uses x86's {\tt jno} instruction, for minimal overhead.
\subsection{Formal Verification}
\item {\sc Quark} \cite{quark} is a web browser with formally verified security guarantees. The technique, ``formal shim verification'', involves writing a small component of the browser, the ``kernel'', in a language which can be formally verified (in their case Coq), and reasoning about the properties it enforces about the rest of the codebase.
{\sc Quark} uses a message-passing system for tabs, which are separate processes running a hacked WebKit, to request and receive from the kernel, with messages such as {\tt GetUrl} and so on. The properties they verify are (1) tab non-interference (sort of a frame rule), (2) cookie integrity and confidentiality (which is stated based on a notion of each tab being authorised for only one particular domain, which forbids xss-type stuff), and (3) address bar integrity (the text in the URL bar never lies).
Because Coq is non-turing-complete (and isn't great for ``reactive'' nonterminating programs like browsers), they use an I/O+nontermination library for Coq called Ynot, which I think uses trusted unsafe code under the hood. In the end section, they talk about what comprises their ``Trusted Code Base'' (incl. Ynot, the OS, the compiler, etc), and also mention some security limitations that their model doesn't/can't provide.
I like this paper quite a bit.
\subsection{Replication and Redundancy}
\item Eve \cite{eve} is a mechanism for state machine replication in multicore servers. I have a hard time telling what their actual mechanism is, but they do away with the deterministic record-replay approach largely for performance reasons. They claim to provide system liveness despite the existence of some given number of concurrency faults.
\subsection{Bug Fixing}
\item CFix \cite{cfix} uses existing concurrency bug detectors to obtain bad traces, and generates patches to fix those bugs. Its fix strategies are mutual exclusion enforcement (for which they use AFix, an existing tool) and order enforcement (for which they contribute OFix). They also merge patches and select for the simplest patches which get the job done.
\subsection{Bug Arch\ae{}ology}
\item \cite{evolution} is a study of linux file system evolution. They analysed over 5000 patches on six common linux filesystems: ext3, ext4, reiserfs, jfs, xfs, and btrfs, and classified them according to multiple metrics: bugfix/performance/enhancement/reliability, common-case/failure-path, type of data structure involved, etc.
Main observations include: 40\% of bugs were in failure-path code, semantic bugs (i.e. domain knowledge to fix) are over 50\% of bugs (with concurrency bugs in second place - 20\%), 50\% of all patches are maintenance, and B-trees are relatively unlikely to have bugs compared to other data structures.
The intended impact of the paper is not crystal-clear, though they claim it should influence tool- and language-designers and developers.
\subsection{Testing Techniques Comparison}
\item A master's thesis from amsterdam~\cite{randomboundedexhaustive} pits random testing and ``bounded exhaustive testing'' against each other to measure their relative effectiveness at finding bugs. In the scope of their thesis, the strategies refer to different ways of selecting values from the space of possible inputs to the program under test, though the analogy to concurrency state spaces or symbolic execution state spaces is straightforward.
Their statement that the motivation for bounded exhaustive testing is the hypothesis that most bugs can be found with small and simple test cases (which I disagree with on the surface level\footnote{...similar though it may seem to observation 3 in section 7.3.2 of \cite{Landslide} -- the difference is that I advocate using the observed pattern as a heuristic for exploration ordering, while here they advocate relying on it to assure correctness when stopping exploration early.}) belies an underlying assumption: no modifications may be made to the test setup to give exhaustive testing any hope of covering a significant fraction of the state space.
A major weakness of this study is that the comparison is performed based on tests of only one piece of software (the programming language Rascal). The work would be stronger if it explored different types of state spaces and identified properties of state spaces that favour one testing strategy over the other.
\section{Future Reading}
\item ReVirt~\cite{revirt} deserves a skim.
Jump to Line
Something went wrong with that request. Please try again.