Skip to content

Commit c377275

Browse files
committed
DataFlowSanitizer; Clang changes.
DataFlowSanitizer is a generalised dynamic data flow analysis. Unlike other Sanitizer tools, this tool is not designed to detect a specific class of bugs on its own. Instead, it provides a generic dynamic data flow analysis framework to be used by clients to help detect application-specific issues within their own code. Differential Revision: http://llvm-reviews.chandlerc.com/D966 llvm-svn: 187925
1 parent 5cbab07 commit c377275

File tree

9 files changed

+250
-2
lines changed

9 files changed

+250
-2
lines changed

clang/docs/DataFlowSanitizer.rst

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
=================
2+
DataFlowSanitizer
3+
=================
4+
5+
.. contents::
6+
:local:
7+
8+
Introduction
9+
============
10+
11+
DataFlowSanitizer is a generalised dynamic data flow analysis.
12+
13+
Unlike other Sanitizer tools, this tool is not designed to detect a
14+
specific class of bugs on its own. Instead, it provides a generic
15+
dynamic data flow analysis framework to be used by clients to help
16+
detect application-specific issues within their own code.
17+
18+
Usage
19+
=====
20+
21+
With no program changes, applying DataFlowSanitizer to a program
22+
will not alter its behavior. To use DataFlowSanitizer, the program
23+
uses API functions to apply tags to data to cause it to be tracked, and to
24+
check the tag of a specific data item. DataFlowSanitizer manages
25+
the propagation of tags through the program according to its data flow.
26+
27+
The APIs are defined in the header file ``sanitizer/dfsan_interface.h``.
28+
For further information about each function, please refer to the header
29+
file.
30+
31+
Example
32+
=======
33+
34+
The following program demonstrates label propagation by checking that
35+
the correct labels are propagated.
36+
37+
.. code-block:: c++
38+
39+
#include <sanitizer/dfsan_interface.h>
40+
#include <assert.h>
41+
42+
int main(void) {
43+
int i = 1;
44+
dfsan_label i_label = dfsan_create_label("i", 0);
45+
dfsan_set_label(i_label, &i, sizeof(i));
46+
47+
int j = 2;
48+
dfsan_label j_label = dfsan_create_label("j", 0);
49+
dfsan_set_label(j_label, &j, sizeof(j));
50+
51+
int k = 3;
52+
dfsan_label k_label = dfsan_create_label("k", 0);
53+
dfsan_set_label(k_label, &k, sizeof(k));
54+
55+
dfsan_label ij_label = dfsan_get_label(i + j);
56+
assert(dfsan_has_label(ij_label, i_label));
57+
assert(dfsan_has_label(ij_label, j_label));
58+
assert(!dfsan_has_label(ij_label, k_label));
59+
60+
dfsan_label ijk_label = dfsan_get_label(i + j + k);
61+
assert(dfsan_has_label(ijk_label, i_label));
62+
assert(dfsan_has_label(ijk_label, j_label));
63+
assert(dfsan_has_label(ijk_label, k_label));
64+
65+
return 0;
66+
}
67+
68+
Current status
69+
==============
70+
71+
DataFlowSanitizer is a work in progress, currently under development for
72+
x86\_64 Linux.
73+
74+
Design
75+
======
76+
77+
Please refer to the :doc:`design document<DataFlowSanitizerDesign>`.
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
DataFlowSanitizer Design Document
2+
=================================
3+
4+
This document sets out the design for DataFlowSanitizer, a general
5+
dynamic data flow analysis. Unlike other Sanitizer tools, this tool is
6+
not designed to detect a specific class of bugs on its own. Instead,
7+
it provides a generic dynamic data flow analysis framework to be used
8+
by clients to help detect application-specific issues within their
9+
own code.
10+
11+
DataFlowSanitizer is a program instrumentation which can associate
12+
a number of taint labels with any data stored in any memory region
13+
accessible by the program. The analysis is dynamic, which means that
14+
it operates on a running program, and tracks how the labels propagate
15+
through that program. The tool shall support a large (>100) number
16+
of labels, such that programs which operate on large numbers of data
17+
items may be analysed with each data item being tracked separately.
18+
19+
Use Cases
20+
---------
21+
22+
This instrumentation can be used as a tool to help monitor how data
23+
flows from a program's inputs (sources) to its outputs (sinks).
24+
This has applications from a privacy/security perspective in that
25+
one can audit how a sensitive data item is used within a program and
26+
ensure it isn't exiting the program anywhere it shouldn't be.
27+
28+
Interface
29+
---------
30+
31+
A number of functions are provided which will create taint labels,
32+
attach labels to memory regions and extract the set of labels
33+
associated with a specific memory region. These functions are declared
34+
in the header file ``sanitizer/dfsan_interface.h``.
35+
36+
.. code-block:: c
37+
38+
/// Creates and returns a base label with the given description and user data.
39+
dfsan_label dfsan_create_label(const char *desc, void *userdata);
40+
41+
/// Sets the label for each address in [addr,addr+size) to \c label.
42+
void dfsan_set_label(dfsan_label label, void *addr, size_t size);
43+
44+
/// Sets the label for each address in [addr,addr+size) to the union of the
45+
/// current label for that address and \c label.
46+
void dfsan_add_label(dfsan_label label, void *addr, size_t size);
47+
48+
/// Retrieves the label associated with the given data.
49+
///
50+
/// The type of 'data' is arbitrary. The function accepts a value of any type,
51+
/// which can be truncated or extended (implicitly or explicitly) as necessary.
52+
/// The truncation/extension operations will preserve the label of the original
53+
/// value.
54+
dfsan_label dfsan_get_label(long data);
55+
56+
/// Retrieves a pointer to the dfsan_label_info struct for the given label.
57+
const struct dfsan_label_info *dfsan_get_label_info(dfsan_label label);
58+
59+
/// Returns whether the given label label contains the label elem.
60+
int dfsan_has_label(dfsan_label label, dfsan_label elem);
61+
62+
/// If the given label label contains a label with the description desc, returns
63+
/// that label, else returns 0.
64+
dfsan_label dfsan_has_label_with_desc(dfsan_label label, const char *desc);
65+
66+
Taint label representation
67+
--------------------------
68+
69+
As stated above, the tool must track a large number of taint
70+
labels. This poses an implementation challenge, as most multiple-label
71+
tainting systems assign one label per bit to shadow storage, and
72+
union taint labels using a bitwise or operation. This will not scale
73+
to clients which use hundreds or thousands of taint labels, as the
74+
label union operation becomes O(n) in the number of supported labels,
75+
and data associated with it will quickly dominate the live variable
76+
set, causing register spills and hampering performance.
77+
78+
Instead, a low overhead approach is proposed which is best-case O(log\
79+
:sub:`2` n) during execution. The underlying assumption is that
80+
the required space of label unions is sparse, which is a reasonable
81+
assumption to make given that we are optimizing for the case where
82+
applications mostly copy data from one place to another, without often
83+
invoking the need for an actual union operation. The representation
84+
of a taint label is a 16-bit integer, and new labels are allocated
85+
sequentially from a pool. The label identifier 0 is special, and means
86+
that the data item is unlabelled.
87+
88+
When a label union operation is requested at a join point (any
89+
arithmetic or logical operation with two or more operands, such as
90+
addition), the code checks whether a union is required, whether the
91+
same union has been requested before, and whether one union label
92+
subsumes the other. If so, it returns the previously allocated union
93+
label. If not, it allocates a new union label from the same pool used
94+
for new labels.
95+
96+
Specifically, the instrumentation pass will insert code like this
97+
to decide the union label ``lu`` for a pair of labels ``l1``
98+
and ``l2``:
99+
100+
.. code-block:: c
101+
102+
if (l1 == l2)
103+
lu = l1;
104+
else
105+
lu = __dfsan_union(l1, l2);
106+
107+
The equality comparison is outlined, to provide an early exit in
108+
the common cases where the program is processing unlabelled data, or
109+
where the two data items have the same label. ``__dfsan_union`` is
110+
a runtime library function which performs all other union computation.
111+
112+
Further optimizations are possible, for example if ``l1`` is known
113+
at compile time to be zero (e.g. it is derived from a constant),
114+
``l2`` can be used for ``lu``, and vice versa.
115+
116+
Memory layout and label management
117+
----------------------------------
118+
119+
The following is the current memory layout for Linux/x86\_64:
120+
121+
+---------------+---------------+--------------------+
122+
| Start | End | Use |
123+
+===============+===============+====================+
124+
| 0x700000008000|0x800000000000 | application memory |
125+
+---------------+---------------+--------------------+
126+
| 0x200200000000|0x700000008000 | unused |
127+
+---------------+---------------+--------------------+
128+
| 0x200000000000|0x200200000000 | union table |
129+
+---------------+---------------+--------------------+
130+
| 0x000000010000|0x200000000000 | shadow memory |
131+
+---------------+---------------+--------------------+
132+
| 0x000000000000|0x000000010000 | reserved by kernel |
133+
+---------------+---------------+--------------------+
134+
135+
Each byte of application memory corresponds to two bytes of shadow
136+
memory, which are used to store its taint label. As for LLVM SSA
137+
registers, we have not found it necessary to associate a label with
138+
each byte or bit of data, as some other tools do. Instead, labels are
139+
associated directly with registers. Loads will result in a union of
140+
all shadow labels corresponding to bytes loaded (which most of the
141+
time will be short circuited by the initial comparison) and stores will
142+
result in a copy of the label to the shadow of all bytes stored to.

clang/docs/UsersManual.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -895,6 +895,8 @@ are listed below.
895895
used in conjunction with the ``-fsanitize-undefined-trap-on-error``
896896
flag. This includes all of the checks listed below other than
897897
``unsigned-integer-overflow`` and ``vptr``.
898+
- ``-fsanitize=dataflow``: :doc:`DataFlowSanitizer`, a general data
899+
flow analysis.
898900

899901
The following more fine-grained checks are also available:
900902

clang/include/clang/Basic/Sanitizers.def

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,9 @@ SANITIZER("vptr", Vptr)
7777
// IntegerSanitizer
7878
SANITIZER("unsigned-integer-overflow", UnsignedIntegerOverflow)
7979

80+
// DataFlowSanitizer
81+
SANITIZER("dataflow", DataFlow)
82+
8083
// -fsanitize=undefined includes all the sanitizers which have low overhead, no
8184
// ABI or address space layout implications, and only catch undefined behavior.
8285
SANITIZER_GROUP("undefined", Undefined,

clang/lib/CodeGen/BackendUtil.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,11 @@ static void addThreadSanitizerPass(const PassManagerBuilder &Builder,
206206
PM.add(createThreadSanitizerPass(CGOpts.SanitizerBlacklistFile));
207207
}
208208

209+
static void addDataFlowSanitizerPass(const PassManagerBuilder &Builder,
210+
PassManagerBase &PM) {
211+
PM.add(createDataFlowSanitizerPass());
212+
}
213+
209214
void EmitAssemblyHelper::CreatePasses(TargetMachine *TM) {
210215
unsigned OptLevel = CodeGenOpts.OptimizationLevel;
211216
CodeGenOptions::InliningMethod Inlining = CodeGenOpts.getInlining();
@@ -265,6 +270,13 @@ void EmitAssemblyHelper::CreatePasses(TargetMachine *TM) {
265270
addThreadSanitizerPass);
266271
}
267272

273+
if (LangOpts.Sanitize.DataFlow) {
274+
PMBuilder.addExtension(PassManagerBuilder::EP_OptimizerLast,
275+
addDataFlowSanitizerPass);
276+
PMBuilder.addExtension(PassManagerBuilder::EP_EnabledOnOptLevel0,
277+
addDataFlowSanitizerPass);
278+
}
279+
268280
// Figure out TargetLibraryInfo.
269281
Triple TargetTriple(TheModule->getTargetTriple());
270282
PMBuilder.LibraryInfo = new TargetLibraryInfo(TargetTriple);

clang/lib/Driver/SanitizerArgs.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,11 @@ class SanitizerArgs {
3737
NeedsAsanRt = Address,
3838
NeedsTsanRt = Thread,
3939
NeedsMsanRt = Memory,
40+
NeedsDfsanRt = DataFlow,
4041
NeedsLeakDetection = Leak,
4142
NeedsUbsanRt = Undefined | Integer,
4243
NotAllowedWithTrap = Vptr,
43-
HasZeroBaseShadow = Thread | Memory
44+
HasZeroBaseShadow = Thread | Memory | DataFlow
4445
};
4546
unsigned Kind;
4647
std::string BlacklistFile;
@@ -66,6 +67,7 @@ class SanitizerArgs {
6667
return false;
6768
return Kind & NeedsUbsanRt;
6869
}
70+
bool needsDfsanRt() const { return Kind & NeedsDfsanRt; }
6971

7072
bool sanitizesVptr() const { return Kind & Vptr; }
7173
bool notAllowedWithTrap() const { return Kind & NotAllowedWithTrap; }

clang/lib/Driver/Tools.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1860,6 +1860,12 @@ static void addUbsanRTLinux(const ToolChain &TC, const ArgList &Args,
18601860
addSanitizerRTLinkFlagsLinux(TC, Args, CmdArgs, "ubsan_cxx", false);
18611861
}
18621862

1863+
static void addDfsanRTLinux(const ToolChain &TC, const ArgList &Args,
1864+
ArgStringList &CmdArgs) {
1865+
if (!Args.hasArg(options::OPT_shared))
1866+
addSanitizerRTLinkFlagsLinux(TC, Args, CmdArgs, "dfsan", true);
1867+
}
1868+
18631869
static bool shouldUseFramePointer(const ArgList &Args,
18641870
const llvm::Triple &Triple) {
18651871
if (Arg *A = Args.getLastArg(options::OPT_fno_omit_frame_pointer,
@@ -6275,6 +6281,8 @@ void gnutools::Link::ConstructJob(Compilation &C, const JobAction &JA,
62756281
addMsanRTLinux(getToolChain(), Args, CmdArgs);
62766282
if (Sanitize.needsLsanRt())
62776283
addLsanRTLinux(getToolChain(), Args, CmdArgs);
6284+
if (Sanitize.needsDfsanRt())
6285+
addDfsanRTLinux(getToolChain(), Args, CmdArgs);
62786286

62796287
// The profile runtime also needs access to system libraries.
62806288
addProfileRTLinux(getToolChain(), Args, CmdArgs);

clang/lib/Lex/PPMacroExpansion.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -908,6 +908,7 @@ static bool HasFeature(const Preprocessor &PP, const IdentifierInfo *II) {
908908
.Case("enumerator_attributes", true)
909909
.Case("memory_sanitizer", LangOpts.Sanitize.Memory)
910910
.Case("thread_sanitizer", LangOpts.Sanitize.Thread)
911+
.Case("dataflow_sanitizer", LangOpts.Sanitize.DataFlow)
911912
// Objective-C features
912913
.Case("objc_arr", LangOpts.ObjCAutoRefCount) // FIXME: REMOVE?
913914
.Case("objc_arc", LangOpts.ObjCAutoRefCount)

clang/runtime/compiler-rt/Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,8 @@ endif
109109
ifeq ($(ARCH),x86_64)
110110
RuntimeLibrary.linux.Configs += \
111111
full-x86_64.a profile-x86_64.a san-x86_64.a asan-x86_64.a \
112-
tsan-x86_64.a msan-x86_64.a ubsan-x86_64.a ubsan_cxx-x86_64.a
112+
tsan-x86_64.a msan-x86_64.a ubsan-x86_64.a ubsan_cxx-x86_64.a \
113+
dfsan-x86_64.a
113114
# We need to build 32-bit ASan/UBsan libraries on 64-bit platform, and add them
114115
# to the list of runtime libraries to make
115116
# "clang -fsanitize=(address|undefined) -m32" work.

0 commit comments

Comments
 (0)