Propagate tags (numerical identifiers) from many sources to many sinks.
User defines sources and sinks (memory locations) via a
configuration file (option -add-metadata-config
):
read, 2, clam-prov-type:input
read, 2, clam-prov-size:3
write, 2, clam-prov-type:output
write, 2, clam-prov-size:3
This states that the clam-prov
type of the second parameter of any
call to read
is an input (i.e., source) and the clam-prov
type of
the second parameter of any call to write
is an output (i.e., sink).
Also, it states that the size of the second parameter of any call to read
is specified by the third parameter and the size of the second parameter
of any call to write
is specified by the third parameter. Note that, in
general, a program will have many sources and many sinks since it can
have many calls to read
and write
.
The analysis then assigns a unique numerical identifier (i.e., tags)
to each source and it relies on
the Clam static analyzer to
propagate those tags across memory and function boundaries. The output
of the analysis maps each sink to a set with all possible
tags. With this, each sink is connected back to a subset of
sources. Currently, this output is encoded as metadata named
clam-prov-tags
. For instance:
%res = call i64 @write(i32 %param, i8* %param1, i64 %param2), !call-site-metadata !7, !clam-prov-tags !9
...
!7 = !{!"4", !8}
!8 = !{!"2", !"clam-prov-type:output", !"clam-prov-size:3"}
!9 = !{i64 2}
This says that the second parameter of call to write
(identifier
4
) is tagged with tags {2}
.
lit
:sudo pip install lit
cmp
utility
./install.sh
To run some regression tests:
cmake --build . --target test-all
clam-prov.py test.c -o test.prov.bc
By default, clam-prov.py
uses the file config/sources-sinks.config
from the install directory to know which are the sources and sinks.
The output bitcode file test.prov.bc
contains call-site-metadata
metadata indicating the tags assigned to both sources (e.g., read
calls) and sinks (e.g., write calls).
If we want to use a different set of sources and sinks then use the command:
clam-prov.py test.c --add-metadata-config=addMetadata.config -o test.prov.bc
Alternatively, the same information provided by LLVM metadata
call-site-metadata
can be printed to a file in DOT format. The tags
propagated to sinks from sources can be outputted using the argument
dependency-map-file
as follows:
clam-prov.py test.c --add-metadata-config=addMetadata.config --dependency-map-file=dependency_map.dot
Following is an example output file dependency_map.dot
:
digraph clam_prov_dependency_map{
"0" [label="function name:read\ncall site:0"];
"1" [label="function name:read\ncall site:1"];
"2" [label="function name:write\ncall site:2"];
"2" -> "0" [label="WasDependentOn"];
"2" -> "1" [label="WasDependentOn"];
}
The output above says the following:
- First call-site
read
has the call site tag0
- Second call-site
read
has the call site tag1
- Third call-site
write
has the call site tag2
- The third call-site (
write
) has the propagated call site tags0
, and1
.
Logging can be added to a Linux program to emit call-sites using the argument add-logging-config
as follows:
clang -Xclang -disable-O0-optnone -c -emit-llvm test.c -o test.bc
clam-pp --crab-devirt test.bc -o test.pp.bc
clam-prov test.pp.bc --add-metadata-config=addMetadata.config --add-logging-config=call-site-logging.config -o test.out.pp.bc
The above specifies the file call-site-logging.config
to configure how to log the call-sites when program is executed. The configurations must have the keys:
output_mode
- Whether to write to a file (at~/.clam-prov/audit.log
) or to a pipe (at~/.clam-prov/audit.pipe
). Specify0
to write to the file, or specify1
to write to the pipemax_records
- The maximum call-site records to buffer before writing to the file or the pipe
The output is written as a series of records in binary format. Each record contains the following fields in the given order:
time in milliseconds
expressed as an unsigned long (8 bytes)process id
expressed as an integer (4 bytes)call site tag
expressed as a signed long (8 bytes)function return value
expressed as a signed long (8 bytes)name of the function
expressed as a char array (256 bytes)
The source file CallSiteLogReader.c demonstrates how to read the call site log file.
To be able to generate an executable to log call-sites from test.out.pp.bc
(above), the shared library must be linked as follows:
llc -relocation-model=pic test.out.pp.bc -o test.out.pp.s
gcc -L./install/lib test.out.pp.s -o test.out.pp.native -lclamprovlogger
Finally, the generated executable can be executed as:
LD_LIBRARY_PATH=./install/lib ./test.out.pp.native