SAFIRE : Scalable and Accurate Fault Injection for Parallel Multi-threaded Applications
SAFIRE is a fault injection framework, descending from REFINE, that supports injecting bit-flip faults in both serial and multi-threaded programs. Also, it supports injecting faults to multi-process, distributed execution but this is experimental at the moment.
SAFIRE includes a modified LLVM compiler backend for x86 to instrument and inject bit-flip faults on machine instructions. The backend implements several optimizations for fast instrumentation and injection that make SAFIRE the fastest and most accurate tool for fault injection so far. Compiling with SAFIRE produces an instrumented binary ready to interface with a dynamic library that implements an API of function hooks. Function hooks get information on the executed instruction and may trigger fault injection. Dynamic libraries can implement any instruction-based fault model to select which instruction(s) to inject to, which operand(s), and which bit(s) to flip.
Repo directory structure
The main directories of the repo are:
- llvm-3.9.0, which contains the modified LLVM compiler
- libinject, which contains implementations of a single fault model for serial, multi-threaded, and multi-process distributed execution (experimental)
The repo contains also a reference directory of the paper on SAFIRE presented at IPDPS'19, named ipdps19. Its sub-directories are:
- pinfi, contains a fault injection tool based on Intel PIN used to compare with SAFIRE
- programs, contains several programs used in experimentation, sub-directories of each program are:
- golden, program in vanilla version with no modification
- llfi, programs that their build process is modified to use the LLFI tool (https://github.com/DependableSystemsLab/LLFI) for fault injection
- pinfi, like golden, programs with no modifications for building using PINFI, the purpose of the sub-directory is store the output of experiments
- refine, programs that their building process is modified to use REFINE for fault injection
- results, contains .eps figures of the accuracy and performance results published in the paper
- scripts, contains the scripts used for running experiments, and post-processing for figure and table creation
Build the SAFIRE LLVM compiler
- Clone the repo
git clone <repo url>
- Change directory to llvm-3.9.0
- Download clang-3.9.0 (http://releases.llvm.org/3.9.0/cfe-3.9.0.src.tar.xz) and decompress it in the llvm-3.9.0/tools/ sub-directory
wget -P tools/ http://releases.llvm.org/3.9.0/cfe-3.9.0.src.tar.xz tar -C tools/ -xf tools/cfe-3.9.0.src.tar.xz
- Create a directory for building, e.g., BUILD
- Change to the building directory
- Run cmake to boostrap the build proces and set the installation directory, e.g.,
cmake -DCMAKE_INSTALL_PREFIX=$HOME/opt/safire -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_ENABLE_DOXYGEN=OFF ..
- Run the build program to process cmake generated build files, e.g.,
- Install the SAFIRE LLVM compiler binaries, e.g.,
Use the SAFIRE LLVM compiler
- Set the environment path to include the installation directory, e.g.,
- SAFIRE extends the LLVM compiler with Fault Injection (FI) flags. Those are:
|-fi||Enable SAFIRE instrumentation and FI in the LLVM backend|
|-fi-ff||Enable the fast-forwarding optimization for instrumentation and injection. Should always enable it for significant speedup, disabling it is there only for comparison|
|-fi-funcs||Comma separated list of functions to target for instrumentation and injection. Setting to "*" selects all|
|-fi-funcs-excl||Comma separated list of functions to exclude from instrumentation and injection|
|-fi-inst-types||Comma separated list of instruction types to target for FI, possible values are: frame, control, data. Setting to "*" selects all|
|-fi-reg-types||comma separated list of register types to be possible FI targets, possible types are: src, dst. Setting to "*" selects all|
To include SAFIRE's instrumentation in the compilation process, you need to include the SAFIRE FI flags in the flags given to the compiler driver, such as
clang. For example, enabling SAFIRE within a Makefile of C compilation extends the
variable CFLAGS as:
CFLAGS += -mllvm -fi -mllvm -fi-ff -mllvm -fi-funcs="*" -mllvm -fi-inst-types="*" -mllvm fi-reg-types="dst"
Those flags enable fault injection with the fast-forwarding optimization, targeting all functions and instructions to inject faults to destination registers.
The following is the same example but invoking
clang from the command line:
clang -O3 -mllvm -fi -mllvm -fi-ff -mllvm -fi-funcs="*" -mllvm -fi-inst-types="*" -mllvm fi-reg-types="dst"
For more examples, see programs in the
programs/safire directory of the repo.
- Compiling with SAFIRE requires linking with a library that implements routines hooks emitted by SAFIRE instrumentation. The prototypes of those routines and their function is:
void selMBB(uint64_t *ret, uint64_t num_insts)
The instrumented program calls this routine on entry to a (Machine) Basic Block of machine instructions (a Basic Block is a sequence of instructions that execute indivisibly). This is the
default instrumentation mode enabled when the program starts execution.
The variable num_insts is input and has the number of instructions in this basic block. The variable ret is output pointing to a memory location. The value that the routine stores in this memory location guides the instrumentation in the program; there are three possibilities:
- *ret = 0, execution continues with Basic Block instrumentation
- *ret = 1, execution continues with detailed per-instruction instrumentation
- *ret = 2, execution continues with instrumentation disabled, there will not be any further calls to hooks nor any instrumentation overhead
The typical use of selMBB is to count the number of dynamic instructions executed so far to decide, based on fault model, whether fault injection should happen to one of the instructions in this basic block. If no, then *ret=0 instructs execution to continue execution without instrumentation until the next basic block. If yes, then *ret=1 instructions execution to continue execution with per-instruction instrumentation that steps every instruction of the basic block until the target is found and the fault is injected. If there are no more faults to inject, by the fault model, *ret=2 disables instrumentation and avoids any overhead from that point on.
void selInst(uint64_t *ret, uint8_t *instr_str)
The instrumented program calls this routine for each instruction, when per-instruction instrumentation is enabled.
The variable instr_str is input and has a textual, C-string representation of the instruction to execute in the next step. The variable ret is output pointing to a memory location. The value that the routine stores in this memory locations guides fault injection; there are two possibilities:
- *ret = 0, execution continues to the next instruction without fault injection
- *ret = 1, execution continues but after this instruction executes, the instrumented binary will invoke the fault injection routine hook doInject, discussed next.
void doInject(unsigned num_ops, uint64_t *op, uint64_t *size, uint8_t *bitmask)
If selInstr sets *ret = 1, the instrumented binary calls doInject right after the instruction in selInst has executed. The routine
doInject can change the value of any operand using a bitmask to inject bit-flips.
The variable num_ops is input and has the number of operands for the instruction. The variable op is output pointing to a memory location. The value the routine stores in this memory location is the identifier of the operand to inject a fault, valid values are 0..num_ops-1. The variable size is input pointing to an array of length num_ops that stores the size in bytes of operands, indexed by the operand identifier. This is helper data to communicate the size of operands. The variable bitmask is output and determines the bitmask to apply to the chosen operand *op. It is a pointer to a byte array that has been allocated by instrumentation storing the bitmask in least significant bit first order (little-endian). A value of '1' in bit position causes injection to flip the bit of the operand at that position.
The instrumented binary must link to a library that implements those function hooks.
There are examples of libraries implementing the single fault model for serial and parallel execution under the directory
Run a fault-injection experiment using SAFIRE and the single-fault model library
The FI library we provide needs a dynamic instruction count to pool a random instruction to inject fault. For that, the library reads the dynamic target instruction counter for the file
fi-inscount.txt. If the file is missing, our library implementation performs a boostrap run that does the counting without injecting faults. There are different example libraries depending on the whether targeting serial, multi-threaded, or multi-process (experimental) execution.
The format of the file
fi-inscount.txt for multi-threaded execution is:
thread=X, fi_index=N ... fi_index=M
where X is the thread id, fi_index in the same line is the number of dynamic instructions thread X executed, and the final fi_index is the total dynamic instructions from all threads.
In next runs, after fi-inscount.txt has been created, the FI library will perform fault injection. For our implementation, the library expects a
fi-target.txt file which contains the thread and target instruction to inject to.
The library reads this file and randomly selects the operand and bit to flip. See the script in
<repo>/ipdps19/scripts/faultinject.py for how we generate a set of FI targets.
Lastly, the FI library saves a log of fault injection in the file
fi-inject.txt that contains the following information:
thread, the thread identifier the fault was injected to
fi_index, the index of the dynamic instruction
op, the index of the operand
size, the size in bytes of the operand
bitflip, the position of the flipped If
fi-inject.txtexists, the library will inject the fault at the same instruction, operand, and bit position specified by this file.
Build the PINFI tool
- Download and install the latest Intel PIN framework (https://software.intel.com/en-us/articles/pin-a-binary-instrumentation-tool-downloads)
- Copy the directory
pinfiin the installed PIN path under
- Change to the copied pinfi directory
maketo build the tool
Note, the PINFI tool is configurable to select whether to inject errors in
source registers, destination registers or destination memory operands of
instructions. This is possible by editing the file
utils.h within the
directory and including or excluding the preprocessor directives
FI_DST_MEM. Those directives control whether FI is enabled for
their respective, self-descriptive targets. The PINFI fault injection
implementation follows too single fault model.
Use the PINFI tool
Similar to SAFIRE, PINFI must have a dynamic target instruction count before
performing fault injection. For the boostrap run, PINFI implements an
tool to run before executing PINFI's
$PIN_PATH/pin -t $PIN_PATH/source/tools/pinfi/obj-intel64/instcount -- ./program <args>
This will run the dynamic instruction counter and generate the
pin.instcount.txt file that contains the number of dynamic target instructions.
PINFI's fault injection tool reads this file to perform random fault injection. Running the fault injection tool:
$PIN_PATH/pin -t $PIN_PATH/source/tools/pinfi/obj-intel64/faultinjection -- ./program <args>
The PINFI tool produces a log of fault injection in the file
pin.injection.txt which contains the following entries:
thread, the thread identifier to which the fault was injected
fi_index, the index of the dynamic instruction
reg, the symbolic name of the operand
bitflip, the position of the flipped bit
addr, the instruction pointer address
To contribute to SAFIRE please send a pull request on the master branch of this repo.
SAFIRE was created by Giorgis Georgakoudis, email@example.com, under technical guidance of Ignacio Laguna (LLNL) and Hans Vandierendonck (QUB), and design mentoring of Dimitrios S. Nikolopoulos (VT) and Martin Schulz (TUM)
Please cite the following paper:
- G. Georgakoudis, I. Laguna, H. Vandierendonck, D. S. Nikolopoulos and M. Schulz, SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications , 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, 2019, pp. 890-899.
SAFIRE is distributed under the terms of the Apache License (Version 2.0) with LLVM exceptions. Other software that is part of this repository may be under a different license, documented by the file LICENSE in its sub-directory.
All new contributions to SAFIRE must be under the Apache License (Version 2.0) with LLVM exceptions.
SPDX License Identifier: "Apache-2.0 WITH LLVM-exception"