This peephole optimization pass looks for add operations to bitfields under the same uint32 in the same sturct and merge multiple add instructions into a single add instruction in the IR, reducing the IR instruction count. For example, say we have some struct like the following:
typedef struct x{
unsigned a:5;
unsigned b:27;
}t;
t w = {1, 2};
We want to perform something like this, where 2 bitfields covered by the same unsigned int are both added some constant:
w.a+=5;
w.b++;
Instead of doing 2 seperate additions, this can obviously be merged into 1 addition.
- Scan instructions between pairs of load/store of the same global value. Any or disjoint instructions that has its value stored by the pair are considered a candidate for optimization.
- Every candidate is the root of a expression tree, which is then matched against the following grammar (%1 is the value loaded by the load/store pair):
<BFAr> ::= <BFAr> or disjoint <BFAr>
| <BFadd>
| <BFmask>
| <BFMA>
<BFadd> ::= %1 add ConstInt
<BFmask> ::= %1 and ConstInt
<BFMA> ::= <BFmask> add ConstInt
| <BFadd> and ConstInt
- Replace all the %1 with 0 for all the matching trees. Since all nodes of the trees should now be constant, constant folding can be performed.
- Create a new instruction that add the folded value to %1
- Insert new instructions that masks certain bits of %1 for instructions that uses the value of nodes in the matching trees.
- Delete the matching trees.
- Install clang & llvm 18.1.8
- compile the pass w/ this command under build/ (If it doesn't exist mkdir first. Subsequent commands should also be run under the same path):
cmake ..
make -j
- compile both test cases
clang -S -emit-llvm -O3 -o ../tests/struct.ll ../tests/struct.c
clang -S -emit-llvm -O3 -o ../tests/loop_struct.ll ../tests/loop_struct.c
- generate the optimized IR.
opt -S -load-pass-plugin=skeleton/SkeletonPass.so -passes=skeleton-pass -o ../tests/loop_struct_trans.ll ../tests/loop_struct.ll
opt -S -load-pass-plugin=skeleton/SkeletonPass.so -passes=skeleton-pass -o ../tests/struct_trans.ll ../tests/struct.ll
this will also print the instruction count like below:
b4 transforming main: 73
after transforming main: 56
b4 transforming printf: 0
after transforming printf: 0
b4 transforming main: 23
after transforming main: 11
b4 transforming printf: 0
after transforming printf: 0
- run the 4 .ll files with lli
lli ../tests/struct.ll
the other 3 can be run similarly and the output of the original .ll and the *trans.ll should be identical.