Skip to content

Handling Andrii's patchset

Harishankar Vishwanathan edited this page Nov 25, 2023 · 3 revisions

This is my analysis of what I think is happening:

Source of the issue

The main issue comes from a change in the is_branch_taken() function in Andrii's patchset:

static int is_branch_taken(struct bpf_reg_state *reg1, struct bpf_reg_state *reg2,
			   u8 opcode, bool is_jmp32)
{
…
	if (!is_reg_const(reg2, is_jmp32)) {
		opcode = flip_opcode(opcode);
		swap(reg1, reg2);
	}
…
}

It uses a swap() instruction as you can see. Here's an implementation from minmax.h:

#define swap(a, b) \
	do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)

Which essentially swaps two values, regardless of type. The main issue is, this emits a select instruction on pointer types (bpf_reg_states). The following is a reduced snippet of the select instruction, and how it is used.

%select_on_ptrs = select i1 %select_value, %bpf_reg_state* %src_reg, %struct.bpf_reg_state* %dst_reg
%gep_ptr = getelementptr %bpf_reg_state, %bpf_reg_state* %select_on_ptrs, i64 0, i32 0
%load_value = load i32, i32* %gep, align 8
store i64 %i206.i, i64* %i200.i, align 8

Which we don't handle yet, and it seems is very painful to handle.

What I tried:

I started with an implementation that handles this. Essentially, whenever there's a select instruction on pointer types, we save the details of this instruction in a map. So SelectMap contains, after the first instruction:

SelectMap: {[%select_ptr, %src_reg, %dst_reg]}

On encountering the GEP instruction, we handle this same as before: we store the details of the GEP in our usual GEPMap.

GEPMap: {[%gep_ptr, %select_on_ptrs, 0, 0]}

With the idea that on encountering the load instruction, we can resolve it to a bitvector formula, conditioned on the result of the select. We have our memory view of bitvector trees, let's say it is populated so, at the point we encounter the load:

MemoryViewMap: 
{
%src_reg: [
	[s1_bv]            <-- u64_min
	[s2_bv]           <-- u64_max
	...
],
%dst_reg: [  
	[d1_bv]            <-- u64_min
	[d2_bv]           <-- u64_max
	...
]
}

When we encounter the load (this can be skipped if you want to skim),

  • We check that the pointer type of the load came from a select (not from a GEP as it usually does)
  • Using GEPMap, we figure out that it is associated with %select_on_ptrs.
  • We also see that the GEP indexes into the 0th element in a bpf_reg_state, so we need to access the 0th bitvector in both the bitvector trees (s1_bv, d1_bv) .
  • Using SelectMap, we figure out that the true case of the select is %src_reg, and the false case is %dst_reg.
  • So the 0th bitvector from src_reg is accessed in the true case, and the 0th bitvector from dst_reg is accessed in the false case.

We finally make the following assertions for the current basic block, using all the information in all the maps.

((select_value_bv == 1) ==>  load_bv == s1_bv)
((select_value_bv == 0) ==>  load_bv == s2_bv)