Skip to content

Commit

Permalink
Simplify multiplication overflow check for getelementpointer.
Browse files Browse the repository at this point in the history
We now use a simple comparison with (maxint / size) instead of
a wide multiply. The new comparison statically evaluates to true
more often than the old one.
  • Loading branch information
Brian Huffman committed Jan 16, 2018
1 parent 092799f commit e1838b6
Showing 1 changed file with 23 additions and 20 deletions.
43 changes: 23 additions & 20 deletions crucible-llvm/src/Lang/Crucible/LLVM/Translation.hs
Original file line number Diff line number Diff line change
Expand Up @@ -1121,16 +1121,18 @@ calcGEP' (ArrayType bound typ') base (idx : xs) = do
let sz :: Expr (LLVM arch) s (BVType wptr)
sz = app $ BVLit PtrWidth $ isz

-- Perform a signed wide multiply and check for overflow
Just LeqProof <- return (testLeq (knownNat @1) (addNat PtrWidth PtrWidth))
Just LeqProof <- return (testLeq (addNat PtrWidth (knownNat @1)) (addNat PtrWidth PtrWidth))
let wideMul = app $ BVMul (addNat PtrWidth PtrWidth)
(app $ BVSext (addNat PtrWidth PtrWidth) PtrWidth sz)
(app $ BVSext (addNat PtrWidth PtrWidth) PtrWidth idx')
let off = app $ BVTrunc PtrWidth (addNat PtrWidth PtrWidth) wideMul
let wideMul' = app $ BVSext (addNat PtrWidth PtrWidth) PtrWidth off
assertExpr (app $ BVEq (addNat PtrWidth PtrWidth) wideMul wideMul')
(App $ TextLit $ Text.pack "Multiplication overflow in getelementpointer")
-- maximum index to prevent multiplication overflow
let maxidx = maxSigned PtrWidth `div` isz

-- Check whether a multiply would overflow
let maxidx' :: Expr (LLVM arch) s (BVType wptr)
maxidx' = app $ BVLit PtrWidth $ maxidx
when (maxidx < toInteger bound) $

This comment has been minimized.

Copy link
@robdockins

robdockins Jan 16, 2018

Contributor

@brianhuffman, after our discussion about this, I've been looking into LLVM documentation. I think we are incorrectly assuming the static array size (bound in this code) can be relied upon. The following page seems to indicate that GEP should ignore the static size of arrays when computing offset values.

https://llvm.org/docs/GetElementPtr.html

"Analysis passes which wish to understand array indexing should not assume that the static array type bounds are respected." and "Can array indices be negative? Yes. This is basically a special case of array indices being out of bounds."

assertExpr (app $ BVSle PtrWidth idx' maxidx')
(App $ TextLit $ Text.pack "Multiplication overflow in getelementpointer")

-- Perform the multiply
let off = app $ BVMul PtrWidth sz idx'

-- Perform the pointer arithmetic and continue
base' <- callPtrAddOffset base off
Expand All @@ -1148,25 +1150,26 @@ calcGEP' (PtrType (MemType typ')) base (idx : xs) = do
_ -> fail $ unwords ["Invalid index value in GEP", show idx]
let dl = TyCtx.llvmDataLayout ?lc

-- Calculate the size of the elemement memtype and check that it fits
-- Calculate the size of the element memtype and check that it fits
-- in the pointer width
let isz = G.bytesToInteger $ memTypeSize dl typ'
unless (isz <= maxSigned PtrWidth)
(fail $ unwords ["Type size too large for pointer width:", show typ'])
let sz :: Expr (LLVM arch) s (BVType wptr)
sz = app $ BVLit PtrWidth $ isz

-- Perform a signed wide multiply and check for overflow
Just LeqProof <- return (testLeq (knownNat @1) (addNat PtrWidth PtrWidth))
Just LeqProof <- return (testLeq (addNat PtrWidth (knownNat @1)) (addNat PtrWidth PtrWidth))
let wideMul = app $ BVMul (addNat PtrWidth PtrWidth)
(app $ BVSext (addNat PtrWidth PtrWidth) PtrWidth sz)
(app $ BVSext (addNat PtrWidth PtrWidth) PtrWidth idx')
let off = app $ BVTrunc PtrWidth (addNat PtrWidth PtrWidth) wideMul
let wideMul' = app $ BVSext (addNat PtrWidth PtrWidth) PtrWidth off
assertExpr (app $ BVEq (addNat PtrWidth PtrWidth) wideMul wideMul')
-- maximum index to prevent multiplication overflow
let maxidx = maxSigned PtrWidth `div` isz

-- Check whether a multiply would overflow
let maxidx' :: Expr (LLVM arch) s (BVType wptr)
maxidx' = app $ BVLit PtrWidth $ maxidx
assertExpr (app $ BVSle PtrWidth idx' maxidx')
(App $ TextLit $ Text.pack "Multiplication overflow in getelementpointer")

-- Perform the multiply
let off = app $ BVMul PtrWidth sz idx'

-- Perform the pointer arithmetic and continue
base' <- callPtrAddOffset base off
calcGEP' typ' base' xs
Expand Down

2 comments on commit e1838b6

@robdockins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more LLVM documentation:

If the GEP lacks the inbounds keyword, the value is the result from evaluating the implied two’s complement integer computation. However, since there’s no guarantee of where an object will be allocated in the address space, such values have limited meaning.

If the GEP has the inbounds keyword, the result value is undefined (a “trap value”) if the GEP overflows (i.e. wraps around the end of the address space).

As such, there are some ramifications of this for inbounds GEPs: scales implied by array/vector/pointer indices are always known to be “nsw” since they are signed values that are scaled by the element size. These values are also allowed to be negative (e.g. “gep i32 *%P, i32 -1”) but the pointer itself is logically treated as an unsigned value. This means that GEPs have an asymmetric relation between the pointer base (which is treated as unsigned) and the offset applied to it (which is treated as signed). The result of the additions within the offset calculation cannot have signed overflow, but when applied to the base pointer, there can be signed overflow.

I'm not sure I understand the reasoning being applied here that asserts that "the additions within the offset calculation cannot have signed overflow"

@robdockins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More LLVM documentation:

If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object. The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end. The only in bounds address for a null pointer in the default address-space is the null pointer itself. In cases where the base is a vector of pointers the inbounds keyword applies to each of the computations element-wise.

If the inbounds keyword is not present, the offsets are added to the base address with silently-wrapping two’s complement arithmetic. If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer. The result value of the getelementptr may be outside the object pointed to by the base pointer. The result value may not necessarily be used to access memory though, even if it happens to point into allocated storage. See the Pointer Aliasing Rules section for more information.

Currently, we essentially treat every GEP as though it has the inbounds flag set, so we can basically ignore the second paragraph. The first paragraph, however, mentions calculating offsets as though we were using infinitely precise signed arithmetic, which is a pretty exciting way to specify things.

It also answers a question I had, which is that it is required that every intermediate pointer be in bounds, not just the final result. So, it's not allowed to index outside a memory object, but then bring it back inside with a subsequent indexing operation.

Please sign in to comment.