<h2>Bisection in floating point</h2>

<p>Mathematically, we have an easy way to find the middle of two numbers, say $0 < a < b$:</p>


$$
c = (a + b)/2
$$


<p>Well, we can have issues with this if we try to put it on the computer.</p>

<p>In particular, out at the edge, we can overflow by adding two large numbers:</p>

In [None]:
b = prevfloat(Inf)
a = b/2
(a + b)/2

Inf

<p>This can be avoided by finding a difference and adding:</p>

In [None]:
a + (b-a)/2

1.3482698511467367e308

<p>In general this is a good practice as adding can lose precision.</p>

<p>So, we can control for really big values by using mathematics. What about other areas?</p>

<p>When $a < b$ are real–mathematical–numbers, there is always a number between $a$ and $b$. This is not so in floating point which is discrete:</p>

In [None]:
a = 0.09999999999999998405
b = 0.09999999999999999
(a + b)/2   ## less than the real number 0.09999999999999998405

0.09999999999999998

<p>It would require proof that the result of <code>c &#61; fl&#40;&#40;a&#43;b&#41;/2&#41;</code> satisfies <code>a &lt;&#61; c &lt;&#61; b</code> for machine numbers <code>a</code> and <code>b</code>. (Which may not be true with some rounding schemes)</p>

<p>If this were the case, we can stop if <code>a &#61;&#61; c</code>  or <code>b &#61;&#61; c</code>.</p>

<h2>Floating point bisection</h2>

<p>Rather than bisect in floating point, there is a trick to bisect over integers.</p>

<p>The floating point numbers are discrete and ordered, so there is a way to reinterpret them using <em>unsigned</em> integers:</p>

In [None]:
a = Float16(0.1)
bits(a)

In [None]:
reinterpret(UInt16, a) |> bits

<p>Reinterpretation is cost free, basically, as the memory is just reinterpreted. Here we see the same bit pattern. But because of how these numbers are stored we have:</p>

<ul>
<li><p>If <code>0 &lt; a &lt;  b</code> in floating point then <code>a &lt; b</code> as unsigned integers.</p>
</li>
</ul>

<p>Now division by <code>2</code> in unsigned integers is just a bit shift down:</p>

In [None]:
UInt16(14) |> bits

<p>Compared to</p>

In [None]:
UInt16(7) |> bits

<p>Division than can be fast using the <code>&gt;&gt;</code> shift operation:</p>

In [None]:
UInt16(14) >> 1 |> bits

<p>For odd numbers it will be truncated or rounded down (7 goes to 3).</p>

<p>Returning to the problem, it is clear for integers that <code>a &lt;&#61; &#40;a&#43;b&#41;/2 &#61; a &#43;&#40;b-a&#41;/2 &lt;&#61; b</code> and equal to an endpoint only when the difference between a and b is at the last bit (<code>a-b &#61; 00000000....0001</code>).</p>

<p>So a terminating bisection algorithm that will terminate in a number of steps bounded by the storage size (16 for 16 bits) we be defined with the "midpoint" as follows:</p>

In [None]:
# from Jason Merrill; Roots.jl
_pairs = Dict(Float64 => UInt64, Float32 => UInt32, Float16 => UInt16)


function _middle(x::T, y::T) where {T <: Union{Float64, Float32, Float16}}
    # Use the usual float rules for combining non-finite numbers
    if !isfinite(x) || !isfinite(y)
        return x + y
    end
    
    # Always return 0.0 when inputs have opposite sign
    if sign(x) != sign(y) && !iszero(x) && ! iszero(y)
        return 0.0
    end
    
    negate = x < 0.0 || y < 0.0

    # do division over unsigned integers with bit shift
    xint = reinterpret(_pairs[T], abs(x))
    yint = reinterpret(_pairs[T], abs(y))
    mid = (xint + yint) >> 1

	# reinterpret in original floating point
    unsigned = reinterpret(T, mid)
    val =  negate ? -unsigned : unsigned

    (val, bits(xint), bits(yint), bits(mid))
    
end

_middle (generic function with 1 method)

<p>We can see the algorithm in action:</p>

In [None]:
a, b = Float16(2.5), Float16(100.5)
ai, bi = reinterpret(UInt16, a), reinterpret(UInt16, b)
bits(ai), bits(bi)

("0100000100000000", "0101011001001000")

<p>And the sum</p>

In [None]:
bits(ai + bi)

<p>And the "middle":</p>

In [None]:
bits((ai + bi) >> 1)

<p>And as a floating point number:</p>

In [None]:
reinterpret(Float16, (ai + bi) >> 1)

Float16(15.28)

<p>We can see by looking the bits from left to right that the value is in the middle:</p>

In [None]:
[bits(ai), bits((ai+bi) >> 1), bits(bi)]

3-element Array{String,1}:
 "0100000100000000"
 "0100101110100100"
 "0101011001001000"

<p>This begs the question of looking at what <code>&#40;b-a&#41;/2</code> is:</p>

In [None]:
[bits(ai), bits(bi - ai), bits((bi - ai) >> 1), bits(bi)]

4-element Array{String,1}:
 "0100000100000000"
 "0001010101001000"
 "0000101010100100"
 "0101011001001000"