This notebook is a record of what I am doing with half precision. The first task is to show that on Apple M* hardware that half is faster than single is faster than double. You won't see exactly the factor of 2 you'd expect because the execution times are so low.

I'm using the Julia ```sum``` command for these tests. Any of your enhanced summaion funcitons should show  similar effects.

In [9]:
# Run the init script
include("Codes/HalfTimeInit.jl")

In [10]:
# Make a Float16 array and copy it to Float32 and Float64
N=4096
x=rand(Float16,N); x32=Float32.(x); x64=Float64.(x);


In [11]:
# Time the Julia sum command
@btime sum($x);
@btime sum($x32);
@btime sum($x64);

  139.856 ns (0 allocations: 0 bytes)
  199.722 ns (0 allocations: 0 bytes)
  369.362 ns (0 allocations: 0 bytes)


Next I want to look at the errors. I will take ```sum(x64)``` as truth.

In [12]:
s=sum(x); s32=sum(x32); s64=sum(x64);
e16=abs(s - s64)/s64
e32=abs(s32 - s64)/s64
println("Float16 error = $e16; Float32 error = $e32")

Float16 error = 1.02710e-04; Float32 error = 0.00000e+00


For summing a __short__ sequence of half precision numbers, promoting to single is enough to get the error exactly correct.

The next thing is to look at my hack job to avoid overflow and reduce any memory burden in the summation. The plan
is to sum the vector in blocks. So N is the size of the vector, NB is the number of blocks, and M is the block size. I'm assuming that N = NB * M and things like ```@simd``` will work better if M is a multiple of 512.

Here's an example to show how you can cheat on overflow.

In [13]:
N = 512 * 512; x=rand(Float16,N); x64=Float64.(x);
s=sum(x); x64=sum(x64);
println("Float16 sum = $s; Truth = $x64")

Float16 sum = Inf; Truth = 1.31447e+05


The sum really does overflow, but accumulating the sum is double gets the right result.