[RFC/Coffee Break] What about a new type `NormedFloat`? #147

kimikage · 2019-12-03T05:29:36Z

⚠️CAUTION:warning:: This is not a practical "request" or "proposal". Please read the conversation in PR #143 first.

I'm sure Normed{<:Signed} solves the N0f8 "overflow" problem, but I doubt that it is the best solution. So, I proposed a new strange type NormedFloat as a spoiling candidate or a touchstone.

NormedFloat is a kind of joke. So, don't think this too seriously. ☕

However, I think that the concepts and techniques related to NormedFloat might be helpful for other developments.

NormedFloat can be defined as:

using FixedPointNumbers

struct DummyInt{T} <: Integer
    i::T
end

struct NormedFloat{T<:AbstractFloat,f} <: FixedPoint{DummyInt{T},f}
    i::T
    NormedFloat{T,f}(i::AbstractFloat, _) where {T, f} = new{T, f}(i)
end

Base.reinterpret(::Type{NormedFloat{T,f}}, x::T) where {T, f} = NormedFloat{T,f}(x, 0)

const NF16f8 = NormedFloat{Float16,8};
const NF32f8 = NormedFloat{Float32,8};
const NF64f8 = NormedFloat{Float64,8};

<: FixedPoint{DummyInt{T},f} is just a workaround to use ColorVectorSpace.jl without modifications. As the name suggests, NormedFloat is not a FixedPoint numbers.

And, the signed Normed can be defined as:

struct SignedNormed{T<:Signed, f} <: FixedPoint{T, f}
    i::T
    SignedNormed{T,f}(i::Signed, _) where {T, f} = new{T, f}(i) 
end

Base.reinterpret(::Type{SignedNormed{T,f}}, x::T) where {T, f} = SignedNormed{T,f}(x, 0)

const S7f8  = SignedNormed{Int16,8};
const S23f8 = SignedNormed{Int32,8};
const S55f8 = SignedNormed{Int64,8};

I don't use Normed{<:Signed} here to make it easier to experiment on local REPL. If you already have Normed{<:Signed}, you can use it.

Just for display (not optimized):

FixedPointNumbers.typechar(::Type{<:NormedFloat}) = 'X'
FixedPointNumbers.signbits(::Type{<:NormedFloat}) = 1

Base.Float32(x::NormedFloat{T, 8}) where {T} = Float32(x.i) / 255.0f0
Base.Float64(x::NormedFloat{T, 8}) where {T} = Float64(x.i) / 255.0

FixedPointNumbers.typechar(::Type{<:SignedNormed}) = 'S'
FixedPointNumbers.signbits(::Type{<:SignedNormed}) = 1

Base.Float32(x::SignedNormed{T, 8}) where {T} = x.i / 255.0f0
Base.Float64(x::SignedNormed{T, 8}) where {T} = x.i / 255.0

Now, the following are the examples of numbers:

julia> reinterpret.(NF16f8, Float16[255, 1, 0, -1, -255])
5-element Array{X7f8,1} with eltype NormedFloat{Float16,8}:
  1.0X7f8
  0.004X7f8
  0.0X7f8
 -0.004X7f8
 -1.0X7f8

julia> reinterpret.(S7f8, Int16[255, 1, 0, -1, -255])
5-element Array{S7f8,1} with eltype SignedNormed{Int16,8}:
  1.0S7f8
  0.004S7f8
  0.0S7f8
 -0.004S7f8
 -1.0S7f8

The text was updated successfully, but these errors were encountered:

kimikage · 2019-12-03T06:01:14Z

Define the addition and subtraction of two N0f8 numbers. Pay attention to the method redefinition.

Base.:+(x::N0f8, y::N0f8) = N0f8(x.i + y.i, 0)
Base.:-(x::N0f8, y::N0f8) = N0f8(x.i - y.i, 0)

# or
Base.:+(x::N0f8, y::N0f8) = NF16f8(Float16(Int16(x.i) + Int16(y.i)), 0)
Base.:-(x::N0f8, y::N0f8) = NF16f8(Float16(Int16(x.i) - Int16(y.i)), 0)

# or
Base.:+(x::N0f8, y::N0f8) = S7f8(Int16(x.i) + Int16(y.i), 0)
Base.:-(x::N0f8, y::N0f8) = S7f8(Int16(x.i) - Int16(y.i), 0)

Let's go!

using BenchmarkTools

A = collect(rand(N0f8, 256, 256));
B = collect(rand(N0f8, 256, 256));
view_A = view(A,:,:);
view_B = view(B,:,:);

julia> @benchmark view_A .+ view_B # N0f8
BenchmarkTools.Trial:
  memory estimate:  64.19 KiB
  allocs estimate:  4
  --------------
  minimum time:     4.117 μs (0.00% GC)
  median time:      5.900 μs (0.00% GC)
  mean time:        13.413 μs (13.47% GC)
  maximum time:     643.033 μs (98.04% GC)
  --------------
  samples:          10000
  evals/sample:     6

julia> @benchmark view_A .- view_B # N0f8
BenchmarkTools.Trial:
  memory estimate:  64.19 KiB
  allocs estimate:  4
  --------------
  minimum time:     4.383 μs (0.00% GC)
  median time:      6.200 μs (0.00% GC)
  mean time:        10.831 μs (11.66% GC)
  maximum time:     954.900 μs (97.91% GC)
  --------------
  samples:          10000
  evals/sample:     6

julia> @benchmark view_A .+ view_B # NF16f8
BenchmarkTools.Trial:
  memory estimate:  128.13 KiB
  allocs estimate:  4
  --------------
  minimum time:     140.399 μs (0.00% GC)
  median time:      143.599 μs (0.00% GC)
  mean time:        150.347 μs (1.65% GC)
  maximum time:     2.787 ms (93.76% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark view_A .- view_B # NF16f8
BenchmarkTools.Trial:
  memory estimate:  128.13 KiB
  allocs estimate:  4
  --------------
  minimum time:     140.199 μs (0.00% GC)
  median time:      143.199 μs (0.00% GC)
  mean time:        147.756 μs (1.39% GC)
  maximum time:     2.492 ms (92.98% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark view_A .+ view_B # S7f8
BenchmarkTools.Trial:
  memory estimate:  128.13 KiB
  allocs estimate:  4
  --------------
  minimum time:     8.100 μs (0.00% GC)
  median time:      13.700 μs (0.00% GC)
  mean time:        18.973 μs (9.08% GC)
  maximum time:     1.940 ms (97.62% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark view_A .- view_B # S7f8
BenchmarkTools.Trial:
  memory estimate:  128.13 KiB
  allocs estimate:  4
  --------------
  minimum time:     7.900 μs (0.00% GC)
  median time:      14.300 μs (0.00% GC)
  mean time:        17.624 μs (11.29% GC)
  maximum time:     2.190 ms (97.86% GC)
  --------------
  samples:          10000
  evals/sample:     1

Excellent!! 😂

johnnychen94 · 2019-12-03T06:11:30Z

In summary, it's a normed float-point number that behaves like a fixed-point number, and since it's a float number, it doesn't have issues of integers.

If I understand it right, if we take numbers as N-bit 01 sequences, it comes to a point that @timholy tends to directly handle this sequence, while @kimikage wants to take advantage of the existing float number system. My feelings (without justifications) tell me that Tim's method would be more maintainable by eliminating unnecessary abstractions, though it requires more effort to make it work properly.

kimikage · 2019-12-03T06:42:20Z

Yes. Regardless of the method, changing the arithmetic requires much effort to make it work properly.
I'm writing about an abnormal way. You might call it AbnormedFloat.:laughing: And I am disappointed with the existing (softeware) Float16 number system. I am poor and use whatever I can use.

kimikage · 2019-12-03T07:28:05Z

Since S7f8 and NF16f8 are both 16-bit type, the fact that S7f8 operations are faster is an advantage of S7f8. Is there no implementation of the NF16f8 operation comparable to S7f8's one? This is a probatio diabolica.

OK. Here is a speed demon. 😈

Remember why we are saying N0f8 operations are not good. The reason is the "overflow". NF16f8 is "safe" type and x.i is integral. So, we don't have to handle Inf16, NaN16 and subnormal numbers.

function Base.:+(x::N0f8, y::N0f8)
    zi16 = Int16(x.i) + Int16(y.i)
    zi32 = reinterpret(Int32, Float32(zi16) * 1.92593f-34)
    zf16 = unsafe_trunc(UInt16, zi32 >> 0xD)
    NF16f8(reinterpret(Float16, zf16), 0)
end
function Base.:-(x::N0f8, y::N0f8)
    xf32p1, yf32p1 = reinterpret.(Float32, 0x53000000 .+ (x.i, y.i))
    zi32 = reinterpret(Int32, xf32p1 - yf32p1)
    zf16 = unsafe_trunc(UInt16, zi32 >> 0xD | ((zi32 >> 0x10) & 0x8000))
    NF16f8(reinterpret(Float16, zf16), 0)
end

julia> @benchmark view_A .+ view_B # diabolical NF16f8
BenchmarkTools.Trial:
  memory estimate:  128.13 KiB
  allocs estimate:  4
  --------------
  minimum time:     11.700 μs (0.00% GC)
  median time:      15.199 μs (0.00% GC)
  mean time:        18.400 μs (10.13% GC)
  maximum time:     1.956 ms (97.30% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark view_A .- view_B # diabolical NF16f8
BenchmarkTools.Trial:
  memory estimate:  128.13 KiB
  allocs estimate:  4
  --------------
  minimum time:     11.900 μs (0.00% GC)
  median time:      15.201 μs (0.00% GC)
  mean time:        18.009 μs (8.97% GC)
  maximum time:     1.882 ms (97.11% GC)
  --------------
  samples:          10000
  evals/sample:     1

😈
I wrote it during a coffee break, so I think there is room for optimization. I think they are SIMD-suitable, but Julia does not completely vectorize them. 😕

kimikage · 2019-12-03T08:04:27Z

MRI benchmark (#143 (comment))

julia> @btime view(mri, :,:,1:26) - view(mri, :,:,2:27); # N0f8
  490.800 μs (36 allocations: 1.04 MiB)

julia> @btime view(mri, :,:,1:26) - view(mri, :,:,2:27); # NF16f8
  718.100 μs (36 allocations: 2.09 MiB)

 julia> @btime view(mri, :,:,1:26) - view(mri, :,:,2:27); # S7f8
  671.701 μs (36 allocations: 2.09 MiB)

In terms of the minimum time, NF16f8 is slightly slower. Can you see it approximately the same?

Regarding the addition and subtraction of N0f8 numbers (i.e. "safe" operations), Normed{Float16,8} is empirically estimated to be ~2x slower than (unoptimized) S7f8. If the difference of speed is like that, I empirically assume that the bottlenecks are memory allocations and memory accesses, and that the practical speeds will be approximately the same. Of course, the sizes are the same.

Originally posted by @kimikage in #143 (comment)

To be honest, the NF16f8 -> Float32 conversion is probably slow, so don't be fooled.

kimikage · 2019-12-03T08:44:38Z

The accumulation shows the true worth of NormedFloat. (cf. #143 (comment), #143 (comment))

Base.:(+)(x::NF32f8, y::N0f8) = NF32f8(x.i + y.i, 0)
Base.:(+)(x::S55f8, y::N0f8) = S55f8(x.i + y.i, 0)
Base.:(+)(x::S23f8, y::N0f8) = S23f8(x.i + y.i, 0)

julia> @btime mysum(0.0f0, view_A)
  4.071 μs (1 allocation: 16 bytes)
32765.545f0

julia> @btime mysum(reinterpret(NF32f8, 0.0f0), view_A) |> Float32
  2.378 μs (3 allocations: 48 bytes)
32765.541f0

julia> @btime mysum(reinterpret(S55f8, 0), view_A) |> Float32
  4.043 μs (3 allocations: 48 bytes)
32765.541f0

julia> @btime mysum(reinterpret(S23f8, Int32(0)), view_A) |> Float32
  2.122 μs (3 allocations: 48 bytes)
32765.541f0

Since Julia's SIMD optimizer still has room for improvement, this result is almost simply a matter of how many numbers are in a SIMD register. It's not surprising.
Both S23f8 and NF32f8 are 32-bit types, but S23f8 is more likely to overflow.

timholy · 2019-12-03T15:09:35Z

I like the speed-demon version. But Float16 is safe only if Inf16 + x == Inf16 for finite x, and I think your addition scheme would need a couple of extra steps to ensure that.

If what you're really after is detecting overflow and invalidating the result: we'd get the same result if we created a new type of 16-bit integer that that reserves the top bit as a sign bit and the 2nd-from-top as an Inf bit. Then we'd get 14 fraction bits rather than 10, at the loss of the exponent which in theory could be nice for multiplication. But since Float16 multiply is glacially slow, I am not sure the exponent buys us much:

julia> a = rand(Float16, 100);

julia> b = Float32.(a);

julia> @btime $a.*$a;
  793.337 ns (1 allocation: 288 bytes)

julia> @btime $b.*$b;
  37.396 ns (1 allocation: 496 bytes)

kimikage · 2019-12-04T14:28:59Z

Thank you for your comment. I'm glad to know a different way of thinking.

But Float16 is safe only if Inf16 + x == Inf16 for finite x, and I think your addition scheme would need a couple of extra steps to ensure that.

You are right. However, I don't think we should ensure x is safe in unchecked methods.
In this regard, we need to reminder two principles.

First, as you know, it is almost impossible to provide "completely safe" types with Julia's type system, i.e. without evaluating the values.

One thing you can't do is "minimal" widening, i.e., ::UInt8 + ::UInt8 -> ::UInt16. The problem there is that now add 3 ::UInt8 numbers together, and you get a UInt32; add 4, you get a UInt64. This is a nightmare in a language like julia where types matter. If you're going to widen, you have to immediately go all the way.

Originally posted by @timholy in #41 (comment)

This means that 128S7f8 + 1N0f8 may return a "wrong" number, too. I don't understand exactly what promotion rules with signed Normed are planned, but I think they will be ad-hoc and not "completely safe".

Secondly, NormedFloat{FloatXX,f} is never equal to FloatXX even if f == 1. FloatXX should follow the IEEE 754, but NormedFloat may not.

Some users may want to use Inf or NaN actively. In fact, I think Inf is somewhat useful in accumulation. However, NormedFloat{Float16} should not be used as an accumulator type and NormedFloat{Float32} does not have this problem.

But since Float16 multiply is glacially slow, I am not sure the exponent buys us much

I also think the exponent is little useful. However, it will not be necessary to discard the exponent and introduce a new non-standard format. I think the multiplication of NormedFloat{Float16} will be at least 5 times faster than Float16's one, and it will be at least twice faster than (unoptimized) S7f8's one, on x86_64.

BTW, the multiplication of the current Normed has a performance problem, too. I was planning to provide the optimized methods for frequently used types such as N0f8(cf. #125 (comment)). However, introducing the promotion to signed Normed should make the plan much harder because at least three types of pairs need to be considered (e.g. ::N0f8 * ::N0f8, ::S7f8 * ::N0f8 and ::S7f8 * ::S7f8).

kimikage · 2019-12-05T07:22:56Z

Today's coffee break ☕

function Base.:*(x::NF16f8, y::NF16f8)
    xu16 = reinterpret(UInt16, x.i)
    xu32 = UInt32(xu16 << 1) << 0xC
    xf32 = reinterpret(Float32, xu32) * (5.194832f33 / 255.0f0) # not tuned
    yu16 = reinterpret(UInt16, y.i)
    yu32 = UInt32(yu16 << 1) << 0xC
    zf32 = xf32 * reinterpret(Float32, yu32)
    zs = (xu16 ⊻ yu16) & 0x8000
    zu32 = reinterpret(UInt32, zf32) >> 0xD
    zu16 = unsafe_trunc(UInt16, zu32)
    NF16f8(reinterpret(Float16, min(0x7C00, zu16) | zs), 0)
end

julia> a = rand(Float16, 100);

julia> @btime $a.*$a;
  745.169 ns (1 allocation: 288 bytes)

julia> nf = reinterpret.(NF16f8, a .* 255);

julia> @btime $nf.*$nf;
  53.651 ns (1 allocation: 288 bytes)

😈

kimikage · 2020-01-13T04:15:35Z

I wrote this article not to confuse the discussion, but to give a broader perspective, i.e. to remove the constraints coming from "belief".
I'm sorry that there has been a decrease in interest in the signed Normed. However, NormedFloat is not a FixedPointNumbers, so I can close this issue.

kimikage mentioned this issue Dec 6, 2019

WIP/RFC/Scary: allow signed Normed numbers #143

Draft

kimikage closed this as completed Jan 13, 2020

timholy mentioned this issue Sep 10, 2020

Don't wrap round #227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC/Coffee Break] What about a new type `NormedFloat`? #147

[RFC/Coffee Break] What about a new type `NormedFloat`? #147

kimikage commented Dec 3, 2019

kimikage commented Dec 3, 2019

johnnychen94 commented Dec 3, 2019 •

edited

Loading

kimikage commented Dec 3, 2019 •

edited

Loading

kimikage commented Dec 3, 2019

kimikage commented Dec 3, 2019

kimikage commented Dec 3, 2019 •

edited

Loading

timholy commented Dec 3, 2019

kimikage commented Dec 4, 2019

kimikage commented Dec 5, 2019

kimikage commented Jan 13, 2020 •

edited

Loading

[RFC/Coffee Break] What about a new type NormedFloat? #147

[RFC/Coffee Break] What about a new type NormedFloat? #147

Comments

kimikage commented Dec 3, 2019

kimikage commented Dec 3, 2019

johnnychen94 commented Dec 3, 2019 • edited Loading

kimikage commented Dec 3, 2019 • edited Loading

kimikage commented Dec 3, 2019

kimikage commented Dec 3, 2019

MRI benchmark (#143 (comment))

kimikage commented Dec 3, 2019 • edited Loading

timholy commented Dec 3, 2019

kimikage commented Dec 4, 2019

kimikage commented Dec 5, 2019

kimikage commented Jan 13, 2020 • edited Loading

[RFC/Coffee Break] What about a new type `NormedFloat`? #147

[RFC/Coffee Break] What about a new type `NormedFloat`? #147

johnnychen94 commented Dec 3, 2019 •

edited

Loading

kimikage commented Dec 3, 2019 •

edited

Loading

kimikage commented Dec 3, 2019 •

edited

Loading

kimikage commented Jan 13, 2020 •

edited

Loading