Skip to content

LDeakin/microfloat

Repository files navigation

8-bit and sub-byte floating point types for Rust

Crates.io Documentation Crates.io

This crate implements microfloat types for Rust, including common 8-bit formats and sub-byte 4-bit and 6-bit formats. Microfloats are a subset of minifloat formats.

8-bit floating point representations:

  • f8e3m4 - signed E3M4, bias 3, IEEE-like NaN/Inf.
  • f8e4m3 - signed E4M3, bias 7, IEEE-like NaN/Inf.
  • f8e4m3b11fnuz - signed E4M3, bias 11, finite-only, unsigned zero.
  • f8e4m3fn - signed E4M3, bias 7, finite-only, signed outer NaNs.
  • f8e4m3fnuz - signed E4M3, bias 8, finite-only, unsigned zero.
  • f8e5m2 - signed E5M2, bias 15, IEEE-like NaN/Inf.
  • f8e5m2fnuz - signed E5M2, bias 16, finite-only, unsigned zero.
  • f8e8m0fnu - unsigned E8M0 scale, bias 127, no zero, single NaN.

Microscaling (MX) sub-byte floating point representations:

  • f4e2m1fn - signed 4-bit E2M1, bias 1, finite-only, saturating.
  • f6e2m3fn - signed 6-bit E2M3, bias 1, finite-only, saturating.
  • f6e3m2fn - signed 6-bit E3M2, bias 3, finite-only, saturating.

In type suffixes,

  • f means finite-only with no infinities,
  • n means the format has a special NaN encoding,
  • uz means unsigned zero with no distinct negative zero encoding, and
  • u means unsigned.

This crate is modeled to be compatible with the microfloat types in the ml-dtypes Python package. For broader minifloat types such as f16 and bf16, use the half crate; microfloat is heavily inspired by half.

Usage

The float types attempt to match existing Rust floating point type functionality where possible, and provide conversion operations, classification, formatting, parsing, arithmetic operations, and common math operations. Calculations are performed through f32 and rounded back to the target format.

use microfloat::f8e4m3;

let x = f8e4m3::from_f32(1.5);
let y = f8e4m3::from_f32(2.0);
let z = x + y;

assert_eq!(z.to_f32(), 3.5);

This crate provides no_std support.

Requires Rust 1.85 or greater.

See the crate documentation for more details.

Optional Features

  • serde - Implement Serialize and Deserialize traits for the float types. This adds a dependency on the serde crate.

  • num-traits - Enable ToPrimitive, FromPrimitive, Num, NumCast, FloatCore, Signed, Bounded, Zero, and One trait implementations from the num-traits crate.

  • bytemuck - Enable Zeroable and Pod trait implementations from the bytemuck crate.

  • rand_distr - Enable sampling from distributions like StandardUniform and StandardNormal from the rand_distr crate.

  • rkyv - Enable zero-copy serialization support with the rkyv crate.

Testing

Compatibility with ml-dtypes is tested by generated fixtures in tests/fixtures/. These fixtures validate conversions, classifications, arithmetic, and math methods.

License

All files in this library are dual-licensed and distributed under the terms of either of:

at your option.

Contributing

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

8-bit and sub-byte floating point types for Rust

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENCE-APACHE
MIT
LICENCE-MIT

Stars

Watchers

Forks

Contributors