Implement tanh, fixes #225 #226

SuperFluffy · 2019-03-09T15:08:44Z

This PR tries to address #225. It currently does not compile, because it turns out that {f32,f64}::tanh are provided via cmath, i.e. via stdlib.

Any suggestions how to implement tanh_f32 and friends?

src/api/math/float/tanh.rs

src/codegen/math/float/tanh.rs

gnzlbg · 2019-03-11T11:29:29Z

src/codegen/math/float/tanh.rs

+use libm::{
+    F32Ext,
+    F64Ext
+};


I think it would be better to move these traits down to the functions that implement them.

src/codegen/math/float/tanh.rs

SuperFluffy · 2019-03-11T13:26:52Z

Looks like it all builds and the tests pass. I have also checked with --features "sleef-sys". Unfortunately, tanh is only really nicely testable with tanh(0)=0. There are no nice cases such as pi/2 as is the case with cos and sin.

If you are happy with the changes I can squash and force-push the commits.

gnzlbg · 2019-03-11T15:02:18Z

Looks like it all builds and the tests pass.

Cool, so the issue was the use of the aligned load and stores ?

SuperFluffy · 2019-03-11T15:04:05Z

Yepp, looks like it. That was the only semantic change I did.

…

On Mon, Mar 11, 2019, 16:02 gnzlbg ***@***.***> wrote: Looks like it all builds and the tests pass. Cool, so the issue was the use of the aligned load and stores ? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#226 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAqy-QqnEuuOFHIo4qrBNxg3D0cXxcZrks5vVm_7gaJpZM4bmueq> .

gnzlbg · 2019-03-11T15:05:33Z

src/codegen/math/float/tanh.rs

+    ($name:ident, $basetype:ty, $simdtype:ty, $lanes:expr, $trait:path) => {
+        fn $name(x: $simdtype) -> $simdtype {
+            let mut buf = [0.0; $lanes];
+            x.write_to_slice_unaligned(&mut buf);


Instead of using write_to_slice here, I think it would be better to just do a transmute:

let mut buf: [$base_type; $lanes] = unsafe { mem::transmute(buf) }; // ... unsafe { mem::transmute(buf) }

since arrays and simd vectors are layout-compatible.

gnzlbg

So this looks really good and is exactly what I had in mind. I left only one nit, but otherwise I'll merge once CI is green, thank you for working on this!

This implements tanh for packed vectors. This is primarily interesting when using sleef-sys for its simd implemenations of tanh. Since llvm does not contain tanh intrinsics, the libm implementation is used for primitives, and packed vectors are transmuted into slices before applying the libm tanh to each of its elements.

SuperFluffy · 2019-03-11T22:47:11Z

Alright, done. I am just a bit confused about benchmarking this. Take this library:

use packed_simd::f64x4 as f64s;

pub fn packed_simd_tanh(src: &[f64], dst: &mut [f64]) {
    assert_eq!(src.len() % f64s::lanes(), 0);
    assert_eq!(src.len(), dst.len());

    let lanes = f64s::lanes();

    src.chunks_exact(lanes)
        .zip(dst.chunks_exact_mut(lanes))
        .for_each(|(s, d)| {
            let s_v = f64s::from_slice_unaligned(s);
            f64s::write_to_slice_unaligned(s_v, d);
    });
}

pub fn standard_tanh(src: &[f64], dst: &mut [f64]) {
    assert_eq!(src.len(), dst.len());

    src.iter()
        .zip(dst.iter_mut())
        .for_each(|(s, d)| {
            *d = s.tanh();
    });
}

And these benches:

#![feature(test)]

extern crate test;

#[bench]
fn packed_simd(bench: &mut test::Bencher) {
    let n = 2usize << 20;
    let v = vec![1.0; n];
    let mut w = vec![0.0; n];

    bench.iter(|| {
        vectorize_tanh::packed_simd_tanh(&v, &mut w);
    });
}

#[bench]
fn std(bench: &mut test::Bencher) {
    let n = 2usize << 20;
    let v = vec![1.0; n];
    let mut w = vec![0.0; n];

    bench.iter(|| {
        vectorize_tanh::standard_tanh(&v, &mut w);
    });
}

Compiling this with features = ["sleef-sys"], I am getting the same results for avx and avx2:

RUSTFLAGS="-C target-feature=+avx" cargo bench
    Finished release [optimized] target(s) in 0.05s
     Running target/release/deps/vectorize_tanh-a46b9ea5812e2e29

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/benches-b2dd4e15a4404548

running 2 tests
test packed_simd ... bench:   1,052,387 ns/iter (+/- 43,399)
test std         ... bench:  39,073,614 ns/iter (+/- 3,531,697)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

RUSTFLAGS="-C target-feature=+avx2" cargo bench
    Finished release [optimized] target(s) in 0.04s
     Running target/release/deps/vectorize_tanh-fd28ba5482fea734

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/benches-bf615806916f7971

running 2 tests
test packed_simd ... bench:   1,048,811 ns/iter (+/- 34,166)
test std         ... bench:  38,750,174 ns/iter (+/- 1,170,225)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

Any idea why this might be?

gnzlbg · 2019-03-12T06:45:18Z

No idea. The benchmarks have no effect in theory (writing the result of tanh but never using it, should probably use black_box), and the functions do a bit more than just computing the tanh, so maybe the iterator checks overweight the cost of tanh?(no idea, probably would be better to just benchmark tanh itself). Also, are the avx2 sleef implementations supposed to be faster than the avx ones? Both can handle 256-bit registers, so that f64x4 fits in both just fine AFAICT.

SuperFluffy · 2019-03-12T09:07:41Z

Yeah, you are right about that. Chances are that the avx2 implementation is the same as avx.

Some of the checks seem to be failing, but that seems to be a platform specific issue.

gnzlbg · 2019-03-16T09:23:43Z

@SuperFluffy sorry about the delay here! Please feel free to ping me next time if I don't answer within a day. There is still one build job failing (the thumb android one) that shouldn't be failing, i've just restarted it, if it was spurious I'll merge afterwards. sorry again about the delay.

gnzlbg · 2019-03-16T11:58:35Z

Thank you a lot @SuperFluffy !

@hsivonen there is one build job failing (the thumbv7neon android one), see https://travis-ci.com/rust-lang-nursery/packed_simd/jobs/184006248#L4787 . Somehow it appears that the wrong ar is being picked (e.g. it is trying to pick thumbv7neon-linux-androideabi-ar instead of picking e.g. arm-linux-androideabi-ar). This wasn't failing before, so maybe something changed upstream? We might be able to workaround this by passing cargo an environment variable specifying this, but I'm not sure.

SuperFluffy commented Mar 11, 2019

View reviewed changes

src/api/math/float/tanh.rs Show resolved Hide resolved

SuperFluffy mentioned this pull request Mar 11, 2019

Provide all trigonometric functions #227

Open

10 tasks

gnzlbg reviewed Mar 11, 2019

View reviewed changes

src/codegen/math/float/tanh.rs Outdated Show resolved Hide resolved

gnzlbg reviewed Mar 11, 2019

View reviewed changes

src/codegen/math/float/tanh.rs Outdated Show resolved Hide resolved

gnzlbg reviewed Mar 11, 2019

View reviewed changes

gnzlbg approved these changes Mar 11, 2019

View reviewed changes

SuperFluffy force-pushed the tanh branch from ae97588 to f094e68 Compare March 11, 2019 22:42

gnzlbg merged commit 5719e4b into rust-lang:master Mar 16, 2019

SuperFluffy changed the title ~~Implement tanh~~ Implement tanh, fixes #225 Mar 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement tanh, fixes #225 #226

Implement tanh, fixes #225 #226

SuperFluffy commented Mar 9, 2019

gnzlbg Mar 11, 2019

SuperFluffy commented Mar 11, 2019

gnzlbg commented Mar 11, 2019

SuperFluffy commented Mar 11, 2019 via email

gnzlbg Mar 11, 2019 •

edited

Loading

gnzlbg left a comment

SuperFluffy commented Mar 11, 2019

gnzlbg commented Mar 12, 2019

SuperFluffy commented Mar 12, 2019

gnzlbg commented Mar 16, 2019

gnzlbg commented Mar 16, 2019

Implement tanh, fixes #225 #226

Implement tanh, fixes #225 #226

Conversation

SuperFluffy commented Mar 9, 2019

gnzlbg Mar 11, 2019

Choose a reason for hiding this comment

SuperFluffy commented Mar 11, 2019

gnzlbg commented Mar 11, 2019

SuperFluffy commented Mar 11, 2019 via email

gnzlbg Mar 11, 2019 • edited Loading

Choose a reason for hiding this comment

gnzlbg left a comment

Choose a reason for hiding this comment

SuperFluffy commented Mar 11, 2019

gnzlbg commented Mar 12, 2019

SuperFluffy commented Mar 12, 2019

gnzlbg commented Mar 16, 2019

gnzlbg commented Mar 16, 2019

gnzlbg Mar 11, 2019 •

edited

Loading