Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tanh, fixes #225 #226

Merged
merged 1 commit into from
Mar 16, 2019
Merged

Implement tanh, fixes #225 #226

merged 1 commit into from
Mar 16, 2019

Conversation

SuperFluffy
Copy link
Contributor

This PR tries to address #225. It currently does not compile, because it turns out that {f32,f64}::tanh are provided via cmath, i.e. via stdlib.

Any suggestions how to implement tanh_f32 and friends?

use libm::{
F32Ext,
F64Ext
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to move these traits down to the functions that implement them.

@SuperFluffy
Copy link
Contributor Author

Looks like it all builds and the tests pass. I have also checked with --features "sleef-sys". Unfortunately, tanh is only really nicely testable with tanh(0)=0. There are no nice cases such as pi/2 as is the case with cos and sin.

If you are happy with the changes I can squash and force-push the commits.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 11, 2019

Looks like it all builds and the tests pass.

Cool, so the issue was the use of the aligned load and stores ?

@SuperFluffy
Copy link
Contributor Author

SuperFluffy commented Mar 11, 2019 via email

($name:ident, $basetype:ty, $simdtype:ty, $lanes:expr, $trait:path) => {
fn $name(x: $simdtype) -> $simdtype {
let mut buf = [0.0; $lanes];
x.write_to_slice_unaligned(&mut buf);
Copy link
Contributor

@gnzlbg gnzlbg Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using write_to_slice here, I think it would be better to just do a transmute:

let mut buf: [$base_type; $lanes] = unsafe { mem::transmute(buf) };
// ...
unsafe { mem::transmute(buf) }

since arrays and simd vectors are layout-compatible.

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks really good and is exactly what I had in mind. I left only one nit, but otherwise I'll merge once CI is green, thank you for working on this!

This implements tanh for packed vectors. This is primarily interesting when using sleef-sys for its simd implemenations of tanh. Since llvm does not contain tanh intrinsics, the libm implementation is used for primitives, and packed vectors are transmuted into slices before applying the libm tanh to each of its elements.
@SuperFluffy
Copy link
Contributor Author

Alright, done. I am just a bit confused about benchmarking this. Take this library:

use packed_simd::f64x4 as f64s;

pub fn packed_simd_tanh(src: &[f64], dst: &mut [f64]) {
    assert_eq!(src.len() % f64s::lanes(), 0);
    assert_eq!(src.len(), dst.len());

    let lanes = f64s::lanes();

    src.chunks_exact(lanes)
        .zip(dst.chunks_exact_mut(lanes))
        .for_each(|(s, d)| {
            let s_v = f64s::from_slice_unaligned(s);
            f64s::write_to_slice_unaligned(s_v, d);
    });
}

pub fn standard_tanh(src: &[f64], dst: &mut [f64]) {
    assert_eq!(src.len(), dst.len());

    src.iter()
        .zip(dst.iter_mut())
        .for_each(|(s, d)| {
            *d = s.tanh();
    });
}

And these benches:

#![feature(test)]

extern crate test;

#[bench]
fn packed_simd(bench: &mut test::Bencher) {
    let n = 2usize << 20;
    let v = vec![1.0; n];
    let mut w = vec![0.0; n];

    bench.iter(|| {
        vectorize_tanh::packed_simd_tanh(&v, &mut w);
    });
}

#[bench]
fn std(bench: &mut test::Bencher) {
    let n = 2usize << 20;
    let v = vec![1.0; n];
    let mut w = vec![0.0; n];

    bench.iter(|| {
        vectorize_tanh::standard_tanh(&v, &mut w);
    });
}

Compiling this with features = ["sleef-sys"], I am getting the same results for avx and avx2:

RUSTFLAGS="-C target-feature=+avx" cargo bench
    Finished release [optimized] target(s) in 0.05s
     Running target/release/deps/vectorize_tanh-a46b9ea5812e2e29

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/benches-b2dd4e15a4404548

running 2 tests
test packed_simd ... bench:   1,052,387 ns/iter (+/- 43,399)
test std         ... bench:  39,073,614 ns/iter (+/- 3,531,697)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out
RUSTFLAGS="-C target-feature=+avx2" cargo bench
    Finished release [optimized] target(s) in 0.04s
     Running target/release/deps/vectorize_tanh-fd28ba5482fea734

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/benches-bf615806916f7971

running 2 tests
test packed_simd ... bench:   1,048,811 ns/iter (+/- 34,166)
test std         ... bench:  38,750,174 ns/iter (+/- 1,170,225)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

Any idea why this might be?

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 12, 2019

No idea. The benchmarks have no effect in theory (writing the result of tanh but never using it, should probably use black_box), and the functions do a bit more than just computing the tanh, so maybe the iterator checks overweight the cost of tanh?(no idea, probably would be better to just benchmark tanh itself). Also, are the avx2 sleef implementations supposed to be faster than the avx ones? Both can handle 256-bit registers, so that f64x4 fits in both just fine AFAICT.

@SuperFluffy
Copy link
Contributor Author

Yeah, you are right about that. Chances are that the avx2 implementation is the same as avx.

Some of the checks seem to be failing, but that seems to be a platform specific issue.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 16, 2019

@SuperFluffy sorry about the delay here! Please feel free to ping me next time if I don't answer within a day. There is still one build job failing (the thumb android one) that shouldn't be failing, i've just restarted it, if it was spurious I'll merge afterwards. sorry again about the delay.

@gnzlbg gnzlbg merged commit 5719e4b into rust-lang:master Mar 16, 2019
@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 16, 2019

Thank you a lot @SuperFluffy !


@hsivonen there is one build job failing (the thumbv7neon android one), see https://travis-ci.com/rust-lang-nursery/packed_simd/jobs/184006248#L4787 . Somehow it appears that the wrong ar is being picked (e.g. it is trying to pick thumbv7neon-linux-androideabi-ar instead of picking e.g. arm-linux-androideabi-ar). This wasn't failing before, so maybe something changed upstream? We might be able to workaround this by passing cargo an environment variable specifying this, but I'm not sure.

@SuperFluffy SuperFluffy changed the title Implement tanh Implement tanh, fixes #225 Mar 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants