Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Example 08 not runnable with Cuda #451

Closed
TimerErTim opened this issue Feb 14, 2023 · 5 comments · Fixed by #453
Closed

BUG: Example 08 not runnable with Cuda #451

TimerErTim opened this issue Feb 14, 2023 · 5 comments · Fixed by #453
Labels
bug Something isn't working

Comments

@TimerErTim
Copy link
Contributor

TimerErTim commented Feb 14, 2023

When running the 08-tensor-broadcast-reduce example on a Cuda device, it fails on reducing the c tensor on two axes with the following error message:

thread 'main' panicked at 'assertion failed: (left == right)
     left: [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
     right: [[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]]', examples/08-tensor-broadcast-reduce.rs:31:5

The code used is the following:

use dfdx::prelude::*;

#[cfg(not(feature = "cuda"))]
type Device = Cpu;

#[cfg(feature = "cuda")]
type Device = Cuda;


fn main() {
    let dev = Device::default();
    let a = dev.tensor([1.0f32, 2.0, 3.0]);

    let b = a.broadcast::<Rank2<5, 3>, _>();
    assert_eq!(b.array(), [[1.0, 2.0, 3.0]; 5]);

    let c = b.broadcast::<Rank4<7, 5, 3, 2>, _>();
    assert_eq!(c.array(), [[[[1.0; 2], [2.0; 2], [3.0; 2]]; 5]; 7]);

    let d = c.mean::<Rank2<5, 3>, _>();
    assert_eq!(d.array(), [[1.0, 2.0, 3.0]; 5]);

    let e = dev.tensor([1.0]);
    let f = e.broadcast::<Rank2<1, 1>, Axis<1>>();
    // NOTE: will fail with "Multiple impls satisfying...":
    // let f = e.broadcast::<Rank2<1, 1>, _>();

    let _ = f.mean::<_, Axis<0>>();
}

The example runs fine on the CPU.

Changing the code to this:

let b = a.broadcast::<Rank2<5, 3>, _>();
assert_eq!(b.array(), [[1.0, 2.0, 3.0]; 5]);

let c = b.broadcast::<Rank4<7, 5, 3, 2>, _>().to_device(&Cpu::default());
assert_eq!(c.array(), [[[[1.0; 2], [2.0; 2], [3.0; 2]]; 5]; 7]);

let d = c.mean::<Rank2<5, 3>, _>();
assert_eq!(d.array(), [[1.0, 2.0, 3.0]; 5]);

let e = dev.tensor([1.0]);
let f = e.broadcast::<Rank2<1, 1>, Axis<1>>();

results in another error:

thread 'main' panicked at 'assertion failed: (left == right)
     left: 3,
     right: 210', /home/timerertim/.cargo/registry/src/github.com-1ecc6299db9ec823/cudarc-0.7.0/src/driver/safe.rs:531:9

@coreylowman
Copy link
Owner

@TimerErTim can you try re-running this with the latest main? I think one of the PRs I merged earlier fixed this

@TimerErTim
Copy link
Contributor Author

@coreylowman I just did but the error remains. Would I need to rebuild the kernels?

@coreylowman
Copy link
Owner

@TimerErTim nah they will rebuild if they are changed so you shouldn't need to. I'll take a look at this!

@coreylowman coreylowman added the bug Something isn't working label Feb 14, 2023
@coreylowman
Copy link
Owner

coreylowman commented Feb 14, 2023

Ahh this is because our sum kernels assume that the number of elements in the result are less than the number of physical elements. Here's a unit test that currently fails for Cuda:

    #[test]
    fn test_sum_reduce_to_more_than_physical_elements() {
        let dev: TestDevice = Default::default();
        let a: Tensor<_, TestDtype, _> = dev.tensor([1.0, 2.0, 3.0]);
        let b = a.broadcast::<Rank3<4, 3, 2>, _>();
        let c = b.sum::<Rank2<4, 3>, _>();
        assert_eq!(c.array(), [[2.0, 4.0, 6.0]; 4]);
    }

In sum, chunk_len (physical_numel / dst.num_elements()) is 0, and elems_per_thread is 8 instead of 2.

Will require some rewriting of our reduction kernels. FYI @nkoppel

coreylowman pushed a commit that referenced this issue Feb 15, 2023
* fix #451; rename internal_reshapes to reduction_utils

* remove printlns

* run cargo fmt

* add documentation
@TimerErTim
Copy link
Contributor Author

@coreylowman The example is now executable as intended, but this code:

let b = a.broadcast::<Rank2<5, 3>, _>();
assert_eq!(b.array(), [[1.0, 2.0, 3.0]; 5]);

let c = b.broadcast::<Rank4<7, 5, 3, 2>, _>().to_device(&Cpu::default());
assert_eq!(c.array(), [[[[1.0; 2], [2.0; 2], [3.0; 2]]; 5]; 7]);

let d = c.mean::<Rank2<5, 3>, _>();
assert_eq!(d.array(), [[1.0, 2.0, 3.0]; 5]);

let e = dev.tensor([1.0]);
let f = e.broadcast::<Rank2<1, 1>, Axis<1>>();

still results in the very weird assertion error inside cudarc-0.7.0/src/driver/safe.rs:531:9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants