Cudnn support #16

M1ngXU · 2022-11-08T19:49:15Z

currently only Tensor4D, Activation and Convolution, both forward and backward

M1ngXU · 2022-12-13T13:18:13Z

should the tensordata struct be changed to something like cudaslice?

coreylowman · 2022-12-14T21:27:42Z

src/cudnn/tensor/descriptor.rs

+/// # See also
+/// <https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnTensorDescriptor_t>
+/// <https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnDestroyTensorDescriptor>
+pub struct TensorDescriptor<T, const N: usize, const C: usize, const H: usize, const W: usize> {


Yeah this can be changed to hold runtime values instead of const generics

ok, but should this runtime panic (asserts) or just be unsafe?

(this is most likely affecting every module in cudnn 😅)

runtime panics on creation IMO

coreylowman · 2022-12-14T21:28:56Z

src/cudnn/custom_kernels/f_kernels.cu

@@ -0,0 +1,55 @@
+// every function must end with `f32` and only accept `float`s; then the same function with `f64` and `double`s will be generated
+
+extern "C" __global__ void recip_with_scale_f32(float *out, const float *a, const float *a_scale, size_t numel)


whoa these are awesome! especially that compile_custom_kernels.sh file. 🚀

thanks :)
(though this doesn’t work on windows for me lol)

coreylowman · 2022-12-14T21:31:43Z

This does seem to fit more at the device level in dfdx, especially since we'll need a layer there regardless to conform to the kernel traits over there.

M1ngXU · 2022-12-14T21:35:44Z

@coreylowman can you take over these changes? i am only availabe tomorrow for very few hours and maybe a bit on friday, but i could also do them in 3.5 weeks

M1ngXU · 2022-12-14T21:36:21Z

hmm, lots of merge conflicts

coreylowman · 2023-01-09T15:14:32Z

src/cudarc.rs

-            block_dim: (n, 1, 1),
+            // round up
+            grid_dim: ((n + 1023) / 1024, 1, 1),
+            block_dim: (n.min(1024), 1, 1),


nice we should port this min over to main in a separate pr

coreylowman · 2023-01-09T15:15:19Z

src/cudnn/activation/mode.rs

+impl_activation_mode!(Sigmoid: CUDNN_ACTIVATION_SIGMOID);
+impl_activation_mode!(Relu: CUDNN_ACTIVATION_RELU);
+impl_activation_mode!(Tanh: CUDNN_ACTIVATION_TANH);
+impl_activation_mode!(Elu: CUDNN_ACTIVATION_ELU);


i'm thinking we don't need to add this activation forward for dfdx since there are so little in cudnn. we can just write custom kernels for them (they are really easy to write)

coreylowman · 2023-01-09T15:16:08Z

src/cudnn/activation/softmax.rs

+use crate::prelude::*;
+
+/// This does the softmax activation per image.
+pub struct Softmax;


cudnn softmax only supports along single axis right? we don't necessarily need softmax impl for dfdx since dfdx uses lower level primitives to implement it, and also supports over any axis

coreylowman · 2023-01-09T15:16:59Z

src/cudnn/batch_normalization.rs

+use crate::prelude::*;
+
+/// Uses per image (after conv2d) normalization.
+pub type BatchNormalizationPerImage<


BatchNorm in dfdx is implemented based off of lower level primitives, so there wouldn't be a great way to use this

coreylowman · 2023-01-09T15:18:29Z

src/cudnn/conv/backward.rs

+
+/// A struct that holds all the data to calculate `dx` by `y`, the filter and
+/// `dy`.
+pub struct Convolution2DBackward<


Yeah we should probably use this for dfdx, there's similar approach for dfdx cpu conv2d kernel with workspaces so i think this will fit in nicely.

Can probably move to using no const generics for this though to simplify api

coreylowman · 2023-01-09T15:20:20Z

src/cudnn/conv/filter/mod.rs

+pub struct Filter<T, const C_OUT: usize, const C_IN: usize, const H: usize, const W: usize> {
+    descriptor: Rc<FilterDescriptor<T, C_OUT, C_IN, H, W>>,
+    data: Tensor4DData<T, C_OUT, C_IN, H, W>,
+}


It's weird how they have a separate descriptor for filters, I wonder why they don't just use a Tensor descriptor?

coreylowman · 2023-01-09T15:21:53Z

src/cudnn/tensor/operation/reduce.rs

+impl_reduce_op!(ReduceOperationAdd: CUDNN_REDUCE_TENSOR_ADD);
+impl_reduce_op!(ReduceOperationMul: CUDNN_REDUCE_TENSOR_MUL);
+impl_reduce_op!(ReduceOperationMin: CUDNN_REDUCE_TENSOR_MIN);
+impl_reduce_op!(ReduceOperationMax: CUDNN_REDUCE_TENSOR_MAX);
+impl_reduce_op!(ReduceOperationAMax: CUDNN_REDUCE_TENSOR_AMAX);
+impl_reduce_op!(ReduceOperationAvg: CUDNN_REDUCE_TENSOR_AVG);
+impl_reduce_op!(ReduceOperationNorm1: CUDNN_REDUCE_TENSOR_NORM1);
+impl_reduce_op!(ReduceOperationNorm2: CUDNN_REDUCE_TENSOR_NORM2);
+impl_reduce_op!(ReduceOperationMulNoZeros: CUDNN_REDUCE_TENSOR_MUL_NO_ZEROS);


Yeah this will all be super useful. Let's move to using runtime shapes though as mentioned in other places, I can help with this

coreylowman · 2023-01-09T15:24:56Z

src/cudnn/tensor/descriptor.rs

+/// # See also
+/// <https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnTensorDescriptor_t>
+/// <https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnDestroyTensorDescriptor>
+pub struct TensorDescriptor<T, const N: usize, const C: usize, const H: usize, const W: usize> {


We also need to support up to 6d tensors, which it looks like you can do pretty easily with the tensor descriptors, this can be tracked at runtime like everything else 😄

so TensorNd = TensorDescriptorNd([Axes]) + CudaSlice?

M1ngXU added 17 commits November 5, 2022 13:38

init

99d8c8f

added activations

e9ae2a6

forward conv

88bc500

cudnn runs on correct stream, activation backward

9860a89

added convolution backward op

f1aa74d

added batchnorm (backward might be wrong), better tensor allocation, fmt

0fdc45f

added tensor ops

27cd864

using cuda cudnn result to propagate cuda errors instead of panicking

6040079

removed tensors from batch norm struct

94dd49c

now only using cuda cudnn result, refactored filter

fd7b1a9

removed activation descriptor as activation itself doesn't hold data

01bbbcd

split modules into multiple files/dirs

f3478fb

more file/dir refactoring

c18c16d

refactored conv, now only using desc

616f490

splitting batchnorm, adding docs

d2ab32c

added docs

42c5ad4

finished doc

6eca31e

M1ngXU marked this pull request as ready for review November 12, 2022 15:17

M1ngXU added 11 commits November 12, 2022 18:19

added softmax

a615061

added pooling

916db90

added custom kernel for tensor division

f77fccf

refactored cudnn custom kernel

9efcadd

implementing into kernel param for tensor data

e755961

renaming as_data to get_data_ref

fb0a32e

added scaling to division

3892ba4

added clone_into_new to create new tensors with another data allocation

73662c0

refactored custom kernels, added sin/cos

95e710e

added reduce and broadcasting, working on exmaple

8ef0600

fixed custom kernels

b52dbed

coreylowman mentioned this pull request Dec 14, 2022

cudnn support #18

Closed

coreylowman reviewed Dec 14, 2022

View reviewed changes

coreylowman reviewed Jan 9, 2023

View reviewed changes

stash

6a23616

M1ngXU closed this Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cudnn support #16

Cudnn support #16

M1ngXU commented Nov 8, 2022

M1ngXU commented Dec 13, 2022

coreylowman Dec 14, 2022

M1ngXU Dec 14, 2022

M1ngXU Dec 14, 2022

coreylowman Jan 9, 2023

coreylowman Dec 14, 2022

M1ngXU Dec 15, 2022

coreylowman commented Dec 14, 2022

M1ngXU commented Dec 14, 2022

M1ngXU commented Dec 14, 2022

coreylowman Jan 9, 2023

coreylowman Jan 9, 2023

coreylowman Jan 9, 2023 •

edited

Loading

coreylowman Jan 9, 2023

coreylowman Jan 9, 2023

coreylowman Jan 9, 2023

coreylowman Jan 9, 2023

coreylowman Jan 9, 2023

M1ngXU Jan 9, 2023

		@@ -0,0 +1,55 @@
		// every function must end with `f32` and only accept `float`s; then the same function with `f64` and `double`s will be generated

		extern "C" __global__ void recip_with_scale_f32(float out, const float a, const float *a_scale, size_t numel)

Cudnn support #16

Cudnn support #16

Conversation

M1ngXU commented Nov 8, 2022

M1ngXU commented Dec 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coreylowman commented Dec 14, 2022

M1ngXU commented Dec 14, 2022

M1ngXU commented Dec 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coreylowman Jan 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coreylowman Jan 9, 2023 •

edited

Loading