Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] more CUDA stuff #57

Merged
merged 57 commits into from Jul 12, 2019

Conversation

@raver119
Copy link

commented Jul 12, 2019

This PR adds more CUDA ops implementations and tests updates/fixes

raver119 and others added 30 commits Jun 27, 2019
initial commit
Signed-off-by: raver119 <raver119@gmail.com>
Yurii
- implementation of dilation op (cpu and cuda)
Signed-off-by: Yurii <yurii@skymind.io>
Yurii
start working on cuda svd - porting available corresponding api from …
…cuSOLVER library

Signed-off-by: Yurii <yurii@skymind.io>
Yurii
provide prelu_bp
Signed-off-by: Yurii <yurii@skymind.io>
Yurii
- provide gruCell_bp (old version ??)
Signed-off-by: Yurii <yurii@skymind.io>
Yurii
- polishing cumsum_bp and cumprod_bp tests
Signed-off-by: Yurii <yurii@skymind.io>
Yurii
provide sparseSoftmaxCrossEntropyWithLogits and sparseSoftmaxCrossEnt…
…ropyWithLogits_grad

Signed-off-by: Yurii <yurii@skymind.io>
Yurii
implementation of cuda kernel for triu_bp operation
Signed-off-by: Yurii <yurii@skymind.io>
cusolver libraries
Signed-off-by: raver119 <raver119@gmail.com>
Yurii
uncomment cuSolver APIs in svd.cu
Signed-off-by: Yurii <yurii@skymind.io>
cusolver var
Signed-off-by: raver119 <raver119@gmail.com>
Yurii
- further work on cuSolver svd
Signed-off-by: Yurii <yurii@skymind.io>
Yurii
- correct naames in lup functions
Signed-off-by: Yurii <yurii@skymind.io>
Yurii
correct svdQR cuda
Signed-off-by: Yurii <yurii@skymind.io>
Yurii
- provide transpositions of input matrices in case of c order in svdC…
…udaQR

Signed-off-by: Yurii <yurii@skymind.io>
Yurii
- implementation of batched cuda svd
Signed-off-by: Yurii <yurii@skymind.io>
shugeo and others added 26 commits Jul 9, 2019
Yurii
- implementation of cuda kernel for sru_bidirectional
Signed-off-by: Yurii <yurii@skymind.io>
bad import excluded
Signed-off-by: raver119 <raver119@gmail.com>
Yurii
- start working on gruCell_bp
Signed-off-by: Yurii <yurii@skymind.io>
Yurii
- further work on new gruCell_bp
Signed-off-by: Yurii <yurii@skymind.io>
cuBLAS related fixes
Signed-off-by: raver119 <raver119@gmail.com>
calculateOutputShapes() now passes device buffers as well
Signed-off-by: raver119 <raver119@gmail.com>
special concat/average/accumulate init host pointers now
Signed-off-by: raver119 <raver119@gmail.com>
few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
additional CudaDataBufferFactory signatures certain for data types
Signed-off-by: raver119 <raver119@gmail.com>
cuSolver host buffer
Signed-off-by: raver119 <raver119@gmail.com>
buffer to buffer memcpy host ptr allocation
Signed-off-by: raver119 <raver119@gmail.com>
Merge branch 'master' into r119_even_more_cuda
# Conflicts:
#	libnd4j/include/ops/declarable/generic/recurrent/sru.cpp
#	libnd4j/include/ops/declarable/helpers/cpu/sru.cpp
#	nd4j/nd4j-backends/nd4j-tests/src/test/java/org/nd4j/linalg/api/buffer/DataBufferTests.java

@raver119 raver119 merged commit 510127b into master Jul 12, 2019

@raver119 raver119 deleted the r119_even_more_cuda branch Jul 12, 2019

AlexDBlack added a commit that referenced this pull request Jul 20, 2019
[WIP] more CUDA stuff (#57)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* Added gradcheck test for dynamic_partition_bp op.

* - implementation of dilation op (cpu and cuda)

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed broadcast_dynamic_shape 1D case and tests.

* Fixed usage of default integer arguments.

* Fixed dynamic_partition_bp op and tests.

* Eliminated test with grad check for dynamic_partition_bp op.

* start working on cuda svd - porting available corresponding api from cuSOLVER library

Signed-off-by: Yurii <yurii@skymind.io>

* provide prelu_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - provide gruCell_bp (old version ??)

Signed-off-by: Yurii <yurii@skymind.io>

* - polishing cumsum_bp and cumprod_bp tests

Signed-off-by: Yurii <yurii@skymind.io>

* provide sparseSoftmaxCrossEntropyWithLogits and sparseSoftmaxCrossEntropyWithLogits_grad

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed atomicMul with float input/output

* implementation of cuda kernel for triu_bp operation

Signed-off-by: Yurii <yurii@skymind.io>

* Refactored lup helper to add parrallel computing.

* cusolver libraries

Signed-off-by: raver119 <raver119@gmail.com>

* uncomment cuSolver APIs in svd.cu

Signed-off-by: Yurii <yurii@skymind.io>

* cusolver var

Signed-off-by: raver119 <raver119@gmail.com>

* - further work on cuSolver svd

Signed-off-by: Yurii <yurii@skymind.io>

* Implement usage of cuda solver to LUP decomposition.

* - correct naames in lup functions

Signed-off-by: Yurii <yurii@skymind.io>

* correct svdQR cuda

Signed-off-by: Yurii <yurii@skymind.io>

* - provide transpositions of input matrices in case of c order in svdCudaQR

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed implementation issues with LUP usign cuda solver.

* Implementation of matrix_determinant helper with cuda kernels. Working revision.

* Implemented log_matrix_determinant helper with cuda kernels.

* - implementation of batched cuda svd

Signed-off-by: Yurii <yurii@skymind.io>

* Refactored cholesky helper and implementation of cuda solver cholesky batch.

* - implementation of cuda kernel for tile bp

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of cholesky and logdet with cuda kernels.

* - implementation of cuda kernel for sru_bidirectional

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed cholesky helper.

* Cholesky op helper implementation. Working double-based cublas implementation.

* bad import excluded

Signed-off-by: raver119 <raver119@gmail.com>

* Finished with cuda implementation of cholesky helper and tests.

* - implementation of cuda kernel for sru_bidirectional_backprop operation

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of matrix_inverse op helper with cuda kernels. The first revision.

* - start working on gruCell_bp

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of matrix_inverse helper.

* - further work on new gruCell_bp

Signed-off-by: Yurii <yurii@skymind.io>

* cuBLAS related fixes

Signed-off-by: raver119 <raver119@gmail.com>

* calculateOutputShapes() now passes device buffers as well

Signed-off-by: raver119 <raver119@gmail.com>

* special concat/average/accumulate init host pointers now

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* additional CudaDataBufferFactory signatures certain for data types

Signed-off-by: raver119 <raver119@gmail.com>

* cuSolver host buffer

Signed-off-by: raver119 <raver119@gmail.com>

* buffer to buffer memcpy host ptr allocation

Signed-off-by: raver119 <raver119@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.