Skip to content

Test CuArrays on tutorial and examples, benchmark against KnetArrays #582

@denizyuret

Description

@denizyuret

CuArray compatibility and speed on Knet tutorial/examples

Knet/tutorial

  • 15.quickstart
  • 23.learning
  • 30.lin
  • 40.mlp
  • 50.cnn
  • 60.rnn
  • 70.imdb
  • 80.charlm
  • 90.s2s

Knet/examples

  • cifar10-cnn: CUDA 25% slower (7 vs 9 secs/epoch on dy03)
  • dcgan-mnist: CUDA 50% slower (24 i/s vs 17 i/s on dy03)
  • DeepLearningFrameworks/Knet_CNN: CUDA 50% slower (15 vs 15.8 secs/epoch on dy03)
  • DeepLearningFrameworks/Knet_RNN
  • DeepLearningFrameworks/ResNet50-Knet: needs pool mode=2, CUDA 100% 25% slower (7.7 vs 9.7 secs on dy03)
  • dynet-benchmark/treenn
  • dynet-benchmark/rnnlm-batch: CUDA 10% slower (35 vs 38 i/s on dy03)
  • dynet-benchmark/bilstm-tagger
  • dynet-benchmark/bilstm-tagger-withchar
  • fashion-mnist
  • housing-linreg
  • julia-tutorial not a real example
  • lenet: update! interface changed
  • mnist-mlp
  • optimizers
  • reinforcement-learning/dp: does not use KnetArray
  • reinforcement-learning/dqn
  • reinforcement-learning/pg
  • resnet: mode=2 is not supported for CPU pool. Knet 50% faster: Missing pooling options, inconsistencies with CuArrays FluxML/NNlib.jl#218
  • rnnlm: (26 vs 27 secs/epoch on dy03)
  • rnn-tutorial: this is the same as 90.s2s
  • synthetic-linreg
  • variational-autoencoder: binary_cross_entropy gives error: Trouble with the @. macro JuliaGPU/CUDA.jl#346
  • vgg: CuArray works but 50% slower.

Other:

  • 2014-Sutskever: sequence to sequence rnn
  • 2015-Luong: s2s rnn with attention
  • 2017-Vaswani: s2s transformer Knet is 50% faster.
  • test/karray.jl: test for CuArray

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions