cuDNN v3 can be much slower than v2 #3239

futurely · 2015-10-23T11:06:07Z

Running with the same model on the same dataset, the latest master with v3 took 25 minutes to finish 1000 iterations while v2 took 10 minutes. The kernels of the first three convolutional layers are of 9, 7 and 5 pixels. This problem was also reported in a fork of Caffe and many other sources.

Maybe cuDNN v3 only works well on the Maxwell architecture GPUs or just can't choose the most efficient convolution algorithm. The simplest solution is to revert to an older version of Caffe since v2 was no longer supported before NVIDIA fix the problem. But we still want to keep Caffe up to date to experiment with some new features such as batch normalization.

The second way is to specify the convolution algorithm explicitly which is not flexible enough for various model configurations.

The last resort is to keep backwards compatibility with cuDNN v2 in Caffe.

beniz · 2015-10-26T15:29:27Z

I've recently bumped into @jdemouth who appears to be one of the main devs of cuDNN, maybe he can tell us more.

EDIT: I remember he told me there will be ways to make things faster, by using batches of 64 and by recoding some stuff...

jdemouth · 2015-10-26T17:27:39Z

Hi,

I'd be happy to help if I can. I need to know the GPU that you are issues with and I need to know the layer size. CuDNN v3 adds new algorithms and there's a heuristic to select the "best" given a convolution size. The more we know, the more we can tune the heuristic to get good perf.

In general, a given algorithm is either as fast in v3 as it was in v2 or it is faster. No matter which GPU architecture you target. Of course, there may be counter examples but I'm not aware of them.

Cheers,
Julien

futurely · 2015-10-27T02:01:53Z

The GPU is K20. The convolutional layers are defined in Caffe proto format as shown in the snippet.

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 32
    pad: 0
    kernel_size: 9
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
  relu_param {
    negative_slope: 0.25
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    pad: 0
    kernel_size: 7
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 3
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "pool2"
  top: "pool2"
  relu_param {
    negative_slope: 0.25
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}


layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 0
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
  relu_param {
    negative_slope: 0.25
  }
}

happynear · 2015-11-09T07:38:59Z

The problem has been solved using the library released in 2015/08/21 . And I have updated my repo.

futurely closed this as completed Nov 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuDNN v3 can be much slower than v2 #3239

cuDNN v3 can be much slower than v2 #3239

futurely commented Oct 23, 2015

beniz commented Oct 26, 2015

jdemouth commented Oct 26, 2015

futurely commented Oct 27, 2015

happynear commented Nov 9, 2015

cuDNN v3 can be much slower than v2 #3239

cuDNN v3 can be much slower than v2 #3239

Comments

futurely commented Oct 23, 2015

beniz commented Oct 26, 2015

jdemouth commented Oct 26, 2015

futurely commented Oct 27, 2015

happynear commented Nov 9, 2015