Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDNN v3 can be much slower than v2 #3239

Closed
futurely opened this issue Oct 23, 2015 · 4 comments
Closed

cuDNN v3 can be much slower than v2 #3239

futurely opened this issue Oct 23, 2015 · 4 comments

Comments

@futurely
Copy link

Running with the same model on the same dataset, the latest master with v3 took 25 minutes to finish 1000 iterations while v2 took 10 minutes. The kernels of the first three convolutional layers are of 9, 7 and 5 pixels. This problem was also reported in a fork of Caffe and many other sources.

Maybe cuDNN v3 only works well on the Maxwell architecture GPUs or just can't choose the most efficient convolution algorithm. The simplest solution is to revert to an older version of Caffe since v2 was no longer supported before NVIDIA fix the problem. But we still want to keep Caffe up to date to experiment with some new features such as batch normalization.

The second way is to specify the convolution algorithm explicitly which is not flexible enough for various model configurations.

The last resort is to keep backwards compatibility with cuDNN v2 in Caffe.

@beniz
Copy link

beniz commented Oct 26, 2015

I've recently bumped into @jdemouth who appears to be one of the main devs of cuDNN, maybe he can tell us more.

EDIT: I remember he told me there will be ways to make things faster, by using batches of 64 and by recoding some stuff...

@jdemouth
Copy link

Hi,

I'd be happy to help if I can. I need to know the GPU that you are issues with and I need to know the layer size. CuDNN v3 adds new algorithms and there's a heuristic to select the "best" given a convolution size. The more we know, the more we can tune the heuristic to get good perf.

In general, a given algorithm is either as fast in v3 as it was in v2 or it is faster. No matter which GPU architecture you target. Of course, there may be counter examples but I'm not aware of them.

Cheers,
Julien

@futurely
Copy link
Author

The GPU is K20. The convolutional layers are defined in Caffe proto format as shown in the snippet.

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 32
    pad: 0
    kernel_size: 9
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
  relu_param {
    negative_slope: 0.25
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    pad: 0
    kernel_size: 7
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 3
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "pool2"
  top: "pool2"
  relu_param {
    negative_slope: 0.25
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}


layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 0
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
  relu_param {
    negative_slope: 0.25
  }
}

@happynear
Copy link

The problem has been solved using the library released in 2015/08/21 . And I have updated my repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants