Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Exponential Linear Units #3388
Conversation
mohomran
changed the title from
ELU layer with basic tests to Exponential Linear Units
Nov 26, 2015
beniz
commented
Nov 26, 2015
|
Great job! Was actually coming to check on ELU, found this :) Will report on performances when I can. |
vchuravy
referenced
this pull request
in apache/incubator-mxnet
Nov 26, 2015
Merged
[RFC] Adds ELU to LeakyReLU activation layer #718
ronghanghu
added the
enhancement
label
Nov 26, 2015
beniz
referenced
this pull request
in beniz/deepdetect
Nov 29, 2015
Merged
added support for ELU activation units with Caffe #34
f0k
commented
Dec 1, 2015
It seems this is actually what they did for the paper as well: |
untom
commented
Dec 1, 2015
|
Thanks for the head's up :) Note that mathematically that as long as alpha == 1, this doesn't make a difference since exp(0) == 1, so both transfer function and gradient output the same thing regardless of > vs >= . Also due to the way ELUs look, it's pretty hard for an activation to hit 0 precisely, anyhow. But you're right, we used > 0 during our own experiments, both in the binet code as well as in our own caffe fork. If we make another paper revision we will definitely include that change. |
shelhamer
added the
needs rebase
label
Dec 2, 2015
|
Thanks for this @mohomran! That was quick. I'm sorry that this was caught by the switch to layer headers in #3315 but could you update this to reflect the new arrangement? See the new ReLU header for an example. |
|
@beniz: Thanks. :) So far, I've only tested it on MNIST and CIFAR-10 ("quick"), but neither network is deep enough to result in significant gains according to the paper. The updated CIFAR-10 network seemed to converge a bit faster though. @f0k, @untom: Thanks, good to know! As said, I encountered problems when alpha was set to 0, which prompted the change. @shelhamer: Rebased and ready to go. :) |
shelhamer
added ready for review and removed needs rebase
labels
Dec 3, 2015
shelhamer
commented on an outdated diff
Dec 3, 2015
| + alpha * (exp(in[index]) - 1); | ||
| + } | ||
| +} | ||
| + | ||
| +template <typename Dtype> | ||
| +void ELULayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom, | ||
| + const vector<Blob<Dtype>*>& top) { | ||
| + const Dtype* bottom_data = bottom[0]->gpu_data(); | ||
| + Dtype* top_data = top[0]->mutable_gpu_data(); | ||
| + const int count = bottom[0]->count(); | ||
| + Dtype alpha = this->layer_param_.elu_param().alpha(); | ||
| + // NOLINT_NEXT_LINE(whitespace/operators) | ||
| + ELUForward<Dtype><<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>( | ||
| + count, bottom_data, top_data, alpha); | ||
| + CUDA_POST_KERNEL_CHECK; | ||
| + // << " count: " << count << " bottom_data: " |
|
|
shelhamer
added in progress and removed ready for review
labels
Dec 3, 2015
|
@jeffdonahue when Leaky ReLU was added it was incorporated into ReLU in #740. Do you have an opinion on a separate ELU layer? |
|
I'd be fine with incorporating it into ReLU if there's a near 0 performance impact, but this feels more to me like it should be a separate layer than leaky ReLU (which felt like a more natural generalization to me, still being piecewise linear). |
beniz
commented
Dec 4, 2015
|
@mohomran so I've tested on GoogleNet, an even with BN activated, just for the sake of it. It appears to work fine, though the memory requirement appears to grow significantly, which translates into smaller batches. The typical memory error (or so I guess) happens on the CUDA_POST_KERNEL_CHECK in elu_layer.cu. FTR, I had cuDNN activated though of course ELU is not using it. I have some GPU time to kill over the next few days if some more experiments or reports can help. |
vchuravy
added a commit
to oist/mxnet
that referenced
this pull request
Dec 7, 2015
|
|
vchuravy |
a0335b8
|
vchuravy
commented
Dec 7, 2015
|
@untom It does make a difference for the gradient, for any |
vchuravy
referenced
this pull request
in apache/incubator-mxnet
Dec 7, 2015
Merged
ELU: change greater than equal to strict greater. #854
untom
commented
Dec 16, 2015
|
Is there anything I can do to help move this PR forward? |
tornadomeet
added a commit
to tornadomeet/mxnet
that referenced
this pull request
Dec 19, 2015
|
|
vchuravy + tornadomeet |
b8b6b63
|
shelhamer
added ready for review ES and removed in progress
labels
Jan 22, 2016
shelhamer
added a commit
that referenced
this pull request
Jan 22, 2016
|
|
shelhamer |
a7ac8bc
|
mohomran commentedNov 26, 2015
Implementation of the Exponential Linear Units proposed in:
Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2015). Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). http://arxiv.org/abs/1511.07289
I made one minor modification to the formula from the paper: f(x) = x, if x > 0 rather than if x >=0, with the corresponding change to the gradient. I did this for two reasons: