Add PReLU Layer #1940

tnarihi · 2015-02-22T17:53:44Z

Replacement of #1880 for master branch development.

jyegerlehner · 2015-02-27T21:54:38Z

Thank you for sharing this tnarihi. I've been running it. I've seeing improved performance.
Edit: Wait, there's more to this. See discussion below.

tnarihi · 2015-02-28T04:08:22Z

Good to hear that @jyegerlehner !

jeffdonahue · 2015-03-01T00:12:31Z

include/caffe/neuron_layers.hpp

+   * @param param provides PReLUParameter prelu_param,
+   *     with PReLULayer options:
+   *   - init_value (\b optional, default 0.25).
+   *     all negative slopes over channels are set to this value.


How about using a Filler for this (like InnerProductLayer and ConvolutionLayer)?

Agree. It seems like more flexible. In this case, is filler good for the name of parameter?

Yup, filler sounds good to me

jeffdonahue · 2015-03-01T00:25:23Z

Hey Takuya, thanks for creating this PR. Besides the one comment I made above, this looks good to me. At some point when we figure out a better way of handling composition, it would be good to add a DiagonalInnerProductLayer (which handles the elementwise multiplication by a parameter -- I can clean up my implementation of this and PR it) and give it the responsibility of handling the parameters. If we had such a layer, we could implement PReLU as the composition EltwiseSum(ReLU(x), DiagonalInnerProduct(ReLU(Power(scale = -1, x)))), but based on the MSR work this seems like a useful enough shorthand for those 5 layers to deserve a name. (I guess with the combined kernel calls it may also be significantly faster on GPU?)

(Edit: if you'd prefer, you could separate the appropriate piece of this out into a DiagInnerProductLayer yourself, but this is useful as is.)

tnarihi · 2015-03-02T21:55:44Z

Interesting! I believe adding more primitive layers would be nicer for research perspective, and to reuse them makes source codes more readable. clearer and easier to maintain.

I will consider to separate the piece of codes out into a DiagonalInnerProductLayer when your PR arrives, if I can do this without increasing computational costs.

Thanks, Jeff!

tnarihi · 2015-03-04T01:16:28Z

Changed to use FillerParameter to fill initial values of negative slopes

hycis · 2015-03-05T10:00:51Z

Hi @tnarihi, tried your PReLU on the cifar10 and mnist example in caffe and it get a worst result. With cifar10 example train_full.sh, the ReLU gets accuracy 0.8181, after i change all ReLU to PReLU, it drops to 0.7495. Any idea why?

tnarihi · 2015-03-05T17:44:11Z

Thanks for reporting, @hycis. I tried only on cifar_quick example. Let me reproduce the results maybe on this weekend. So far, please try different learning rates and initialization.
@jyegerlehner If you have any comments, it would be helpful. Thanks!

jyegerlehner · 2015-03-06T03:11:53Z

@tnarihi, @hycis, the improved performance I alluded to above was from a model that had a couple of things different from the one I was comparing it to; one of those things was the use of PReLU. I haven't done a clean A/B comparison of PReLU vs leaky ReLU. Sorry if I wrongly attributed the improvement to the PReLU. I will try a clean compare of Leaky ReLU vs PReLU on my own model.

ducha-aiki · 2015-03-06T07:50:34Z

@jyegerlehner Also there is a "Very Leaky ReLU", when negative slope is ~ 0.1-0.3 rather than 0.01.
It has been used in Kaggle-CIFAR10 competition (http://blog.kaggle.com/2015/01/02/cifar-10-competition-winners-interviews-with-dr-ben-graham-phil-culliton-zygmunt-zajac/ ). So the improved performance of the MSRA paper could also from this rather than learnability of the ReLU. However, learning parameter is much better than manual dark magic.

jyegerlehner · 2015-03-07T16:10:14Z

Here's a result comparing training of two models that are identical except leaky relu's with negative_slope=0.1 are switched out for prelus with initial_value=0.1 (I was running the patch from before initial_value was changed over to filler). Both models were initialized from the same .caffemodel (using ../../../caffe/build/tools/caffe-d train --solver=solver.prototxt --weights=net.caffemodel) with the same solver parameters. So the initial state for the two should be identical.

Edit: Updated charts to reflect behavior of latest PR code.

This is the kind of observation that led to my "improved behavior" comment above.

However, there seems to be a problem. I notice that the initial loss computed by caffe was a bit different for the two cases (25.17 for ReLU, 25.62 for PReLU). It should be identical, since a prelu with initial_value = 0.1 ought to forward propagate identically to leaky relu with negatve_slope = 0.1 before any training has happened. Unless I'm confused about PReLUs. Furthermore, setting the learn rate for the PReLU to zero ought to make it behave identically to a ReLU if its initial_value is set identically to the ReLU's negative_slope. In other words, I have layers in the ReLU net like this:

layer {
  name: "encode1_relu"
  type: "ReLU"
  bottom: "encode1_conv"
  top: "encode1_conv"
  relu_param: {
    negative_slope: 0.1
  }
}

And in the PReLU version, I replace those with:

layer {
  name: "encode1_prelu"
  type: "PReLU"
  bottom: "encode1_conv"
  top: "encode1_conv"
  param {
    lr_mult: 0
  }
  prelu_param: {
    filler { value: 0.1 type: "constant" }
  }
}

So if I do that, and run the same test, I should see a loss vs iterations identical to the relu version. I did this, and it turns out they are different.

So I think this suggests we need to look at the forward and gradient tests of PReLU. We could write a test that forward propagates through a Leaky ReLU and PReLU and assert they produce the same result (assuming ReLU's negative_slope is set to PReLU's initial_value). Although I imagine the forward must already assert correct behavior. And same for backward propagation, with learning rate of PReLU set to zero). I think I'll go try that.

Anyone see errors in my reasoning, or have better ideas?

Edit: I don't show the loss at iteration =0 just because it changes Y axis scaling enough to obscure the trends later.

jyegerlehner · 2015-03-07T16:13:12Z

Thanks for pointing that out @ducha-aiki . Yes, somehow I had stumbled across that. My ReLUs are already very leaky, using negative_slope = 0.1.

ducha-aiki · 2015-03-07T22:03:04Z

@jyegerlehner
Do you use in-place computation? May be differences are caused by some issues with in-place computations in PReLU?

tnarihi · 2015-03-08T04:48:02Z

@jyegerlehner Thanks for reporting your experiments. Those are really useful. As you suggested, I've just added a new test case that checks if PReLU produces numbers consistent with Leaky ReLU (negative_slope=0.25) in Forward/Backward. Then, I confirmed the test passed. Please see the final commit I've just made, and check if my code is right.
Now I am suspicious about the behavior of in-place computation as @ducha-aiki mentioned. I will soon add another test to check if it works.
Thanks collaborators!

tnarihi · 2015-03-08T05:45:59Z

Sorry... test was wrong, but now it passed anyway.

tnarihi · 2015-03-08T05:52:52Z

Now I figured out in-place computation in GPU backward was something wrong. I will look into the GPU code.

[==========] 136 tests from 4 test cases ran. (27581 ms total)
[  PASSED  ] 134 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] NeuronLayerTest/2.TestPReLUInplace, where TypeParam = caffe::FloatGPU
[  FAILED  ] NeuronLayerTest/3.TestPReLUInplace, where TypeParam = caffe::DoubleGPU

 2 FAILED TESTS

tnarihi · 2015-03-08T06:12:01Z

I found the bug. It was due to calling an incorrect API for copying data. Please see the commit message for details. Now it should work correctly. @jyegerlehner ~~@zhongwen~~ @hycis Please try the latest commit version.

tnarihi · 2015-03-08T06:14:04Z

Sorry, I replied to another person.

jyegerlehner · 2015-03-08T06:25:33Z

@tnarihi said:

Now I figured out in-place computation in GPU backward was something wrong.

I found the bug.

Wow that was quick; I had just finished changing over my net.prototxt to remove in-place computation. Most of what I had to say is stale now in light of your new work. Both the ReLU and PReLU showed improved performance after removing in-place computation. As concerns conclusions above, it doesn't make any difference: PReLU still performs better. And PReLU with learn rate=0 and negative slope = 0.1 still gives slightly different results than ReLU with negative slope = 0.1. It's a small difference, but consistent. Not sure if it's worth it to go through and see what the root cause of that difference is. I didn't find anything wrong with the PReLU tests so I'm inclined to think it's good. Could be a methodological error on my part.

Will pull your fix, restore the net.prototxt to in-place computation, and report back the results, which I expect will be identical to the not-in-place computation results.

Edit: Updated charts above with the new results. PReLU still better.

jyegerlehner · 2015-03-08T23:11:48Z

@hycis I reproduced the behavior you reported where MNIST/lenet and Cifar examples both perform worse when ReLUs are switched over to PReLUs. Note that both of those models use in-place computation for the ReLUs. I then pulled @tnarihi's latest change that fixes in-place computation, and repeated the test. I found that MNIST/lenet with PReLUs and ReLUs perform nearly identically to each other. In the case of Cifar, I found that PReLU has superior accuracy at 70K iterations: 0.8216 for PReLU model vs 0.8156 accuracy for ReLU. So I think we can call that one solved.

futurely · 2015-03-10T08:28:34Z

It's great that the implementation has been proved to be correct. Will there be any end-to-end example that can get the same result of the paper? Thanks!

ducha-aiki · 2015-03-10T16:30:04Z

Will there be any end-to-end example that can get the same result of the paper?

@futurely surely it would be nice, but now even per-reviewed paper does not require to check algorithms on ImageNet (He et.at. report performance on it only), which takes >=2 weeks of GPU work (pretty costly even if you have free GPU and pay only for electricity)

tnarihi · 2015-03-10T17:29:46Z

Agree. After this PR is merged, It would be nice if anyone reproduces the result of the paper and put it into ModelZoo! I would like to keep this PR as is.

I came up with another possible fix of this PR. The paper describes they don't use weight decay for PReLU.

It is worth noticing that we do not use weight decay (l2 regular- ization) when updating ai. A weight decay tends to push ai to zero, and thus biases PReLU toward ReLU. Even without regularization, the learned coefficients rarely have a magni- tude larger than 1 in our experiments.

Wouldn't it be better to force weight decay for PReLU to be 0? Does anyone have comments or suggestions?

ducha-aiki · 2015-03-10T17:34:20Z

I think, it is enough to set param{ decay_mult : 0} in example which explains it's usage

jeffdonahue · 2015-03-10T21:46:10Z

src/caffe/test/test_neuron_layer.cpp

+  }
+}
+
+TYPED_TEST(NeuronLayerTest, TestPReLUInplace) {


capitalize Place (TestPReLUInPlace)

tnarihi · 2015-03-12T04:44:30Z

@jeffdonahue Done.

hycis · 2015-03-12T05:57:50Z

@tnarihi I just tried decay_multi: 0 on cifar10_full, but still only get 0.75. @jyegerlehner I wonder how you get 0.82?

layer {
name: "relu3"
type: "PReLU"
bottom: "conv3"
top: "conv3"
param {
decay_mult: 0
}
}

jyegerlehner · 2015-03-12T06:54:18Z

. @jyegerlehner I wonder how you get 0.82?

@hycis I'm running the experiment again. Will report back when it's done. I didn't change any hyperparameters. Just the out-of-the-box examples/cifar10/train_full.sh, with the ReLUs changed to default PReLUs. I also did not use tnarihi's trick of decay_mult = 0.0.

I'm not clear. Is your 0.75 PReLU accuracy after pulling tnarihi's in-place-computation fix?

Add PReLU Layer

jeffdonahue · 2015-03-12T07:02:10Z

Thanks again for the layer and all the fixes @tnarihi.

jyegerlehner · 2015-03-12T07:23:52Z

@hycis This time the ReLU version produced 0.8172 accuracy at 70K iterations, and PReLU version produced 0.8177 at 70K iterations.

I could post the modified shell scripts and prototxt I used if that would help you to reproduce the result.

hycis · 2015-03-12T09:13:38Z

@jyegerlehner I am not sure why too. I did a pull and also did as what you mention, but just not getting the result as you. It will be great if you can share your prototxt and shell scripts. Thanks. My email hyciswu@gmail.com

jyegerlehner · 2015-03-12T17:15:58Z

@hycis Well that's troubling. OK here's what I used:

https://gist.github.com/jyegerlehner/b2f073aa8e213f0a9167

Please let us know what you find. I'm worried perhaps I have an error on my end.

hycis · 2015-03-13T18:06:39Z

After I pull and rebuild on the latest commit, I was able to improve full cifar10 from 0.7562 to 0.8184 and 0.8193 (decay_mult=0) at 70000 iterations for using prelu. Thanks @jyegerlehner and @tnarihi

jeffdonahue · 2015-03-15T04:21:01Z

A bit of (mostly useless) Caffe trivia: I just realized that even before this PR we could already implement PReLU (very inefficiently) by composition; at least the !channel_shared version -- the "diagonal" multiplication is equivalent to a 1x1 convolution where num_output and group are both set to the number of input channels. But ConvolutionLayer isn't at all optimized for this case as it loops over groups, so this is a lot faster. In case anyone is curious, I mean that if conv1 has C channels, this PReLU layer...:

layer {
  name: "conv1-prelu"
  type: "PReLU" param { decay_mult: 0 }
  bottom: "conv1"
  top: "conv1-prelu"
}

...is equivalent to this sequence of layers:

layer {
  name: "conv1-prelu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1-prelu1"
}
layer {
  name: "conv1-prelu2"
  type: "Power" power_param { scale: -1 }
  bottom: "conv1"
  top: "conv1-prelu2"
}
layer {
  name: "conv1-prelu3"
  type: "ReLU"
  bottom: "conv1-prelu2"
  top: "conv1-prelu3"
}
layer {
  name: "conv1-prelu4"
  type: "Convolution"
  bottom: "conv1-prelu3"
  top: "conv1-prelu4"
  param { decay_mult: 0 }
  convolution_param {
    bias_term: false
    weight_filler { type: "constant" value: 0.25 }
    kernel_size: 1
    group: C
    num_output: C
  }
}
layer {
  name: "conv1-prelu5"
  type: "Eltwise" eltwise_param { operation: SUM }
  bottom: "conv1-prelu1"
  bottom: "conv1-prelu4"
  top: "conv1-prelu"
}

To be honest though, when I tried both I got slightly different results, so I'm not 100% sure that's right, but I've already spent way more time on this than was warranted, so I won't look into it further...

shelhamer · 2015-03-15T08:05:28Z

@jeffdonahue layer composition is a fine hobby. Thanks for commenting with
the PReLU thoughts.
On Sat, Mar 14, 2015 at 21:21 Jeff Donahue notifications@github.com wrote:

A bit of (mostly useless) Caffe trivia: I just realized that even before
this PR we could already implement PReLU (very inefficiently) by
composition; at least the !channel_shared version -- the "diagonal"
multiplication is equivalent to a 1x1 convolution where num_output and
group are both set to the number of input channels. But ConvolutionLayer
isn't at all optimized for this case as it loops over groups, so this is
a lot faster. In case anyone is curious, I mean that if conv1 has C
channels, this PReLU layer...:

layer {
name: "conv1-prelu"
type: "PReLU" param { decay_mult: 0 }
bottom: "conv1"
top: "conv1-prelu"
}

...is equivalent to this sequence of layers:

layer {
name: "conv1-prelu1"
type: "ReLU"
bottom: "conv1"
top: "conv1-prelu1"
}
layer {
name: "conv1-prelu2"
type: "Power" power_param { scale: -1 }
bottom: "conv1"
top: "conv1-prelu2"
}
layer {
name: "conv1-prelu3"
type: "ReLU"
bottom: "conv1-prelu2"
top: "conv1-prelu3"
}
layer {
name: "conv1-prelu4"
type: "Convolution"
bottom: "conv1-prelu3"
top: "conv1-prelu4"
param { decay_mult: 0 }
convolution_param {
bias_term: false
weight_filler { type: "constant" value: 0.25 }
kernel_size: 1
group: C
num_output: C
}
}
layer {
name: "conv1-prelu5"
type: "Eltwise" eltwise_param { operation: SUM }
bottom: "conv1-prelu1"
bottom: "conv1-prelu4"
top: "conv1-prelu5"
}

To be honest though, when I tried both I got slightly different results,
so I'm not 100% sure that's right, but I've already spent way more time on
this than was warranted, so I won't look into it further...

—
Reply to this email directly or view it on GitHub
#1940 (comment).

hycis · 2015-03-24T02:05:25Z

@tnarihi just wondering is there a way to output the learned coefficients from of the slope of PReLU from the saved caffemodel?

tnarihi · 2015-03-24T02:19:58Z

@hycis
Yes, but It seems like a general Caffe question. There isn't any special difference from other layers such as InnerProduct. If you work on Caffe Python, do like the following:

net = caffe.Net("<proto_path>", "<caffemodel_path>", caffe.TRAIN)
slopes_blob = net.layers['prelu1'][0]  # your prelu layer name
print slopes_blob.data  # This is a numpy array of the slopes

I don't test this script, but a script like this should work.

hycis · 2015-03-24T03:14:21Z

@tnarihi thanks for the quick reply
I tried net.layers['prelu1'] but get no invalid index type error, so i tried net.params['prelu1'] which give me some numbers. So I guess net.params corresponds to net.layers?

tnarihi · 2015-03-24T03:26:30Z

sorry it shoud be params

hycis · 2015-03-24T03:30:49Z

hi @tnarihi
i also observed that the prelu units for the same feature map will have the same slope coefficient? Because i the check the dimension of the net.param['prelu1'][0].shape for prelu layer after a convolution layer, the dimension net.param['prelu1'][0].shape is equal to the number of feature maps i set for the convolution layer.

tnarihi · 2015-03-24T03:46:31Z

@hycis Right.

happynear · 2015-04-01T09:40:24Z

hi @tnarihi ,
I am wondering why using an additional blob "bottom_memory_". When the layer is in-place computing, the bottom_data can be got by top_data / slope_data. My GPU memory is only 2GB, and I do not want to cost more memory space for an activation layer.

tnarihi · 2015-04-01T17:37:12Z

Hi @happynear,

I think that's the case only if negative slopes are all positive. If we allow slopes to be negative, we cannot figure out the pre-activation values only from top_data and slope_data. Another way to reduce memory consumption is that we keep bottom signs (pos or neg) with 1 byte array (e.g. int8) instead of actual values (Dtype=float 4bytes), then we can reconstruct the pre-activation values using the signs of them, top_data and slope_data. Actually I have one idea to remove temporary memory in mind (one of the authors of the original paper has contacted me and kindly gave me an advice), but it needs that we modify the framework of Caffe Net/Layer. I don't have time to work on it. If you have any other idea, I'd be happy to discuss.

happynear · 2015-04-02T06:23:35Z

@tnarihi
Yeah, I haven't considered the negative case. The slopes indeedly came to be negative in some cases I experimented in Matlab. Could you tell me what the idea is?

tnarihi · 2015-04-08T14:24:19Z

@happynear Sorry to be late.

We create a global blob for shared buffer shared_buff.
At every PReLU layer during the forward pass, we copy pre-activation values (bottom values) into bottom.diff (bottom[0]-->mutable_diff()) which is not used during the forward pass.
During backward pass, every layer which follows PReLU copies its bottom.diff (PReLU's top.diff = PReLU's bottom.diff) to shared_buff (reshape if necessary) in order to avoid to overwrite the stored PReLU pre-activation.
At every PReLU in backprop, we take pre-activation values from shared_buff and use it for backward diff computation.

One naive implementation of doing this is that every layer copies bottom.diff to shared_buff, but it involves unnecessary computation if the followed layer is not PReLU. Otherwise, we should implement kind of communication interfaces such that layers know what their top/bottom layer is, or introduce switching valuables for layers to know whether it should copy their bottom.diff to shared_buff or not.
Does that make sense?

EDIT: This still needs additional memory shared_buff, but it is much smaller than keeping buff for all PReLUs if we have many PReLUs.

tnarihi · 2015-04-08T14:34:31Z

Now I think we have two choices:

Use int8 array (possibly we could use bit array) to store sign of pre-activation, reduces 75% (97% if bit array) memory consumption.
Store pre-activation into bottom.diff (need to modify Caffe itself other than PReLULayer).

qingqing01 · 2015-05-05T16:52:20Z

When I use PReLULayer to train the ImageNet model, the loss of the beginning is about 7. But when I fine-tune the " bvlc_reference_caffenet.caffemodel ", the loss of the beginning is about 80! and why?

tnarihi · 2015-05-05T18:14:38Z

Maybe the bigger loss is due to much larger amount of nonzero responses in PReLU, but I am not sure exactly. The reference model is trained for ReLU (not PReLU) which is equivalent to the initial state of PReLU with following setting:

layer {
    type: "PReLU" 
    ....
    prelu_parameter { filler { type: 'constant' value: 0} }
}

You should start with this setting. Default value of negative slope is 0.25.

qingqing01 · 2015-05-06T02:27:07Z

@tnarihi Thank you! I want to train model with prelu. And I use the reference model to make an experiments to fine-tune a prelu model. So I use the default value of negative slope, 0.25.

happynear · 2015-05-07T08:20:20Z

A new type of ReLU is designed to solve the overfit problem: http://arxiv.org/abs/1505.00853 .
Maybe we should open a new issue.

qingqing01 · 2015-05-07T08:27:31Z

@happynear Thank you. I have read this paper. In my experience, the initial negative slope should be adjusted, if you pre-trained model to fine-tune.

tnarihi mentioned this pull request Feb 22, 2015

Add PReLULayer #1880

Closed

shelhamer added the JD label Feb 22, 2015

ducha-aiki mentioned this pull request Feb 27, 2015

Implement Microsoft Parametric ReLU and corresponding initialization method to surpass human on ImageNet classification task #1991

Closed

shelhamer added the enhancement label Feb 28, 2015

jeffdonahue reviewed Mar 1, 2015
View reviewed changes

jeffdonahue reviewed Mar 10, 2015
View reviewed changes

jeffdonahue added a commit that referenced this pull request Mar 12, 2015

Merge pull request #1940 from tnarihi/prelu2

c67a3fa

Add PReLU Layer

jeffdonahue merged commit c67a3fa into BVLC:master Mar 12, 2015

tnarihi deleted the prelu2 branch April 1, 2015 16:27

futurely mentioned this pull request Apr 9, 2015

MSRA weight filler #1946

Closed

Add PReLU Layer #1940

Add PReLU Layer #1940

Conversation

tnarihi commented Feb 22, 2015

jyegerlehner commented Feb 27, 2015

tnarihi commented Feb 28, 2015

jeffdonahue Mar 1, 2015

Choose a reason for hiding this comment

tnarihi Mar 2, 2015

Choose a reason for hiding this comment

jeffdonahue Mar 3, 2015

Choose a reason for hiding this comment

jeffdonahue commented Mar 1, 2015

tnarihi commented Mar 2, 2015

tnarihi commented Mar 4, 2015

hycis commented Mar 5, 2015

tnarihi commented Mar 5, 2015

jyegerlehner commented Mar 6, 2015

ducha-aiki commented Mar 6, 2015

jyegerlehner commented Mar 7, 2015

jyegerlehner commented Mar 7, 2015

ducha-aiki commented Mar 7, 2015

tnarihi commented Mar 8, 2015

tnarihi commented Mar 8, 2015

tnarihi commented Mar 8, 2015

tnarihi commented Mar 8, 2015

tnarihi commented Mar 8, 2015

jyegerlehner commented Mar 8, 2015

jyegerlehner commented Mar 8, 2015

futurely commented Mar 10, 2015

ducha-aiki commented Mar 10, 2015

tnarihi commented Mar 10, 2015

ducha-aiki commented Mar 10, 2015

jeffdonahue Mar 10, 2015

Choose a reason for hiding this comment

tnarihi commented Mar 12, 2015

hycis commented Mar 12, 2015

jyegerlehner commented Mar 12, 2015

jeffdonahue commented Mar 12, 2015

jyegerlehner commented Mar 12, 2015

hycis commented Mar 12, 2015

jyegerlehner commented Mar 12, 2015

hycis commented Mar 13, 2015

jeffdonahue commented Mar 15, 2015

shelhamer commented Mar 15, 2015

hycis commented Mar 24, 2015

tnarihi commented Mar 24, 2015

hycis commented Mar 24, 2015

tnarihi commented Mar 24, 2015

hycis commented Mar 24, 2015

tnarihi commented Mar 24, 2015

happynear commented Apr 1, 2015

tnarihi commented Apr 1, 2015

happynear commented Apr 2, 2015

tnarihi commented Apr 8, 2015

tnarihi commented Apr 8, 2015

qingqing01 commented May 5, 2015

tnarihi commented May 5, 2015

qingqing01 commented May 6, 2015

happynear commented May 7, 2015

qingqing01 commented May 7, 2015