Dropout before max pooling killing embedding components during training #10

KieranLitschel · 2021-01-03T10:52:55Z

When a unit is dropped out its value is set to 0. As we are applying dropout directly to the word embeddings, for long input sequences, it becomes increasingly likely that at least one component in each dimension will be set to zero. This means that negative components can often die, as they get stuck with negative values due to the zeros being introduced by dropout being taken as the maximum.

This is particularly problematic as our distribution for initializing embeddings is centred at zero, meaning around half of the components are initialized as values less than zero. The histogram below exemplifies this issue.

One possible solution is to initialize all embedding weights with values greater than zero. This should significantly reduce the number of dying units, but units will still die if they are updated with a value less than zero.

A better solution would be to make it so that zero is ignored during the max-pooling operation. But this may slow down training significantly, which would make the first solution more preferable.

KieranLitschel · 2021-01-03T21:09:39Z

It seems like the main cause of the above distribution was too high a dropout rate. We were using a dropout rate of 0.8, but switching to a dropout rate of 0.2 we get the distribution below, which looks much better.

We explored shifting the centre of the initialization right 0.05 so that all initialized values would be greater than or equal to zero. The distribution with this modification is shown below.

We observe the same pattern as with the zero centred distribution, with half the values appearing to have stayed at their initialized values. Surprisingly the distributions seem to be very similar, just the centre shifted.

So it now seems more like this behaviour is being caused by the max pool layer, with a lot of values just never being seen during training.

Hence it seems like this is more a property of SWEM-max, and is not a bug, so we are closing this issue.

KieranLitschel self-assigned this Jan 3, 2021

KieranLitschel changed the title ~~Dropout before max pooling killing units during training~~ Dropout before max pooling killing embedding components during training Jan 3, 2021

KieranLitschel closed this as completed Jan 3, 2021

KieranLitschel mentioned this issue Jan 17, 2021

Investigate setting embedding weights not seen in training to zero to reduce saved model size #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropout before max pooling killing embedding components during training #10

Dropout before max pooling killing embedding components during training #10

KieranLitschel commented Jan 3, 2021 •

edited

Loading

KieranLitschel commented Jan 3, 2021

Dropout before max pooling killing embedding components during training #10

Dropout before max pooling killing embedding components during training #10

Comments

KieranLitschel commented Jan 3, 2021 • edited Loading

KieranLitschel commented Jan 3, 2021

KieranLitschel commented Jan 3, 2021 •

edited

Loading