An experimental implementation to verify variation idea to SENet(Squeeze-and-Excitation Networks).
This code is under following environment:
- python=3.9.4
- pytorch=1.9.1
- torchvision=0.10.1
Above figure shows the concept idea of SENet. They squeezed by taking representative value(scalar) per channel. (H * W * C
-> 1 * 1 * C
)
In original paper, to exploit input-specific descriptor z, global-average-pooling(nn.AdaptiveAvgPool2d on PyTorch) was used - it is just the average value of each 'image' whose size is H * W
.
We've conducted experiments under same training recipe but only different squeeze operation. The squeeze operation we used are:
- baseline (no squeeze operation; naive architecture)
- gap (global average pooling)
- gmp (global max pooling)
- std (standard deviation of pixels in
H * W
) - gapXstd (gap * std)
- random (random value that have distribution on the interval
[0, 1)
. not extracted fromH * W
)
- ResNet18 + ImageNet
Squeeze operation | Top-1 Acc (%) | Top-5 Acc (%) | Improved Top-1 %p |
---|---|---|---|
baseline | 65.71 | 86.29 | +0.00 |
gap | 66.91 | 87.35 | +1.20 |
gmp | 66.73 | 87.20 | +1.02 |
std | 66.47 | 86.94 | +0.76 |
gapXstd | 66.90 | 87.18 | +1.19 |
random | 65.36 | 86.22 | -0.35 |
- ResNet50 + ImageNet
Squeeze operation | Top-1 Acc (%) | Top-5 Acc (%) | Improved Top-1 %p |
---|---|---|---|
baseline | 70.94 | 89.84 | +0.00 |
gap | 70.12 | 89.34 | -0.82 |
gmp | 70.36 | 89.23 | -0.58 |
std | 70.34 | 89.33 | -0.60 |
gapXstd | 70.17 | 89.26 | -0.77 |
random | 68.99 | 88.71 | -1.95 |