Figure 1. Illustration of our DCANet. We visualize intermediate feature activation using Grad-CAM. Vanilla SE-ResNet50 varies its focus dramatically at different stages. In contrast, our DCA enhanced SE-ResNet50 progressively and recursively adjusts focus, and closely pays attention to the target object.
Figure 2. An overview of our Deep Connected Attention Network. We connect the output of transformation module in the previous attention block to the output of extraction module in current attention block. In the context of multiple attention dimensions, we connect attentions along each dimension. Here we show an example with two attention dimensions. It can be extended to more dimensions.
In this repository, all the models are implemented by pytorch.
We use the standard data augmentation strategies with ResNet.
To reproduce our DCANet work, please refer Usage.md.
😊 All trained models and training log files are submitted to an anonymous Google Drive.
😊 We provide corresponding links in the "download" column.
Table 1. Single crop classification accuracy (%) on ImageNet validation set. We re-train models using the PyTorch framework and report results in the "re-implement" column. The corresponding DCANet variants are presented in the "DCANet" column. The best performances are marked as bold. "-" means no experiments since our DCA module is designed for enhancing attention blocks, which are not existent in base networks.
Re-Implement | DCANet | |||||||||
Top1 | Top5 | Param(G) | FLOPs | Download | Top1 | Top5 | Param(G) | FLOPs | Download | |
ResNet50 | 75.90 | 92.72 | 4.12 | 25.56M | model log | - | - | - | - | - |
SE-ResNet50 | 77.29 | 93.65 | 4.13 | 28.09M | model log | 77.55 | 93.77 | 4.13 | 28.65M | model log |
SK-ResNet50 | 77.79 | 93.76 | 5.98 | 37.12M | model log | 77.94 | 93.90 | 5.98 | 37.48M | model log |
GEθ-ResNet50 | 76.24 | 92.98 | 4.13 | 25.56M | model log | 76.75 | 93.36 | 4.13 | 26.12M | model log |
GC-ResNet50 | 74.90 | 92.28 | 4.13 | 28.11M | model log | 75.42 | 92.47 | 4.13 | 28.63M | model log |
CBAM-ResNet50 | 77.28 | 93.60 | 4.14 | 28.09M | model log | 77.83 | 93.72 | 4.14 | 30.90M | model log |
Mnas1_0 | 71.72 | 90.32 | 0.33 | 4.38 | model log | - | - | - | - | - |
SE-Mnas1_0 | 69.69 | 89.12 | 0.33 | 4.42M | model log | 71.76 | 90.40 | 0.33 | 4.48M | model log |
GEθ-Mnas1_0 | 72.72 | 90.87 | 0.33 | 4.38M | model log | 72.82 | 91.18 | 0.33 | 4.48M | model log |
CBAM-Mnas1_0 | 69.13 | 88.92 | 0.33 | 4.42M | model log | 71.00 | 89.78 | 0.33 | 4.56M | model log |
MobileNetV2 | 71.03 | 90.07 | 0.32 | 3.50M | model log | - | - | - | - | - |
SE-MobileNetV2 | 72.05 | 90.58 | 0.32 | 3.56M | model log | 73.24 | 91.14 | 0.32 | 3.65M | model log |
SK-MobileNetV2 | 74.05 | 91.85 | 0.35 | 5.28M | model log | 74.45 | 91.85 | 0.36 | 5.91M | model log |
GEθ-MobileNetV2 | 72.28 | 90.91 | 0.32 | 3.50M | model log | 72.47 | 90.68 | 0.32 | 3.59M | model log |
CBAM-MobileNetV2 | 71.91 | 90.51 | 0.32 | 3.57M | model log | 73.04 | 91.18 | 0.34 | 3.65M | model log |
Table 2: Detection performances (%) with different backbones on the MS-COCO validation dataset. We employ two state-of-the-art detectors: RetinaNet and Cascade R-CNN in our detection experiments.
Detector | Backbone | AP(50:95) | AP(50) | AP(75) | AP(s) | AP(m) | AP(l) | Download |
---|---|---|---|---|---|---|---|---|
Retina | ResNet50 | 36.2 | 55.9 | 38.5 | 19.4 | 39.8 | 48.3 | model log |
Retina | SE-ResNet50 | 37.4 | 57.8 | 39.8 | 20.6 | 40.8 | 50.3 | model log |
Retina | DCA-SE-ResNet50 | 37.7 | 58.2 | 40.1 | 20.8 | 40.9 | 50.4 | model log |
Cascade R-CNN | ResNet50 | 40.6 | 58.9 | 44.2 | 22.4 | 43.7 | 54.7 | model log |
Cascade R-CNN | GC-ResNet50 | 41.1 | 59.7 | 44.6 | 23.6 | 44.1 | 54.3 | model log |
Cascade R-CNN | DCA-GC-ResNet50 | 41.4 | 60.2 | 44.7 | 22.8 | 45.0 | 54.2 | model log |