Questions regarding backbone network #26

henriquepm · 2023-02-15T11:49:07Z

Hi! First of all thank you for the great quality of this work, both the paper and the code.
I have a couple of doubts regarding the backbone:

As mentioned in issue Questions on architecture design choices #24 the image features in the repo come from the concatenation of the output of the second layer and the upsampled output of the third layer. In the paper, it is instead stated that the features come from the concatenation of the output of the third layer with the upsampled output of the last layer, leading to feature maps of dimension C x H/8 x W/8, while the approach in the code will produce FM of dimension C x H/4 x W/4. From which of the two approaches come the results reported in the paper? Does this difference have significant effect on performance (if both have been tested)?
In the paper it is mentioned that the ResNet-101 backbone is initialized from COCO pretraining citing the DETR paper, while in the code the network is initialized from torchvision default weights (ImageNet pretraining). In the experiment sections of the paper, the effect of input resolution is discussed and it is hypothesised that the decreasing performance with higher resolution could be explained due to worse transfer from inconsistency with the pretraining scale. Do the results in this section of the code come from the approach described in the paper (COCO-pretraining) or the one in the code? In case you have run experiments with both approaches, does this make any significant difference?
Thanks again.

aharley · 2023-02-16T07:27:44Z

Thanks for these questions.

The paper results come from this repo (or a slightly messier version of it). I will either update the dimension line in the paper, or add an experiment with H/8 x W/8. (Do you already know if H/8 x W/8 is much different?)
That's a great point. I need to think and check back to see why the paper says COCO while clearly the code indicates Imagenet. It could be that we used coco inits very early on, then switched to imagenet while simplifying the codebase.

henriquepm · 2023-02-16T09:21:24Z

Thanks for the quick answer, I do not know atm, I'm planning to run some experiments with the backbones and wanted to understand the departure point as well as possible.

aharley · 2023-02-23T19:44:19Z

@henriquepm I'm coming back to this to check the /4 and /8 stuff. I added a bunch of shape prints to the forward of Encoder_res101, and right now I'm not sure why you said "the approach in the code will produce FM of dimension C x H/4 x W/4".

def forward(self, x):
        print('x in', x.shape)
        x1 = self.backbone(x)                                                                                                               
        print('x1', x1.shape)
        x2 = self.layer3(x1)                                                                                                                 
        print('x2', x2.shape)
        x = self.upsampling_layer(x2, x1)
        print('x up', x.shape)
        x = self.depth_layer(x)
        print('x d', x.shape)
        return x

The output is:

x in torch.Size([6, 3, 448, 800])
x1 torch.Size([6, 512, 56, 100])
x2 torch.Size([6, 1024, 28, 50])
x up torch.Size([6, 512, 56, 100])
x d torch.Size([6, 128, 56, 100])

which looks like H/8, W/8 like the paper said. I may easily have missed something because I haven't used the repo in a little bit, so please let me know if you see something wrong.

henriquepm · 2023-02-23T20:05:37Z

Hey, that looks totally right, sorry about that.
I took a look at the notebook where I was looking at the network and dissecting it. I was comparing the size against the output of the first conv layer of the resnet instead of the proper input so I was missing a 1/2 factor.

aharley · 2023-02-23T20:14:09Z

Perfect, no problem. Thanks for confirming so quickly!

henriquepm closed this as completed Feb 16, 2023

aharley reopened this Feb 23, 2023

aharley closed this as completed Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding backbone network #26

Questions regarding backbone network #26

henriquepm commented Feb 15, 2023

aharley commented Feb 16, 2023

henriquepm commented Feb 16, 2023

aharley commented Feb 23, 2023

henriquepm commented Feb 23, 2023

aharley commented Feb 23, 2023

Questions regarding backbone network #26

Questions regarding backbone network #26

Comments

henriquepm commented Feb 15, 2023

aharley commented Feb 16, 2023

henriquepm commented Feb 16, 2023

aharley commented Feb 23, 2023

henriquepm commented Feb 23, 2023

aharley commented Feb 23, 2023