You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This work is very helpful for my research. I am training detectors using a ViT backbone. I have used RFLA for both a ResNet and a ViT backbone and I find that in either case it improves the detection accuracy of small objects compared to NWD RKA.
However, this work is built on the ERF / TRF of the ResNet, which is computed based on the gaussian of the series of convolutional layers in a ResNet. But ViTs don't have as clear of a way of attributing the receptive field for each pyramid in a FPN built on the ViT output (e.g. https://openreview.net/pdf?id=Gl8FHfMVTZu). I'm curious whether you have any suggestions for modifying the ERF calculations for a ViT.
Thanks!
The text was updated successfully, but these errors were encountered:
Very interesting question. At now, it is hard to estimate the effective receptive field for vits. If you want to adapt the pipeline into ViT-based methids, a simple solution may be directly using the receptive field (from bottom to top) in this repo for calculation, and discard those redundant receptive fields (for example, if you only have 4 FPN levels in ViT, you can use the lowest 4 level receptive field calculation from use code). However, i am not sure whether this way will perform well or not
This work is very helpful for my research. I am training detectors using a ViT backbone. I have used RFLA for both a ResNet and a ViT backbone and I find that in either case it improves the detection accuracy of small objects compared to NWD RKA.
However, this work is built on the ERF / TRF of the ResNet, which is computed based on the gaussian of the series of convolutional layers in a ResNet. But ViTs don't have as clear of a way of attributing the receptive field for each pyramid in a FPN built on the ViT output (e.g. https://openreview.net/pdf?id=Gl8FHfMVTZu). I'm curious whether you have any suggestions for modifying the ERF calculations for a ViT.
Thanks!
The text was updated successfully, but these errors were encountered: