# Feature Pyramid Networks
In a ConvNet of the sort used for classification, typically the input progressively transformed to lower resolution feature maps, which stacked on top of each other form a pyramid of feature maps. This is a bottom up pyramid since the feature maps decrease in size as you move along the feedforward pathway.

### Bottom-up pathway
- The feedforward pathway of the backbone over the course of which feature maps are downscaled by 2 at various points
- Typically there are sequences of layers during which the feature map size is preserved.
- These are referred to as network stages.
- The last feature map of each stage is used to build the feature pyramid as these will have the strongest (most relevant/important to classification?) features
- For example in ResNets the last feature maps from the residual blocks of each stage are used



### Top-down pathway 
Feature pyramid networks construct a second pathway that reverses the order of the feature map scales creating a top-down pyramid. This is akin to the encoder-decoder structure found in fully convolutional networks such as U-Net, with the main difference that the top-down pyramid does not mirror the structure of the bottom-up backbone network. The backbone can be any convolutional network which has a pyramid of feature maps. It can have an arbitrary number of layers and different types of modules such as residual blocks. On the other hand the top-down pyramid has a much simpler structure simply consisting of a set of feature maps for each pyramid level. 

- The top-down pathway sequentially learns larger feature maps starting from the last, smallest feature map of the bottom-up pathway i.e. from the top of the pyramid.
- The first feature map of the top-down pyramid is constructed by passing the the smallest - lowest resolution - feature map of the bottom-up pyramid through a 1 x 1 conv layer which reduces the number of channels. 
- Subsequent higher resolution feature maps are generated as follows:
    - Lower feature maps are upsampled to twice the size
    - The corresponding size feature maps from the bottom-up pathway is passed through a 1 x 1 conv layer to reduce its channel dimensions to be the same as that of the lower resolution featuer map
    - These two sets of feature maps are merged by element-wise addition
    - Finally the merged feature maps are passed through a 3 x 3 conv layer in order to reduce the aliasing effect of upsampling
    

<img src='FPN.png'/>