# ROI Pooling

**Inputs**

parameters:

```
- pooled_width_ (pooled_w): 7
- pooled_height_ (pooled_h): 7
- spatial_scale_ (spatial_scale): 0.0625  # 1/16 
```

data:

```
- bottom_rois: count = 1500, num = 300, [index, x1, y1, x2, y2], with shape (5, 300)
- bottom_data: count = 972800, (batch_size) num = 1 , value = 0
- top_data: count = 7526400; num = 300; value = -3.40282e+38 / -FLT_MAX
- argmax_data: n = top_count, value = -1
```


**Process**

objects:

- image:
- rois:
- pools:

parameters:

- heights: height_, pooled_height_, ph
- widths: width_, pooled_width_, pw
- rois: roi_height, roi_width

For each roi: (if n = 300, with 300 loops)

```
    3.1. bottom_roi: [roi_batch_ind, roi_start_w, roi_start_h, roi_end_w, roi_end_h]

      - roi_batch_ind = bottom_rois[0]
      - roi_start_w = round(bottom_rois[1] * spatial_scale_)
      - roi_start_h = round(bottom_rois[2] * spatial_scale_)
      - roi_end_w = round(bottom_rois[3] * spatial_scale_)
      - roi_end_h = round(bottom_rois[4] * spatial_scale_)
      
    3.2. check roi_batch_ind

      - roi_batch_ind == 0
      - roi_batch_ind < batch_size

    3.3. get max roi_height, roi_width

      - roi_height = max(roi_end_h - roi_start_h + 1, 1)
      - roi_width = max(roi_end_w - roi_start_w + 1, 1)
      
      - bin_size_h = roi_height / pooled_height_
      - bin_size_w = roi_width / pooled_width_
      
    3.4. batch_data
    
      - batch_data = bottom_data + bottom[0]->offset(roi_batch_ind);
      
    3.5. roi pooling
    
      for channels_ (each: feature_map)
      for pooled_height_ (each: ph)
      for pooled_width_ (each: pw)

          # start: get the max; end: get the min
          # bin_size_x: the steps
          - hstart = floor(ph * bin_size_h)
          - wstart = floor(pw * bin_size_w)
          - hend = ceil((ph + 1) * bin_size_h)
          - wend = ceil((pw + 1) * bin_size_w)
          
          # start/end deltas + start/end rois -> start/end
          - hstart = min(max(hstart + roi_start_h, 0), height_)
          - hstart = min(max(hend + roi_start_h, 0), height_)
          - wstart = min(max(wstart + roi_start_w, 0), width_)
          - wend = min(max(wend + roi_start_w, 0), width_)          

          - bool is_empty = (hend <= hstart) || (wend <= wstart)

          # output index
          - const int pool_index = ph * pooled_width_ + pw

          - if (is_empty)
            - top_data[pool_index] = 0
            - argmax_data[pool_index] = -1

          - hstart, hend, wstart, wend
            - index = h * width_ + w
            - if (bach_data[index] > top_data[pool_index])
              - top_data[pool_index] = batch_data[index]
              - argmax_data[pool_index] = index
```

**Inputs**

feature maps:

- channels: 512
- height: 50
- width: 38

RoIs:

- num: 300
- format: [index, x1, y1, x2, y2]
- size: 300 * 5 = 1500

**Outputs**

Output shape: (N, C, H, W)

- N: # of RoIs
- C: channels
- H: pooled_height
- W: pooled_width

In [3]:
# original image: (600, 800)
# feature map: (38, 50)
# spatial_scale = original_width/height / feature_map_width/height

spatial_scale = 0.0625

# roi on orignal image -> roi on feature map

# roi_start_height = y1 * spatial_scale
# roi_start_width = x1 * spatial_scale
# roi_end_height = y2 * spatial_scale
# roi_end_width = x2 * spatial_scale

# original image (600, 800) -> feature map (38, 50) 
# - feature map: 512 * 38 * 50 = 972800

# feature map
# - channels: 512
# - height: 50
# - width: 38

# [x1, y1, x2, y2] -> [x1', y1', x2', y2']
# [149.366, 28.643, 799, 430.263] -> [9, 2, 50, 27]

roi_start_width = round(149.366 * 0.0625) = 9
roi_start_height = round(28.643 * 0.0625) = 2
roi_end_width = round(799 * 0.0625) = 50
roi_end_height = round(430.263 * 0.0625) = 27

SyntaxError: can't assign to function call (<ipython-input-3-0d527b446249>, line 24)

In [None]:
# roi_height = max(roi_end_h - roi_start_h + 1, 1);
# roi_width = max(roi_end_w - roi_start_w + 1, 1);
roi_height = max(27 - 2 + 1, 1) = 26
roi_width = max(50 - 9 + 1, 1) = 42

# bin_size_height = roi_height / pooled_height
# bin_size_width = roi_width / pooled_width
bin_size_height = 26 / 7 = 3.71429
bin_size_width = 42 / 7 = 6


In [None]:
# 1. find location
# 2. find value

for channels:
    for pooled_height:
        for pooled_width:
            
            # 1. find location
            # [0, 0, 6, 4]
            hstart = floor(ph * bin_size_h) = floor(0 * 3.71429) = 0
            wstart = floor(pw * bin_size_w) = floor(0 * 6) = 0
            hend = ceil((ph+1) * bin_size_h) = ceil((0 + 1) * 3.71429) = 4
            wend = ceil((pw+1) * bin_size_w) = ceil((0 + 1) * 6) = 6
            
            # move: [0, 0, 6, 4] -> [9, 2, 15, 6]
            hstart = min(max(hstart + roi_start_h, 0), height_) = min(max(0 + 2), 0), 50) = 2
            hend = min(max(hend + roi_start_h, 0), height_) = min(max(4 + 2), 50) = 6
            wstart = min(max(wstart + roi_start_w, 0), width_) = min(max((0 + 9), 0), 38) = 9 
            wend = min(max(wend + roi_start_w, 0), width_) = min(max((6 + 9), 0), 38) = 15
            

            # 2. find value
            # check is_empty
            bool is_empty = (hend <= hstart) || (wend <= wstart) = (6 <= 2) || (15 <= 9)
            
            # pooling
            pool_index = ph * pooled_width_ + pw = 0 * 7 + 0 = 0 (1 -> 49)
            
            for hstart to hend (2, 6)
                for wstart to wend (9, 15)
                
                    index = h * width_ + w = 2 * 50 + 9 = 109
                    
                    if (batch_data[index] > top_data[pool_index]) 
                    if (batch_data[109] > top_data[0])
                        top_data[pool_index] = batch_data[index] = top_data[0] = batch_data[109]
                        argmax_data[pool_index] = index = argmax_data[0] = 109
                        
            

