Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Region proposal network (RPN) layer #7

Closed
0x00b1 opened this issue May 30, 2017 · 26 comments
Closed

Region proposal network (RPN) layer #7

0x00b1 opened this issue May 30, 2017 · 26 comments

Comments

@0x00b1
Copy link
Contributor

0x00b1 commented May 30, 2017

The region proposal network (RPN) should take two inputs, image features (i.e. features extracted by ResNet) and ground truth bounding boxes and produce object proposals and corresponding “objectness” scores. I’m envisioning something like:

x = keras.layers.Input((223, 223, 3))

a = keras_resnet.ResNet50(x)

b = keras.layers.Input((None, 4))

y = keras_rcnn.layers.RPN((14, 14))([a, b])
@0x00b1 0x00b1 changed the title Region proposal network (RPN) Region proposal network (RPN) layer May 30, 2017
@0x00b1
Copy link
Contributor Author

0x00b1 commented May 30, 2017

Here’re a few links with information about implementing layers with multiple inputs:

keras-team/keras#148
keras-team/keras#2364
keras-team/keras#3037

@0x00b1
Copy link
Contributor Author

0x00b1 commented May 30, 2017

This issue has information about implementing a loss function for an intermediate layer.

keras-team/keras#5563

@JihongJu
Copy link
Contributor

JihongJu commented Jun 2, 2017

@0x00b1 Why would the RPN need the ground truth as input? As far as I understand, RPN takes the Conv features as the input and predicts labels and bounding box transforms for K anchors in each cell. I guess what you mean here is something like a AnchorTargetLayer which produces anchor classification labels and bbox regression targets, given the bounding box ground truth?

@0x00b1
Copy link
Contributor Author

0x00b1 commented Jun 2, 2017

Yeah, exactly. I imagined a layer that’d encapsulate AnchorLayer, AnchorTargetLayer, and ProposalLayer into one layer to circumvent the awkward train-and-predict step in Ross’ implementation. Unfortunately, it’s still unclear how this should be implemented. That’s why I’ve been implementing the Anchor and Proposal layers in parallel. What do you think?

@JihongJu
Copy link
Contributor

JihongJu commented Jun 2, 2017

I do think it's a good idea to make the train-to-predict switching easier. But I'm not sure whether encapsulating the Proposal Layer and the Anchor Target Layer into one is the way to go. It seems feeding the ground truth as inputs, instead of labels, to the model could be problematic during testing. I can hardly imagine what to fed as b during test time.

How about keeping the anchor ground truth generation away from the model definition? What we want then becomes a loss function that calculates the losses given the "objectness" scores and the bounding boxes GT directly. We can put the Anchor Target Layer inside such a loss function.

@JihongJu
Copy link
Contributor

JihongJu commented Jun 5, 2017

@0x00b1 I am thinking of structuring it as:

x = keras.layers.Input((223, 223, 3))
a = keras_resnet.ResNet50(x)
[rpn_cls, rpn_reg] = keras_rcnn.layers.RegionProposalNetwork()(a)
rpn_pred = keras.backend.concatenate([rpn_cls, rpn_reg])
proposals = keras_rcnn.layers.ObjectProposal()([rpn_cls, rpn_reg])
[rcnn_cls, rcnn_reg] = keras_rcnn.layers.ROI([7, 7])([x, proposals])
model = Model( inputs=x, outputs=[rpn_pred, rcnn_cls, rcnn_reg])
model.compile( loss=[rpn_pred_loss, rcnn_cls_loss, rcnn_reg_loss], optimizer="adam")

And we have

def rpn_pred_loss(lambda, *args, **kwargs):
    def f(y_true, y_pred):
        # separate y_pred into rpn_cls_pred and rpn_reg_pred
        rpn_cls_pred, rpn_reg_pred = separate_pred(y_pred)
        # convert y_true from gt_boxes to gt_anchors
        rpn_cls_gt, rpn_reg_gt = encode(y_true, rpn_cls_pred)
        # classification loss
        rpn_cls_loss = keras_rcnn.rpn.classification(anchors=9)(rpn_cls_gt, rpn_cls_pred)
        # regression loss
        rpn_reg_loss = keras_rcnn.rpn.regresion(anchors=9)(rpn_reg_gt, rpn_reg_pred)

        return rpn_cls_loss + lambda * rpn_reg_loss
    return f

Then we can also extend this to Mask R-CNN as simple as adding a mask branch and another loss rcnn_mask_loss. What do you think?

@0x00b1
Copy link
Contributor Author

0x00b1 commented Jun 6, 2017

@JihongJu I love this! Especially this:

x = keras.layers.Input((223, 223, 3))
a = keras_resnet.ResNet50(x)
[rpn_cls, rpn_reg] = keras_rcnn.layers.RegionProposalNetwork()(a)
rpn_pred = keras.backend.concatenate([rpn_cls, rpn_reg])
proposals = keras_rcnn.layers.ObjectProposal()([rpn_cls, rpn_reg])
[rcnn_cls, rcnn_reg] = keras_rcnn.layers.ROI([7, 7])([x, proposals])
model = Model( inputs=x, outputs=[rpn_pred, rcnn_cls, rcnn_reg])
model.compile( loss=[rpn_pred_loss, rcnn_cls_loss, rcnn_reg_loss], optimizer="adam")

@0x00b1
Copy link
Contributor Author

0x00b1 commented Jun 8, 2017

@JihongJu I started structuring this into code:

classes = 2

x = keras.layers.Input((224, 224, 3))

y = keras_resnet.ResNet50(x)

rpn_classification = keras.layers.Conv2D(9 * 1, (1, 1), activation="sigmoid")(y.layers[-2].output)

rpn_regression = keras.layers.Conv2D(9 * 4, (1, 1))(y.layers[-2].output)

rpn_prediction = keras.layers.concatenate([rpn_classification, rpn_regression])

proposals = keras_rcnn.layers.object_detection.ObjectProposal(300)([rpn_classification, rpn_regression])

y = keras_rcnn.layers.ROI((7, 7) 32)([x, proposals])
y = kera.layers.AveragePooling2D((7, 7))(y)
y = keras.layers.Dense(4096)(y)

score = keras.layers.Dense(classes, activation="softmax")(y)

boxes = keras.layers.Dense(4 * (classes - 1))(y)

model = keras.models.Model(x, [rpn_prediction, score, boxes])

model.compile(optimizer="adam", loss="mse")

@jhung0
Copy link
Contributor

jhung0 commented Jun 8, 2017

@JihongJu
Copy link
Contributor

JihongJu commented Jun 8, 2017

@0x00b1 Cool. Maybe a typo here

model.compile(optimizer="adam", loss="mse") # Should be rpn/rcnn losses

@jhung0
Copy link
Contributor

jhung0 commented Jun 8, 2017

I think that y_true and y_pred should be the same shape. Right now in the tests, classification has

y_pred = keras.backend.variable(0.5 * numpy.ones((1, 4, 4, n_anchors)))
y_true = keras.backend.variable(numpy.ones((1, 4, 4, 2 * n_anchors)))

and regression has

y_pred = keras.backend.variable(0.5 * numpy.ones((1, 4, 4, 4 * n_anchors)))
y_true = keras.backend.variable(numpy.ones((1, 4, 4, 8 * n_anchors)))

@jhung0
Copy link
Contributor

jhung0 commented Jun 8, 2017

https://github.com/mitmul/chainer-faster-rcnn/blob/v2/models/region_proposal_network.py
has output space

2 * n_anchors

for classification and

4 * n_anchors

for regression.

So

rpn_classification = keras.layers.Conv2D(9 * 2, (1, 1), activation="softmax")(y.layers[-2].output)

@jhung0
Copy link
Contributor

jhung0 commented Jun 8, 2017

my edits:

classes = 2

x = keras.layers.Input((224, 224, 3))

y = keras_resnet.ResNet50(x, include_top=False)

rpn_classification = keras.layers.Conv2D(9 * 2, (1, 1), activation="softmax")(y)

rpn_regression = keras.layers.Conv2D(9 * 4, (1, 1))(y)

@jhung0
Copy link
Contributor

jhung0 commented Jun 8, 2017

@JihongJu
Copy link
Contributor

JihongJu commented Jun 8, 2017

@jhung0 To answer you question about the y_true shape, the first anchors values indicate if the anchor is taken into account (1) or not (0). I think this implementation originally came from keras-frcnn, which I think is quite ugly. You could refer to the Anchor Layer for how the anchor target is generated for us.

@JihongJu
Copy link
Contributor

JihongJu commented Jun 8, 2017

@jhung0 Yes, indeed. and we already have some works done by @0x00b1 Anchor

@JihongJu
Copy link
Contributor

JihongJu commented Jun 9, 2017

@0x00b1 And we missed a pooling layer before the R-CNN C layers since ROI output fixed-size feature maps

y = keras_rcnn.layers.ROI((7, 7) 32)([x, proposals])
y = kera.layers.AveragePooling2D(pool_size=(7, 7))(y)
y = keras.layers.Dense(4096)(y)

@jhung0
Copy link
Contributor

jhung0 commented Jun 9, 2017

@JihongJu I don't think we need that weird y_true shape...? The losses seem to just depend on the values with the anchor taken into account like in https://github.com/rbgirshick/py-faster-rcnn/blob/master/models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt#L465

@JihongJu
Copy link
Contributor

JihongJu commented Jun 9, 2017

@jhung0 For rpn_cls_score, I think what matters is whether we want to use the softmax loss or the logistic loss. Because rpn will have only one possible class, I don't see a particular reason why we should use softmax loss. What do you think @0x00b1

@JihongJu
Copy link
Contributor

JihongJu commented Jun 9, 2017

@jhung0 I agree with you. y_true should have a shape as simple as (anchors,). But we will have 0, 1 and -1 in it. Probably we need a loss can ignore -1s in y_true.

@0x00b1
Copy link
Contributor Author

0x00b1 commented Jun 9, 2017

@0x00b1 And we missed a pooling layer before the R-CNN C layers since ROI output fixed-size feature maps

Nice catch. Updated my earlier comment! 😎

@0x00b1
Copy link
Contributor Author

0x00b1 commented Jun 9, 2017

@jhung0 For rpn_cls_score, I think what matters is whether we want to use the softmax loss or the logistic loss. Because rpn will have only one possible class, I don't see a particular reason why we should use softmax loss. What do you think @0x00b1

Totally. We shouldn’t use softmax.

@0x00b1
Copy link
Contributor Author

0x00b1 commented Jun 9, 2017

@jhung0 To answer you question about the y_true shape, the first anchors values indicate if the anchor is taken into account (1) or not (0). I think this implementation originally came from keras-frcnn, which I think is quite ugly. You could refer to the Anchor Layer for how the anchor target is generated for us.

@jhung0 Yeah, I dislike the keras-frcnn implementation too. The Anchor layer should have more or loss everything you need.

@JihongJu
Copy link
Contributor

JihongJu commented Jun 9, 2017

@0x00b1 Another thing we missed here is that these four layers

y = kera.layers.AveragePooling2D((7, 7))(y)
y = keras.layers.Dense(4096)(y)
score = keras.layers.Dense(classes, activation="softmax")(y)
boxes = keras.layers.Dense(4 * (classes - 1))(y)

should be applied per proposal. We will need the TimeDistributed layer from keras for this purpose.

@JihongJu
Copy link
Contributor

JihongJu commented Jun 9, 2017

@0x00b1 I modified the code above for the ResNet and added to #27.

@emedinac
Copy link

emedinac commented Jul 3, 2017

Good night to everyone.
I don't have the honor of being a contributor here and I'm not an expertise in keras programming, but I would like to suggest an idea about RPN "layer".

I saw into the file called "keras_rcnn/models.py" that the RPN was instantiated as a MODEL and not as a layer, as the original idea here in this amazing group. I know RPN is based on CNN, but I think RPN would be better if this block were instantiated as a layer to module this block. so, I propose this (of course, as I said I'm not a good programmer still, like the people working here):
Thank you for reading this message.

class RPN(keras.engine.topology.Layer):
    def __init__(self, anchors=9 , **kwargs):
        self.anchors_cls = anchors * 1
        self.anchors_reg = anchors * 4
        super(RPN, self).__init__(**kwargs)

    def build(self, input_shape):
        self.channels = self.anchors_cls + self.anchors_reg

    def call(self, inputs):
        # y = inputs.layers[-2].output
        y = inputs
        a = keras.layers.Conv2D(self.anchors_cls, (1, 1), activation="sigmoid")(y)
        b = keras.layers.Conv2D(self.anchors_reg, (1, 1))(y)

        y = keras.layers.concatenate([a, b]) 
        return y # [rpn_cls, rpn_reg]

    def compute_output_shape(self, input_shape):
        return None, input_shape[1], input_shape[2], self.channels  # shape=(?, 500, 375, 45) for VOC2012

For non-concatenated output could be this (I think this is not an elegant programming, but it works in cases I tested such as concatenate again the outputs):

    def call(self, inputs):
        # y = inputs.layers[-2].output
        y = inputs

        a = keras.layers.Conv2D(self.anchors_cls, (1, 1), activation="sigmoid")(y)
        b = keras.layers.Conv2D(self.anchors_reg, (1, 1))(y)

        # y = keras.layers.concatenate([a, b]) 
        return [a,b] # [rpn_cls, rpn_reg]

    def compute_output_shape(self, input_shape):
        out1 = None, input_shape[1], input_shape[2], self.anchors_cls, 
        out2 = None, input_shape[1], input_shape[2], self.anchors_reg
        return [out1,out2]
    def compute_mask(self, inputs, mask=None):
        return 2 * [None]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants