Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Region Proposal Network (RPN) classification and regression losses #51

Conversation

hgaiser
Copy link
Contributor

@hgaiser hgaiser commented Jul 26, 2017

Not sure if this (PR) is the best way to show what I had in mind, but lets give it a shot :)

This PR is mainly intended to show how I more or less had implemented RPN losses in my own port. I'm guessing the losses don't make sense at the moment, they are only there to convey the idea. Mathematically they probably make no sense at the moment because I believe it receives different inputs.

Also I changed the ResNet classifier according to the changes in broadinstitute/keras-resnet#24, I'm not sure if the existing ResNet classifier in keras-rcnn was correct.

Also, the RPN layers appeared to miss an initial convolutional layer, called rpn_conv_3x3 in the Caffe prototxts. In addition, the convolutional layer for rpn class scores gave as output num_anchors, but it should give num_anchors * 2, since it classifies two classes (fg/bg).

These are the main contributions in this PR, aside from some minor corrections here and there.

I would gladly hear what you think @0x00b1 . Maybe we can discuss this further tomorrow, on Slack?

@0x00b1
Copy link
Contributor

0x00b1 commented Jul 27, 2017

Awesome. Thanks, @hgaiser! Two immediate questions after a quick review:

  • Where would you generate anchors?
  • Would you mind providing (hypothetical) training and prediction examples so I can better imagine the API?
  • Do you envision training in two-stages (RPN and RCNN)?

What’s your time zone? Let’s find a time to talk tomorrow (or Friday).

@0x00b1
Copy link
Contributor

0x00b1 commented Jul 27, 2017

cc: @JihongJu and @jhung0

@0x00b1
Copy link
Contributor

0x00b1 commented Jul 27, 2017

@hgaiser (et al.) I created a #keras-rcnn channel on the keras.io Slack! 😀

@hgaiser
Copy link
Contributor Author

hgaiser commented Jul 27, 2017

Awesome. Thanks, @hgaiser! Two immediate questions after a quick review:

Where would you generate anchors?

It doesn't show in this PR, but since the base anchors have nothing to do with the input, I think it'd be better to generate them while creating the model, ie., offline.

Would you mind providing (hypothetical) training and prediction examples so I can better imagine the API?

Sure, I'll do it in this comment:
Compilation has to be done with loss=None, since this PR is using custom loss functions only:

model.compile(optimizer='adam', loss=None)

Basically there are three options to do loss for RPN, none of which are really nice. Keras just doesn't have the tools to do loss computations like we need (see keras-team/keras#7395). The options are :

  1. "mangle" your input/output to fit the required y_true, y_pred. Not sure how this would work in this case because y_true cannot be known beforehand, it is depending on data generated inside the network.
  2. Compute the losses inside your network and use them as output. Then use an identity_loss (basically only a K.mean) to summarize the loss.
  3. Compute the loss in a custom layer and set loss=None when compiling. This is the approach from this example. Not a big fan, but it seems to be the closest thing to something that makes sense.

Preferably there will be a change in how Keras forces loss to be computed in such a strict way. I wrote an issue (keras-team/keras#7395) about it but so far no responses.. Until then, option 3 seems like our best bet.

As for the training process itself, I had written a data generator that retrieves images and annotations from our server. It then serves the image batch and ground truth boxes. The training command would look something like this:

model.fit_generator(
    generator=train_datagen,
    steps_per_epoch=train_datagen.n // batch_size,
    epochs=100,
    verbose=1,
    validation_data=val_datagen,
    validation_steps=50,
    callbacks=[
        keras.callbacks.ModelCheckpoint('snapshots/resnet50.h5', monitor='val_loss', verbose=1, save_best_only=True),
        keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=8, verbose=1, mode='auto', epsilon=0.0001, cooldown=0, min_lr=0),
    ],
)

Do you envision training in two-stages (RPN and RCNN)?

No I think we should stick with end2end training.

What’s your time zone? Let’s find a time to talk tomorrow (or Friday).

My timezone is UTC/GMT +2. You're in the states I guess ? :)

@hgaiser hgaiser force-pushed the region-proposal-network-losses-hgaiser branch from 6284600 to d738b0e Compare July 27, 2017 12:14
@hgaiser hgaiser force-pushed the region-proposal-network-losses-hgaiser branch from d738b0e to 5b764be Compare July 27, 2017 13:09
@hgaiser
Copy link
Contributor Author

hgaiser commented Jul 27, 2017

I pushed an updated version, I mistakenly thought ObjectProposal was supposed to be analogous to anchor_target_layer in py-faster-rcnn, but it is similar to proposal_layer instead. Should we add a custom layer for anchor target layer? Or would all three layers (anchor_target_layer, proposal_target_layer and proposal_layer) fit inside one layer?

@JihongJu
Copy link
Contributor

@hgaiser We used to have an anchor_target_layer alternative. But we decided to move generating anchor target to the RPN loss #7 . However, this direction didn't go well as expected because of the check for y_pred, y_true shape.
We might want to go back to @0x00b1 's original idea about implementing loss for intermediate layers as regularization. In that case, we will probably need the anchor_target_layer back again.

@0x00b1
Copy link
Contributor

0x00b1 commented Jul 28, 2017

Hey, @hgaiser:

I really appreciate your feedback! I think we’re all circling around the same issues.

It doesn't show in this PR, but since the base anchors have nothing to do with the input, I think it'd be better to generate them while creating the model, ie., offline.

It’s certainly easier! Unfortunately, the downside is that you’re removing needed functionality from the compute graph produced by your Keras backend. Initially, we planned (and added) anchor generation inside the ObjectDetectionGenerator, but we quickly realized it’d break compatability with downstream functionality that didn’t use Keras generators (e.g. Apple’s Core ML framework or Google’s Cloud ML service).

"mangle" your input/output to fit the required y_true, y_pred. Not sure how this would work in this case because y_true cannot be known beforehand, it is depending on data generated inside the network.

I’m probably misunderstanding your meaning, but if you’re talking about the shapes produced by the reference bounding box convolutional layer and the objectness convolutional layer, I believe they can be dynamically inferred:

(samples, anchors × 4, height / 4, width / 4)

and:

(samples, anchors × 1, height / 4, width / 4)

Is that what you mean?

Compute the losses inside your network and use them as output. Then use an identity_loss (basically only a K.mean) to summarize the loss.

I’m curious, what do you think about this? I really like the idea of treating the RPN losses as regularization terms. Moreover, it’d simplify the training and testing differences.

My timezone is UTC/GMT +2. You're in the states I guess ? :)

Yep! I'm in Boston.

@0x00b1
Copy link
Contributor

0x00b1 commented Jul 28, 2017

@hgaiser We used to have an anchor_target_layer alternative. But we decided to move generating anchor target to the RPN loss #7 . However, this direction didn't go well as expected because of the check for y_pred, y_true shape.
We might want to go back to @0x00b1 's original idea about implementing loss for intermediate layers as regularization. In that case, we will probably need the anchor_target_layer back again.

Yeah, I agree. I think it was worth a shot, but it feels like we need to reintroduce a layer. What do you think @hgaiser? Is anybody excited to work on this?

@hgaiser
Copy link
Contributor Author

hgaiser commented Jul 30, 2017

It’s certainly easier! Unfortunately, the downside is that you’re removing needed functionality from the compute graph produced by your Keras backend. Initially, we planned (and added) anchor generation inside the ObjectDetectionGenerator, but we quickly realized it’d break compatability with downstream functionality that didn’t use Keras generators (e.g. Apple’s Core ML framework or Google’s Cloud ML service).

Ah so with an eye on deployment on phones or the cloud or anything basically that doesn't have numpy, it is better to generate the anchors using Keras? I see. Though wouldn't the position it has in master mean anchors are generated every forward pass?

I’m probably misunderstanding your meaning, but if you’re talking about the shapes produced by the reference bounding box convolutional layer and the objectness convolutional layer, I believe they can be dynamically inferred:

(samples, anchors × 4, height / 4, width / 4)
and:

(samples, anchors × 1, height / 4, width / 4)

I'm not sure about those shapes, but even if the shapes can be inferred, its content cannot (ie. which of the anchors are labelled as true and which are labelled as false and the bbox targets). Since those cannot be inferred beforehand, the 'traditional' Keras loss method won't work.

I’m curious, what do you think about this? I really like the idea of treating the RPN losses as regularization terms. Moreover, it’d simplify the training and testing differences.

How would this work? Don't regularization functions receive weights and biases from a specific layer? Would you make a custom layer where the prediction and target values are 'weights' so that you can perform regularization on those 'weights' ? This seems like a big workaround.

@0x00b1
Copy link
Contributor

0x00b1 commented Jul 31, 2017

Ah so with an eye on deployment on phones or the cloud or anything basically that doesn't have numpy, it is better to generate the anchors using Keras? I see.

Exactly. Unfortunately, this may prove (or has proved) too ambitious for the moment. Especially if we’re interested in a functional release in the next few weeks.

Though wouldn't the position it has in master mean anchors are generated every forward pass?

What do you mean by “the position it has in master?”

I'm not sure about those shapes, but even if the shapes can be inferred, its content cannot (ie. which of the anchors are labelled as true and which are labelled as false and the bbox targets). Since those cannot be inferred beforehand, the 'traditional' Keras loss method won't work.

Good point.

How would this work? Don't regularization functions receive weights and biases from a specific layer? Would you make a custom layer where the prediction and target values are 'weights' so that you can perform regularization on those 'weights' ? This seems like a big workaround.

Yeah, it’s far from ideal. And the implementation would be clumsy.

@hgaiser What strategy would you choose? Generate anchors from the ObjectDetectionGenerator (similar to the existing implementation inspired by @yhenon)? Fix Keras’ inflexibility around losses? Nevertheless, I think we should probably choose something sooner than later so we can start testing other parts (e.g. augmentation, RCNN, Mask RCNN, etc.).

cc: @JihongJu @jhung0 @mcquin

@hgaiser
Copy link
Contributor Author

hgaiser commented Jul 31, 2017

What do you mean by “the position it has in master?”

Ah I see that part isn't in master, but introduced in the RPN PR. I meant that keras_rcnn.backend.anchor() is called in keras_rcnn.backend.shift. I'm new to tensorflow / keras so I might be mistaken, but doesn't that mean anchors are generated everytime shift is called? It's not a big performance issue, but it can be removed from shift.

@hgaiser What strategy would you choose? Generate anchors from the ObjectDetectionGenerator (similar to the existing implementation inspired by @yhenon)? Fix Keras’ inflexibility around losses? Nevertheless, I think we should probably choose something sooner than later so we can start testing other parts (e.g. augmentation, RCNN, Mask RCNN, etc.).

Preferably we can define the anchors as constants somehow, which are only computed once. It's not a big deal so if this proves to be difficult, generating them every inference is also fine.

Fixing Keras' inflexibility around losses is probably going to be difficult, since it's a major overhaul of their API. Making an addition to the Keras API, keeping the original intact, is an option, but will probably also take some time before that is approved and merged. I think the best option is to implement it using the "variational_autoencoder" method. If something like what I proposed in keras-team/keras#7395 is going to happen, which I believe is the correct way forward for Keras, then the "variational_autoencoder" method would resemble the proposed solution the most. It would be a very small change to move from that to the proposed design. The alternatives would require more workarounds ('mangling' of data, custom layers with custom weights to use regularization, weird identity loss functions, etc.) and therefore deviate more from the 'desired' approach. I think it makes sense to make an attempt to implement as much as possible the 'desired' approach, get that to a working state, and make an attempt to fix Keras to allow the complete 'desired' approach.

@hgaiser
Copy link
Contributor Author

hgaiser commented Aug 1, 2017

I will close this PR as it has served its purpose.

@hgaiser hgaiser closed this Aug 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants