Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat network outputs differently depending on their names #193

Closed
lukeyeager opened this issue Aug 4, 2015 · 6 comments
Closed

Treat network outputs differently depending on their names #193

lukeyeager opened this issue Aug 4, 2015 · 6 comments
Labels

Comments

@lukeyeager
Copy link
Member

Jon just had a good idea.

What if instead of creating multiple paths in DIGITS like "Classification", "Generic Inference", "Bounding Boxes", etc., we had only a single generic path, and the decision about what to do with the network outputs was handled only by looking at the name of the outputs?

Output name Assumed network type Action
classifications Classification print confidence for top 5 classes
bbox Bounding Box draw rectangle on top of input image
segmentation Segmentation show each pixel as a color corresponding to its predicted class
other Generic just print the numbers

This is worth giving some more thought ...

@semisight
Copy link
Contributor

This would have to be very explicit in the docs. I can see someone changing the output layer name to something more meaningful to them, and then not understanding why they broke it.

I do like where the idea is going however. Can we move the "signal" out of band somewhere? Maybe to a param in the prototxt file? Like an "optional" param at the top of the prototxt that tells us to do extra processing (like bbox for example).

@gheinrich
Copy link
Contributor

It feels to me like a natural way of doing it would be to extend the model with one (or more) visualisation layer whose meaning would only be of significance to the DL front-end. If caffe does not want to ignore the layer it could be implemented as an identify layer. For example:

layer {
  name: "myVisu"
  type: "Visualisation"
  bottom: "output"
  visualization_param {
    type: "classification" 
    classification_param {
      top_confidence: 5
    }
  }
  include {
    phase: TEST
  }
}

@lukeyeager
Copy link
Member Author

@semisight, why not just store the network type as a piece of DIGITS metadata? Why does it need to be included in the prototxt? We can still do it in such a way that we can merge the Classification and Generic Inference paths.

if model.output_type == 'classification':
    # do something
elif model.output_type == 'bbox':
    # do something else
else:
    # default - treat as generic inference

@gheinrich, are you suggesting the network should output strings like Dog - 90%? The output of Caffe has to be n-dimensional blobs, not strings. I think it's fine to leave the interpretation of the outputs as a post-processing step external to the network definition.

@semisight
Copy link
Contributor

@lukeyeager that's fine with me. I know we don't really "own" the prototxt so it's probably not a good idea to modify it. The in-name solution just looks like a bit too much magic for me.

After thinking about it, I don't think Caffe sits at a high enough level to "care" about whether a network is classification or not. So I think storing it as DIGITS metadata is probably the best way to do it.

@gheinrich
Copy link
Contributor

My thinking was that, conceptually, the visualisation of results is the final layer in your network. The DL back-end does not need to deal with it (the visualisation layers could be either hidden from the DL back-end, or implemented as an identity layer).
Since the model is defined by ways of prototxt files it would be consistent to define the visualisation in a prototxt file. So you could have a visualisation.prototxt where you can specify the various visualisations that you would like to have. This would allow you to specify different visualisations for different layers so DIGITS does not have to make assumptions about what the user wants to see (for example I think there is an assumption in DIGITS that in order to show top-N predictions you can just look at the penultimate layer - this might not hold for all classification networks). The prototxt format is handy because it is easily extendable, shared, and requires no buttons. Any text format would do though.

@lukeyeager
Copy link
Member Author

Closed by #756 (with a much better implementation than what I had originally proposed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants