Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Cost actually a Layer? #1311

Closed
wangkuiyi opened this issue Feb 10, 2017 · 3 comments
Closed

Is Cost actually a Layer? #1311

wangkuiyi opened this issue Feb 10, 2017 · 3 comments
Assignees

Comments

@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Feb 10, 2017

It seems that it is.

Consider the following network for image classification:

image -> conv -> softmax

If we are going to train it by minimizing the squared error, we can add

image -> conv -> softmax \
                          -> squared_error
                 label --/

So it seems that the "model" should contain only the part from image to softmax, and at training time we add label and squared_error.

@jacquesqiao
Copy link
Member

Yes, I agree with you.
According to the discussion of in the last few days, Cost is not a layer but a independent concept, the topology of neural network only contains image -> conv -> softmax, Cost will be used in train and test.

@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Feb 11, 2017

Hi @jacquesqiao , @helinwang and I learned something more from @emailweixu this afternoon that it is not flexible enough to express the to-be-trained network by a composition of model and cost.

@emailweixu gave us an example problem: suppose that we are going to learn a text embedding, f, of our inputs in the way that if we have "search result A is closer to query Q than search result B" as a training instance, we should have sim{f(A), f(Q} > sim{f(B), f(Q)}.

In order to learn the f, we need to construct the following 3-branch network:

A -> f -\
Q -> f --> cost
B -> f -/

In this example, the model is actually x -> f, but in above network, we'd have to replicate the model three times in order to learn the f. This is why that we cannot say that the to-be-trained network is a composition of a model and a cost.

Actually, I remember that @reyoung once said that he thought that a model should include the cost. But a precise statement should be that the cost should be included in the to-be-trained network, instead of the model.

After this discussion, we reached a point that

  1. in order to do training, we need a to-be-trained network,
  2. this to-be-trained network might consist one or more cost and one or more replication of the model.
  3. Once the to-be-trained network is trained, we should be able to extract the model part from it and use it for serving/inference.

@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Feb 11, 2017

By this conclusion

  1. cost is a layer, and
  2. it is not sufficient to define a topology as a "model" and a "cost",

I am closing this issue and open #1315 for the discussion of the concept network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants