New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-example weights #2616
Comments
Right, my current thinking on the subject is basically that we can implement per-example (and: per time step) weights by extending the current masking infrastructure. Per output (not per example) weights could be implemented (along with per-output) masks as follows: Per output weighting is useful for some cases like missing values, some RL cases, and some architectures like this: http://arxiv.org/abs/1603.00806 One question of course is how other DL platforms handle this - it might be good to review that too. |
There's also the recently added feature of specifying per output weighing in the loss functions. Technically speaking, this could be encoded in the output masks as they are right now as well, by using a labels rank - 1 mask with weights. |
We also need per example weights. It is useful when you have vastly imbalanced datasets (1 in 1 positive for every 1000 negative for instance). I am looking forward to the feature. On the other hand, I personally prefer not to mix roles of methods and data attributes. Mixing masking and weighting is likely to be confusing. From a user point of view, I would prefer distinct methods (one for output weights, one for masks), with clear documentation each, even if you store the data in the INDArray in the mask fields internally. You could make it so that clients can only call one of the methods. When reading the code, it will be clear what the intent is from the name of the method without having to think about the rank of the parameter. |
As it stands, is it currently possible to specify per-sample weights by using mask values other than 0 or 1? (ie. 0.5 for half the weight of a normal sample, 2 for twice the weight). |
I was discussing the option of a per-example weights feature on gitter with @AlexDBlack, but decided to continue it in an issue.
So, I would like to be able to use per-example weights in order to boost the importance of certain examples. One option is to use oversampling, but that has some drawbacks:
Just for fun I tried to implement example weights and make it work with comp graphs, which wasn't too hard. POC can be found here (nd4j) and here (dl4j). However, after a little discussion Alex figured it might be easier to implement this feature by extending the masking infrastructure, and introduce an additional feature while we're at it: per-output-timestep masking and weighing. This could be done by adding a dimension to the output mask in case the user desires to specify weights for individual time steps.
So, what do you guys think?
The text was updated successfully, but these errors were encountered: