Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-example weights #2616

Closed
EdeMeijer opened this issue Jan 4, 2017 · 4 comments
Closed

Per-example weights #2616

EdeMeijer opened this issue Jan 4, 2017 · 4 comments
Labels
Enhancement New features and other enhancements

Comments

@EdeMeijer
Copy link

I was discussing the option of a per-example weights feature on gitter with @AlexDBlack, but decided to continue it in an issue.

So, I would like to be able to use per-example weights in order to boost the importance of certain examples. One option is to use oversampling, but that has some drawbacks:

  • Only effectively allowing integer weights
  • Makes training slower (more artificial examples), especially if you need big weights

Just for fun I tried to implement example weights and make it work with comp graphs, which wasn't too hard. POC can be found here (nd4j) and here (dl4j). However, after a little discussion Alex figured it might be easier to implement this feature by extending the masking infrastructure, and introduce an additional feature while we're at it: per-output-timestep masking and weighing. This could be done by adding a dimension to the output mask in case the user desires to specify weights for individual time steps.

So, what do you guys think?

@AlexDBlack
Copy link
Contributor

Right, my current thinking on the subject is basically that we can implement per-example (and: per time step) weights by extending the current masking infrastructure.
Currently masks are either 0 or 1 - but in principle could be any arbitrary weighting value.

Per output (not per example) weights could be implemented (along with per-output) masks as follows:
if mask rank == labels rank: it's per output masking
if mask rank == labels rank - 1: it's per example/time step masking (like what we have now: 2d mask array for per-time step masking of a 3d array)

Per output weighting is useful for some cases like missing values, some RL cases, and some architectures like this: http://arxiv.org/abs/1603.00806

One question of course is how other DL platforms handle this - it might be good to review that too.

@EdeMeijer
Copy link
Author

There's also the recently added feature of specifying per output weighing in the loss functions. Technically speaking, this could be encoded in the output masks as they are right now as well, by using a labels rank - 1 mask with weights.

@fac2003
Copy link

fac2003 commented Jan 13, 2017

We also need per example weights. It is useful when you have vastly imbalanced datasets (1 in 1 positive for every 1000 negative for instance). I am looking forward to the feature.

On the other hand, I personally prefer not to mix roles of methods and data attributes. Mixing masking and weighting is likely to be confusing. From a user point of view, I would prefer distinct methods (one for output weights, one for masks), with clear documentation each, even if you store the data in the INDArray in the mask fields internally.

You could make it so that clients can only call one of the methods. When reading the code, it will be clear what the intent is from the name of the method without having to think about the rank of the parameter.

@cacophany53
Copy link

As it stands, is it currently possible to specify per-sample weights by using mask values other than 0 or 1? (ie. 0.5 for half the weight of a normal sample, 2 for twice the weight).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New features and other enhancements
Projects
None yet
Development

No branches or pull requests

6 participants