Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for poisson regression. #1001

Merged
merged 9 commits into from Jun 17, 2016

Conversation

sharatsc
Copy link
Contributor

Adding support for Possion regression. Does not support --invariant mode yet. Haven't figured out the closed form for the update. Currently works with --normalized as well as --bfgs

Usage documented in a python notebook:
https://github.com/sharatsc/vowpal_wabbit/blob/c1e95f98038c628d0ff3995fb3d85bf7c35eefa3/python/poisson_regression.ipynb

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 69.59% when pulling c1e95f9 on sharatsc:poisson into 68c22d9 on JohnLangford:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.003%) to 69.705% when pulling 6f88a55 on sharatsc:poisson into 68c22d9 on JohnLangford:master.

@sharatsc
Copy link
Contributor Author

Coverage is off by rounding error: -0.003%

Appveyor is failing on the same location as an unrelated pull request
#1000

This could be a cs code issue (or CLR version difference b/w Travis and the CS wrapper). Can @eisber somebody take a look?

@JohnLangford
Copy link
Member

This seems good except that getUpdate is not safe. In particular, a very large learning rate/importance weight will result in the update going crazy. Have you considered designing a safe update along the lines of this paper: http://arxiv.org/abs/1011.1576 ? This has been a clear win experimentally.

@sharatsc
Copy link
Contributor Author

Table 1 does list a closed form solution for Poisson. I'll try to work it out. In the mean time, it does work with --normalized and --bfgs.

@JohnLangford
Copy link
Member

Nikos says he has it worked out.

@n17s
Copy link
Contributor

n17s commented Jun 1, 2016

Hi folks,

Yes the paper does not list Poisson regression but it can be derived by solving the ODE. Then you have to translate between paper and VW which I believe can be summarized in the following changes:

  • VW computes -s() and the table lists s()
  • VW replaces (learning rate * importance) by update_scale
  • VW replaces norm(x)^2 by pred_per_update

With these in mind and some help from alpha
https://www.wolframalpha.com/input/?i=solve+y%27(t)%3Da*(exp(b-y(t)*c)-d),+y(0)%3D0
my recommendation is

float getUpdate(float prediction, float label, float update_scale, float pred_per_update) {
    if (label > 0) {
      return label * update_scale - log1p(exp(prediction)*expm1(label * update_scale * pred_per_update)/label)/pred_per_update;
    }
    else {
       return - log1p(exp(prediction) * update_scale * pred_per_update)/pred_per_update;
    }
  }

I believe I have done this carefully enough but I have not tested it. Sharat can you verify that this update is at least on par with yours?

@JohnLangford
Copy link
Member

Sharat have you had a chance to look at Nikos's update?

@sharatsc
Copy link
Contributor Author

sharatsc commented Jun 9, 2016

Looking at it now.

@eisber
Copy link
Collaborator

eisber commented Jun 10, 2016

You need to use # Test 133: and not # Train 133:

@sharatsc
Copy link
Contributor Author

Nikos,
I was able to derive the same. Derivation outlined here: https://dl.dropboxusercontent.com/u/1897767/Implementing%20poisson%20loss%20in%20VW.pdf

The expression in notation used in the original paper (http://bit.ly/1Us0oVe) vs. your derivation (http://bit.ly/25ZG6Gp)

@JohnLangford
Copy link
Member

Can you implement today? I want to drop a new version soon.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0002%) to 70.123% when pulling 8761d06 on sharatsc:poisson into 8e0fd1c on JohnLangford:master.

@sharatsc
Copy link
Contributor Author

safe update changes incorporated along fixing with @eisber's comments. The windows build seems to be broken (also on master)

Windows build failing with unrelated error
Assert.AreEqual failed.
Expected:<--no_stdin --max_prediction 1 --bit_precision 18 --cb_adf --cb_type ips --rank_all --csoaa_ldf multiline --interact ud --csoaa_rank>.
Actual:<--no_stdin --max_prediction 1 --bit_precision 18 --cb_adf --cb_type ips --rank_all --csoaa_ldf multiline --csoaa_rank --interact ud>.

@eisber
Copy link
Collaborator

eisber commented Jun 14, 2016

My PR has fixes for this… essentially replace the expected with the actual string in TestArguments.cs

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0002%) to 70.123% when pulling 5b0d86c on sharatsc:poisson into 8e0fd1c on JohnLangford:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 70.158% when pulling 960f2a5e6195838410b84a8a4c4b1cc5f358128a on sharatsc:poisson into 8e0fd1c on JohnLangford:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 70.158% when pulling cf57a61ddb45ad980a70acf523d695c425b3ec76 on sharatsc:poisson into 8e0fd1c on JohnLangford:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 70.158% when pulling b0bb984 on sharatsc:poisson into 8e0fd1c on JohnLangford:master.

@sharatsc
Copy link
Contributor Author

Windows build has gotten quite flaky. We should consider spinning off cs wrapper as a separate project that consumes the vw nuget? Build time is taken up with cs and cs-tests

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 70.158% when pulling 22fa41a on sharatsc:poisson into 8e0fd1c on JohnLangford:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 70.158% when pulling 18cf9a720930ce8482fd2efaa927a9a32299c103 on sharatsc:poisson into 8e0fd1c on JohnLangford:master.

@JohnLangford
Copy link
Member

This looks good to go to me. Not sure you should worry about the windows-side tests as @eisber is working on that.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 70.158% when pulling 56418c898105a5a7a587d29e283bc93bca5ef462 on sharatsc:poisson into 8e0fd1c on JohnLangford:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 70.158% when pulling ae3b4d0 on sharatsc:poisson into 8e0fd1c on JohnLangford:master.

@sharatsc
Copy link
Contributor Author

@JohnLangford this is ready to go. Fixed windows errors as well.

@JohnLangford JohnLangford merged commit 08f4f58 into VowpalWabbit:master Jun 17, 2016
@JohnLangford
Copy link
Member

This is merged in, thanks. All, we now have Poisson loss working thanks
to Sharat and Nikos.

@eisber there are some minor changes to make the windows test work.

-John

On Wed, Jun 15, 2016 at 10:33 AM, Sharat Chikkerur <notifications@github.com

wrote:

@JohnLangford https://github.com/JohnLangford this is ready to go.
Fixed windows errors as well.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1001 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAE25pK6tpnK5PAj11G7xOeNF4uvWJnZks5qMA08gaJpZM4Inn0d
.

@trufanov-nok
Copy link
Contributor

Hello,
Does poisson loss supposed to support negative labels? Like {-1,1}.
According to poisson_loss::getUpdate() it does as it has a special case for negative labels.
But poisson_loss::getLoss() returns 0 for -1 labels and I'm getting no weights update. Looks like bcs logf(label + 1e-6) is NaN here. So it doesn't support labels that is < 0 ...

@n17s
Copy link
Contributor

n17s commented Jun 20, 2016

It should not support labels <0 as the prediction is exp(w*x). The
assumption is that labels are >=0 so the else part is to avoid division by
0.

Nikos
On Jun 20, 2016 6:36 AM, "Alexander Trufanov" notifications@github.com
wrote:

Hello,
Does poisson loss supposed to support negative labels? Like {-1,1}.
According to poisson_loss::getUpdate() it does as it has
https://github.com/JohnLangford/vowpal_wabbit/blob/master/vowpalwabbit/loss_functions.cc#L306
a special case for negative labels.
But poisson_loss::getLoss() returns 0 for -1 labels and I'm getting no
weights update. Looks like bcs logf(label + 1e-6) is NaN here
https://github.com/JohnLangford/vowpal_wabbit/blob/master/vowpalwabbit/loss_functions.cc#L300.
So it doesn't support labels that is < 0 ...


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1001 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEU1UHTT0yZ_AN0quKwwivxNrq8iXaeks5qNm1GgaJpZM4Inn0d
.

@trufanov-nok
Copy link
Contributor

Ok, then it may have sense to put a label check and a warning in its getLoss() as it's done for logloss...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants