Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid numerical instability #1652

Closed
wants to merge 1 commit into from

Conversation

naftaliharris
Copy link
Contributor

This avoids basically doing 1 - 1, for example:

>>> from math import exp
>>> margin = -40
>>> 1 - 1 / (1 + exp(margin))
0.0
>>> exp(margin) / (1 + exp(margin))
4.248354255291589e-18
>>>

This avoids basically doing 1 - 1, for example:

>>> from math import exp
>>> margin = -40
>>> 1 - 1 / (1 + exp(margin))
0.0
>>> exp(margin) / (1 + exp(margin))
4.248354255291589e-18
>>>
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Jul 30, 2014

Y'know, there's a similar issue in LogisticGradient.scala, in lines like:

math.log(1 + math.exp(margin))

For -40, this gives 0.0, when really it should be about math.exp(-40) = 4.248354255291589e-18, since log(1+x) ~= x for very small x. This one can be fixed up with

math.log1p(math.exp(margin))

I'll have a look for other instances beyond the 4 I see and open a JIRA? I could mention this PR too to bring it under one umbrella.

@srowen
Copy link
Member

srowen commented Jul 30, 2014

See also https://issues.apache.org/jira/browse/SPARK-2748 and #1659 . This could be considered part of SPARK-2748.

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2014

Jenkins, add to whitelist.

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA tests have started for PR 1652. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17453/consoleFull

asfgit pushed a commit that referenced this pull request Jul 30, 2014
…Math.exp, Math.log

In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However the correct answer is very near `p`. This is why `Math.log1p` exists.

Similarly for one instance of `exp(m) - 1` in GraphX; there's a special `Math.expm1` method.

While the errors occur only for very small arguments, given their use in machine learning algorithms, this is entirely possible.

Also note the related PR for Python: #1652

Author: Sean Owen <srowen@gmail.com>

Closes #1659 from srowen/SPARK-2748 and squashes the following commits:

c5926d4 [Sean Owen] Use log1p, expm1 for better precision for tiny arguments
@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA results for PR 1652:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17453/consoleFull

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2014

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in e3d85b7 Jul 30, 2014
@naftaliharris
Copy link
Contributor Author

Awesome, thank you! :-)

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…Math.exp, Math.log

In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However the correct answer is very near `p`. This is why `Math.log1p` exists.

Similarly for one instance of `exp(m) - 1` in GraphX; there's a special `Math.expm1` method.

While the errors occur only for very small arguments, given their use in machine learning algorithms, this is entirely possible.

Also note the related PR for Python: apache#1652

Author: Sean Owen <srowen@gmail.com>

Closes apache#1659 from srowen/SPARK-2748 and squashes the following commits:

c5926d4 [Sean Owen] Use log1p, expm1 for better precision for tiny arguments
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This avoids basically doing 1 - 1, for example:

```python
>>> from math import exp
>>> margin = -40
>>> 1 - 1 / (1 + exp(margin))
0.0
>>> exp(margin) / (1 + exp(margin))
4.248354255291589e-18
>>>
```

Author: Naftali Harris <naftaliharris@gmail.com>

Closes apache#1652 from naftaliharris/patch-2 and squashes the following commits:

0d55a9f [Naftali Harris] Avoid numerical instability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants