Avoid numerical instability #1652

naftaliharris · 2014-07-30T06:46:56Z

This avoids basically doing 1 - 1, for example:

>>> from math import exp
>>> margin = -40
>>> 1 - 1 / (1 + exp(margin))
0.0
>>> exp(margin) / (1 + exp(margin))
4.248354255291589e-18
>>>

This avoids basically doing 1 - 1, for example: >>> from math import exp >>> margin = -40 >>> 1 - 1 / (1 + exp(margin)) 0.0 >>> exp(margin) / (1 + exp(margin)) 4.248354255291589e-18 >>>

AmplabJenkins · 2014-07-30T06:47:25Z

Can one of the admins verify this patch?

srowen · 2014-07-30T10:01:07Z

Y'know, there's a similar issue in LogisticGradient.scala, in lines like:

math.log(1 + math.exp(margin))

For -40, this gives 0.0, when really it should be about math.exp(-40) = 4.248354255291589e-18, since log(1+x) ~= x for very small x. This one can be fixed up with

math.log1p(math.exp(margin))

I'll have a look for other instances beyond the 4 I see and open a JIRA? I could mention this PR too to bring it under one umbrella.

srowen · 2014-07-30T11:41:00Z

See also https://issues.apache.org/jira/browse/SPARK-2748 and #1659 . This could be considered part of SPARK-2748.

mengxr · 2014-07-30T15:52:15Z

Jenkins, add to whitelist.

mengxr · 2014-07-30T15:52:21Z

Jenkins, test this please.

SparkQA · 2014-07-30T15:53:53Z

QA tests have started for PR 1652. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17453/consoleFull

…Math.exp, Math.log In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However the correct answer is very near `p`. This is why `Math.log1p` exists. Similarly for one instance of `exp(m) - 1` in GraphX; there's a special `Math.expm1` method. While the errors occur only for very small arguments, given their use in machine learning algorithms, this is entirely possible. Also note the related PR for Python: #1652 Author: Sean Owen <srowen@gmail.com> Closes #1659 from srowen/SPARK-2748 and squashes the following commits: c5926d4 [Sean Owen] Use log1p, expm1 for better precision for tiny arguments

SparkQA · 2014-07-30T16:41:11Z

QA results for PR 1652:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17453/consoleFull

mengxr · 2014-07-30T16:57:20Z

LGTM. Merged into master. Thanks!

naftaliharris · 2014-07-30T18:05:26Z

Awesome, thank you! :-)

…Math.exp, Math.log In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However the correct answer is very near `p`. This is why `Math.log1p` exists. Similarly for one instance of `exp(m) - 1` in GraphX; there's a special `Math.expm1` method. While the errors occur only for very small arguments, given their use in machine learning algorithms, this is entirely possible. Also note the related PR for Python: apache#1652 Author: Sean Owen <srowen@gmail.com> Closes apache#1659 from srowen/SPARK-2748 and squashes the following commits: c5926d4 [Sean Owen] Use log1p, expm1 for better precision for tiny arguments

This avoids basically doing 1 - 1, for example: ```python >>> from math import exp >>> margin = -40 >>> 1 - 1 / (1 + exp(margin)) 0.0 >>> exp(margin) / (1 + exp(margin)) 4.248354255291589e-18 >>> ``` Author: Naftali Harris <naftaliharris@gmail.com> Closes apache#1652 from naftaliharris/patch-2 and squashes the following commits: 0d55a9f [Naftali Harris] Avoid numerical instability

Avoid numerical instability

0d55a9f

This avoids basically doing 1 - 1, for example: >>> from math import exp >>> margin = -40 >>> 1 - 1 / (1 + exp(margin)) 0.0 >>> exp(margin) / (1 + exp(margin)) 4.248354255291589e-18 >>>

naftaliharris mentioned this pull request Jul 30, 2014

[SPARK-2552][MLLIB] stabilize logistic function in pyspark #1493

Closed

srowen mentioned this pull request Jul 30, 2014

SPARK-2748 [MLLIB] [GRAPHX] Loss of precision for small arguments to Math.exp, Math.log #1659

Closed

asfgit closed this in e3d85b7 Jul 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid numerical instability #1652

Avoid numerical instability #1652

naftaliharris commented Jul 30, 2014

AmplabJenkins commented Jul 30, 2014

srowen commented Jul 30, 2014

srowen commented Jul 30, 2014

mengxr commented Jul 30, 2014

mengxr commented Jul 30, 2014

SparkQA commented Jul 30, 2014

SparkQA commented Jul 30, 2014

mengxr commented Jul 30, 2014

naftaliharris commented Jul 30, 2014

Avoid numerical instability #1652

Avoid numerical instability #1652

Conversation

naftaliharris commented Jul 30, 2014

AmplabJenkins commented Jul 30, 2014

srowen commented Jul 30, 2014

srowen commented Jul 30, 2014

mengxr commented Jul 30, 2014

mengxr commented Jul 30, 2014

SparkQA commented Jul 30, 2014

SparkQA commented Jul 30, 2014

mengxr commented Jul 30, 2014

naftaliharris commented Jul 30, 2014