TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions #39

sfindeisen · 2022-05-16T20:03:50Z

Fixes #19

Mikolaj · 2022-05-16T22:15:03Z

Thank you very much, Staszek!

CI fails. In most of the cases, that's because your formula for tanh, while beautiful (no doubt, though I can't tell if it's better behaved numerically or if it's faster or slower and on what architectures), gives subtly different results than the old one. In some cases the whole computation is not numerically stable wrt to the tanh computation and then the difference becomes huge.

However, not all failures are clear, so let's try to understand them before deciding on a fix. A couple of failures are probably due to my sparse matrix experiment not coping well with matrices of undetermined size (as your (recip 2)). It's possible than changing the general multiplication by (recip 2) to scaling by a constant (dScale) would lessen the strain.

The last and embarrassing case is where a test says only "wrong result". It would be helpful to make it print any details.

sfindeisen · 2022-05-17T20:15:08Z

CI fails. In most of the cases, that's because your formula for tanh, while beautiful (no doubt, though I can't tell if it's better behaved numerically or if it's faster or slower and on what architectures), gives subtly different results than the old one. In some cases the whole computation is not numerically stable wrt to the tanh computation and then the difference becomes huge.

Right, and it also contains division (/). Is this the reason you're assuming your original formula is more accurate than mine?

However, not all failures are clear, so let's try to understand them before deciding on a fix. A couple of failures are probably due to my sparse matrix experiment not coping well with matrices of undetermined size (as your (recip 2)). It's possible than changing the general multiplication by (recip 2) to scaling by a constant (dScale) would lessen the strain.

Not that I understand, but I just eliminated (recip 2) from atanh, sinh and cosh, instead applying those functions to the primal component explicitly. Does this make sense? Why? :)

The last and embarrassing case is where a test says only "wrong result". It would be helpful to make it print any details.

Fixed the test to print out more info, on my machine it now produces:

Running 1 test suites...
Test suite shortTestForCI: RUNNING...
Short tests for CI
  MNIST RNN short tests
    randomLL2 140 0 3 5 54282 5.0e-2: FAIL (2.25s)
      test/common/TestMnistRNN.hs:173:
      wrong result: 30.061871495725264 is expected to be a member of [30.061871495723956,30.06187089990927]

which, again, is a subtle numerical error. But in which function and which number is closer to the truth?...

BTW why do those tests here on Github take 40 minutes to complete? Can we improve that?

sfindeisen · 2022-05-17T20:22:42Z

Not sure what to do with tan: all the forms I could find contain the division somewhere. For example, the first derivative: 1/cos^2.

Mikolaj · 2022-05-17T20:40:07Z

Right, and it also contains division (/). Is this the reason you're assuming your original formula is more accurate than mine?

I'm not assuming that. However, if I knew one of them is more accurate (or faster) and if I had to guess which, I'd assume mine, just because it's repeated on ML blogs, while yours is derived from calculus. Ultimately, checking the quality of our pre-defined functions is a separate task, so I wouldn't worry and settle on something uniform and cheap (e.g., not requiring a change of tests). Probably, avoiding division by zero is a good idea, too, but not at a too high complexity cost (I guess tan is fine; if users don't like it, let them propose a change). I'm learning this as I go, so I'm afraid I don't have good answers yet.

Not that I understand, but I just eliminated (recip 2) from atanh, sinh and cosh, instead applying those functions to the primal component explicitly. Does this make sense? Why? :)

Actually, what I had in mind was to change

cosh z = (recip 2) * ((exp z) + (exp (-z)))

to

cosh z = scale (recip 2) ((exp z) + (exp (-z)))

and this looks better to me, because it underlines that we are not using a bilinear function * that has a complex derivative, but we use a scaling by a constant that has a simple derivative. But your new version

cosh (D u u') = D (cosh u) (dScale (sinh u) u')

is even clearer to me, because it tell me what is the primal component and what is the dual component of the dual number (the derivative that, indeed, is a small expression tree). Please ask more, if what I said requires further clarification.

Fixed the test to print out more info, on my machine it now produces:
Running 1 test suites...
Test suite shortTestForCI: RUNNING...
Short tests for CI
  MNIST RNN short tests
    randomLL2 140 0 3 5 54282 5.0e-2: FAIL (2.25s)
      test/common/TestMnistRNN.hs:173:
      wrong result: 30.061871495725264 is expected to be a member of [30.061871495723956,30.06187089990927]
which, again, is a subtle numerical error. But in which function and which number is closer to the truth?...

Thank you for the fix. What is truth? I'd recommend to change the test to expect one of two results: the result you get at home and the result CI obtains with the same test (if it differs). Ignore the old results; they are close enough, so we probably haven't made an error.

BTW why do those tests here on Github take 40 minutes to complete? Can we improve that?

How much do they take on your machine? We can surely solve #14 and #20, but they are really hard (and #14 is almost finished).

Mikolaj · 2022-05-17T21:12:57Z

BTW, I've just merged a big PR to master. Apologies for the conflicts and for, probably, some extra boilerplate TODOs to do in DualNumber.hs.

The code on master is now hlint-clean except for a dozen of cases. You can use hlint to tell you which parens are unnecessary. But feel free to ignore or to silence via .hlint.yaml any hints that are not useful.

Mikolaj · 2022-05-18T09:19:56Z

Thank you for the boilerplate. I guess that's all the relevant TODOs? Could you rebase your branch instead of merge, to keep git history clean? We will then merge this PR with --no-ff to preserve PR number in git history.

sfindeisen · 2022-05-18T10:04:58Z

Thank you for the boilerplate. I guess that's all the relevant TODOs?

It looks so.

Could you rebase your branch instead of merge

Ups sorry, I just merged, then read this. Shall I make a new branch and replay my commits there?

sfindeisen · 2022-05-18T10:36:38Z

What is truth?

The real tanh result which should be there, if our floating point arithmetic was error-free. BTW, did you consider symbolic calculations and/or using your own arithmetic with infinite precision?

I'd recommend to change the test to expect one of two results: the result you get at home and the result CI obtains with the same test (if it differs).

No longer relevant since I rolled back my tanh implementation (the test is now green).

BTW why do those tests here on Github take 40 minutes to complete? Can we improve that?

How much do they take on your machine?

I had to stop them, but maybe I will run them again on some cold winter day.

Mikolaj · 2022-05-18T12:01:54Z

Ups sorry, I just merged, then read this. Shall I make a new branch and replay my commits there?

Oh, no, please don't bother, that's not a big deal. Also, if you rebase now, the merge commits are probably going to vanish (though you might have to solve a conflict or two manually, as is common with rebasing). Then just git push -f to this branch.

BTW, did you consider symbolic calculations and/or using your own arithmetic with infinite precision?

Not really because, as I'm slowly learnings, ML is fine with lack of precision and sometimes it's even good (introduces a fuzz that helps break away from local minima and not to overfit) and definitely not worth a big performance price to eliminate. I think ML applications even prefer smaller floats than 64 bits in order to lower peak memory. [Edit: it's only serious numerical instability that becomes a problem and for which, indeed, exact arithmetic would be a solution.]

I'd recommend to change the test to expect one of two results: the result you get at home and the result CI obtains with the same test (if it differs).

No longer relevant since I rolled back my tanh implementation (the test is now green).

Great. Thanks.

I had to stop them, but maybe I will run them again on some cold winter day.

I suggested in that ticket that running just the short test, the one that CI runs, may be a better idea than running both and concurrently. Did this suggestion make sense?

sfindeisen · 2022-05-18T15:06:04Z

Ups sorry, I just merged, then read this. Shall I make a new branch and replay my commits there?

Oh, no, please don't bother, that's not a big deal. Also, if you rebase now, the merge commits are probably going to vanish (though you might have to solve a conflict or two manually, as is common with rebasing). Then just git push -f to this branch.

Done.

BTW, did you consider symbolic calculations and/or using your own arithmetic with infinite precision?

Not really because, as I'm slowly learnings, ML is fine with lack of precision and sometimes it's even good (introduces a fuzz that helps break away from local minima and not to overfit) and definitely not worth a big performance price to eliminate. I think ML applications even prefer smaller floats than 64 bits in order to lower peak memory. [Edit: it's only serious numerical instability that becomes a problem and for which, indeed, exact arithmetic would be a solution.]

I was expecting that. Now the question is, do we ever experience those serious errors.

Anything else to do in this ticket?

Mikolaj · 2022-05-18T19:25:57Z

Anything else to do in this ticket?

Let me check the comments...

I think only this comment remains, if it makes sense to you (please ask if anything is unclear):

The code on master is now hlint-clean except for a dozen of cases. You can use hlint to tell you which parens are unnecessary. But feel free to ignore or to silence via .hlint.yaml any hints that are not useful.

Mikolaj

I've had a closer look at most of the formulas and they seem correct and probably fast, too. Thank you very much. Merging.

sfindeisen mentioned this pull request May 16, 2022

Do TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions #19

Closed

sfindeisen changed the title ~~TODOs in HordeAd.Core.DualNumber~~ TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions May 17, 2022

sfindeisen added 17 commits May 18, 2022 21:52

ignore vim swap files

bb1f59f

implement abs

284b61c

implement signum

640f64d

implement logBase

d60affb

implement tan

01e73ea

implement asin

90c9972

implement acos

7510b9b

implement atan

c926f3c

implement sinh

885b4ca

simplify logBase

5f96956

implement asinh

0fcc319

implement cosh

2788db4

simplify tanh

c77de6f

implement acosh

6f408ac

implement atanh

4219aac

simplify: asinh, acosh, atanh

c1a954e

implement sqrt

1c75691

sfindeisen added 27 commits May 18, 2022 21:52

improve test message

523ee78

restore the original formula for tanh

e314850

fix atanh (numerically better?)

a999199

fix sinh (numerically better?)

653d321

fix cosh (numerically better?)

5e7e48b

ignore vim .swo files

1039031

implement abs (Out version)

87e6371

implement signum (Out version)

f664951

implement tan (Out version)

db777df

implement asin (Out version)

d9eb11a

implement acos (Out version)

39236cc

implement atan (Out version)

68c0021

implement sinh (Out version)

715ef76

implement cosh (Out version)

5e5ea77

implement asinh (Out version)

fe7c800

implement acosh (Out version)

937b79d

implement atanh (Out version)

95a822f

implement abs

d0ddf0c

implement signum

2269460

even more boilerplate

08b9c67

implement logBase

ff995b8

implement sqrt

5d7ffde

implement logBase

18552f4

get rid of redundant brackets (hlint)

883b8e2

make explicit derivative of tan

367459f

make explicit derivative of asinh

2e41f83

make explicit derivative of acosh

f98ceb9

Mikolaj approved these changes May 18, 2022

View reviewed changes

Mikolaj merged commit 5bcecc7 into Mikolaj:master May 18, 2022

Mikolaj mentioned this pull request Jun 2, 2022

improve QuickCheck tests #51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions #39

TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions #39

sfindeisen commented May 16, 2022 •

edited

Loading

Mikolaj commented May 16, 2022

sfindeisen commented May 17, 2022

sfindeisen commented May 17, 2022

Mikolaj commented May 17, 2022

Mikolaj commented May 17, 2022

Mikolaj commented May 18, 2022

sfindeisen commented May 18, 2022 •

edited

Loading

sfindeisen commented May 18, 2022

Mikolaj commented May 18, 2022 •

edited

Loading

sfindeisen commented May 18, 2022

Mikolaj commented May 18, 2022

Mikolaj left a comment

TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions #39

TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions #39

Conversation

sfindeisen commented May 16, 2022 • edited Loading

Mikolaj commented May 16, 2022

sfindeisen commented May 17, 2022

sfindeisen commented May 17, 2022

Mikolaj commented May 17, 2022

Mikolaj commented May 17, 2022

Mikolaj commented May 18, 2022

sfindeisen commented May 18, 2022 • edited Loading

sfindeisen commented May 18, 2022

Mikolaj commented May 18, 2022 • edited Loading

sfindeisen commented May 18, 2022

Mikolaj commented May 18, 2022

Mikolaj left a comment

Choose a reason for hiding this comment

sfindeisen commented May 16, 2022 •

edited

Loading

sfindeisen commented May 18, 2022 •

edited

Loading

Mikolaj commented May 18, 2022 •

edited

Loading