Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions #39

Merged
merged 51 commits into from
May 18, 2022
Merged

TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions #39

merged 51 commits into from
May 18, 2022

Conversation

sfindeisen
Copy link
Contributor

@sfindeisen sfindeisen commented May 16, 2022

Fixes #19

@Mikolaj
Copy link
Owner

Mikolaj commented May 16, 2022

Thank you very much, Staszek!

CI fails. In most of the cases, that's because your formula for tanh, while beautiful (no doubt, though I can't tell if it's better behaved numerically or if it's faster or slower and on what architectures), gives subtly different results than the old one. In some cases the whole computation is not numerically stable wrt to the tanh computation and then the difference becomes huge.

However, not all failures are clear, so let's try to understand them before deciding on a fix. A couple of failures are probably due to my sparse matrix experiment not coping well with matrices of undetermined size (as your (recip 2)). It's possible than changing the general multiplication by (recip 2) to scaling by a constant (dScale) would lessen the strain.

The last and embarrassing case is where a test says only "wrong result". It would be helpful to make it print any details.

@sfindeisen sfindeisen changed the title TODOs in HordeAd.Core.DualNumber TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions May 17, 2022
@sfindeisen
Copy link
Contributor Author

CI fails. In most of the cases, that's because your formula for tanh, while beautiful (no doubt, though I can't tell if it's better behaved numerically or if it's faster or slower and on what architectures), gives subtly different results than the old one. In some cases the whole computation is not numerically stable wrt to the tanh computation and then the difference becomes huge.

Right, and it also contains division (/). Is this the reason you're assuming your original formula is more accurate than mine?

However, not all failures are clear, so let's try to understand them before deciding on a fix. A couple of failures are probably due to my sparse matrix experiment not coping well with matrices of undetermined size (as your (recip 2)). It's possible than changing the general multiplication by (recip 2) to scaling by a constant (dScale) would lessen the strain.

Not that I understand, but I just eliminated (recip 2) from atanh, sinh and cosh, instead applying those functions to the primal component explicitly. Does this make sense? Why? :)

The last and embarrassing case is where a test says only "wrong result". It would be helpful to make it print any details.

Fixed the test to print out more info, on my machine it now produces:

Running 1 test suites...
Test suite shortTestForCI: RUNNING...
Short tests for CI
  MNIST RNN short tests
    randomLL2 140 0 3 5 54282 5.0e-2: FAIL (2.25s)
      test/common/TestMnistRNN.hs:173:
      wrong result: 30.061871495725264 is expected to be a member of [30.061871495723956,30.06187089990927]

which, again, is a subtle numerical error. But in which function and which number is closer to the truth?...

BTW why do those tests here on Github take 40 minutes to complete? Can we improve that?

@sfindeisen
Copy link
Contributor Author

Not sure what to do with tan: all the forms I could find contain the division somewhere. For example, the first derivative: 1/cos^2.

@Mikolaj
Copy link
Owner

Mikolaj commented May 17, 2022

Right, and it also contains division (/). Is this the reason you're assuming your original formula is more accurate than mine?

I'm not assuming that. However, if I knew one of them is more accurate (or faster) and if I had to guess which, I'd assume mine, just because it's repeated on ML blogs, while yours is derived from calculus. Ultimately, checking the quality of our pre-defined functions is a separate task, so I wouldn't worry and settle on something uniform and cheap (e.g., not requiring a change of tests). Probably, avoiding division by zero is a good idea, too, but not at a too high complexity cost (I guess tan is fine; if users don't like it, let them propose a change). I'm learning this as I go, so I'm afraid I don't have good answers yet.

Not that I understand, but I just eliminated (recip 2) from atanh, sinh and cosh, instead applying those functions to the primal component explicitly. Does this make sense? Why? :)

Actually, what I had in mind was to change

cosh z = (recip 2) * ((exp z) + (exp (-z)))

to

cosh z = scale (recip 2) ((exp z) + (exp (-z)))

and this looks better to me, because it underlines that we are not using a bilinear function * that has a complex derivative, but we use a scaling by a constant that has a simple derivative. But your new version

cosh (D u u') = D (cosh u) (dScale (sinh u) u')

is even clearer to me, because it tell me what is the primal component and what is the dual component of the dual number (the derivative that, indeed, is a small expression tree). Please ask more, if what I said requires further clarification.

Fixed the test to print out more info, on my machine it now produces:

Running 1 test suites...
Test suite shortTestForCI: RUNNING...
Short tests for CI
  MNIST RNN short tests
    randomLL2 140 0 3 5 54282 5.0e-2: FAIL (2.25s)
      test/common/TestMnistRNN.hs:173:
      wrong result: 30.061871495725264 is expected to be a member of [30.061871495723956,30.06187089990927]

which, again, is a subtle numerical error. But in which function and which number is closer to the truth?...

Thank you for the fix. What is truth? I'd recommend to change the test to expect one of two results: the result you get at home and the result CI obtains with the same test (if it differs). Ignore the old results; they are close enough, so we probably haven't made an error.

BTW why do those tests here on Github take 40 minutes to complete? Can we improve that?

How much do they take on your machine? We can surely solve #14 and #20, but they are really hard (and #14 is almost finished).

@Mikolaj
Copy link
Owner

Mikolaj commented May 17, 2022

BTW, I've just merged a big PR to master. Apologies for the conflicts and for, probably, some extra boilerplate TODOs to do in DualNumber.hs.

The code on master is now hlint-clean except for a dozen of cases. You can use hlint to tell you which parens are unnecessary. But feel free to ignore or to silence via .hlint.yaml any hints that are not useful.

@Mikolaj
Copy link
Owner

Mikolaj commented May 18, 2022

Thank you for the boilerplate. I guess that's all the relevant TODOs? Could you rebase your branch instead of merge, to keep git history clean? We will then merge this PR with --no-ff to preserve PR number in git history.

@sfindeisen
Copy link
Contributor Author

sfindeisen commented May 18, 2022

Thank you for the boilerplate. I guess that's all the relevant TODOs?

It looks so.

Could you rebase your branch instead of merge

Ups sorry, I just merged, then read this. Shall I make a new branch and replay my commits there?

@sfindeisen
Copy link
Contributor Author

What is truth?

The real tanh result which should be there, if our floating point arithmetic was error-free. BTW, did you consider symbolic calculations and/or using your own arithmetic with infinite precision?

I'd recommend to change the test to expect one of two results: the result you get at home and the result CI obtains with the same test (if it differs).

No longer relevant since I rolled back my tanh implementation (the test is now green).

BTW why do those tests here on Github take 40 minutes to complete? Can we improve that?

How much do they take on your machine?

I had to stop them, but maybe I will run them again on some cold winter day.

@Mikolaj
Copy link
Owner

Mikolaj commented May 18, 2022

Ups sorry, I just merged, then read this. Shall I make a new branch and replay my commits there?

Oh, no, please don't bother, that's not a big deal. Also, if you rebase now, the merge commits are probably going to vanish (though you might have to solve a conflict or two manually, as is common with rebasing). Then just git push -f to this branch.

BTW, did you consider symbolic calculations and/or using your own arithmetic with infinite precision?

Not really because, as I'm slowly learnings, ML is fine with lack of precision and sometimes it's even good (introduces a fuzz that helps break away from local minima and not to overfit) and definitely not worth a big performance price to eliminate. I think ML applications even prefer smaller floats than 64 bits in order to lower peak memory. [Edit: it's only serious numerical instability that becomes a problem and for which, indeed, exact arithmetic would be a solution.]

I'd recommend to change the test to expect one of two results: the result you get at home and the result CI obtains with the same test (if it differs).

No longer relevant since I rolled back my tanh implementation (the test is now green).

Great. Thanks.

I had to stop them, but maybe I will run them again on some cold winter day.

I suggested in that ticket that running just the short test, the one that CI runs, may be a better idea than running both and concurrently. Did this suggestion make sense?

@sfindeisen
Copy link
Contributor Author

Ups sorry, I just merged, then read this. Shall I make a new branch and replay my commits there?

Oh, no, please don't bother, that's not a big deal. Also, if you rebase now, the merge commits are probably going to vanish (though you might have to solve a conflict or two manually, as is common with rebasing). Then just git push -f to this branch.

Done.

BTW, did you consider symbolic calculations and/or using your own arithmetic with infinite precision?

Not really because, as I'm slowly learnings, ML is fine with lack of precision and sometimes it's even good (introduces a fuzz that helps break away from local minima and not to overfit) and definitely not worth a big performance price to eliminate. I think ML applications even prefer smaller floats than 64 bits in order to lower peak memory. [Edit: it's only serious numerical instability that becomes a problem and for which, indeed, exact arithmetic would be a solution.]

I was expecting that. Now the question is, do we ever experience those serious errors.

Anything else to do in this ticket?

@Mikolaj
Copy link
Owner

Mikolaj commented May 18, 2022

Anything else to do in this ticket?

Let me check the comments...

I think only this comment remains, if it makes sense to you (please ask if anything is unclear):

The code on master is now hlint-clean except for a dozen of cases. You can use hlint to tell you which parens are unnecessary. But feel free to ignore or to silence via .hlint.yaml any hints that are not useful.

Copy link
Owner

@Mikolaj Mikolaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a closer look at most of the formulas and they seem correct and probably fast, too. Thank you very much. Merging.

@Mikolaj Mikolaj merged commit 5bcecc7 into Mikolaj:master May 18, 2022
@Mikolaj Mikolaj mentioned this pull request Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do TODOs in HordeAd.Core.DualNumber about defining more derivatives of common functions
2 participants