Add floating point math support #179

ajcord · 2017-06-26T02:58:15Z

It's not finished yet, but I think it's time this started to get reviewed. I stuck to relatively basic operations on floating points and left the rest to userspace. As far as I can tell, besides some non-critical issues with fptostr, the only major issue currently is adding negative numbers. That being said, my testing has not been exhaustive.

Here is what's done:

String/integer to FP conversion
FP to string conversion
Loading common constants
Addition of positive numbers
Multiplication/division by powers of ten only
Logical operators (and, or, xor, not)
Negation, absolute value
Comparison, min, max
Integer and fractional part
Random number generation

Here is what's left to do, ranked by how important I think it is:

Adding negative numbers (i.e. subtraction)
Multiplication and division
Roots/powers
Logarithms
Trigonometry
Any other miscellaneous math operations
Conversion to and from IEEE 754 floats/doubles

Hopefully it's up to par in terms of style and efficiency. I'm not super experienced with Z80, so some portions can get a bit ugly (mostly fptostr and fpAdd).

Feel free to let me know about any problems you find, but my availability is going to decrease over the next few weeks. I will no longer be able to contribute in about a month (but should still be able to answer questions about my code). Ideally, someone else will take over to flesh out the rest.

Uses an encoding similar to TI-OS but with fewer flags.

Addition is currently just standard BCD addition on the mantissa. Still need to work on handling signs, exponents, and overflows.

Only 1 bug away from addition being able to handle signed operands

I’ve tested several cases I can think of, but it should really get tested more thoroughly.

Now using a single rld instruction instead of multiple sla instructions. Also now formally assumes the destination is zeroed, which it previously only partially did.

fpAdd can now correctly add two positive numbers that have the same exponent.

The code is rather ugly, but it appears to work. I’ll try to come back to this section at some point to clean it up.

Would like to implement display options instead of always using scientific notation, but this works fine for testing.

Currently very buggy and ugly.

Also combines scientific notation processing with normal processing to share code. Should probably move away from using IX now since it’s slow.

Correctly rounds 3.1415926535897 to 3.141592654, but crashes on variations like 0.1415926535897, 11.1415926535897.

Too much work for not much gain at this point. Also fixed the crashing bug.

ddevault · 2017-06-26T19:42:03Z

Thanks!

ajcord · 2017-06-26T19:50:40Z

include/kernel.inc

+FP_180_OVER_PI  .equ 6
+FP_E            .equ 7
+FP_LOG_E        .equ 8
+FP_LN_E         .equ 9


Should be FP_LN_10

Whoops, fixed

ajcord · 2017-06-26T19:52:31Z

src/02/fp-math.asm

-;;  * Rounding last digit - buggy, currently abandoned
-;;  * Never show exponent if significand is 0 - not started
+;;  Examples:
+;;  All nonzero decimals in scientific notation, digits grouped with ',':


This example doesn't make much sense because scientific notation will never have thousands separators. Might make more sense to change this to FP_STR_INV_PUNC | FP_DISP_SCIENTIFIC | 0xF, and then make the next one FP_GROUP_DIGITS | 5

Good point.

ddevault · 2017-06-26T20:16:57Z

Docs up at http://www.knightos.org/documentation/reference/decimal_floating_point.html

ajcord · 2017-06-26T20:21:30Z

Looks like some formatting issues with those labeled "Input" or "Output" instead of "Inputs" or "Outputs", and a list indentation issue with itofp, but otherwise looks good 👍

ddevault · 2017-06-26T20:21:56Z

Yep, one step at a time. Fixing the rest of the build server at the moment <_<

ddevault · 2017-06-26T20:22:23Z

Project infrastructure isn't generally healthy after 8 months of negligence

ajcord · 2017-06-26T20:23:16Z

Understandable

ddevault · 2017-06-26T20:29:11Z

e873701

ddevault · 2017-07-01T18:43:11Z

Hmm, I ran into an issue with fptostr:

  ld a, FP_PI
  kld(hl, .pi)
  pcall(fpLoadConst)
  xor a
  kld(ix, .pi)
  kld(hl, .str)
  pcall(fptostr)
  ; (draw the string)
.pi:
  .block 9
.str:
  .block 20

This displays "3".

ddevault · 2017-07-01T18:45:21Z

Also tested parsing "3.1415" and printing that, also displays 3. Seems like the fractional part is never displayed.

ddevault · 2017-07-01T18:47:45Z

Derp, I needed to set A to 0xF.

ajcord · 2017-07-01T20:26:40Z

Yeah, it's happened to me before. Can you think of a good way such that A can be 0x00 by default? Can't really just remap 0xF to 0x0 because 0-digit fixed point is perfectly valid.

…

On Sat, Jul 1, 2017 at 11:47 AM Drew DeVault ***@***.***> wrote: Derp, I needed to set A to 0xF. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#179 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAWaunboM_J2FEq3BVvcbmXOVdttKN-oks5sJpRSgaJpZM4OE05m> .

ddevault · 2017-07-01T21:51:27Z

Generally speaking we design our APIs so that 0 for flag fields does the most sane thing possible. In this case, however, I could go either way.

ddevault · 2017-07-01T22:19:14Z

Progress on the calculator is going well, by the way:

https://github.com/KnightOS/calculator

ajcord · 2017-07-02T22:42:22Z

I would have preferred to design the API like you mentioned where 90% of the time you would use A=0. The best solution I can think of is to use bit 5 to indicate fixed point and the bottom nibble for the number of digits, but I don't like that because it wastes the last available flag bit unnecessarily.

Also, great work on the calculator app! I'll be excited to actually be able to do math on my calculator again.

ddevault · 2017-07-03T19:40:38Z

Been looking into subtraction. It seems we'll want to do the 9's compliment again at the end, but I'm having a hard time getting everything to work.

ajcord · 2017-07-03T19:45:14Z

That sounds right. There are just a lot of edge cases to worry about. Alternatively, I believe it should work to sbc instead of adc because apparently daa works with both. Then you only need to do one 9's complement in certain cases, I believe. I have a branch on my fork where I was exploring that idea before the PR.

…

On Mon, Jul 3, 2017 at 3:40 PM Drew DeVault ***@***.***> wrote: Been looking into subtraction. It seems we'll want to do the 9's compliment again at the end, but I'm having a hard time getting everything to work. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#179 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAWauorB-RKoIqhX6Aou5syPrJt8DKYEks5sKUO2gaJpZM4OE05m> .

Zeda · 2017-07-03T19:55:45Z

If you are using the BCD format that the Z80 uses, then SBC followed by DAA will work and properly set the carry flag. If you are doing the standard floating point format where the mantissa must be positive, then if it ever becomes negative, start at the LSB of the mantissa and do 'xor a \ sub a,(hl) \ daa \ ld (hl),a' then follow up through the rest of the bytes with 'ld a,0 \ sbc a,(hl) \ daa \ ld (hl),a'. I haven't had time to look at the code as I've been super busy with work, but I get the email updates.

…

On Jul 3, 2017 15:45, "Alex Cordonnier" ***@***.***> wrote: That sounds right. There are just a lot of edge cases to worry about. Alternatively, I believe it should work to sbc instead of adc because apparently daa works with both. Then you only need to do one 9's complement in certain cases, I believe. I have a branch on my fork where I was exploring that idea before the PR. On Mon, Jul 3, 2017 at 3:40 PM Drew DeVault ***@***.***> wrote: > Been looking into subtraction. It seems we'll want to do the 9's > compliment again at the end, but I'm having a hard time getting everything > to work. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#179 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AAWauorB- RKoIqhX6Aou5syPrJt8DKYEks5sKUO2gaJpZM4OE05m> > . > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#179 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFAPiLnVUaeWce93NsVhFJrDX1BmC4Ezks5sKUTKgaJpZM4OE05m> .

ddevault · 2017-07-03T20:25:42Z

Thanks @Zeda - BCD is in use here, but the current approach is to use the 9's compliment and only implement addition. It might be easier to do subtraction separately but I'm not particular either way.

Thanks for the tips, I would definitely welcome your eyeballs on this code if you find some spare time.

ddevault · 2017-07-08T13:23:41Z

@ajcord can I interest you in writing a blog post for knightos.org about your work?

ajcord · 2017-07-08T14:31:54Z

Hmm... I'm getting ready to move across the country, so now is probably not a good time. Maybe in a few weeks?

…

On Sat, Jul 8, 2017 at 9:23 AM Drew DeVault ***@***.***> wrote: @ajcord <https://github.com/ajcord> can I interest you in writing a blog post for knightos.org about your work? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#179 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAWauolhTRDANYQGa0owtCfK7OE_H4gIks5sL4LegaJpZM4OE05m> .

ddevault · 2017-07-08T14:32:33Z

Sure! Send a pull request to https://github.com/KnightOS/knightos.org whenever you like.

ajcord · 2017-08-07T04:08:08Z

I talked to some great Z80 developers at the Vintage Computer Festival today, and one of them suggested implementing x*y as 10^(log(x) + log(y)). Might be worth looking into, although I suspect it may lose some precision and be less efficient. But it could still be worthwhile as a proof of concept once logarithms and powers are implemented.

Zeda · 2017-08-07T12:47:00Z

I've seen that method used for integer multiplication using lookup tables. It does lose accuracy, but it is suitable in some applications. I have a few possible multiplication algorithms, by the way. Depending on how much scrap ram is available I think I could get it twice as fast as what TI does. The best is actually schoolbook multiplication: You'll need a dynamically generated LUT of the values 1*operand2~9*operand2. Each entry is just the mantissa with an extra "00" in front. That's 72 bytes. Following this is 8 bytes for the mantissa of operand1, prefixed with 00. Next is 14 bytes for the mantissa (all zeroed out) This is 94 bytes total. Algorithm. Start output_index at 'output+14'. Pass 1: on each of the 7 bytes of operand1, decrement output_index and mask for the lower nibble. If zero, skip, else binary shift left three times to get the lut index. Add that at output_index. Next: Start HL at operand1+8, A=0, then do 'dec hl \ rld' 64 times. Next:Start output_index at 'output+14'. Pass 2: Same as Pass 1. That should multiply the two mantissas. It costs 94 bytes, OPs count is: 23 8-byte BCD adds, shift-left-by-4 a 64-byte buffer (using the rld instruction) and some overhead.

…

On Aug 7, 2017 00:08, "Alex Cordonnier" ***@***.***> wrote: I talked to some great Z80 developers at the Vintage Computer Festival today, and one of them suggested implementing x*y as 10^(log(x) + log(y)). Might be worth looking into, although I suspect it may lose some precision and be less efficient. But it could still be worthwhile as a proof of concept once logarithms and powers are implemented. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#179 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFAPiHvk3MDgeCCQ3Ihtm8lQ01dG9K5Hks5sVo2ogaJpZM4OE05m> .

ddevault · 2017-08-07T12:48:20Z

Do you need 94 bytes of scrap RAM or 94 bytes of statically allocated LUTs? Both are easily doable, fwiw.

Zeda · 2017-08-07T12:58:03Z

As well, I have some algorithms that may be suitable for exp(x) to high precision. It took me about a week of obsessed math to come up with this and a notebook out two later. Not sure how well suited it is for BCD math, though: Input: x y=x^2 ; squaring can be faster a=x/2*(1+5y/156(1+3y/550(1+y/1512)))/(1+3y/26(1+5y/396(1+y/450))) return (1+a)/(1-a)

Zeda · 2017-08-08T02:54:02Z

The LUTs are dynamically generated on the fly, and the algorithm can be designed to work with an allocated buffer, but it would be a fair bit faster if the 94 bytes were in a static location.

…

On Aug 7, 2017 08:58, "Zeda Thomas" ***@***.***> wrote: As well, I have some algorithms that may be suitable for exp(x) to high precision. It took me about a week of obsessed math to come up with this and a notebook out two later. Not sure how well suited it is for BCD math, though: Input: x y=x^2 ; squaring can be faster a=x/2*(1+5y/156(1+3y/550(1+y/1512)))/(1+3y/26(1+5y/396(1+y/450))) return (1+a)/(1-a) On Aug 7, 2017 08:46, "Zeda Thomas" ***@***.***> wrote: I've seen that method used for integer multiplication using lookup tables. It does lose accuracy, but it is suitable in some applications. I have a few possible multiplication algorithms, by the way. Depending on how much scrap ram is available I think I could get it twice as fast as what TI does. The best is actually schoolbook multiplication: You'll need a dynamically generated LUT of the values 1*operand2~9*operand2. Each entry is just the mantissa with an extra "00" in front. That's 72 bytes. Following this is 8 bytes for the mantissa of operand1, prefixed with 00. Next is 14 bytes for the mantissa (all zeroed out) This is 94 bytes total. Algorithm. Start output_index at 'output+14'. Pass 1: on each of the 7 bytes of operand1, decrement output_index and mask for the lower nibble. If zero, skip, else binary shift left three times to get the lut index. Add that at output_index. Next: Start HL at operand1+8, A=0, then do 'dec hl \ rld' 64 times. Next:Start output_index at 'output+14'. Pass 2: Same as Pass 1. That should multiply the two mantissas. It costs 94 bytes, OPs count is: 23 8-byte BCD adds, shift-left-by-4 a 64-byte buffer (using the rld instruction) and some overhead. On Aug 7, 2017 00:08, "Alex Cordonnier" ***@***.***> wrote: > I talked to some great Z80 developers at the Vintage Computer Festival > today, and one of them suggested implementing x*y as 10^(log(x) + log(y)). > Might be worth looking into, although I suspect it may lose some precision > and be less efficient. But it could still be worthwhile as a proof of > concept once logarithms and powers are implemented. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#179 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AFAPiHvk3MDgeCCQ3Ihtm8lQ01dG9K5Hks5sVo2ogaJpZM4OE05m> > . >

Alex Cordonnier added 30 commits June 6, 2017 17:36

Add integer to FP conversion

9a67682

Uses an encoding similar to TI-OS but with fewer flags.

Add FP subtract, negate, compare; begin addition

96e3d15

Addition is currently just standard BCD addition on the mantissa. Still need to work on handling signs, exponents, and overflows.

Fix/optimize various things

97860a4

Only 1 bug away from addition being able to handle signed operands

Simplify code using IX and IY as pointer inputs

5b598f1

Add conversion from string to floating point

da5260d

I’ve tested several cases I can think of, but it should really get tested more thoroughly.

Optimize strtofp

0f7af11

Now using a single rld instruction instead of multiple sla instructions. Also now formally assumes the destination is zeroed, which it previously only partially did.

Add carry support to fpAdd

f247626

fpAdd can now correctly add two positive numbers that have the same exponent.

Add floating point support for fpAdd

e84cec6

The code is rather ugly, but it appears to work. I’ll try to come back to this section at some point to clean it up.

Add initial fptostr conversion

e3671a4

Would like to implement display options instead of always using scientific notation, but this works fine for testing.

Start implementation of various fptostr options

c46224c

Currently very buggy and ugly.

Fix trailing 0 bug

9add1d4

Hide trailing 0s in scientific notation

082b535

Fix known bugs in fptostr

e41e92e

Also combines scientific notation processing with normal processing to share code. Should probably move away from using IX now since it’s slow.

Add fixed point support to fptostr

d12d251

Fix bug where fixed point could cause >10 digits

e02a9d4

Fix bug with fixed point scientific notation

673ddbe

Fix bug with fixed point >10 digits

681a7d0

First attempt at rounding in fptostr

683b4ad

Correctly rounds 3.1415926535897 to 3.141592654, but crashes on variations like 0.1415926535897, 11.1415926535897.

Abandon rounding for now

2ffb924

Too much work for not much gain at this point. Also fixed the crashing bug.

Merge branch 'fptostr-modes'

12bba20

Let fpSub fall through to fpAdd

85e4bec

Rearranged fpNeg to be close to fpSub

15d891c

Merge branch 'fpAdd-negative'

9dcad48

Add fpAbs, fpRand, and fpNormalize

d2f1f9f

Improve fpNormalize runtime

273450a

Fix fptostr for negative exponents in normal mode

4c4a064

Fix leading zeroes issue in scientific notation mode

b44ed8e

Add fpIPart

1e88ae3

Add fpFPart

b00dba5

Fix fpNormalize copying length

bc5b397

Minor optimizations and documentation improvements

005c0b9

ddevault merged commit b9a5ee2 into KnightOS:master Jun 26, 2017

ajcord commented Jun 26, 2017

View reviewed changes

Add floating point math support #179

Add floating point math support #179

Uh oh!

Conversation

ajcord commented Jun 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddevault commented Jun 26, 2017

Uh oh!

ajcord Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

ddevault Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

ajcord Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

ddevault Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

ddevault commented Jun 26, 2017

Uh oh!

ajcord commented Jun 26, 2017

Uh oh!

ddevault commented Jun 26, 2017

Uh oh!

ddevault commented Jun 26, 2017

Uh oh!

ajcord commented Jun 26, 2017

Uh oh!

ddevault commented Jun 26, 2017

Uh oh!

ddevault commented Jul 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddevault commented Jul 1, 2017

Uh oh!

ddevault commented Jul 1, 2017

Uh oh!

ajcord commented Jul 1, 2017 via email

Uh oh!

ddevault commented Jul 1, 2017

Uh oh!

ddevault commented Jul 1, 2017

Uh oh!

ajcord commented Jul 2, 2017

Uh oh!

ddevault commented Jul 3, 2017

Uh oh!

ajcord commented Jul 3, 2017 via email

Uh oh!

Zeda commented Jul 3, 2017 via email

Uh oh!

ddevault commented Jul 3, 2017

Uh oh!

ddevault commented Jul 8, 2017

Uh oh!

ajcord commented Jul 8, 2017 via email

Uh oh!

ddevault commented Jul 8, 2017

Uh oh!

ajcord commented Aug 7, 2017

Uh oh!

Zeda commented Aug 7, 2017 via email

Uh oh!

ddevault commented Aug 7, 2017

Uh oh!

Zeda commented Aug 7, 2017 via email • edited by ddevault Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zeda commented Aug 8, 2017 via email

Uh oh!

Uh oh!

ajcord commented Jun 26, 2017 •

edited

Loading

ddevault commented Jul 1, 2017 •

edited

Loading

Zeda commented Aug 7, 2017 via email •

edited by ddevault

Loading