-
Notifications
You must be signed in to change notification settings - Fork 33
Add floating point math support #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Uses an encoding similar to TI-OS but with fewer flags.
Addition is currently just standard BCD addition on the mantissa. Still need to work on handling signs, exponents, and overflows.
Only 1 bug away from addition being able to handle signed operands
I’ve tested several cases I can think of, but it should really get tested more thoroughly.
Now using a single rld instruction instead of multiple sla instructions. Also now formally assumes the destination is zeroed, which it previously only partially did.
fpAdd can now correctly add two positive numbers that have the same exponent.
The code is rather ugly, but it appears to work. I’ll try to come back to this section at some point to clean it up.
Would like to implement display options instead of always using scientific notation, but this works fine for testing.
Currently very buggy and ugly.
Also combines scientific notation processing with normal processing to share code. Should probably move away from using IX now since it’s slow.
Correctly rounds 3.1415926535897 to 3.141592654, but crashes on variations like 0.1415926535897, 11.1415926535897.
Too much work for not much gain at this point. Also fixed the crashing bug.
Thanks! |
FP_180_OVER_PI .equ 6 | ||
FP_E .equ 7 | ||
FP_LOG_E .equ 8 | ||
FP_LN_E .equ 9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be FP_LN_10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, fixed
src/02/fp-math.asm
Outdated
;; * Rounding last digit - buggy, currently abandoned | ||
;; * Never show exponent if significand is 0 - not started | ||
;; Examples: | ||
;; All nonzero decimals in scientific notation, digits grouped with ',': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example doesn't make much sense because scientific notation will never have thousands separators. Might make more sense to change this to FP_STR_INV_PUNC | FP_DISP_SCIENTIFIC | 0xF, and then make the next one FP_GROUP_DIGITS | 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
Looks like some formatting issues with those labeled "Input" or "Output" instead of "Inputs" or "Outputs", and a list indentation issue with |
Yep, one step at a time. Fixing the rest of the build server at the moment <_< |
Project infrastructure isn't generally healthy after 8 months of negligence |
Understandable |
Hmm, I ran into an issue with fptostr: ld a, FP_PI
kld(hl, .pi)
pcall(fpLoadConst)
xor a
kld(ix, .pi)
kld(hl, .str)
pcall(fptostr)
; (draw the string)
.pi:
.block 9
.str:
.block 20 This displays "3". |
Also tested parsing "3.1415" and printing that, also displays 3. Seems like the fractional part is never displayed. |
Derp, I needed to set A to 0xF. |
Yeah, it's happened to me before. Can you think of a good way such that A
can be 0x00 by default? Can't really just remap 0xF to 0x0 because 0-digit
fixed point is perfectly valid.
…On Sat, Jul 1, 2017 at 11:47 AM Drew DeVault ***@***.***> wrote:
Derp, I needed to set A to 0xF.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#179 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAWaunboM_J2FEq3BVvcbmXOVdttKN-oks5sJpRSgaJpZM4OE05m>
.
|
Generally speaking we design our APIs so that 0 for flag fields does the most sane thing possible. In this case, however, I could go either way. |
Progress on the calculator is going well, by the way: |
I would have preferred to design the API like you mentioned where 90% of the time you would use A=0. The best solution I can think of is to use bit 5 to indicate fixed point and the bottom nibble for the number of digits, but I don't like that because it wastes the last available flag bit unnecessarily. Also, great work on the calculator app! I'll be excited to actually be able to do math on my calculator again. |
Been looking into subtraction. It seems we'll want to do the 9's compliment again at the end, but I'm having a hard time getting everything to work. |
That sounds right. There are just a lot of edge cases to worry about.
Alternatively, I believe it should work to sbc instead of adc because
apparently daa works with both. Then you only need to do one 9's complement
in certain cases, I believe. I have a branch on my fork where I was
exploring that idea before the PR.
…On Mon, Jul 3, 2017 at 3:40 PM Drew DeVault ***@***.***> wrote:
Been looking into subtraction. It seems we'll want to do the 9's
compliment again at the end, but I'm having a hard time getting everything
to work.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#179 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAWauorB-RKoIqhX6Aou5syPrJt8DKYEks5sKUO2gaJpZM4OE05m>
.
|
If you are using the BCD format that the Z80 uses, then SBC followed by DAA
will work and properly set the carry flag. If you are doing the standard
floating point format where the mantissa must be positive, then if it ever
becomes negative, start at the LSB of the mantissa and do 'xor a \ sub
a,(hl) \ daa \ ld (hl),a' then follow up through the rest of the bytes with
'ld a,0 \ sbc a,(hl) \ daa \ ld (hl),a'.
I haven't had time to look at the code as I've been super busy with work,
but I get the email updates.
…On Jul 3, 2017 15:45, "Alex Cordonnier" ***@***.***> wrote:
That sounds right. There are just a lot of edge cases to worry about.
Alternatively, I believe it should work to sbc instead of adc because
apparently daa works with both. Then you only need to do one 9's complement
in certain cases, I believe. I have a branch on my fork where I was
exploring that idea before the PR.
On Mon, Jul 3, 2017 at 3:40 PM Drew DeVault ***@***.***>
wrote:
> Been looking into subtraction. It seems we'll want to do the 9's
> compliment again at the end, but I'm having a hard time getting
everything
> to work.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#179 (comment)>,
or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAWauorB-
RKoIqhX6Aou5syPrJt8DKYEks5sKUO2gaJpZM4OE05m>
> .
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#179 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFAPiLnVUaeWce93NsVhFJrDX1BmC4Ezks5sKUTKgaJpZM4OE05m>
.
|
Thanks @Zeda - BCD is in use here, but the current approach is to use the 9's compliment and only implement addition. It might be easier to do subtraction separately but I'm not particular either way. Thanks for the tips, I would definitely welcome your eyeballs on this code if you find some spare time. |
@ajcord can I interest you in writing a blog post for knightos.org about your work? |
Hmm... I'm getting ready to move across the country, so now is probably not
a good time. Maybe in a few weeks?
…On Sat, Jul 8, 2017 at 9:23 AM Drew DeVault ***@***.***> wrote:
@ajcord <https://github.com/ajcord> can I interest you in writing a blog
post for knightos.org about your work?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#179 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAWauolhTRDANYQGa0owtCfK7OE_H4gIks5sL4LegaJpZM4OE05m>
.
|
Sure! Send a pull request to https://github.com/KnightOS/knightos.org whenever you like. |
I talked to some great Z80 developers at the Vintage Computer Festival today, and one of them suggested implementing x*y as 10^(log(x) + log(y)). Might be worth looking into, although I suspect it may lose some precision and be less efficient. But it could still be worthwhile as a proof of concept once logarithms and powers are implemented. |
I've seen that method used for integer multiplication using lookup tables.
It does lose accuracy, but it is suitable in some applications.
I have a few possible multiplication algorithms, by the way. Depending on
how much scrap ram is available I think I could get it twice as fast as
what TI does. The best is actually schoolbook multiplication:
You'll need a dynamically generated LUT of the values
1*operand2~9*operand2. Each entry is just the mantissa with an extra "00"
in front. That's 72 bytes.
Following this is 8 bytes for the mantissa of operand1, prefixed with 00.
Next is 14 bytes for the mantissa (all zeroed out)
This is 94 bytes total.
Algorithm.
Start output_index at 'output+14'.
Pass 1: on each of the 7 bytes of operand1, decrement output_index and
mask for the lower nibble. If zero, skip, else binary shift left three
times to get the lut index. Add that at output_index.
Next: Start HL at operand1+8, A=0, then do 'dec hl \ rld' 64 times.
Next:Start output_index at 'output+14'.
Pass 2: Same as Pass 1.
That should multiply the two mantissas. It costs 94 bytes, OPs count is: 23
8-byte BCD adds, shift-left-by-4 a 64-byte buffer (using the rld
instruction) and some overhead.
…On Aug 7, 2017 00:08, "Alex Cordonnier" ***@***.***> wrote:
I talked to some great Z80 developers at the Vintage Computer Festival
today, and one of them suggested implementing x*y as 10^(log(x) + log(y)).
Might be worth looking into, although I suspect it may lose some precision
and be less efficient. But it could still be worthwhile as a proof of
concept once logarithms and powers are implemented.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#179 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFAPiHvk3MDgeCCQ3Ihtm8lQ01dG9K5Hks5sVo2ogaJpZM4OE05m>
.
|
Do you need 94 bytes of scrap RAM or 94 bytes of statically allocated LUTs? Both are easily doable, fwiw. |
As well, I have some algorithms that may be suitable for exp(x) to high
precision. It took me about a week of obsessed math to come up with this
and a notebook out two later. Not sure how well suited it is for BCD math,
though:
Input: x
y=x^2 ; squaring can be faster
a=x/2*(1+5y/156(1+3y/550(1+y/1512)))/(1+3y/26(1+5y/396(1+y/450)))
return (1+a)/(1-a)
|
The LUTs are dynamically generated on the fly, and the algorithm can be
designed to work with an allocated buffer, but it would be a fair bit
faster if the 94 bytes were in a static location.
…On Aug 7, 2017 08:58, "Zeda Thomas" ***@***.***> wrote:
As well, I have some algorithms that may be suitable for exp(x) to high
precision. It took me about a week of obsessed math to come up with this
and a notebook out two later. Not sure how well suited it is for BCD math,
though:
Input: x
y=x^2 ; squaring can be faster
a=x/2*(1+5y/156(1+3y/550(1+y/1512)))/(1+3y/26(1+5y/396(1+y/450)))
return (1+a)/(1-a)
On Aug 7, 2017 08:46, "Zeda Thomas" ***@***.***> wrote:
I've seen that method used for integer multiplication using lookup tables.
It does lose accuracy, but it is suitable in some applications.
I have a few possible multiplication algorithms, by the way. Depending on
how much scrap ram is available I think I could get it twice as fast as
what TI does. The best is actually schoolbook multiplication:
You'll need a dynamically generated LUT of the values
1*operand2~9*operand2. Each entry is just the mantissa with an extra "00"
in front. That's 72 bytes.
Following this is 8 bytes for the mantissa of operand1, prefixed with 00.
Next is 14 bytes for the mantissa (all zeroed out)
This is 94 bytes total.
Algorithm.
Start output_index at 'output+14'.
Pass 1: on each of the 7 bytes of operand1, decrement output_index and
mask for the lower nibble. If zero, skip, else binary shift left three
times to get the lut index. Add that at output_index.
Next: Start HL at operand1+8, A=0, then do 'dec hl \ rld' 64 times.
Next:Start output_index at 'output+14'.
Pass 2: Same as Pass 1.
That should multiply the two mantissas. It costs 94 bytes, OPs count is:
23 8-byte BCD adds, shift-left-by-4 a 64-byte buffer (using the rld
instruction) and some overhead.
On Aug 7, 2017 00:08, "Alex Cordonnier" ***@***.***> wrote:
> I talked to some great Z80 developers at the Vintage Computer Festival
> today, and one of them suggested implementing x*y as 10^(log(x) + log(y)).
> Might be worth looking into, although I suspect it may lose some precision
> and be less efficient. But it could still be worthwhile as a proof of
> concept once logarithms and powers are implemented.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#179 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AFAPiHvk3MDgeCCQ3Ihtm8lQ01dG9K5Hks5sVo2ogaJpZM4OE05m>
> .
>
|
Resolves KnightOS/KnightOS#216
It's not finished yet, but I think it's time this started to get reviewed. I stuck to relatively basic operations on floating points and left the rest to userspace. As far as I can tell, besides some non-critical issues with
fptostr
, the only major issue currently is adding negative numbers. That being said, my testing has not been exhaustive.Here is what's done:
Here is what's left to do, ranked by how important I think it is:
Hopefully it's up to par in terms of style and efficiency. I'm not super experienced with Z80, so some portions can get a bit ugly (mostly
fptostr
andfpAdd
).Feel free to let me know about any problems you find, but my availability is going to decrease over the next few weeks. I will no longer be able to contribute in about a month (but should still be able to answer questions about my code). Ideally, someone else will take over to flesh out the rest.