-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardware math division: 5211/193 > 27 #786
Comments
There is a python vunit testbench in tests/hardware_divider.vhdl which can be used to debug the math unit. |
Ok, so I'm absolutely a beginner in reading VHDL. But it seems the divider in question is this one: https://github.com/MEGA65/mega65-core/blob/development/src/vhdl/fast_divide.vhdl Is there any reference where I can find the algorithm and compare it to my understanding of it, after staring at it for a bit? But I think I get (mostly) how it is supposed to work. It is an iterative approximation of the desired result, with as invariant that the value This is clearly true in the "normalise" stage: both values get shifted left the same amount. Nonzero bits from In the "step" phase, they are both multiplied with the same value every time. (I didn't verify how this particular factor The stopping condition of the Now the difference between dividing by either of those two numbers is likely to be fairly small, especially for larger values in The number of step iterations may be too low, too. I did some imperfect calculation for our dd=normalized(193) and I came to about 8 iterations until some kind of end condition arose (and it wasn't all FFs) (but I don't trust completely that python did what I wanted). I searched around, and It seems this division is a form of Goldschmidt division ( https://en.wikipedia.org/wiki/Division_algorithm#Goldschmidt_division ) with fixed-point numbers (instead of floating point?). The text under the next header suggests that 5 iterations will give you 2^5 = 32 bits of precision (under the right circumstances). That sounds like enough, but I think we want at least 64 bits here (for 32 bits of quotient plus 32 bits of fraction), which is at iteration extra. 68 bits would require another iteration. |
This fixes issue MEGA65#786 The problem was lack of precision in the intermediate calculations. This fix increases the used number of DSPs from 12 to 32.
(let me ramble a bit, I'll probably get around to try out the latest ideas in vhdl) I've been playing with simulation of my ideas a bit. What was the main idea? That the end condition So what did I do? I added an extra msb (on the left) of dd, so that it can represent 1,0 exactly. Later on I realised I could have done that by just interpreting the binary comma in a different place - maybe I'll make a code variant of that if it's useful. The very nice result of this is that if you now divide by 1, or any other power of 2, the first step to "normalize" the values already does the actual division! No more steps are needed. And dd is nicely 10000...00 so the end condition triggers right away. Still with this change, the original case of 5211/193 is just as in-precise, so @MJoergen 's fix to add precision bits is the right thing to do. I'm not sure what the easiest way is to show my changes; creating a fork and a branch doesn't let you see a diff easily I think. So I'll just paste a diff below and a bit of the debugging output to show a nice case like I just mentioned. I also increased the number of iterations to see if that helps for some cases. That's more for diagnostic purposes than anything else (although I suspect that one extra iteration might be needed for some unlucky cases)
By contrast, the original code looks like this:
and it needs all the steps with multiplications and stuff to reach the approximation of the answer:
which needs to rounded up to be correct. All that work and it doesn't even get a simple bit-shift exact.
|
Here a diff for the case where I do not change the size of dd, so this should be an easier guide how to apply the same idea to @MJoergen 's version with increased lsbs. f can probably made 1 bit smaller but I didn't get that quite right yet.
|
@Rhialto : I've modified the unit test, to make it more extensive: https://github.com/MJoergen/mega65-core/blob/development/tests/hardware_divider.vhdl It now automatically calculates the expected result (including rounding). It will count the number of cases where the result differs by only 1 in the LSB. Any greater difference is an immediate failure. I suggest you try this, and please do extend the test with additional values too. |
Looks nice. I added test cases
The compiler didn't accept the commented-out values:
and
The divider should work for these, though. I applied my second version to yours, and I found that while divisions by powers of 2 are faster, the precision is reduced. It's not so strange, since I essentially trade a lsb for an msb. My version increases the reported round-off errors from 8 to 13 or so. But while my version somewhat improves the divisions by powers of 2, it leaves another weak spot of the algorithm untouched: numbers divided by themselves, where the result should be 1.0 exactly. I don't know if there is a generic way to improve this case or if it is worth to special-case it. |
@lydon42 Correct. |
PR#812 merged |
Test Environment (required)
You can use MEGA65INFO to retrieve this.
Describe the bug
The hardware math registers are returning an inexact result for 5211/193 (= 27). This Issue is to track down whether this is expected, or whether there is a flaw in the hardware division.
This was noticed in the BASIC, which might be exacerbating the imprecision with its conversion from fixed point to floating point, or might be using the hardware math registers incorrectly in some other way. MEGA65/mega65-rom-public#101
To Reproduce
This breaks to the monitor. Type M1800 to see the result. The result is DIVOUT whole (D76C-F) of 1B 00 00 00 = +27 and DIVOUT frac (D768-B) of 03 00 00 00 = +3 != 0.
Expected behavior
DIVOUT frac of 0 is desired.
The text was updated successfully, but these errors were encountered: