Skip to content
This repository has been archived by the owner on Apr 23, 2021. It is now read-only.

Problem: Sqrt() for Q48.15 format implementation #6

Closed
jjcat opened this issue Jun 18, 2015 · 1 comment
Closed

Problem: Sqrt() for Q48.15 format implementation #6

jjcat opened this issue Jun 18, 2015 · 1 comment

Comments

@jjcat
Copy link
Contributor

jjcat commented Jun 18, 2015

Good Repo!
I have met a problem to the sqrt for Q48.15 format fixed point. I try to modify your sqrt code, but can not get the correct answer. Is your sqrt only available on Q31.32 format? Code below is my version, I use NUM_BITS /4 - 1 to represent number 15. I would also glad if you can list the detail comments of the fixed point sqrt implementation.

public static Fix64 Sqrt(Fix64 x) {
    var xl = x.m_rawValue;
    if (xl < 0) {
        // We cannot represent infinities like Single and Double, and Sqrt is
        // mathematically undefined for x < 0. So we just throw an exception.
        throw new ArgumentException("Negative value passed to Sqrt", "x");
    }

    var num = (ulong)xl;
    var result = 0UL;

    // second-to-top bit
    var bit = 1UL << (NUM_BITS - 2);

    while (bit > num) {
        bit >>= 2;
    }

    // The main part is executed twice, in order to avoid
    // using 128 bit values in computations.
    for (var i = 0; i < 2; ++i) {
        // First we get the top 48 bits of the answer.
        while (bit != 0) {
            if (num >= result + bit) {
                num -= result + bit;
                result = (result >> 1) + bit;
            }
            else {
                result = result >> 1;
            }
            bit >>= 2;
        }

        if (i == 0) {
            // Then process it again to get the lowest 16 bits.
            if (num > (1UL << (NUM_BITS / 4 - 1)) - 1) {
                // The remainder 'num' is too large to be shifted left
                // by 32, so we have to add 1 to result manually and
                // adjust 'num' accordingly.
                // num = a - (result + 0.5)^2
                //       = num + result^2 - (result + 0.5)^2
                //       = num - result - 0.5
                num -= result;
                num = (num << (NUM_BITS / 4 - 1)) - 0x4000UL;
                result = (result << (NUM_BITS / 4 - 1)) + 0x4000UL;
            }
            else {
                num <<= (NUM_BITS / 4 - 1);
                result <<= (NUM_BITS / 4 - 1);
            }

            bit = 1UL << (NUM_BITS / 4 - 2 - 1);
        }
    }
    // Finally, if next bit would have been 1, round the result upwards.
    if (num > result) {
        ++result;
    }
    return new Fix64((long)result);
}
@jjcat
Copy link
Contributor Author

jjcat commented Jun 18, 2015

I know where my fault, using Q47.16 format can get the correct answer. Fractional length must be power of two.

@jjcat jjcat closed this as completed Jun 18, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant