<a href="https://colab.research.google.com/github/JadonTelep/MAT-421/blob/main/SP_24_MAT_421_Module_A_Representation_of_Numbers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Module A**

## Section 9.1 (Base-N and Binary)

The decimal system is a way of representing numbers that you are familiar with from elementary school. In the decimal system, a number is represented by a list of digits from 0 to 9, where each digit represents the coefficient for a power of 10.

**EXAMPLE:** Show the decimal expansion for 154.3.

$154.3 = 1 \cdot 10^2 + 5 \cdot 10^1 + 4 \cdot 10^0 + 3 \cdot 10^{-1} $!

In [10]:
(1 * 10**2) + (5 * 10**1) + (4 * 10**0) + (3 * 10**-1)

154.3

Since each digit is associated with a power of 10, the decimal system is also known as base10 because it is based on 10 digits (0 to 9).

$154.3(base 10) = 154.3$

A very important representation of numbers for computers is base 2 or binary numbers. In binary, the only available digits are 0 and 1, and each digit is the coefficient of a power of 2. Digits in a binary number are also known as a bit. Note that binary numbers are still numbers, and so addition and multiplication are defined on them exactly as you learned in grade school.

**EXAMPLE:** Convert the number 37(base10) into binary.

$37(base 10) = 32 + 4 + 1 = 1 \cdot 2^5 + 0 \cdot 2^4 + 0 \cdot 2^3 + 1 \cdot 2^2 + 0 \cdot 2^1 + 1 \cdot 10^0 = 100101(base 2)$

In [11]:
(1 * 2**5) + (0  * 2**4) + (0 * 2**3) + (1 * 2**2) + (0 * 2**1) + (1 * 2**0)

37

## Section 9.2 (Floating Point Numbers)

The number of bits is usually fixed for any given computer. Using binary representation gives us an insufficient range and precision of numbers to do relevant engineering calculations. To achieve the range of values needed with the same number of bits, we use **floating** point numbers or **float** for short. Instead of utilizing each bit as the coefficient of a power of 2, floats allocate bits to three different parts:

1.   the **sign indicator**, $s$
2.   the **characteristic** or **exponent**, $e$
3.   the **fraction**, $f$

The sign indicator identifies whether the number is positive or negative, the charactersitic or exponent is the power of 2, and the fraction is the coefficient of the exponent. Almost all platforms map Python floats to the **IEEE754** double precision - 64 total bits. 1 bit is allocated to the sign indicator, 11 bits are allocated to the exponent, and 52 bits are allocated to the fraction.

With 11 bits allocated to the exponent, this makes 2048 values that this number can take. Since we want to be able to make very precise numbers, we want some of these values to represent negative exponents (i.e., to allow numbers that are between 0 and 1 (base10)). To accomplish this, 1023 is subtracted from the exponent to normalize it. The value subtracted from the exponent is commonly referred to as the bias. The fraction is a number between 1 and 2. In binary, this means that the leading term will always be 1, and, therefore, it is a waste of bits to store it. To save space, the leading 1 is dropped.

In Python, we could get the float information using the sys package as shown below:

In [12]:
import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

A float can then be represented as:

$n = (-1)^{s}2^{e-1023}(1+f)$

**EXAMPLE:** What is the number 1 10000000010 1000000000000000000000000000000000000000000000000000 (IEEE754) in base10?

Find the float (n) using our formula for IEEE754

$n = (-1)^{s}2^{e-1023}(1+f)$

Determine $s$, $e$, and $f$

$s = 1 = 1$

$e = 10000000010 = 1 \cdot 2^{10} + 0 \cdot 2^9 + ... + 1 \cdot 2^1 + 0 \cdot 2^0= 1026$

$f = 1000000000000000000000000000000000000000000000000000 = 1 \cdot \frac{1}{2^1} + 0 \cdot \frac{1}{2^2} + ... = 0.5$

Plug in $s$, $e$, and $f$

$n = (-1)^{1}2^{1026-1023}(1+0.5) = -12.0$




In [16]:
((-1)**1) * (2**(1026 - 1023)) * (1 + 0.5)

-12.0

**EXAMPLE:** What is 15.0 (base10) in IEEE754? What is the largest number smaller than 15.0? What is the smallest number larger than 15.0?

To determine n, use the formula for IEEE754

$n = (-1)^{s}2^{e-1023}(1+f)$

Since the number is positive $s = 0$. To determine $e$, find the largest power of two that is smaller than 15.

$15 < 8 = 2^3 = 2^{1026 - 1023}$

$e = 1026 = 10000000010$

Lastly solve for $f$

$15 = (-1)^{0}2^{1026-1023}(1+f)$

$15 = 8(1+f)$

$f = (15 / 8) - 1 = 0.875 = 1 \cdot \frac{1}{2^1} +1 \cdot \frac{1}{2^2} + 1 \cdot \frac{1}{2^3} + 0 \cdot \frac{1}{2^4} + ... = 1110000000000000000000000000000000000000000000000000 $

In [18]:
(1 * (1/2**1)) + (1 * (1/2**2)) + (1 * (1/2**3))

0.875

All together

15 (base10) = 0 10000000010 1110000000000000000000000000000000000000000000000000 (IEEE754)

In [19]:
((-1)**0) * (2**(1026 - 1023)) * (1 + 0.875)

15.0

Adding or subtracting 1 to the end of $f$ will increment the float by 1. Thus the largest number smaller then 15.0 and the smallest number larger then 15 is

0 10000000010 1101111111111111111111111111111111111111111111111111 (IEEE754) = 14.9999999999999982236431605997 (base10)

and

0 10000000010 1110000000000000000000000000000000000000000000000001 (IEEE754) = 15.0000000000000017763568394003 (base10)

We call the distance from one number to the next the **gap**. Because the fraction is multiplied by $2^{e−1023}$, the gap grows as the number represented grows. The gap at a given number can be computed using the function *spacing* in *numpy*.

In [20]:
import numpy as np

EXAMPLE! Use the *spacing* function to determine the gap at 1e9. Verify that adding a number to 1e9 that is less than half the gap at 1e9 results in the same number.

In [21]:
np.spacing(1e9)

1.1920928955078125e-07

In [22]:
1e9 == (1e9 + np.spacing(1e9)/3)

True

There are special cases for the value of a floating point number when e = 0 (i.e., e = 00000000000 (base2)) and when e = 2047 (i.e., e = 11111111111 (base2)), which are reserved. When the exponent is 0, the leading 1 in the fraction takes the value 0 instead. The result is a **subnormal** number. When the exponent is 2047 and f is nonzero, then the result is “Not a Number”, which means that the number is undefined.

## Section 9.3 (Round-off Errors)

The most common form round-off error is the representation error in the floating point numbers.A simple example will be to represent π. We know that π is an infinite number, but when we use it, we usually only use a finite digits. For example, if you only use 3.14159265, there will be an error between this approximation and the true infinite number. Another example will be 1/3, the true value will be 0.333333333…, no matter how many decimal digits we choose, there is an round-off error as well.

Besides, when we rounding the numbers multiple times, the error will accumulate. For instance, if 4.845 is rounded to two decimal places, it is 4.85. Then if we round it again to one decimal place, it is 4.9, the total error will be 0.55. But if we only round one time to one decimal place, it is 4.8, which the error is 0.045.

From the above example, the error between 4.845 and 4.8 should be 0.055. But if you calculate it in Python, you will see the 4.9 - 4.845 is not equal to 0.055.

5.7 - 5.545 == 0.155

Why does this happen? If we have a look of 4.9 - 4.845, we can see that, we actually get 0.055000000000000604 instead. This is because the floating point can not be represented by the exact number, it is just approximation, and when it is used in arithmetic, it is causing a small error.

In [38]:
4.9 - 4.845

0.15500000000000025

In [39]:
4.8 - 4.845

-0.04499999999999993

Another example shows below that 0.1 + 0.2 + 0.3 is not equal 0.6, which has the same cause.

In [40]:
0.1 + 0.2 + 0.3 == 0.6

True

Though the numbers cannot be made closer to their intended exact values, the round function can be useful for post-rounding so that results with inexact values become comparable to one another:

In [27]:
round(0.1 + 0.2 + 0.3, 5)  == round(0.6, 5)

True

When we are doing a sequence of calculations on an initial input with round-off error due to inexact representation, the errors can be magnified or accumulated. The following is an example, that we have the number 1 add and subtract 1/3, which gives us the same number 1. But what if we adding 1/3 for many times and subtract the same number of times 1/3, do we still get the same number 1? No, you can see the example below, the more times you doing this, the more errors you are accumulating.

In [28]:
# If we only do once
1 + 1/3 - 1/3

1.0

In [29]:
def add_and_subtract(iterations):
    result = 1

    for i in range(iterations):
        result += 1/3

    for i in range(iterations):
        result -= 1/3
    return result

In [30]:
# If we do this 100 times
add_and_subtract(100)

1.0000000000000002

In [31]:
# If we do this 1000 times
add_and_subtract(1000)

1.0000000000000064

In [32]:
# If we do this 10000 times
add_and_subtract(10000)

1.0000000000001166