<a href="https://colab.research.google.com/github/Snaiyer1/MAT_421/blob/main/Module_A.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Base-N and Binary**

The decimal system is a representation of digits from 0 to 9, and each digit is a coefficient for the power of 10 (base 10).

Here is an example of decimal exapansion:


In [None]:
152.4 == 1*(10**2) + 5*(10**1) + 2*(10**0) + 4*(10**-1)

True

There are also other bases, such as base 3.

Here's another example with 122 in base 3, which is equivalent to 17(base 10).

In [None]:
1*(3**2) + 2*(3**1) + 2*(3**0) == 17

True

For computers, binary numbers (base 2) are an important representation of numbers. In binary, there are only 2 digit 0 and 1 where each digit is a coefficient for the power of 2. Binary digits are also known as bits and operate just as other numbers.

Lets conver 10 (base 10) into binary:

In [None]:
10 == 1*(2**3) + 0*(2**2) + 1*(2**1) + 0*(2**0) == 8 + 0 + 2 + 0

True

So the binary represetation of 10 would be 1010.

Since binary numbers have the same properties as all other real numbers, we can perform operations such as addition, multiplication, etc.

By using AND, OR, and NOT operaters, computers can perform arithmetic operations at a high speed.

Here is an example of binary addition and multiplication:

In [None]:
35 == 1*(2**5) + 0*(2**4) + 0*(2**3) + 0*(2**2) + 1*(2**1) + 1*(2**0)

True

In [None]:
15 == 1*(2**3) + 1*(2**2) + 1*(2**1) + 1*(2**0)

True

In [None]:
35*15

525

In [None]:
35+15

50

In [None]:
525 == 1*(2**9) + 0*(2**8) + 0*(2**7) + 0*(2**6) + 0*(2**5) + 0*(2**4) + 1*(2**3) + 1*(2**2) + 0*(2**1) + 1*(2**0)

True

In [None]:
50 == 1*(2**5) + 1*(2**4) + 0*(2**3) + 0*(2**2) + 1*(2**1) + 0*(2**0)

True

Computers have a fixed number of bits which can be stored at once. A 32-bit computer, for example, can process 32-digit binary numbers at most.

**Floating Point Numbers**

Since the number of bits is fixed, binary representaiton can give an insufficient precision and range of numbers. We can use floating point, or float, numbers to achieve that range of values needed. Instead of the bit being a coefficient with the power of 2 (base 2), floats will allocate bits to the sign indicator, fraction, and exponent.

We can get information on floats with the sys package:

In [None]:
import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

In [None]:
print("n = (-1)ˢ * 2ᵉ⁻¹⁰²³ * (1 + f)")

n = (-1)ˢ * 2ᵉ⁻¹⁰²³ * (1 + f)


We can represent a float as above.

The distance between numbers is called the gap. The gap grows as the number grows and it can be copmuted with the spacing function in numpy:

In [None]:
import numpy as np

np.spacing(1e9)

1.1920928955078125e-07

In [None]:
1e9 == (1e9 + np.spacing(1e9)/3)

True

There are some special cases for floats, such as when e = 0 and e = 2047. These cases are reserved. When e = 0, the leading one in the fraction takes on the value of 0 which results in a subnormal number.

The equation for a subnormal number is similar to the original float one above, however instead of -1023, we have -1022.

When e = 2047 and f is nonzero, we get an undefined number. If e = 2047 and f = 0 and s = 0, the result is postiive infinity and if s = 1, negative infinity.

We can verify the largest and smallest defined numbers below:

In [None]:
largest = (2**(2046-1023))*((1 + sum(0.5**np.arange(1, 53))))
largest

1.7976931348623157e+308

In [None]:
sys.float_info.max

1.7976931348623157e+308

In [None]:
smallest = (2**(1-1023))*(1+0)
smallest

2.2250738585072014e-308

In [None]:
sys.float_info.min

2.2250738585072014e-308

Numbers which are larger than the largest floating point number represented can result in overflow. Similarly, numbers than the smallest subnormal number will result in underflow.

*   Overflow = inf
*   Underflow = 0

In [None]:
sys.float_info.max + 2 == sys.float_info.max

True

In [None]:
sys.float_info.max + sys.float_info.max

inf

Adding the max 64 bit float number with 2 will result in the same number as python cannot store the +2 precisely. So the operation is the same as if we were adding 0. However, adding the maximum number to itself results in an overflow.

Now let's look at underflow:

In [None]:
2**(-1075)

0.0

In [None]:
2**(-1075) == 0

True

In [None]:
2**(-1074)

5e-324

2^-1075 underflows because it's smaller than 2^-1074. Using the rules for subnormal numbers, we know (-1)^0 * 2^(1-1023) * 2^(-52) = 2^(-1074).

Using 64 bits binary gives us 2^54 numbers. Binary numbers have constant spacing between each other so you cannot have range and precision. However, IEEE754 can help to overcome such limitations by using high precision at smaller numbers and lower precision for larger numbers.

**Round-off Errors**

The difference between an approximation of a number and it's true value is called a round-off error. Representation errors are the most common form of this. Examples include:

*   1/3
*  pi

We can see errors by floating-point arithmetic here:



In [None]:
3.9 - 3.845 == 0.055

False

In [None]:
3.9 - 3.845

0.054999999999999716

In [None]:
3.8 - 3.845

-0.04500000000000037

In [None]:
0.1 + 0.2 + 0.3 == 0.6

False

In [None]:
round(0.1 + 0.2 + 0.3, 5) == round(0.6, 5)

True

When doing a sequence of calculations, if there is an initial round-off error, the errors can accumulate.

In [None]:
1 + 1/3 - 1/3

1.0

In [None]:
def add_and_subtract(iterations):
  result = 1

  for i in range(iterations):
    result += 1/3

  for i in range(iterations):
    result -= 1/3
  return result

add_and_subtract(100)

1.0000000000000002

In [None]:
add_and_subtract(1000)

1.0000000000000064

In [None]:
add_and_subtract(10000)

1.0000000000001166