# Chapter 9: Representation of Numbers
by [Arief Rahman Hakim](https://github.com/ahman24)

In this chapter, we will learn how numbers are representated in computer, how to convert between them and decimal numbers, their primary advantages and disadvantages and the roundoff errors.

## 1. Base-N and Binary
### 1.A Base-N
The **decimal** system is a way of representing numbers that you are familiar with from elementary school. In the decimal system, a number is represented by a list of digits from 0 to 9, where **each digit represents the coefficient for a power of 10**.

EXAMPLE: Show the decimal expansion for 147.3.

$147.3 = 1⋅10^2 + 4⋅10^1 + 7⋅10^0 + 3⋅10^{− 1}$

Since each digit is associated with a power of 10, the **decimal system is also known as base10** because it is based on 10 digits (0 to 9). However, there is nothing special about base10 numbers except perhaps that you are more accustomed to using them. 

For example, in **base3** we have the digits 0, 1, and 2 and the number 121 $(base 3)= 1⋅3^2 + 2⋅3^1 + 1⋅3^0 = 9 + 6 + 1 = 16(base 10)$

### 1.B Binary
A very important representation of numbers for computers is **base2 or binary numbers**. In binary, the only **available digits are 0 and 1**, and each digit is the coefficient of a power of 2. Digits in a binary number are also known as a bit.

For instance,  
$37 (base 10) = 32 + 4 + 1 = 1⋅2^5 + 0⋅2^4 + 0⋅2^3 + 1⋅2^2 + 0⋅2^1 + 1⋅2^0 = 100101 (base 2)$

Unlike humans that can abstract numbers to arbitrarily large values, **computers have a fixed number of bits** that they are capable of storing at one time. For example, a **32-bit computer can represent and process 32-digit binary numbers** and no more. 

If all 32-bits are used to represent positive integer binary numbers, then this means that there are $\sum_{n=0}^{31} 2^{n} = 4,294,967,296$ numbers the computer can represent. This **is not very many numbers at all** and would be completely insufficient to do any useful arithmetic on. For example, you could not compute the perfectly reasonable sum 0.5+1.25 using this representation because all the bits are dedicated to only integers.

## 2. Floating Point Numbers
Binary representation gives us an insufficient range and precision of numbers to do relevant engineering calculations. To achieve the range of values needed with the same number of bits, we use **floating point numbers** or `float` for short.

**Instead of utilizing each bit as the coefficient of a power of 2**, floats allocate bits to three different parts: 
* the sign indicator, s, which says whether a number is positive or negative; 
* characteristic or exponent, $e$, which is the power of 2; 
* fraction, `f`, which is the coefficient of the exponent. 


<img src="img/09.02.01-Binary_neg_12.png" alt="Float in Binary System" width="400"/>


Almost all platforms map Python floats to the **IEEE754 double precision - 64 total bits**. **1 bit** is allocated to the **sign indicator**, **11 bits** are allocated to the **exponent**, and **52 bits** are allocated to the **fraction**. With 11 bits allocated to the exponent, this makes **2048 values that this number can take**. 

Since we want to be able to make very precise numbers, we want some of these values to represent negative exponents (i.e., to allow numbers that are between 0 and 1 (base10)). To accomplish this, 1023 is subtracted from the exponent to normalize it. The value subtracted from the exponent is commonly referred to as the bias. The fraction is a number between 1 and 2. In binary, this means that the leading term will always be 1, and, therefore, it is a waste of bits to store it. To save space, the leading 1 is dropped. In Python, we could get the float information using the sys package as shown below:

In [1]:
import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

following conversion: `15 (base10)` = 0 10000000010 1110000000000000000000000000000000000000000000000000 (IEEE754)


The next smallest number is 0 10000000010 1101111111111111111111111111111111111111111111111111 = **14.9999999999999982236431605997**

The next largest number is 0 10000000010 1110000000000000000000000000000000000000000000000001 = **15.0000000000000017763568394003**  


Therefore, the IEEE754 number **not only represents the number 15.0, but also all the real numbers halfway between its immediate neighbors. So any computation that has a result within this interval will be assigned 15.0.**

Moreover,

In [3]:
import numpy as np

largest = (2**(2046-1023))*((1 + sum(0.5**np.arange(1, 53))))
largest

1.7976931348623157e+308

In [4]:
sys.float_info.max

1.7976931348623157e+308

In [5]:
smallest = (2**(1-1023))*(1+0)
smallest

2.2250738585072014e-308

In [6]:
sys.float_info.min

2.2250738585072014e-308

Numbers that are larger than the largest representable floating point number result in **overflow**, and Python handles this case by assigning the result to **inf**. Numbers that are smaller than the smallest subnormal number result in **underflow**, and Python handles this case by assigning the result to **0**.

So, what have we gained by using IEEE754 versus binary?  
Using 64 bits binary gives us 264 numbers. Since the number of bits does not change between binary and IEEE754, IEEE754 must also give us 264 numbers. In **binary**, **numbers have a constant spacing between them**. As a result, you **cannot have both range** (i.e., large distance between minimum and maximum representable numbers) and **precision** (i.e., small spacing between numbers). Controlling these parameters would depend on where you put the decimal point in your number. 

**IEEE754** overcomes this limitation by **using very high precision at small numbers** and **very low precision at large numbers**. This limitation is usually acceptable because the gap at large numbers is still small relative to the size of the number itself. Therefore, even if the gap is millions large, it is irrelevant to normal calculations if the number under consideration is in the trillions or higher.

## 3. Round-offs Errors