<a href="https://colab.research.google.com/github/anferivera/DarkBariogenesis/blob/main/2_1_computer_arithmetics_32_64_bits.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Computer Arithmetics and Round-off Methods

based in Diego Restrepo notebook

In the ideal mathematical world, operations like $1+2=3$, $4\times 3 = 12$, $(\sqrt{2})^2 = 2$ are unambiguously defined, however, when one is representing numbers in a computer, this is no longer true. The main reason of this is the so-called *finite arithmetic*, what is the way in which a computer performs basic operations. Some features of *finite arithmetic* are stated below:

- Only integer and rational numbers can be exactly represented.
- The elements of the set in which arithmetic is performed is necessarily finite.
- Any arithmetic operation between two or more numbers of this set should be another element of the set.
- Non-representable numbers like irrational numbers are approximated to the closest rational number within the defined set.
- **Extremely large numbers produce overflows and extremely small numbers produce underflows, which are taken as null.**
- Operations over non-representable numbers are not exact.

In spite of this, defining adequately the set of elements in which our computer will operate, round-off methods can be systematically neglected, yielding correct results within reasonable error margins. In some pathological cases, when massive iterations are required, these errors must be taken into account more seriously.

- - -

- [Binary machine numbers](#Binary-machine-numbers)
    - [Single-precision numbers](#Single-precision-numbers)
    - [Double-precision numbers](#Double-precision-numbers)
- [Finite Arithmetic](#Finite-Arithmetic)
    - [Addition](#Addition)
    - [Multiplication](#Multiplication)

- - -

1. IA: La **precisión de una computadora** se refiere a la cantidad de dígitos con los que se efectúan las operaciones aritméticas, lo cual está determinado por la longitud de palabra del procesador. En otras palabras, indica cuántos **bits** se usan para representar un valor numérico. Cuantos más bits se utilicen, mayor será la precisión y, por lo tanto, la capacidad de representar números con mayor exactitud.

2. **Precisión de punto flotante:**
En la mayoría de las computadoras, la precisión se mide en términos de punto flotante, donde los números se representan con un signo, una mantisa (los dígitos significativos) y un exponente.

3. **Tipos de precisión:** Se puede distinguir entre **precisión simple (32 bits)** y precisión **doble (64 bits)**. La precisión doble ofrece mayor precisión y rango, lo que la hace más adecuada para cálculos científicos y matemáticos complejos.

4. **Precisión y errores de redondeo:**
Debido a que las computadoras utilizan una representación finita de los números, algunos cálculos pueden generar errores de redondeo. Estos errores pueden acumularse a lo largo de cálculos más largos, lo que afecta la precisión final del resultado. Luego se verá en detalle...

In [None]:
import numpy as np

# Binary machine numbers    

As everyone knows, the base of the modern computation is the binary numbers. The binary base or base-2 numeral system is the simplest one among the existing numeral bases. As every electronic devices are based on logic circuits (circuits operating with [logic gates](#LogicGates)), the implementation of a binary base is straightforward, besides, any other numeral system can be reduced to a binary representation.

![LogicGates](http://www.ee.surrey.ac.uk/Projects/CAL/digital-logic/gatesfunc/graphics/symtab.gif)

According to the standard [IEEE 754-2008](http://en.wikipedia.org/wiki/IEEE_floating_point), representation of real numbers can be done in several ways, [single-precision](http://en.wikipedia.org/wiki/Single-precision_floating-point_format) and [double precision](http://en.wikipedia.org/wiki/Double-precision_floating-point_format) are the most used ones.

## Single-precision numbers

https://en.wikipedia.org/wiki/Single-precision_floating-point_format

Single-precision numbers are used when one does not need very accurate results and/or need to save memory. These numbers are represented by a **32-bits** (Binary digit) lenght binary number, where the real number is stored following the next rules:

![32-bits](http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Float_example.svg/590px-Float_example.svg.png)

1. The fist digit (called *s*) indicates the sign of the number (s=0 means a positive number, s=1 a negative one).
2. The next 8 bits represent the exponent of the number.
3. The last 23 bits represent the fractional part of the number.

The formula for recovering the real number is then given by:

$$r = (-1)^s\times \left( 1 + \sum_{i=1}^{23}b_{23-i}2^{-i} \right)\times 2^{e-127}$$

where $s$ is the sign, $b_{23-i}$ the fraction bits and $e$ is given by:

$$e = \sum_{i=0}^7 b_{23+i}2^i$$

Next, it is shown a little routine for calculating the value of the represented 32-bits number

In [None]:
def number32( binary ):
    #Inverting binary string
    binary = binary[::-1]
    #Decimal part
    dec = 1
    for i in range(1,24):
        dec += int(binary[23-i])*2**-i
    #Exponent part
    exp = 0
    for i in range(0,8):
        exp += int(binary[23+i])*2**i
    #Total number
    number = (-1)**int(binary[31])*2**(exp-127)*dec
    return number

In [None]:
#Check
number32( "00111110001000000000000000000000" )

0.15625

Please, see examples in wiki. In particular, how to convert a decimal number to binary number in this IEEE 754 binary format.

In [None]:
number32( "01000001010001100000000000000000" )

12.375

In [None]:
number32( "00111111100000000000000000000000" )

1.0

## **Exercice:** Understand how it works.

Write 16 in this format:

1. First, You have to lear how to write a binary number using the divition by 2.
2. Later, you have to do the next process:

\begin{equation}
(16)_{10} = (10000)_2 = (1.0)_2\times 2^4 = (1.0)_2\times 2^{(131-127)}
\end{equation}

In this case, $e=(131)_{10}=(10000011)_2$.

Therefore the number is: 0-10000011-00000000000000000000000

In [None]:
number32( "01000001100000000000000000000000" )

16.0

In [None]:
#Checck how to invert
a = "0011102"
b = [a]
type(b)

list

In [None]:
a[::-1]

'2011100'

https://codelearn.es/blog/que-es-el-sistema-binario/#:~:text=El%20sistema%20binario%2C%20popularmente%20conocido,el%200%20y%20el%201.


Single-precision system can represent real numbers within the interval $\pm 10^{-38} \cdots 10^{38}$, with **7- 8 decimal digits**.

In [None]:
#Decimal digits
print("\n")
print("Decimal digits contributions for single precision number")
print(2**-23., 2**-15., 2**-5. , "\n")

#Largest and smallest exponent
suma = 0
for i in range(0,8):
    suma += 2**i
print("Largest and smallest exponent for single precision number"    )
print(2**(suma-127.), 2**(-127.),"\n")



Decimal digits contributions for single precision number
1.1920928955078125e-07 3.0517578125e-05 0.03125 

Largest and smallest exponent for single precision number
3.402823669209385e+38 5.877471754111438e-39 



## Double-precision numbers

Double-precision numbers are used when high accuracy is required. These numbers are represented by a **64-bits** (Binary digIT) lenght binary number, where the real number is stored following the next rules:

![64-bits](http://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/IEEE_754_Double_Floating_Point_Format.svg/618px-IEEE_754_Double_Floating_Point_Format.svg.png)

1. The fist digit (called *s*) indicates the sign of the number (s=0 means a positive number, s=1 a negative one).
2. The next 11 bits represent the exponent of the number.
3. The last  bits represent the fractional part of the number.

The formula for recovering the real number is then given by:

$$r = (-1)^s\times \left( 1 + \sum_{i=1}^{52}b_{52-i}2^{-i} \right)\times 2^{e-1023}$$

where $s$ is the sign, $b_{23-i}$ the fraction bits and $e$ is given by:

$$e = \sum_{i=0}^{10} b_{52+i}2^i$$

Double-precision system can represent real numbers within the interval $\pm 10^{-308} \cdots 10^{308}$, with **16-17 decimal digits**.

###**ACTIVITY**: Tarea opcional.

**1.** Write a python script that calculates the double precision number represented by a 64-bits binary.

    
**2.** What is the number represented by:

0 10000000011 1011100100001111111111111111111111111111111111111111

<font color='white'>
    **ANSWER:**  27.56640625

## Finite Arithmetic

The most basic arithmetic operations are addition and multiplication. Further operations such as subtraction, division and power are secondary as they can be reached by iteratively use the latter ones.

### Addition

As mentioned before, arithmetic operations are not exact in a computer due to the inherent limitations in number representing. Even when adding two already approximate numbers, say a single-precision couple of numbers, the result may not be a representable number, being necessary to apply approximation rules.

$\sum_{i=1}^{i=N}\dfrac{1}{N}= \dfrac{N}{N}=1$

In [None]:
N = 9
x = 0
y = 0
for i in range(N):
    x += np.float16(1.0/N)
    y += np.float32(1.0/N)
print('x=',x)
print('y=',y)

x= 1.001
y= 1.0


Note that the sucessive application of rounded-off numbers produces a final result less precise.

$\dfrac{5}{7}+\dfrac{1}{3}=\dfrac{5\times3+1\times7}{21}=\dfrac{22}{21}$

In [None]:
print("5/7=", np.float32(5/7.))
print("1/3=", np.float32(1/3.),'\n')
print(np.float32(5./7.+1./3.), 22./21.)

#print("Error:", np.float16(5/7.+1/3.)-22/21.)
k = np.float32(5./7.+1/3.) - 22./21.
#print('Error:','%.7f'% k)

5/7= 0.71428573
1/3= 0.33333334 

1.0476191 1.0476190476190477


The final result has an error at the **seven** decimal digit.

Although the **float16** or half-float precision is standard according to the IEEE 754-2008, many devices do not support it well.

### Multiplication

For multiplication it is applied the same round-off rules as the addition, however, be aware that multiplicative errors propagate more quickly than additive errors.

$\prod_{N=1}^{20} 2^{(1/N)}=2$

In [None]:
N = 20
x = 1
for i in range(N):
    x *= np.float32(2.0**(1.0/N))
print('%.8f'% x)

2.00000167


In [None]:
N = 20
x = 1
for i in range(N):
    x *= np.float16(2.0**(1.0/N))
print('%.8f'% x)

1.99414062


In [None]:
#in python2
N = 20
x = 1
for i in range(N):
    x *= np.float16(2.0**(1.0/N))
print(x, np.float16(5/7.))

1.99580530418 0.71436


The final result has an error at the **third** decimal digit, 4 more than the case of addition.

**ACTIVITY**: Tarea opcional

Find the error associated to the finite representation in the next operations



$$
x-u, \frac{x-u}{w}, (x-u)*v, u+v
$$

considering the values

$$
x = \frac{5}{7}, y = \frac{1}{3}, u = 0.71425
$$



$$
v = 0.98765\times 10^5, w = 0.111111\times 10^{-4}
$$
