# Rounding
**Rounding by chopping**: we can either round up or round down.   
$x_- = 1.b_1b_2...b_n*2^m$ we just chop off the digit.    
$x_+ = x_-+\epsilon_m*2^m$      
gap: $\epsilon_m*2^m$

**Round to nearest**: finding the thing with the smallest error and doing that.
![Screenshot 2023-09-05 at 9.47.57 AM.png](attachment:81d37283-4f78-490c-a731-8036bb69463c.png)

![image.png](attachment:a0a2a1cf-cc5a-4b33-b197-9546ec70a02f.png)

In [1]:
bin(20)

'0b10100'

In [3]:
2**(-3)

0.125

In [10]:
# full number in binary 1.0100001 * 2^4
# convert to 3 decimal places: 1.010*2^4 which is 20
print(int("10100", 2))
# rounded up is 1.011*2^4 which is 22
print(int("10110", 2))

20
22


![image.png](attachment:f4ad85ad-ccf4-4668-9fc0-a531a5ef936f.png)

In [17]:
import struct

def float_to_binary(input: float) -> str:
    return ''.join(format(c, '08b') for c in struct.pack('!f', input))
float_to_binary(6.17)[8::]

'110001010111000010100100'

In [15]:
bin(6)

'0b110'

In [18]:
# the exponent is 2 to get the leading 10 and the third digit is 0. 

# Error
## Absolute Error
$|fl(x)-x| \leq \epsilon*2^m$
## Relative Error
$\frac{|fl(x)-x|}{|x|}\leq \epsilon$     

single precision: $e_r \leq5*10^-7$   
double precision: $5*10^-16$
when the representation of float is the same as float + delta: when delta is less than machine epsilon. 

![image.png](attachment:7795262f-56a9-4891-85e4-7f6b6de7db9c.png)

In [21]:
-52+25

-27

![image.png](attachment:194a164d-4d87-4470-9c39-bbcfab1bd616.png)

In [22]:
-17+6

-11

# Mathematical properties that are different in FP operations. 
1. Associative
2. Distributive
3. Commutative
4. Cumulative   
this is because of intermediate rounding. 

In [23]:
import numpy as np

In [24]:
(np.pi+1e100)-1e100

0.0

In [25]:
np.pi+(1e100-1e100)

3.141592653589793

In [27]:
b = 1e80
a = 1e2
print(a+(b-b))
print((a+b)-b)

100.0
0.0


In [29]:
print(100*(0.1+0.2))
print(100*0.1+100*0.2)

30.000000000000004
30.0


In [30]:
for i in range(0, 4):
   b += 1e-50 
b

1e+80

# addition and subtraction
1. bring both numbers onto a common exponent
2. do the grade school operation
3. round the result   
sometimes you get *catostropic cancellation* when subtracting or adding reduces the precision of the answer. 

![image.png](attachment:77256736-1dd8-47f3-a315-f959527d797f.png)
![image.png](attachment:a3634ddf-e3ab-47fe-ab5f-9c9223d1d502.png)

![image.png](attachment:74a56ee5-3528-4fb2-acc7-1724c709c2af.png)

In [None]:
"""
    1.1100  *2^3
-   0.0101  *2^3
_________________
    
which is the same as 
    1.1100  *2^3
+   1.1010  *2^3
_________________
    1.0111  *2^3
"""

# Loss of precision
When we shift numbers to add or subtract, we can lose precision. 

![image.png](attachment:75133c4f-9a18-4d62-901e-4ce4624cacc8.png)

![image.png](attachment:b5ec2623-7b81-4ec0-acb1-2ad913521afd.png)

![image.png](attachment:04de5ba2-e9b1-4fa3-bd35-04f99402eb98.png)

In [35]:
x = 3.1415276549
y = 3.1402143526
fl_x = 3.141
fl_y = 3.140
actual = x-y
print(actual)
fl = fl_x-fl_y
print(fl)
er = abs((actual-fl)/actual)
er

0.001313302299999819
0.0009999999999998899


0.23856068781724685

![image.png](attachment:d9d0aecf-92ee-495a-8db8-ceb782d78278.png)