# GEOPHYS 257 (Winter 2023)

## Data Types and Basic Machine Errors

In this lab we will be covering Numeric Data Types for both Python and Numpy, as well as Machine errors. Although the problems in this lab are not from *Python Numerical Methods*, please read [Chapter-9](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter09.00-Representation-of-Numbers.html). Pay particular attention to the *Round-off Errors* secion, but the first two sections are mostly FYI; however, these sections provide an understanding of how most computational systems represent fractional numbers via binary numbers.

[//]: <> (Notebook Author: Thomas Cullison, Stanford University, Jan. 2023)

## External Resources
If you have any question regarding some specific Python functionality you can consult the official [Python documenation](http://docs.python.org/3/).

* [Python Numeric Types](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex)
* [Numpy Data Types](https://numpy.org/doc/stable/user/basics.types.html)

## Python Built-in Numeric Types (not the same as Numpy types)

In Python it is not necessary to declare variables. Once a value is assigned to a variable the variable will have the type of that value. That behaviour is known as **dynamic typing**. A variable can also change its type at any point of the program execution. That has its advantages but there are also some pitfalls.

1. Int
1. Float
1. Complex (really the 3rd type is *imaginary* and the complex number is like a tuple)
1. Type Casting

## Exercise 0

#### First read [Chapter 9](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter09.00-Representation-of-Numbers.html) in *Python Numerical Methods*.

Then begin this lab by first reading the code below; then running it, and, finally, by examing the results. Continue to the next exercise afterwards.

In [1]:
# 1
x = 2
print('x =', x)
print('x type:', type(x))
print()

# 2 manual casting int to float
cx = float(x)
print('cx =', cx)
print('cx type:', type(cx))
print()

# 3
y = 1.2
print('y =', y)
print('y type:', type(y))
print()

# 4 manual casting float to int
cy = int(y)
print('cy =', cy) # !!! notice the rounding !!!
print('cy type:', type(cy))
print()

# 3
z = 1 + 2j
print('z =', z)
print('z type:', type(z))
print()

# 4 dynamic casting of y to complex
print('z*y =', z*y)
print('z*y type:', type(z*y))
print()

x = 2
x type: <class 'int'>

cx = 2.0
cx type: <class 'float'>

y = 1.2
y type: <class 'float'>

cy = 1
cy type: <class 'int'>

z = (1+2j)
z type: <class 'complex'>

z*y = (1.2+2.4j)
z*y type: <class 'complex'>



## Exercise 1: Dynamic Type Casting

You might have noticed that $z \cdot y$ above returned a complex number. This is related to something called *type casting*. The value stored in $y$ was actually casted (think transformed) into a complex number and then the multiplication was computed. Thus, the two lines below will yield equivalent answers.

```python 
z*y
z*(y + 0.j)
```

Note, the dot in '0.j' is needed to tell Python that I'm adding a floating point imaginary number. This dot is only neccessary when using '0' as the imaginary part, or when you wish to explicitly define a values as being a float type.

To confirm that the above two lines are equivelant, try the below line for yourself. 

```python 
z*y == z*(y + 0.j)
```

It will return a value of 'True'. The '==' is a comparison operator, which we will discuss later.

### For this exercise please do the following in the cell below.
1. Run the comparison code above and print the result. If you don't believe the answer, change '0.j' to '1j' and run the code again.
1. Type-cast 0.5 and 0.99 to integers and print the results
1. Similar to above, but now type-cast 100.5 and 100.99 to integers
1. Set $x$ equal to a floating point number with a non-zero value left of the decimal point. Then finish the line below such that $x$ is rounded to the nearest integer no mater what value is to the right of the decimal point. Print that result.
1. Store the results of the following statements into a list (*hint: use .append()*). After that, run the code for section 5 below which maps each element in the list to it's corresponding boolean value. Using comments (i.e. #comment), discuss the results. 
```python 
True
True + 1
True - 1
True*1
True*0
True*3.1425
True*-3.1425
False
False + 1
False - 1
False*1
False*0
False*3.1425
False*-3.1425
```

In [2]:
# 1. comparison
print('#1')
is_same = (z*y == z*(y + 0.j))
print(f'Equivalent?: {is_same}')
print(f'z*(y+1j) = {z*(y+1j)}')


# 2. type-cast
print('\n#2')
print('int(0.5) = ', int(0.5))
print('int(0.99) = ', int(0.99))


# 3. type-cast
print('\n#3')
print('int(100.5) = ', int(100.5))
print('int(100.99) = ', int(100.99))


# 4. round to nearest integer (you need to add something before casting)
print('\n#4')

def my_round(x):
    return int(x + (x>0)-0.5)

print('my_round(0.4999) = ', my_round(0.4999))
print('my_round(100.5) = ', my_round(100.5))
print('my_round(-2.5) = ', my_round(-2.5))
print('my_round(-9.9) = ', my_round(-9.9))
print('my_round(-0.5) = ', my_round(-0.5))
print('my_round(-0.1) = ', my_round(-0.1))

# 5. 'Boolean' is a number
print('\n#5')

mylist = []
mylist.append(True)
mylist.append(True + 1)
mylist.append(True - 1)
mylist.append(True * 1)
mylist.append(True * 0)
mylist.append(True * 3.1425)
mylist.append(True * -3.1425)
mylist.append(False)
mylist.append(False + 1)
mylist.append(False - 1)
mylist.append(False * 1)
mylist.append(False * 0)
mylist.append(False * 3.1425)
mylist.append(False * -3.1425)

print(f'mylist: {mylist}')
print(f'bool(mylist): {list(map(bool, mylist))}')

#1
Equivalent?: True
z*(y+1j) = (-0.8+3.4j)

#2
int(0.5) =  0
int(0.99) =  0

#3
int(100.5) =  100
int(100.99) =  100

#4
my_round(0.4999) =  0
my_round(100.5) =  101
my_round(-2.5) =  -3
my_round(-9.9) =  -10
my_round(-0.5) =  -1
my_round(-0.1) =  0

#5
mylist: [True, 2, 0, 1, 0, 3.1425, -3.1425, False, 1, -1, 0, 0, 0.0, -0.0]
bool(mylist): [True, True, False, True, False, True, True, False, True, True, False, False, False, False]


### Comment on Exercise 1

For **question 2 and 3**: Type-cast using `int()` function directly removes all digits after the floating point.
<br><br>
For **question 4**: The logic is that, the nearest integer for positive float numbers $x \in [1.5, 2.5)$ is $2$. This is equivalent to first adding $0.5$ to number x, so we have $x + 0.5 \in [2.0, 3.0)$, and then type-casting these numbers into integers yields the expected result $2$. For negative float numbers, the operation is the opposite: subtracting $x$ by $0.5$. In the function, I use the value of Boolean expression $x > 0$ to separate positive and negative cases.
<br><br>
For **question 5**:<br>
    1. Map Boolean value to number: `True` is mapped to 1, and `False` is mapped to 0.<br>
    2. Map number to Boolean value: All numbers other than 0 are mapped to `True`. Only 0 is mapped to `False`.

## Exercise 2: "Type-casting" Numpy arrays

Type casting Numpy arrays is a little different than it is for built in types.  Here, we can make use of the following function, [**astype()**](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html).

### For this exercise please do the following in the cell below.
1. Run the code below for this problem below to create an array of random 64bit floats. Then type cast the array to float32 types, and then subtract the two arrays and print the results. Discuss the result of the subtraction.
1. Now, using the same 64bit array, type cast all the values to int32 values. How were the numbers rounded? Was this what you expected compared to rounding a Python built in type?
1. Now type cast the float64 array to the nearest int32.
1. Dynamically cast the int32 array created above to into an array with float64 types. Print the array and something that verifies the arrays new numeric type.
1. For the last problem in this section. Type cast the float64 array created in problem 1, to an array of unsigned integers ([Hint](https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.uint32)). What do you notice with the results compared to problem 2?

In [5]:
import numpy as np

# 1 type cast to float32
print('#1')
np.random.seed(42) # leave seed alone until all 5 problems and discussion are finished, then feel free to tinker.
isc = np.random.randint(5, high=10)
rx64 = (np.random.rand(10) - 0.5) * isc
print('rx64\n', rx64)

rx32 = rx64.astype(np.float32)
print('\nrx32\n', rx32)
print('\nrx64 - rx32\n', rx64 - rx32)

# 2 type cast to int32
print('\n#2')
int32 = rx64.astype(np.int32)
print('int32\n', int32)

# 3 type cast to nearest int32
print('\n#3')
rint32 = (rx64 + (rx64>0) - 0.5).astype(np.int32)
print('Nearest int32\n', rint32)

# 4 dynamic type cast int32 array above to float64
print('\n#4')
rx64_new = rint32 + np.array([0], dtype=np.float64)
print('rx64_new\n', rx64_new)
print('dtype:', rx64_new.dtype)

# 5 type cast to unsigned-int32
print('\n#5')
uint32 = rx64.astype(np.uint32)
print('uint32\n', uint32)


#1
rx64
 [ 3.60571445  1.85595153  0.78926787 -2.75185088 -2.75204384 -3.5353311
  2.92940917  0.80892009  1.66458062 -3.83532405]

rx32
 [ 3.6057146  1.8559515  0.7892679 -2.7518508 -2.7520437 -3.535331
  2.9294093  0.8089201  1.6645806 -3.835324 ]

rx64 - rx32
 [-1.08275724e-07 -1.31314399e-08 -2.40296032e-08 -3.30309424e-08
 -1.13250320e-07 -9.18359229e-08 -9.93187070e-08  8.51552517e-09
  3.87959762e-08  3.36239125e-09]

#2
int32
 [ 3  1  0 -2 -2 -3  2  0  1 -3]

#3
Nearest int32
 [ 4  2  1 -3 -3 -4  3  1  2 -4]

#4
rx64_new
 [ 4.  2.  1. -3. -3. -4.  3.  1.  2. -4.]
dtype: float64

#5
uint32
 [         3          1          0 4294967294 4294967294 4294967293
          2          0          1 4294967293]


### Comment on Exercise 2

For **questions 1**: Because float32 is less accurate than float64, type-casting float64 into float32 leads to truncation error.<br>
More specific, IEEE 754 standard for binary32 has 1 sign bit, 8 exponent bits and 23 fraction bits. The real value is expressed as:
$$ (-1)^{b_{31}} \times 2^{\left(b_{30} b_{29} \cdots b_{23}\right)_2 - 127} \times \left(1.b_{22}b_{21}\cdots b_{0}\right)_2 $$
Denote the exponent with $E = \left(b_{30} b_{29} \cdots b_{23}\right)_2 - 127$. The value range of fraction term is $ \left(1.b_{22}b_{21}\cdots b_{0}\right)_2 \in [1,2)$. The upper bound of truncation error can be estimated as:
$$ \left| 2^E \times 2^{-23} \right| \approx 2^E \times 1.2 \times 10^{-7}$$
The exponent $E$ converts the original value into $[1,2)$. The above analysis is consistent with the observed subtraction results.
<br><br>
For **question 2**: Digits after the decimal point are all removed. This is not rounding, but truncation toward 0. This is the same as Python type-casting floats to integers.
<br><br>
For **question 3**: I use the same method as in the previous exercise.
<br><br>
For **question 5**: 32 bit unsigned integer can represent numbers from $0$ to $2^{32}-1 = 4294967295$. Positive floats give the same results as type-cast to int32, but negative floats do not. If the integer part of a float is negative, this negative integer is converted to the value at the end of the cycle for unsigned type.

## Python and Numpy Overflow Errors (integers)

The behavior you just saw in problem 5 above is related to a type of computational error called an Overflow, specifically, a Binary Overflow (not related to a "Stack Overflow" which is a real thing, and not just a website for finding solutions to coding problems). I think the following is a reasonable discription of a [Binary Overflow](https://www.geeksforgeeks.org/overflow-in-arithmetic-addition-in-binary-number-system/).

### Exercise 3

For this exercise, extend the code below to int64 type integers (copy and modify including the comments). Then in the markdown cell for this exercise below, please explain:
1. Why does adding *1* to the numpy variables cause the values to become negative, while the same operation does not cause the Python variable values to become negative. ([Hint-1](https://docs.python.org/3.3/reference/lexical_analysis.html#numeric-literals), [Hint-2](https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.int32))
1. Suppose you are writting a specific code, and you know that none of the operations will result in negative values, what other interger type (dtype) could use for 32-bit and 64-bit numpy arrays that would extend the maximum interger value that the arrays could correctly represent? ([Hint](https://programmercave0.github.io/blog/2019/10/19/Bit-Manipulation-in-C-and-C++))
1. Why is a Numpy array '+=' operator being used in Problem 1 instead of just doing something like the code shown below? Try this code your self, and then modify if so that you can understand and explain what happens after the addition of $1$ to the Numpy variable vs. the addition to the Numpy array.
```python
np_x32 = np.array([2**31-1],dtype=np.int32)[0]
#                               look here --^
```
```python
np_x32 += 1 # Single variable += operator
print(f'val(np_x32+1), type(np_x32+1): {np_x32},{type(np_x32)}')
#                         what is not here --^  nor here --^
```


In [6]:
# Python int32 
print('#1')
print('Python int32')
x32 = int(2**31 - 1) # remember order of operations!
print(f'val(x32), x32.bit_length: {x32}, {x32.bit_length()}')

x32 += 1
print(f'val(x32+1), (x32+1).bit_length: {x32}, {x32.bit_length()}')

print()

# Numpy int32 
print('Numpy int32')
np_x32 = np.array([2**31-1], dtype=np.int32)
print(f'val(np_x32), type(np_x32): {np_x32[0]}, {type(np_x32[0])}')

np_x32 += 1
print(f'val(np_x32+1), type(np_x32+1): {np_x32[0]}, {type(np_x32[0])}')

print()

# Python int64 
print('Python int64')
x64 = int(2**63 - 1) # remember order of operations!
print(f'val(x64), x64.bit_length: {x64}, {x64.bit_length()}')

x64 += 1
print(f'val(x64+1), (x64+1).bit_length: {x64}, {x64.bit_length()}')

print()

# Numpy int64 
print('Numpy int64')
np_x64 = np.array([2**63-1], dtype=np.int64)
print(f'val(np_x64), type(np_x64): {np_x64[0]}, {type(np_x64[0])}')

np_x64 += 1
print(f'val(np_x64+1), type(np_x64+1): {np_x64[0]}, {type(np_x64[0])}')

#1
Python int32
val(x32), x32.bit_length: 2147483647, 31
val(x32+1), (x32+1).bit_length: 2147483648, 32

Numpy int32
val(np_x32), type(np_x32): 2147483647, <class 'numpy.int32'>
val(np_x32+1), type(np_x32+1): -2147483648, <class 'numpy.int32'>

Python int64
val(x64), x64.bit_length: 9223372036854775807, 63
val(x64+1), (x64+1).bit_length: 9223372036854775808, 64

Numpy int64
val(np_x64), type(np_x64): 9223372036854775807, <class 'numpy.int64'>
val(np_x64+1), type(np_x64+1): -9223372036854775808, <class 'numpy.int64'>


In [240]:
# Problem 3: Variable Numpy int32
print('#3')
print('Example 1')
np_x32 = np.array([2**31-1], dtype=np.int32)[0]

np_x32 += 1 # Single variable += operator
print(f'val(np_x32+1), type(np_x32+1): {np_x32}, {type(np_x32)}')

print()

print('Example 2')
np_x32 = np.array([2**31-1], dtype=np.int32)[0]

np_x32 += np.array([1], dtype=np.int32)[0] # Single variable += operator
print(f'val(np_x32+1), type(np_x32+1): {np_x32}, {type(np_x32)}')

print()

print('Example 3')
np_x32 = np.array([2**31-1], dtype=np.int32)[0]

np_x32 += np.array([1], dtype=np.float32)[0] # Single variable += operator
print(f'val(np_x32+1), type(np_x32+1): {np_x32}, {type(np_x32)}')

print()

print('Example 4')
np_x32 = np.array([2**31-1], dtype=np.int32)

np_x32 += np.array([1], dtype=np.float32) # Array += operator
print(f'val(np_x32+1), type(np_x32+1): {np_x32[0]}, {type(np_x32[0])}')

#3
Example 1
val(np_x32+1), type(np_x32+1): 2147483648, <class 'numpy.int64'>

Example 2
val(np_x32+1), type(np_x32+1): -2147483648, <class 'numpy.int32'>

Example 3
val(np_x32+1), type(np_x32+1): 2147483648.0, <class 'numpy.float64'>

Example 4


  np_x32 += np.array([1], dtype=np.int32)[0] # Single variable += operator


UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int32') with casting rule 'same_kind'

### Comment on Exercise 3

For **question 1**: In terms of Python data type, "there is no limit for the length of integer literals apart from what can be stored in available memory". In other words, no overflow happens for Python variable in the example, given sufficiently large memory available. However, for Numpy variable in the array, since we designate its data type to be `np.int32` or `np.int64`, overflow happens.
<br><br>
For **question 2**: If no negative values are involved, we can use unsigned integer type. For same bits of integer representation, the upper limit is doubled.
<br><br>
For **question 3**: I use four example cases to understand the mechanism.
* Example 1: Numpy <span style="color:red"> variable </span> with `+=` operator, number 1 as default integer type (Instruction case).
* Example 2: Numpy <span style="color:red"> variable </span> with `+=` operator, number 1 as integer type `np.int32`. This case gives the overflow warning.
* Example 3: Numpy <span style="color:red"> variable </span> with `+=` operator, number 1 as float type `np.float32`.
* Example 4: Numpy <span style="color:blue"> array </span> with `+=` operator, array `np.array([1])` as float type `np.float32`. This case gives the error.

The error provides the hint. The first critical point is the accuracy of addition operator `+`. The result of operation accommodates the input data types. For Example 1, `int32 + int64` &rarr; `int64`. For Example 3 and 4, `int + float` &rarr; `float`. If the input data types are the same, the output one will also be this data type, as shown by Example 2 with the overflow warning.


The second critical point is related to the addition assignment operator `+=`. Example 1-3 indicate that Numpy <span style="color:red"> variable </span> can be flexible. However, the error in Example 4 indicates that Numpy <span style="color:blue"> array </span> has fixed data type. The assignment operation of the `+=` operator will perforce convert the operation results into the original data type. Thus, in the first question we see that adding 1 to the Numpy array causes overflow, and in Example 4 there arises an error.

## Numpy Underflow Errors (floating point)

Now that you have seen Overflow errors, lets look at what Underflow errors are.  Do you have a guess?  The following [Wiki](https://en.wikipedia.org/wiki/Arithmetic_underflow) provides a nice explination of this type of error. There is more information on that page then you probably need to know, but be sure to read the first section. Then work on the exercise below.

### Exercise 4

For this exercise, besure to comment the code, and provide print statments between the problems.  Make the output look nice. For the questions, please fill in your responses in the markdown section that follows.

* Part A
    0. Print the smallest possible numpy.float32 value. (This has been done for you.) [Extra-Info](https://en.wikipedia.org/wiki/Subnormal_number)
    1. print the result of mf$32^2$. Do you believe that the value in A0 really is the smallest numpy.float32? Provide evidence that supports your answer (Hint: this is possible with one line of code.)
    2. Now creat a numpy array, np_mf32, with a size one and the value of mf32 in it. Print both the array and the numpy.dtype of the array. (no discussion needed)
    3. Now square the the np_mf32 array.  Does this confirm your thoughts about A1?
    4. Create a list or numpy array of floats for $x$ such that $ x \in \left[0.5,0.9\right]$ by increments of 0.1 (there shold be 5 elements). No loop over the values of $x$ and multiply the np_mf32 array by x. Comment on which value or values cause an Underflow error.
* Part B: do all of the above in Part A with a numpy.float64 dtype
    5. And, answer this question related to A1 vs B1. Why were the results different when we squared mf32 in A1 vs squaring mf64 in B1? (Hint: what is the dtype of both values after squaring? It takes just two lines of code to see what was different between the two.)


In [7]:
# Part A

# A0
# This should be the smallest float32 possible, but ...
print('\n#A0')
mf32 = np.finfo(np.float32).smallest_subnormal 
print(f'val(mf32): {mf32}')


# A1 square mf32
print('\n#A1')
mf32_sq = mf32 ** 2
print(f'val(mf32^2), type(mf32^2): {mf32_sq}, {mf32_sq.dtype}')


# A2 mf32 in a numpy array (look at the Numpy int32 example in the previous exercise)
print('\n#A2')
np_mf32 = np.array([mf32], dtype=np.float32)
print(f'val(np_mf32), type(np_mf32): {np_mf32[0]}, {np_mf32.dtype}')


# A3 square the np_mf32 array 
print('\n#A3')
np_mf32_sq = np_mf32 ** 2
print(f'val(np_mf32^2), type(np_mf32^2): {np_mf32_sq[0]}, {np_mf32_sq.dtype}')


# A4 loop over x in [0.5:0.9:0.1] show x*np_mf32
print('\n#A4')
coeff = np.arange(0.5, 1.0, 0.1)
np_mf32_under = coeff.astype(np.float32) * np_mf32
print(f'val(x): {coeff}')
print(f'val(x*np_mf32): {np_mf32_under}, dtype(x*np_mf32): {np_mf32_under.dtype}')

print()

# Part B (same as above but for float64)

# B0 - B4: Your code
print('\n#B0')
mf64 = np.finfo(np.float64).smallest_subnormal 
print(f'val(mf64): {mf64}')

print('\n#B1')
mf64_sq = mf64 ** 2
print(f'val(mf64^2), type(mf64^2): {mf64_sq}, {mf64_sq.dtype}')

print('\n#B2')
np_mf64 = np.array([mf64], dtype=np.float64)
print(f'val(np_mf64), type(np_mf64): {np_mf64[0]}, {np_mf64.dtype}')

print('\n#B3')
np_mf64_sq = np_mf64 ** 2
print(f'val(np_mf64^2), type(np_mf64^2): {np_mf64_sq[0]}, {np_mf64_sq.dtype}')

print('\n#B4')
np_mf64_under = coeff.astype(np.float64) * np_mf64
print(f'val(x): {coeff}')
print(f'val(x*np_mf64): {np_mf64_under}, dtype(x*np_mf64): {np_mf64_under.dtype}')

# B5 The extra qustion: A1 vs B1
print('\n#B5')
print('See response below')


#A0
val(mf32): 1.401298464324817e-45

#A1
val(mf32^2), type(mf32^2): 1.9636373861190906e-90, float64

#A2
val(np_mf32), type(np_mf32): 1.401298464324817e-45, float32

#A3
val(np_mf32^2), type(np_mf32^2): 0.0, float32

#A4
val(x): [0.5 0.6 0.7 0.8 0.9]
val(x*np_mf32): [0.e+00 1.e-45 1.e-45 1.e-45 1.e-45], dtype(x*np_mf32): float32


#B0
val(mf64): 5e-324

#B1
val(mf64^2), type(mf64^2): 0.0, float64

#B2
val(np_mf64), type(np_mf64): 5e-324, float64

#B3
val(np_mf64^2), type(np_mf64^2): 0.0, float64

#B4
val(x): [0.5 0.6 0.7 0.8 0.9]
val(x*np_mf64): [0.e+000 5.e-324 5.e-324 5.e-324 5.e-324], dtype(x*np_mf64): float64

#B5
See response below


### Comment on Exercise 4

For **question A1**: The value in A0 is the smallest `np.float32` value, because `mf32^2` has been changed to `np.float64` type, indicating that `np.float32` is not accurate enough to represent `mf32^2`. This again shows that Numpy variable is flexible in type.
<br><br>
For **question A3**: The result confirms that the value in A0 is the smallest `np.float32` value, since `mf32^2` is too small and considered as zero in `np.float32` representation. Again, this result also shows that Numpy array is fixed in type.
<br><br>
For **question A4**: Only $x = 0.5$ causes an underflow error to 0. For $x \in (0.5, 1)$ the result will be rounded to `mf32`.
<br><br>
For **question B1 and B3**: The value in B0 is the smallest `np.float64` value. Both computations with Numpy variable and array confirm this.
<br><br>
For **question B4**: Only $x = 0.5$ causes an underflow error to 0. For $x \in (0.5, 1)$ the result will be rounded to `mf64`.
<br><br>
For **question B5**: After squaring the variable, `mf32^2` is changed to data type `np.float64`, so in A1 we see the printed result of `mf32^2` is not 0. However, `mf64^2` is still of data type `np.float64`, so in B1 we see the printed result of `mf64^2` is 0.

## Round-off/Truncation Errors (floating point)

Before starting this exercise, please read the [Round-off Errors](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter09.03-Roundoff-Errors.html) section in Chapter 9 of *Python Numerical Methods*.

### Exercise 5

For this exercise, were are going to examine [Truncation](http://nifty.stanford.edu/2003/pests/2002/lectures/07.1_FloatingPoint/truncation.htm?CurrentSlide=6) errors, which are related to Round-off errors. An example of when you might encounter truncation errors is when solving integrals (a sequence of summations) on a computer. Given that we mostly work with computers that have finite precision arithematic, we must represent integrals as finite summations.

To demonstrate truncation errors we are going to sum all the integer values from $1$ to $N$, but we are going to represent each integer as a numpy.float32 type. In the cell below, I've writen a function that finds the analyical solution to $\sum\limits_{i=1}^{N}i$, for a given $N$.

Your job (write responses in the markdown cell below):
* First, create a Numpy array of size $N$ containing numpy.float32 values representing the integers from $1-N$, and have the values stored in consective indexing order (i.e. arr[0] = 1.0, arr[1] = 2.0, ..., arr[N-1] = N.0).
* Next, you need to create function that calculates and returns the sum of this array (make it fast, and feel free to use numpy functions). Besure to verify that your sum is the same as the analytical sum. I suggest starting with an array of just 10 elements, so $N=10$.
* Once you have verified that your sumation function is returning the correct results, you have two tasks:
    1. Find the value of $N_{inc}$ at which your summation is no longer correct. Discuss in the markdown cell below how you found this value. 
    2. Write an improved summation that can correctly sum all the values to find the answer of $\sum\limits_{i=1}^{2\cdot N_{inc}}$ (the sum upto $2\cdot N_{inc}$). Discuss your reasoning about how you chose to solve this task.
    3. Special Case: If you are taking this course for 4-units, then write a recursive function recur_sum() that can solve problem 2. All values must remain as numpy.float32 values (no type-casting). Then test your fuction for $4\cdot N_{inc}$ (it should be accurate without any changes, but if it's not, that's ok).



In [14]:
import numpy as np

# function that returns the analytical sum of the numbers 1 through N
def analytic_sum(N):
    return np.array([N*(N+1)/2]).astype(np.int64)[0]


# your summation function that takes numpy array of type numpy.float32
def my_sum(myarr):
    return np.sum(myarr)


# your improved summation function 
def my_impr_sum(myarr):
    return np.sum(myarr.astype(np.float64))
    

# Special Case: recursive sum function here


# Test case: N = 10
print('Test case with N = 10')
N = 10
num_arr = np.arange(N+1, dtype=np.float32)
print(f'Analytic:  {analytic_sum(N)}')
print(f'Numpy sum: {my_sum(num_arr)}')

print()

# Find N_inc
N = 1
while True:
    sum_analytic = analytic_sum(N)
    num_arr = np.arange(N+1, dtype=np.float32)
    sum_numpy = my_sum(num_arr)
    
    if sum_analytic != sum_numpy.astype(np.int64):
        break
    N += 1
print('#1')
print(f'N_inc: {N}')

print()

# Improved sum using type-cast
print('#2')
N = 2 * N
print('Test improved sum on N = %d' %N)
num_arr = np.arange(N+1, dtype=np.float32)
print(f'Analytic:  {analytic_sum(N)}')
print(f'Numpy sum: {my_impr_sum(num_arr)}')

Test case with N = 10
Analytic:  55
Numpy sum: 55.0

#1
N_inc: 5793

#2
Test improved sum on N = 11586
Analytic:  67123491
Numpy sum: 67123491.0


### Comment on Exercise 5

For **question 1**: A simple search is performed by a loop over $N$, until the analytic and Numpy numeric results are different. I find $N_{inc} = 5793$.<br>

Note that for function `analytic_sum`, I change the data type to `np.int64` in order to provide accurate results even for very large $N$ (although 64 bits can represent far more significant digits than needed).<br>

Inaccuracy arises when the sum has more significant digits than can be represented by the 23 fraction bits in `np.float32`. The upper limit for integer representation is $2^{24} - 1 = 16777215$, while for $N = N_{inc} = 5793$, we have $N(N+1)/2 = 16782321$. Therefore, $N_{inc} = 5793$ is reasonable.

For **question 2**: The simple solution is to type-cast the numbers into `np.float64`, which can represent numbers with more significant digits. Theoretically, this trick is valid for $N < N_{inc} \approx 1.3\times10^8$.