# Homework 2 - Floating Point Numbers

by Michael Moen

## Exercise 1

Let $x, y, z \in \mathcal{F}$ be floating point double-precision numbers. Then there exist $\delta_1$ and $\delta_2$ such that

$$ x \oplus y = (x + y)(1 + \delta_1) $$

and

$$ x \oplus y \oplus z = ((x \oplus y) + z)(1 + \delta_2) $$

### Problem 1.1

Show that

$$ |x \oplus y \oplus z - (x + y + z)| \leq |\delta_1|(|x| + |y|) + |\delta_2|(|x| + |y| + |z|) + |\delta_1| |\delta_2| (|x| + |y|) $$

#### Solution

$$
\begin{align*}
    x \oplus y \oplus z &= ((x \oplus y) + z)(1 + \delta_2) \\
    x \oplus y \oplus z &= ((x + y)(1 + \delta_1) + z)(1 + \delta_2) \\
    x \oplus y \oplus z &= (x + y + z + x\delta_1 + y\delta_1)(1 + \delta_2) \\
    x \oplus y \oplus z &= x + y + z + x\delta_1 + y\delta_1 + x\delta_2 + y\delta_2 + z\delta_2 + x\delta_1\delta_2 + y\delta_1\delta_2 \\
    x \oplus y \oplus z - (x + y + z) &= x + y + z + x\delta_1 + y\delta_1 + x\delta_2 + y\delta_2 + z\delta_2 + x\delta_1\delta_2 + y\delta_1\delta_2 - (x + y + z) \\
    x \oplus y \oplus z - (x + y + z) &= x\delta_1 + y\delta_1 + x\delta_2 + y\delta_2 + z\delta_2 + x\delta_1\delta_2 + y\delta_1\delta_2 \\
    x \oplus y \oplus z - (x + y + z) &= \delta_1(x + y) + \delta_2(x + y + z) + \delta_1\delta_2(x + y) \\
    | x \oplus y \oplus z - (x + y + z) | &\leq |\delta_1|(|x| + |y|) + |\delta_2|(|x| + |y| + |z|) + |\delta_1||\delta_2|(|x| + |y|)
\end{align*}
$$

### Problem 1.2

Suppose further that $x, y, z > 0$. Estimate the relative error of the arithmetic operations $x \oplus y \oplus z$ in terms of the machine epsilon. Hint: show that

$$ \frac{|x \oplus y \oplus z - (x + y + z)|}{|x + y + z|} \leq |\delta_1| + |\delta_2| + |\delta_1| |\delta_2| $$

and notice that $\delta_1 \delta_2 = \mathcal{O}(\epsilon^2) \approx 0$.

#### Solution

$$
\begin{align*}
    | x \oplus y \oplus z - (x + y + z) | &\leq |\delta_1|(|x| + |y|) + |\delta_2|(|x| + |y| + |z|) + |\delta_1||\delta_2|(|x| + |y|) \\
    \frac{| x \oplus y \oplus z - (x + y + z) |}{| x + y + z |} &\leq \frac{|\delta_1|(|x| + |y|) + |\delta_2|(|x| + |y| + |z|) + |\delta_1||\delta_2|(|x| + |y|)}{| x + y + z |} \\
\end{align*}
$$

## Exercise 2

Given the function $f(x) = \frac{\sqrt{x + 9} - 3}{x}$.

### Problem 2.1

Reformulate the function to avoid the catastrophic cancellation when $x \approx 0$.

#### Solution

$$
\begin{align*}
    f(x) &= \frac{\sqrt{x + 9} - 3}{x} \\
    f(x) &= \frac{(\sqrt{x + 9} - 3) (\sqrt{x + 9} + 3)}{x (\sqrt{x + 9} + 3)} \\
    f(x) &= \frac{x + 9 - 9}{x (\sqrt{x + 9} + 3)} \\
    f(x) &= \frac{1}{\sqrt{x + 9} + 3}
\end{align*}
$$

### Problem 2.2

Write the function in Python to calculate the value of the function (with new formula) at the following values of $x$:

$$ x = 0.1, x = 0.01, \dots, x = 10^{-8} $$

Hint: Use an f-string to print out the values of the function: `print(f"{fx:0.16f}")`

#### Solution

In [None]:
import math

def sqrt_function(x: float) -> float:
    """Calculate the output of the function f(x)

    Parameters
    ----------
    x : float
        the independent variable of the function
    
    Returns
    -------
    float
        the output of the function f(x)
    
    """
    return 1 / (math.sqrt(x + 9) + 3)

for i in range(1, 9):
    fx = sqrt_function(10**(-1 * i))
    print(f'{fx:0.16f}')

0.1662062579967122
0.1666203960726876
0.1666620372942208
0.1666662037062757
0.1666666203703961
0.1666666620370373
0.1666666662037037
0.1666666666203704


## Exercise 3

Consider the function

$$ f(x) = \frac{e^x - cos(2x)}{x} $$

### Problem 3.1

Write the truncated Taylor series for $e^x - cos(2x)$ with Lagrange remainder so that $f(x)$ can be calculated without loss of significance for $x$ close to $0$.

#### Solution

$$
\begin{align*}
    e^x - cos(2x) &\approx f(0) + f'(0)(x-0) + \frac{f^{(2)}(0)}{2!}(x-0)^2 + \frac{f^{(3)}(0)}{3!}(x-0)^3 + \frac{f^{(4)}(0)}{4!}(x-0)^4 \\
    &\approx e^0 - cos(0) + (e^0 + 2sin(0))x + \frac{e^0 + 4cos(0)}{2}x^2 + \frac{e^0 - 8sin(0)}{6}x^3 + \frac{e^0 - 16cos(0)}{24}x^4 \\
    &\approx 1 - cos(0) + (1 + 2sin(0))x + \frac{1 + 4cos(0)}{2}x^2 + \frac{1 - 4sin(0)}{6}x^3 + \frac{1 - 16cos(0)}{24}x^4 \\
    &\approx 1 - 1 + (1 + 2(0))x + \frac{1 + 4(1)}{2}x^2 + \frac{1 - 4(0)}{6}x^3 + \frac{1 - 16(1)}{24}x^4 \\
    &\approx x + \frac{5}{2}x^2 + \frac{1}{6}x^3 + \frac{5}{8}x^4 \\
\end{align*}
$$

### Problem 3.2

When $x = 0.001$, determine the number of terms in the series needed to have the absolute error less than $10^{-8}$.

#### Solution

$$
\begin{align*}
    R_n(x) &= \frac{f^{(n+1)}(c)}{(n+1)!} (x-a)^{n+1} \\
    R_n(x) &= \frac{f^{(n+1)}(c)}{(n+1)!} x^{n+1}
\end{align*}
$$

Now, we must solve for $f^{(n+1)}(x)$:

$$
\begin{align*}
    f(x) &= e^x - cos(2x) \\
    f^{(n+1)}(x) &= e^x - 2^{n+1} \\
    f^{(n+1)}(x) &= e^{0.001} - 2^{n+1}
\end{align*}
$$

### Problem 3.3

Write a Python function to calculate the value of $f(x)$ at $x = 0.001$ up to $8$ significant digits.

#### Solution

In [None]:
from sympy import symbols, series, exp, cos

def exp_cos_function(x):
    """Calculate the output of the function f(x)

    Parameters
    ----------
    x : float
        the independent variable of the function
    
    Returns
    -------
    float
        the output of the function f(x)
    
    """

    fx = (exp(x) - cos(2*x)) / x
    taylor_expansion = fx.series(x, 0, 5)
    
    return taylor_expansion

exp_cos_function(0.001)

1.00250016604175