# 1. Computation fundamentals

There are several sources of errors in computation: 

- **stability**, a property of the algorithm,
- **conditioning**, a property of the problem, and
- **roundoff**, a consequence of the representation of numbers we use.

## Floating-point arithmetic

Computers work in terms of floating-point numbers instead of reals $\mathbb{R}$. 

Let's say $\circ$ is an operator that rounds the real number $x \in \mathbb{R}$ to the nearest floating-point number $\hat{x}$:

$$ \hat{x} := \circ x. $$

The _absolute_ error induced by this representation is

$$ \Delta x = \circ x - x = \hat{x} - x, $$
and the _relative_ error is

$$ \delta x = \frac{\Delta x}{x} = \frac{\hat{x} - x}{x}. $$
We can rearrange this as

$$ \hat{x} = (1 + \delta x)x. $$

The IEEE (Institute of Electrical and Electronics Engineers) standard _guarantees_ that 

$$ |\delta x| < \mu_M = \tfrac{1}{2}\varepsilon_M, $$
where $\varepsilon_M$ is **machine precision**.

```{admonition} Question
Where does $\varepsilon_M$ come from, and what is it in single and double precision arithmetic?
```

Numbers in floating-point arithmetic are represented as

$$ \pm (1 + f) \cdot 2^n, $$ 
where $n$ is the **exponent** and $(1+f)$ is the **mantissa**. 
- $d$ is a constant called the binary _precision_.

A fixed number of bits called the binary **precision**, denoted $d$, is then used to store $f$ in base 2:

$$ f = \sum_{i = 1}^d b_i 2^{-i}, \quad b_i \in \{0, 1\}. $$
It's useful to pull out a constant $2^{-d}$ to get

$$ f = 2^{-d}\sum{i = 1}{d} b_i 2^{d - i} = 2^{-d}z, $$
where $z$ is now an integer.

```{admonition} Question
What values can $z$ take? How many numbers are there between $2^n$ and $2^{n+1}$? What does that say about the relationship between $\varepsilon_M$ and $d$?
```

$z$ takes values in $ z \in \{0, 1, 2, \ldots, 2^{d} - 1\}$, therefore there are $2^d$ integers between two adjacent powers of $2$ in floating-point arithmetic. From this we can read off that

$$ \varepsilon_M = 2^{-d}. $$
In single precision, $d = 23$ bits are used to represent the mantissa, therefore $\varepsilon_M \approx 2\cdot 10^{-7}$ and from $\mu_M$ we expect each number to be accurate to around 7 digits. For double precision, $d = 52$ and numbers can be trusted to 16 digits.

## Condition number