(uv:mt:prob_spaces)=
# Probability Spaces

## What is wrong with traditional probability courses?

By reading this book, I'm assuming you've thought about probability at some point in your life.

When you think of probability, one of the first things that might come to your mind is something like a coin flip. The coin lands on heads with probability $\frac{1}{2}$, and tails with probability $\frac{1}{2}$. The computations you have inherently made here are that a coin can land on either heads or tails, and both of these outcomes are equally likely, so it makes sense to say that each have a probability of $\frac{1}{2}$. When there's a finite number of possible outcomes, this is pretty easy to understand. When we are dealing with random quantities that have only finitely many possible outcomes, we call these quantities *discrete*.

Let's say, on the other hand, that I'm throwing a ball. What is the probability that I throw the ball $30$ meters? $40$ meters? $50$ meters? $30.05$ meters? $30.0001$ meters? If we were to assign probabilities of any sort of mass to particular distance totals, we'd quickly run into a problem: there are uncountably many different distances I could possibly throw the ball! Stated another way, maybe it's certain that I would throw the ball between $30$ and $50$ meters (I did play baseball and waterpolo for the better part of a decade), but there are uncountably many points on the real line between $30$ and $50$. Let's say we were to pick to ascribe probabilities every $.0001$ meters, which would result in $\frac{20}{.0001} = 200000$ possible locations I could throw the ball. If we could assume I'd hit each of these distance totals with equal probability, and say that for any gradation of $.0001$ yards in between $30$ and $50$, I throw the ball that particular distance with probability $\frac{1}{200000} = 0.000005$. So, the probability of me throwing the ball to the $30.0001$ yards is $0.000005$. But now we've got a new problem: what if I threw the ball $30.00011$ yards? Since I already "allocated" all possible distances with my $.0001$ meter gradations, this distance has a probability of $0$ that I throw the ball to it. This is absolutely puzzling and completely counter-intuitive, so I'd need a system that completely breaks with how I ascribe probabilities to the coin flip situation to how I ascribe probabilities with throwing the ball.

The problem is that, in this context, the unit that we want to ascribe probabilities to, the distances of the throw, are *continuous*. We could certainly overcome this pretty easily: we could invent a probability system that works when the outcomes are finite (like the coin flips), and we could invent a totally different probability system for when the outcomes are continuous (like throwing the ball). However, this is pretty counter-intuitive: wouldn't it be nice if you could conceptualize any probabilistic system with the same intuition? This is where *measure theory*, and in particular, *probability spaces*, will come into play. Figuring out how to have unified sets of rules, and unified sets of *intuition*, are what makes probability theory so powerful for both theorists and application specialists alike. Probability spaces will serve as the building blocks upon which we can ascribe unified sets of rules for random quantities. 

## Establishing units upon which we can understand probabilities

### $\sigma$-algebras

To begin to delineate how we can come to terms with this seeming inconsistency when we handle discrete outcomes (like coin flips) with continuous outcomes (like throwing a ball), we need to create a sort of homologous system on which we can ascribe probabilities, that works whether or not the units are discrete or continous. The first ingredient for this is called a $\sigma$-algebra. A $\sigma$-algebra is defined on a set $\Omega$, which can be anything (such as the real line).

````{prf:definition} $\sigma$-algebra
:label: uv:mt:prob_spaces:sig
Suppose that $\Omega$ is a set. $\mathcal F$ is called a $\sigma$-algebra on $\Omega$ if it is a non-empty collection of *events* $F \in \mathcal F$, s.t. $F \subseteq \Omega$, where:
1. Contain event space: $\Omega \in \mathcal F$,
2. Closure under complement: If $F \in \mathcal F$, then $F^c \in \mathcal F$,
3. Closure under countable unions: If $(F_i)_{i \in \mathbb N} \subseteq \mathcal F$ is a countable sequence of events, then $\bigcup_{i \in \mathbb N}F_i \in \mathcal F$.
````

You'll notice that we called these "subsets" of $\mathcal F$ "events": the reason for this will be apparent before we get to the end of this section, so just bear with us for a second. We'll often call this set $\Omega$ the "event space", because it is the set in which our "events" $F$ live ($F \subseteq \Omega$). $\mathcal F$ collects these events in a very particular way. It contains the entire "event space" $\Omega$, it is closed under complements, and if a set of events are in $\mathcal F$, then the union of these events are also in $\mathcal F$.

### Properties of $\sigma$-algebras

The idea of a $\sigma$-algebra is very simple to conceptualize; a set which contains sets which obey these $3$ rules we described above. However, as we will see here, these $3$ laws imply that $\sigma$-algebras also follow many other critical properties. Together, the properties about $\sigma$-algebras (and tools that we can define on them, such as *measures*, which we will learn about later), will allow us to define many higher-order operations like integration.

If we have a pair of $\sigma$-algebras (each of which is a collection of subsets of the event space), and we take the subsets of the event space that are in their intersection, what is the result? It turns out that the result is also a $\sigma$-algebra:

````{prf:theorem} $\sigma$-algebras are closed under intersections
:label: uv:mt:prob_spaces:sig:intersection

Let $\{F_n\}_{i \in \mathcal I}$ be a collection of $\sigma$-algebras on $\Omega$, where $\mathcal I \neq \varnothing$ is an arbitrary indexing set (which may be countable or uncountable). Then $\bigcap_{i \in \mathcal I} \mathcal F_i$ is a $\sigma$-algebra.
````

When proving things about $\sigma$-algebras, you always need to show exactly the properties in the definition. This means that every proof where we seek to show something is a $\sigma$-algebra is going to have $3$ parts, one for each delineation in the definition:
````{prf:proof}
Denote $\mathcal F = \bigcap_{i \in \mathcal I} \mathcal F_i$. To see that it is a $\sigma$-algebra, note:

1\. Contains $\Omega$: Note that by definition, since $\mathcal F_i$ are $\sigma$-algebras, then $\Omega \in \mathcal F_i$, for all $i \in \mathcal I$.

Then since $\Omega \in \mathcal F_i$ for all $i \in \mathcal I$, $\Omega \in \mathcal F$.

2\. Closed under complement: Let $A_j \in \mathcal F$.

Then by definition of the intersection, $A_j \in \mathcal F_i$, for all $i \in \mathcal I$.

Then since each $\mathcal F_i$ are $\sigma$-algebras, then $A_j^c \in \mathcal F_i$, since $\mathcal F_i$ are closed under intersection.

Then since $A_j^c \in \mathcal F_i$ for all $i \in \mathcal I$, $A_j^c \in \mathcal F$, by definition of the intersection.

3\. Closed under countable unions: Let $\{A_n\}_{n \in \mathbb N} \subseteq \mathcal F$ be a countable set of elements of $\mathcal F$.

Then by definition of the intersection, $\{A_n\}_{n \in \mathbb N} \subseteq \mathcal F_i$, for all $i \in \mathcal I$, by definition of the intersection operation.

Then since $\mathcal F_i$ is a $\sigma$-algebra and hence closed under countable unions, $\bigcup_{n \in \mathbb N}A_n \in \mathcal F_i$ for all $i \in \mathcal I$.

Then $\bigcup_{n \in \mathbb N}A_n \in \mathcal F$, by definition of the intersection.
````

If we have an event space $\Omega$, and a collection of sets that we are interested in $\mathcal A$, can we make a $\sigma$-algebra from the collection of sets we are interested in? The answer is yes, and it is called the $\sigma$-generated algebra:

````{prf:definition} $\sigma$-generated algebra
:label: uv:mt:prob_spaces:sig:siggen
Let $\Omega$ be an event space, and let $\mathcal A$ be a collection of sets, where $A \in \mathcal A \Rightarrow A \subseteq \Omega$. Then the smallest $\sigma$-algebra containing $\mathcal A$ is denoted $\sigma(\mathcal A)$, and is defined:
```{math}
    \sigma(\mathcal A) = \bigcap_{\mathcal F_i \in \left\{\mathcal F_i : \substack{\mathcal F_i \textrm{ is a $\sigma$-algebra } \\ \textrm{on $\Omega$ and $\mathcal A \subseteq \mathcal F_i$}}\right\}}\mathcal F_i
```
````
Notice that by {prf:ref}`uv:mt:prob_spaces:sig:intersection`, that $\sigma(\mathcal A)$ is also a $\sigma$-algebra.

We'll often want to consider $\sigma$-algebras generated by a particular type of sets, called *partitions*:

````{prf:definition} Partition
:label: uv:mt:prob_spaces:sig:partition
Suppose the event space $\Omega$. A family of sets $\mathcal A$ is called a partition if:
1. It does not contain the empty set: $\varnothing \not\in \mathcal A$,
2. The sets are disjoint: If $\Omega_i, \Omega_j \in \mathcal A$, then $\Omega_i \cap \Omega_j = \varnothing$, and
2. The sets in $\mathcal A$ exhaust $\Omega$: $\bigsqcup_{\Omega_n \in \mathcal A}\Omega_i = \Omega$.
````
Can we describe the $\sigma$-algebra generated by these partitions easily? It turns out, we can:

````{prf:lemma} $\sigma$-algebra generated by a partition
:label: uv:mt:prob_spaces:sig:siggenpart
Suppose the measurable space $(\Omega, \mathcal F)$, where $\{\Omega_i\}_{i \in \mathcal I} \subseteq \mathcal F$ is a partition of $\Omega$. Then:
```{math}
    \sigma\left(\Omega_i : i \in \mathcal I\right) \equiv \mathcal G \triangleq \left\{\bigsqcup_{j \in \mathcal J : \mathcal J \subseteq \mathcal I}\Omega_j\right\}
```
````
````{prf:proof}
To start, we need to show that $\mathcal G$ is a $\sigma$-algebra. We can do this in $3$ steps:
1\. Contains the event space: Let $\mathcal J = \mathcal I$, so $\bigsqcup_{\Omega_n \in \mathcal A}\Omega_i \in \mathcal G$.

But since $\{\Omega_i\}_{i \in \mathcal I}$ is a partition of $\omega$, $\bigsqcup_{\Omega_n \in \mathcal A}\Omega_i = \Omega \in \mathcal G$.

2\. Closed under complements: Suppose that $A \in \mathcal G$.

Then by construction, there exists $\mathcal J \subseteq \mathcal I$ s.t. $A = \bigsqcup_{j \in \mathcal J}\Omega_j$.

Take $\mathcal T = \mathcal I \setminus \mathcal J$. By construction, $\mathcal T \subseteq \mathcal I$, and further, $\mathcal T \cap \mathcal J = \varnothing$.

Define $B = \bigcap_{j \in \mathcal T}\Omega_j$. Since $\mathcal T \subseteq \mathcal I$, then $B \in \mathcal G$.

Notice that $A \cap B = \varnothing$, and $A \cup B = \Omega$, so $B = \Omega \setminus A$, and $B \equiv A^c$.

Then $A^c \in \mathcal G$.

3\. Closed under countable unions: Suppose that $A_n \in \mathcal G$, for $n \in \mathbb N$. 

Then there exists $\mathcal J_n$, s.t. $A_n = \bigcup_{j \in \mathcal J_n}\Omega_j$, by construction.

Further, note that $\mathcal J' = \bigcup_{n \in \mathbb N}\mathcal J_n \subseteq \mathcal I$.

Then $A = \bigcup_{n \in \mathcal J'}A_n$ has an indexing set $\mathcal J' \subseteq \mathcal I$, so $A \in \mathcal I$.

To see that $\sigma(\Omega_i : i \in \mathcal I)$ is equivalent to $\mathcal G$, we can do this in parts:

1\. $\mathcal G \supseteq \sigma(\Omega_i : i \in \mathcal I)$: Note that $\mathcal G$ is a $\sigma$-algebra containing each $\Omega_i$, for all $i \in \mathcal I$, by taking $\mathcal J_i = \{i\}$.

Then $\sigma(\Omega_i : i \in \mathcal I) \subseteq \mathcal G$, since $\sigma(\Omega_i : i \in \mathcal I)$ is the intersection of $\sigma$-algebras containing $\Omega_i$s by definition {prf:ref}`uv:mt:prob_spaces:sig:siggen`, and $\mathcal G$ is one such $\sigma$-algebra.

2\. $\mathcal G \subseteq \sigma(\Omega_i : i \in \mathcal I)$: Suppose that $A \in \mathcal G$.

Then by construction, there exists $\mathcal J \subseteq \mathcal I$ s.t. $A = \bigcup_{j \in \mathcal J}\Omega_j$.

Then since $\sigma(\Omega_i : i \in \mathcal I)$ is the intersection of all $\sigma$-algebras containing $\{\Omega_i : i \in \mathcal I\}$, $\sigma(\Omega_i : i \in \mathcal I)$ must contain all of the countable unions of elements of $\{\Omega_i : i \in \mathcal I\}$, of which $A$ is one.

Then $A \in \sigma(\Omega_i : i \in \mathcal I)$. 
````

Another interesting property is that $\sigma$-generated algebras preserve the subset operation. While this might seem rather intuitive, it isn't *immediately* obvious, but this lemma will be extremely helpful later on:

````{prf:lemma} Generated algebras preserve subsets
Let $\Omega$ be an event space, and let $\mathcal C, \mathcal F$ be families of sets on $\Omega$ (for every $C \in \mathcal C$ and $F \in \mathcal F$, $C, F \subseteq \Omega$). If $\mathcal C \subseteq \mathcal F$, then $\sigma(\mathcal C) \subseteq \sigma(\mathcal F)$.
````
````{prf:proof}
By definition, $\sigma(\mathcal F)$ is the $\sigma$-algebra at the intersection of all $\sigma$-algebras containing $\mathcal F$.

Since $\mathcal C \subseteq \mathcal F$, then $\mathcal C \subseteq \sigma(\mathcal F)$, since $\sigma(\mathcal F)$ is the intersection of all $\sigma$-algebras containing $\mathcal F$ (and consequently, also contain $\mathcal C$) by definition of $\sigma(\mathcal F)$ {prf:ref}`uv:mt:prob_spaces:sig:siggen`.

Then since $\sigma(\mathcal C)$ is the intersection of every $\sigma$-algebra containing $\mathcal C$, it is the intersection of $\sigma(\mathcal F)$ (since $\mathcal C \in \sigma(\mathcal F)$) with other $\sigma$-algebras that contain $\mathcal C$ but might not contain $\mathcal F$. 

Since the intersection operation of a set $\sigma(\mathcal F)$ with other sets will necessarily be at most $\sigma(\mathcal C)$, then $\sigma(\mathcal C) \subseteq \sigma(\mathcal F)$. 
````

### Borel $\sigma$-algebras

Next, we have one of the most important results about $\sigma$-algebras that we will use repeatedly throughout this book when looking at probabilities: the Borel $\sigma$-algebra. The Borel $\sigma$-algebra will allow us to break up intervals of the (or the entire) real line into palatable open intervals. When we begin to define probability measures, we will generally define them using Borel $\sigma$-algebras.

````{prf:definition} Borel $\sigma$-algebra
Suppose that $\Omega$ is a set. The borel $\sigma$-algebra is defined as:
```{math}
\mathcal B(\Omega) \triangleq \sigma\left(\left\{A \in \Omega : A \text{ is open}\right\}\right)
```
````
So, the Borel $\sigma$-algebra is the $\sigma$-algebra generated by the family of sets containing all open sets in $\Omega$.

There are a variety of special Borel $\sigma$-algebras that we will deal with in this book, for which we'll use special notation to simplify things. They are:

````{prf:remark} Special Borel $\sigma$-algebras
1. If $\Omega = \mathbb R$, then $\mathcal R \triangleq \mathcal B(\mathbb R)$,
2. If $\Omega = \mathbb R^d$, then $\mathcal R^d \triangleq \mathcal B(\mathbb R^d)$,
3. If $\Omega = [\alpha, \beta]$, is a closed interval s.g. $\Omega \subseteq \mathbb R$, then $\mathcal R[\alpha, \beta] \triangleq \mathcal B([\alpha, \beta])$,
4. If $\Omega = (\alpha, \beta)$, is a closed interval s.g. $\Omega \subseteq \mathbb R$, then $\mathcal R[\alpha, \beta] \triangleq \mathcal B((\alpha, \beta))$
````


## Measures
 
The main reason that $\sigma$-algebras are powerful for our field of study is that they are special mathematical objects to which we can assign something called a *measure*. Together, the event space (and a $\sigma$-algebra defined on it) for this reason are called a measurable space:

````{prf:definition} Measurable Space $(\Omega, \mathcal F)$
:label: uv:mt:prob_spaces:measbl_sp
The tuple $(\Omega, \mathcal F)$ is called a measurable space, if:
1. $\Omega$ is a set,
2. $\mathcal F$ is a $\sigma$-algebra on $\Omega$.
````

Now that we have measurable spaces, we can begin to think about things called *measures*. A measure, in effect, is a function which allows us to formalize the concept of *size*:

````{prf:definition} Measure $\mu$
:label: uv:mt:prob_spaces:meas
Let $(\Omega, \mathcal F)$ be a measurable space. A measure $\mu : \mathcal F \rightarrow \bar{\mathbb R}_{\geq 0}$ is a non-negative countably additive set function, where:
1. Measure of empty set is zero: $\mu(\varnothing) = 0$,
2. Non-negative: For any $F \in \mathcal F$, $\mu(F) \geq 0$,
3. Countably additive: If $\{F_i\}_{i \in \mathbb N} \subseteq \mathcal F$ is a countable sequence of disjoint events, then:
```{math}
    \mu\left(\bigcup_{n \in \mathbb N}F_n\right) = \sum_{n \in \mathbb N}\mu(F_n)
```
````

By the sets being disjoint, what we means is that for all $i, j \in \mathbb N$, then $F_i \cap F_j = \varnothing$. When the sets are disjoint, we use the square union cup $\sqcup$ instead of the union cup $\cup$ to emphasize explicitly that the sets we are taking a union of are disjoint. The space $\bar{\mathbb R}$ is called the *extended real numbers*, which just means that it includes $\infty$ and $-\infty$. The subscript $\geq 0$ just delineates that it is the non-negative component. Written another way, $\bar{\mathbb R}_{\geq 0} = [0, \infty]$. The concepts of a measurable space and a measure are united with the concept of a measure space:

````{prf:definition} Measure Space $(\Omega, \mathcal F, \mu)$
:label: uv:mt:prob_spaces:meas_sp
The triple $(\Omega, \mathcal F, \mu)$ is called a measure space if:
1. $\Omega$ is a set,
2. $\mathcal F$ is a $\sigma$-algebra on \Omega,
3. $\mu : \mathcal F \rightarrow \mathbb R$ is a measure on $(\Omega, \mathcal F)$. 
````

### Properties of measures

In this book, we will often use (and abuse) several basic properties of measures. We'll go through some of these now. To start off, we have the monotonicity of measures:

````{prf:theorem} Monotonicity of measures
:label: uv:mt:prob_spaces:meas:monotone
Let $(\Omega, \mathcal F, \mu)$ be a measure space. Then if $A \subseteq B$ and $A, B \in \mathcal F$, $\mu(A) \leq \mu(B)$.
````

````{prf:proof}
Let $A, B \in \mathcal F$, where $A \subseteq B$. 

Recall that $B \setminus A = B \cap A^c$. This represents the elements of $B$ that are not in $A$, so we could alternatively express $B = A \cup (B \setminus A)$. 

As $B \setminus A$ is disjoint from $A$:
```{math}
    \mu(B) &= \mu\left(A \sqcup (B \setminus A)\right) \\
    &= \mu(A) + \mu(B \setminus A),\,\,\,\,\mu\text{ is countably additive} \\
    &\geq \mu(A),\,\,\,\,\mu(B \setminus A) \geq 0\text{ by definition of a measure}
```
as desired.
````

Intuitively, what this statement asserts is that the measure of a set which comprises another set must be at least the measure of the set it comprises. This concept extends to the case when $A$ is a subset of a countable union of sets as well, and is called subadditivity:

````{prf:theorem} Subadditivity of measures
:label: uv:mt:prob_spaces:meas:subadd
Let $(\Omega, \mathcal F, \mu)$ be a measure space. If $A \subseteq \bigcup_{n \in \mathbb N}A_n$ where $A, A_n \in \mathcal F$ for all $n \in \mathbb N$, then:
```{math}
    \mu(A) \leq \sum_{n \in \mathbb N}\mu(A_n).
```
````
````{prf:proof}
Let $A_n' = A_n \cap A$, for all $n \in \mathbb N$. Like above, $A_n'$ are the elements of $A_n$ that are also in $A$.

Define $F_1 = A_1'$, and let $F_n = A_n' \setminus \bigcup_{m = 1}^{n - 1}A_m'$ for all $n > 1$. $F_n$ represents the elements of $A_n$ that are in $A$, but are not in any of the preceding sets $F_m$ where $m \leq n$.

Note that $F_n$ are disjoint by construction, since each set adds only the unique elements of $A_n$ in $A$ that are not in any of the preceding sets $F_m$, and that $A = \bigsqcup_{n \in \mathbb N}F_n$.

Further, note that $F_n \subseteq A_n$, so:
```{math}
\mu(A) &= \mu\left(\bigsqcup_{n \in \mathbb N}F_n\right) \\
&= \sum_{n \in \mathbb N}\mu(F_n),\,\,\,\,\mu\text{ is countably additive} \\
&\leq \sum_{n \in \mathbb N}\mu(A_n).\,\,\,\,F_n \subseteq A_n \Rightarrow \mu(F_n) \leq \mu(A_n)
```
which follows by {prf:ref}`uv:mt:prob_spaces:meas:monotone`.
````

Next, we'll see that measures share some of the intuitive convergence concepts a lot like functions, except instead of operating on single points, they operate on sets. We'll begin with a definition for sets and then apply it to the measure:

````{prf:definition} Set convergence from below
Suppose that $(\Omega, \mathcal F)$ is a measurable space, and that $A_n \in \mathcal F$, for $n \in \mathbb n$. If $A_n \subseteq A_{n + 1}$ for all $n$, and $\bigcup_{n \in \mathbb N}A_n = A \in \mathcal F$, then we say that $A_n \uparrow A$ as $n \rightarrow \infty$.
````

This is called *convergence from below*, and the basic idea is that the sets $A_n$ are "growing" to the set $A$. When the sets "grow" to $A$, the measures do, too:

````{prf:theorem} Measure convergence from below
:label: uv:mt:prob_spaces:meas:convbelow
Let $(\Omega, \mathcal F, \mu)$ be a measure space. If $A_n \uparrow A$, then $\mu(A_n) \uparrow A$, as $n \rightarrow \infty$.
````

````{prf:proof}
Let $F_1 = A_1$, and let $F_n = A_n \setminus A_{n - 1}$ for $n > 1$. $F_n$ represents the unique elements of $A_n$ from all of the preceding sets $A_m$, for $m \leq n$.

Note that the $F_n$ are disjoint by construction, that $\bigsqcup_{n \in \mathbb N}F_n = \bigcup_{n \in \mathbb N}A_n = A$, and that $\bigsqcup_{n = 1}^m F_n = A_m$.

Then:
```{math}
    \mu(A) &= \mu\left(\bigsqcup_{n \in \mathbb N}F_n\right) \\
    &= \sum_{n \in \mathbb N}\mu\left(F_n\right),\,\,\,\,\text{ countable additivity} \\
    &= \lim_{m \rightarrow \infty}\sum_{n = 1}^m \mu(F_n) \\
    &= \lim_{m \rightarrow \infty} \mu(A_n).
```
Which follows because $\bigsqcup_{n = 1}^m F_n = A_m \Rightarrow \sum_{n = 1}^m \mu(F_n) = \mu(A_m)$.
````

Wouldn't it be great if this same property held in reverse, too? Good news:


````{prf:definition} Set convergence from above
Suppose that $(\Omega, \mathcal F)$ is a measurable space, and that $A_n \in \mathcal F$, for $n \in \mathbb n$. If $A_n \supseteq A_{n + 1}$ for all $n$, and $\bigcap_{n \in \mathbb N}A_n = A \in \mathcal F$, then we say that $A_n \downarrow A$ as $n \rightarrow \infty$.
````

This is called *convergence from above*, and the basic idea is that the sets $A_n$ are "shrinking" to the set $A$. An important corollary of this is that $A \subseteq A_n$ for all $n \in \mathbb N$, which follows since $A = \bigcap_{n \in \mathbb N}A_n$.

When the sets "shrink" to $A$, the measures do, too:

````{prf:theorem} Measure convergence from above
:label: uv:mt:prob_spaces:meas:subadd
Let $(\Omega, \mathcal F, \mu)$ be a measure space. If $A_n \downarrow A$, and further $\mu(A_k) < \infty$ for some $k \in \mathbb N$, then $\mu(A_n) \downarrow \mu(A)$.
````

````{prf:proof}
Without loss of generality (WOLOG), suppose that $k=1$. If it does not, simply shift $(A_n)$ over $k$ places, until the first element has $A_1$ where $\mu(A_1) < \infty$.

Notice that $A_1 \setminus A_n \uparrow A_1 \setminus A$, so then $\mu(A_1 \setminus A_n) \uparrow \mu(A_1 \setminus A)$ as $n \rightarrow \infty$, by {prf:ref}`uv:mt:prob_spaces:meas:convbelow`.

Observe that since $A \subseteq A_m$, that $A_m = A_m \setminus A \sqcup A$. 

Then $\mu(A_m) = \mu(A_m \setminus A) + \mu(A)$, and consequently, $\mu(A_m \setminus A) = \mu(A_m) - \mu(A)$, which holds for any $m \in \mathbb N$, since $A_n \downarrow A$. By the same argument, $\mu(A_1 \setminus A_m) = \mu(A_1) - \mu(A_m)$, as $A_m \subseteq A_1$. 

Then:
```{math}
    \mu(A_1 \setminus A_n) &\uparrow \mu(A_1 \setminus A), \\
    \mu(A_1) - \mu(A_n) &\uparrow \mu(A_1) - \mu(A),\,\,\,\,A_1 \supseteq A_n \supseteq A \\
    -\mu(A_n) &\uparrow \mu(A), \\
   \Rightarrow \mu(A_n) &\downarrow \mu(A),
```
as desired.
````
What we did here was we effectively used the fact that the sets are "shrinking" to $A$, so $A_1$ contains every succeeding set (and $A$ itself). 

## Probability measures

Finally, we are ready for the most important concept of this section: the probability measure. To define a probability measure, we really don't need to do much: we just add a single property to the definition of measure. Since probability measures are measures (but reverse need not be true), all of those nice properties we just learned about measures extend to probability measures, too:

````{prf:definition} Probability Measure
Let $(\Omega, \mathcal F, \mu)$ be a measure space. We say that $\mu$ is a probability measure if $\mu(\Omega) = 1$, and we typically denote such measures with $\mathbb P$.
````

Nothing to it: all we added was the condition that the measure, or the *probability*, of the entire event space was $1$. Likewise, we have probability spaces:

````{prf:definition} Probability Space
The triple $(\Omega, \mathcal F, \mathbb P)$ is called a probability space, where:
1. $\Omega$ is a set (the event space),
2. $\mathcal F$ is a set of events, where $\mathcal F$ is a $\sigma$-algebra on $\Omega$,
3. $\mathbb P: \mathcal F \rightarrow [0, 1]$ is a probability measure that assigns probabilities to events.
````

In that last definition, we snuck in a fact about probability measures $\mathbb P$: that their upper bound is always $1$ (e.g., $\mathbb P$ ascribes values in $[0, 1]$), which wasn't *quite* what the definition of a probability measure states unless you look closely. This is a consequence of the fact that any $F \in \mathcal F$ is a subset of $\Omega$; e.g., $F \subseteq \Omega$, so therefore $\mathbb P(F) \leq \mathbb P(\Omega) = 1$, simply by {prf:ref}`uv:mt:prob_spaces:meas:monotone`.