(mt:intro:prob_spaces)=
# Measures

 
The main reason that $\sigma$-algebras are powerful for our field of study is that they are special mathematical objects to which we can assign something called a *measure*. To do this, we'll build towards measures, first starting with set functions, which allow us to ascribe units to families of sets like algebras.

## Set functions

Set functions are the basic building blocks we need to define measures. In effect, set functions allow us to ascribe some notion of *size* to sets in algebras:

````{prf:definition} Set function
:label: mt:intro:prob_spaces:setfn
Suppose that $\Omega$ is a set, and let $\mathcal A$ be an algebra on $\Omega$. $\mu_0$ is called a *set function* if $\mu_0 : \mathcal A \rightarrow \bar{\mathbb R}_{\geq 0}$.
````

The space $\bar{\mathbb R}$ is called the *extended real numbers*, which just means that it includes $\infty$ and $-\infty$. The subscript $\geq 0$ just delineates that it is the non-negative component. Written another way, $\bar{\mathbb R}_{\geq 0} = [0, \infty]$.

Just like algebras were closed under particular operations (finitely many unions), set functions have analogous operations, too. However, it is important to recognize that set functions *need not* have the below listed properties, which is why these types of set functions have special names:

````{prf:definition} Additive set function
:label: mt:intro:prob_spaces:add_setfn
Suppose that $\Omega$ is a set, and let $\mathcal A$ be an algebra on $\Omega$. $\mu_0$ is called an additive set function if:
1. $\mu_0(\varnothing) = 0$, and
2. if $A_1, A_2 \in \mathcal A$ are s.t. $A_1 \cap A_2 = \varnothing$ (they are *disjoint*), then $\mu_0(A_1 \sqcup A_2) = \mu_0(A_1) + \mu_0(A_2)$. 
````

We can show that this property extends to finitely many operations, too:

````{prf:example}
Suppose that $\Omega$ is a set, and let $\mathcal A$ be an algebra on $\Omega$. Show that if $\mu_0$ is an additive set function, then if $A_m \in \mathcal A$ are *mutually disjoint* for $m \in [n]$ where $n \in \mathbb N$, that:
```{math}
    \mu_0\left(\bigcup_{m \in [n]}A_m\right) = \sum_{m \in [n]} \mu_0(A_m)
```
````

Let's see an example of this, with a figure:

```{figure} ./Images/countable_add.png
---
width: 300px
name: mt:intro:rvs:fin_add
---
Here, we show the sample space $\Omega$ in blue. In this and succeeding figures, you can conceptualize the measure of a set to be its area (e.g., this example shows a finite measure, since $\mu(\Omega) < \infty$). The sets $A_i$ are the shapes shown, where each $A_i$ has a different color. Notice, in particular, that the sets are *disjoint*, in that they are not overlapping. If we wanted to measure (compute the *area*, in this context) of the union of such disjoint sets, the measure of the area of all of these disjoint objects would just be the sum of the area of each disjoint object individually (e.g., compute the area of each shaded region corresponding to an $A_i$, and then sum them up).
```

Countably additive set functions extend this property from finitely many operations to *countably* many operations:

````{prf:definition} Countably additive set function
:label: mt:intro:prob_spaces:count_add_setfn
Suppose that $\Omega$ is a set, and let $\mathcal A$ be an algebra on $\Omega$. $\mu_0$ is called a countably additive set function if:
1. $\mu_0(\varnothing) = 0$, and
2. if $(A_n)_{n \in \mathbb N} \subseteq \mathcal A$ are a sequence of mutually disjoint sets, then:
```{math}
    \mu_0\left(\bigsqcup_{n \in \mathbb N}A_n\right) = \sum_{n \in \mathbb N} \mu_0(A_n)
```
````

It should be pretty obvious to you that a countably additive set function is additive:

````{prf:example}
Suppose that $\Omega$ is a set, and let $\mathcal A$ be an algebra on $\Omega$. Show that if $\mu_0$ is a countably additive set function, it is also additive.
````

## Measures

Now that we have countably additive set functions, we are ready to wrap our heads around one of the most crucial topics that we will discuss so far: measures. The event space (and a $\sigma$-algebra defined on it) are called a measurable space:

````{prf:definition} Measurable Space $(\Omega, \mathcal F)$
:label: mt:intro:prob_spaces:measbl_sp
The tuple $(\Omega, \mathcal F)$ is called a measurable space, if:
1. $\Omega$ is a set,
2. $\mathcal F$ is a $\sigma$-algebra on $\Omega$.
````

A measure, in effect, allows us to formalize the concept of *relational size*, and unite it with what we've learned so far about countably additive set functions:

````{prf:definition} Measure $\mu$
:label: mt:intro:prob_spaces:meas
Let $(\Omega, \mathcal F)$ be a measurable space. A measure $\mu : \mathcal F \rightarrow \bar{\mathbb R}_{\geq 0}$ is a non-negative countably additive set function, where:
1. Measure of empty set is zero: $\mu(\varnothing) = 0$,
2. Non-negative: For any $F \in \mathcal F$, $\mu(F) \geq 0$,
3. Countably additive: If $\{F_i\}_{i \in \mathbb N} \subseteq \mathcal F$ is a countable sequence of disjoint events, then:
```{math}
    \mu\left(\bigcup_{n \in \mathbb N}F_n\right) = \sum_{n \in \mathbb N}\mu(F_n)
```
````
The idea here is that, as we stated, *set functions* allow us to ascribe a notion of *size* to countably additive subsets of the algebra $\mathcal A$. However, since an algebra is closed under only finitely many unions, we have *no idea* whether the resulting thing we ascribed size to even makes sense with respect to the algebra (it only necessarily holds meaning with respect to the space upon which the algebra was defined, $\Omega$). However, the measure defined on a measurable space ascribes size to countable unions of mutually disjoint subsets of $\mathcal F$ (which is a $\sigma$-algebra). Therefore, this countable union of mutually disjoint subsets will *actually end up* being meaningful with respect to the measurable space (since the resulting countable union will also be contained in $\mathcal F$, since $\sigma$-algebras are closed under countable unions).

By the second property, we obtain the logic of *why* we call $(\Omega, \mathcal F)$ a measurable space: it ascribes *measure* to *measurable sets*:

````{prf:definition} Measurable set
Suppose that $(\Omega, \mathcal F)$ is a measurable space. Then every $F \in \mathcal F$ is called a *measurable set*.
````
The idea is measures ascribe reasonable notions of size to the measurable sets. When you read through {numref}`mt:intro:set_theory`, any time you see a result that concerns a $\sigma$-algebra being closed under something (e.g., countable unions, extrema, limits) you can say that these properties produce sets which are *measurable* if the sets they perform operations on are *measurable*. 

Due to the fact that these two concepts of a measurable space and a measure are so complementary in this regard (one provides a set and a family of measurable sets, the other prescribes a function for ascribing relational size to the measurable sets), we often group these ideas together with the word *measure space*:

````{prf:definition} Measure Space $(\Omega, \mathcal F, \mu)$
:label: mt:intro:prob_spaces:meas_sp
The triple $(\Omega, \mathcal F, \mu)$ is called a measure space if:
1. $\Omega$ is a set,
2. $\mathcal F$ is a $\sigma$-algebra on $\Omega$, and
3. $\mu : \mathcal F \rightarrow \mathbb R$ is a measure on $(\Omega, \mathcal F)$. 
````

You'll notice that the only restriction we placed on measures were that they were non-negative. This means that $\mu : \mathcal F \rightarrow \bar{\mathbb R}_{\geq 0}$, which means that we could, feasibly, have some sets with infinite measures. This theoretical note tends to not be particularly nice for our field of study, so we'll introduce two related types of measures which remove this oddity:

````{prf:definition} Finite measure space
Suppose that $(\Omega, \mathcal F, \mu)$ is a measure space. The measure space, and the measure $\mu$, are called finite if $\mu(\Omega) < \infty$.
````

As a consequence here, $\mu : \mathcal F \rightarrow \mathbb R_{\geq 0}$, non-inclusive of $\infty$. When you think about the properties of measures below, start to think about why the restriction that $\mu(\Omega) < \infty$ implies that for any $F \in \mathcal F$, $\mu(F) < \infty$ for a finite measure. 

This definition, unfortunately, tends to be a little bit restrictive, for a reason that you'll see in an exercise later on. For this reason, we can "tweak" this definition a little bit, and instead define measure spaces that are finite *only* for the sets we actually care about: subsets of the $\sigma$-algebra. This gives us the concept of a $\sigma$-finite measure space:

````{prf:definition} $\sigma$-finite measure space
Suppose that $(\Omega, \mathcal F, \mu)$ is a measure space. The measure space, and the measure $\mu$, are called $\sigma$-finite if there exists a sequence $(S_n)_{n \in \mathbb N} \subseteq \mathcal F$, s.t.:
1. For all $F_n$, $\mu(F_n) < \infty$, and
2. $\bigcup_{n \in \mathbb N}F_n = \Omega$.
````

With this definition, we can *still* have $\mu(\Omega) = \infty$, but we can still define countable sequences where *each element* has finite measure that unite to $\Omega$. This may feel kind of like an edge-case situation, but it is going to be ultra necessary any time we try to deal with spaces that are uncountably infinite such as $\mathbb R$.

### Properties of measures

In this book, we will often use (and abuse) several basic properties of measures. We'll go through some of these now. To start off, we have the monotonicity of measures:

````{prf:property} Monotonicity of measures
:label: mt:intro:prob_spaces:meas:monotone
Let $(\Omega, \mathcal F, \mu)$ be a measure space. Then if $F_1 \subseteq F_2$ and $F_1, F_2 \in \mathcal F$, $\mu(F_1) \leq \mu(F_2)$.
````

````{prf:proof}
Let $F_1, F_2 \in \mathcal F$, where $F_1 \subseteq F_2$. 

Recall that $F_2 \setminus F_1 = F_2 \cap F_1^c$. This represents the elements of $F_2$ that are not in $F_1$, so we could alternatively express $F_2 = F_1 \cup (F_2 \setminus F_1)$. 

As $F_2 \setminus F_1$ is disjoint from $F_1$:
```{math}
    \mu(F_2) &= \mu\left(F_1 \sqcup (F_2 \setminus F_1)\right) \\
    &= \mu(F_1) + \mu(F_2 \setminus F_1),\,\,\,\,\mu\text{ is countably additive} \\
    &\geq \mu(F_1),\,\,\,\,\mu(F_2 \setminus F_1) \geq 0\text{ by definition of a measure}
```
as desired.
````

Intuitively, what this statement asserts is that the measure of a set which comprises another set must be at least the measure of the set it comprises. Let's take a look at a picture which explains what's going on:

```{figure} ./Images/monotone.png
---
width: 300px
name: mt:intro:prob_spaces:monotone
---
Here, we show the sample space $\Omega$ in blue. In this and succeeding figures, you can conceptualize the measure of a set to be its area (e.g., this example shows a finite measure, since $\mu(\Omega) < \infty$). Here, the $F_2$ (in red) $\subseteq F_1$ (in blue; since it is contained within $F_2$, it looks purple). Notice that in this case, the measure (the area) $\mu(F_2) \geq \mu(F_1)$. 
```

This concept extends to the case when $A$ is a subset of a countable union of sets as well, and is called subadditivity:

````{prf:property} Subadditivity of measures
:label: mt:intro:prob_spaces:meas:subadd
Let $(\Omega, \mathcal F, \mu)$ be a measure space. If $F \subseteq \bigcup_{n \in \mathbb N}F_n$ where $F, F_n \in \mathcal F$ for all $n \in \mathbb N$, then:
```{math}
    \mu(F) \leq \sum_{n \in \mathbb N}\mu(F_n).
```
````
````{prf:proof}
Let $F_n' = F_n \cap F$, for all $n \in \mathbb N$. Like above, $F_n'$ are the elements of $F_n$ that are also in $F$.

Define $A_1 = F_1'$, and let $A_n = F_n' \setminus \bigcup_{m = 1}^{n - 1}F_m'$ for all $n > 1$. $A_n$ represents the elements of $F_n$ that are in $F$, but are not in any of the preceding sets $A_m$ where $m \leq n$.

Note that $A_n$ are disjoint by construction, since each set adds only the unique elements of $F_n$ in $F$ that are not in any of the preceding sets $A_m$, and that $F = \bigsqcup_{n \in \mathbb N}A_n$.

Further, note that $A_n \subseteq F_n$, so:
```{math}
\mu(F) &= \mu\left(\bigsqcup_{n \in \mathbb N}A_n\right) \\
&= \sum_{n \in \mathbb N}\mu(A_n),\,\,\,\,\mu\text{ is countably additive} \\
&\leq \sum_{n \in \mathbb N}\mu(F_n).\,\,\,\,A_n \subseteq F_n \Rightarrow \mu(A_n) \leq \mu(F_n)
```
which follows by {prf:ref}`mt:intro:prob_spaces:meas:monotone`.
````

This proof is a little tough, so let's see a quick figure explaining the proof:

```{figure} ./Images/subadd.png
---
width: 700px
name: mt:intro:prob_spaces:preimage_fig
---
This figure shows a finite example of subadditivity. **(A)** We have three sets $F_1$, $F_2$, and $F_3$, where $F \in \bigcup_{n \in [3]} F_n$. **(B)** First, we compute the $F_n'$s by intersecting each set with $F$. Since $F_3$ doesn't intersect with $F$, $F_3'$ is the empty set. **(C)** Next, we construct $A_n$ sequentially, first by setting $A_1 = F_1$, and then defining $A_2$ to be the portion of $F_2'$ that isn't already allocated by $F_1'$. Together, these remaining $A_n$ (disjoint) sets have the measure of $F$. Finally, note that these two sets have far smaller measure than the original sets $F_n$ originally, giving the result.
```

Next, we'll see that measures share some of the intuitive convergence concepts a lot like functions, except instead of operating on single points, they operate on sets. We'll begin with a definition for sets and then apply it to the measure:

````{prf:definition} Set convergence from below
:label: mt:intro:prob_spaces:meas:set_conv_below
Suppose that $(\Omega, \mathcal F)$ is a measurable space, and that $F_n \in \mathcal F$, for $n \in \mathbb n$. If $F_n \subseteq F_{n + 1}$ for all $n$, and $\bigcup_{n \in \mathbb N}F_n = F \in \mathcal F$, then we say that $F_n \uparrow F$ as $n \rightarrow \infty$.
````

This is called *convergence from below*, and the basic idea is that the sets $F_n$ are "growing" to the set $A$. Stated another way, we could say that the sequence of sets is *monotone non-decreasing* to $F$, as-per {prf:ref}`mt:intro:set:sig:monotone_sets`. 

When the sets "grow" to $F$, the measures do, too:

````{prf:property} Measure convergence from below
:label: mt:intro:prob_spaces:meas:convbelow
Let $(\Omega, \mathcal F, \mu)$ be a measure space. If $F_n \uparrow F$, then $\mu(F_n) \uparrow \mu(F)$, as $n \rightarrow \infty$.
````

````{prf:proof}
Let $A_1 = F_1$, and let $A_n = F_n \setminus F_{n - 1}$ for $n > 1$. $A_n$ represents the unique elements of $F_n$ from all of the preceding sets $F_m$, for $m \leq n$.

Note that the $A_n$ are disjoint by construction, that $\bigsqcup_{n \in \mathbb N}A_n = \bigcup_{n \in \mathbb N}F_n = F$, and that $\bigsqcup_{n = 1}^m A_n = F_m$.

Then:
```{math}
    \mu(F) &= \mu\left(\bigsqcup_{n \in \mathbb N}A_n\right) \\
    &= \sum_{n \in \mathbb N}\mu\left(A_n\right),\,\,\,\,\text{ countable additivity} \\
    &= \lim_{m \rightarrow \infty}\sum_{n = 1}^m \mu(A_n) \\
    &= \lim_{m \rightarrow \infty} \mu(F_n).
```
Which follows because $\bigsqcup_{n = 1}^m A_n = F_m \Rightarrow \sum_{n = 1}^m \mu(A_n) = \mu(F_m)$.
````

Wouldn't it be great if this same property held in reverse, too? Good news:

````{prf:definition} Set convergence from above
Suppose that $(\Omega, \mathcal F)$ is a measurable space, and that $F_n \in \mathcal F$, for $n \in \mathbb n$. If $F_n \supseteq F_{n + 1}$ for all $n$, and $\bigcap_{n \in \mathbb N}F_n = F \in \mathcal F$, then we say that $F_n \downarrow F$ as $n \rightarrow \infty$.
````

This is called *convergence from above*, and the basic idea is that the sets $F_n$ are "shrinking" to the set $A$. Stated another way, we could say that the sequence of sets is *monotone non-increasing* to $F$, as-per {prf:ref}`mt:intro:set:sig:monotone_sets`. 

An important corollary of this is that $F \subseteq F_n$ for all $n \in \mathbb N$, which follows since $F = \bigcap_{n \in \mathbb N}F_n$.

When the sets "shrink" to $F$, the measures do, too:

````{prf:property} Measure convergence from above
:label: mt:intro:prob_spaces:meas:convabove
Let $(\Omega, \mathcal F, \mu)$ be a measure space. If $F_n \downarrow F$, and further $\mu(F_k) < \infty$ for some $k \in \mathbb N$, then $\mu(F_n) \downarrow \mu(F)$.
````

````{prf:proof}
Without loss of generality (WOLOG), suppose that $k=1$. If it does not, simply shift $\{F_n\}$ over $k$ places, until the first element has $F_1$ where $\mu(F_1) < \infty$.

Notice that $F_1 \setminus F_n \uparrow F_1 \setminus F$, so then $\mu(F_1 \setminus F_n) \uparrow \mu(F_1 \setminus F)$ as $n \rightarrow \infty$, by {prf:ref}`mt:intro:prob_spaces:meas:convbelow`.

Observe that since $F \subseteq F_m$, that $F_m = F_m \setminus F \sqcup F$. 

Then $\mu(F_m) = \mu(F_m \setminus F) + \mu(F)$, and consequently, $\mu(F_m \setminus F) = \mu(F_m) - \mu(F)$, which holds for any $m \in \mathbb N$, since $F_n \downarrow F$. By the same argument, $\mu(F_1 \setminus F_m) = \mu(F_1) - \mu(F_m)$, as $F_m \subseteq F_1$. 

Then:
```{math}
    \mu(F_1 \setminus F_n) &\uparrow \mu(F_1 \setminus F), \\
    \mu(F_1) - \mu(F_n) &\uparrow \mu(F_1) - \mu(F),\,\,\,\,F_1 \supseteq F_n \supseteq F \\
    -\mu(F_n) &\uparrow -\mu(F), \\
   \Rightarrow \mu(F_n) &\downarrow \mu(F),
```
as desired.
````
What we did here was we effectively used the fact that the sets are "shrinking" to $F$, so $F_1$ contains every succeeding set (and $F$ itself). 

```{figure} ./Images/conv.png
---
width: 600px
name: mt:intro:rvs:preimage_fig
---
Here we demonstrate convergence concepts for measures. **(A)** The sets $F_n$ (in blue) converge to $F$ (in red) from above. Notice that their measures get closer and closer to that of $F$, from above, as well. **(B)** The sets $F_n$ (in blue) converge to $F$ (outermost set) from below. Notice that their measures get closer and closer to that of $F$, from below, as well.
```


Now, we'll rattle off some other properties of measures. Try proving some of these as an exercise! Bonus: for your intuition, it is often *extremely* helpful to conceptualize these different properties with pictures like the above of your own.

````{prf:property} Measure of difference
:label: mt:intro:prob_spaces:meas:measdiff
Let $(\Omega, \mathcal F, \mu)$ be a measure space. If $F_1, F_2 \in \mathcal F$ and $F_1 \subseteq F_2$, then:
```{math}
    \mu(F_2\setminus F_1) = \mu(F_2) - \mu(F_1).
```
````

````{prf:property} Measure of union
Let $(\Omega, \mathcal F, \mu)$ be a measure space. If $F_1, F_2 \in \mathcal F$, then:
```{math}
    \mu(F_1 \cup F_2) \leq \mu(F_1) + \mu(F_2).
```
````


````{prf:property} Inclusion/Exclusion I
Let $(\Omega, \mathcal F, \mu)$ be a measure space. If $F_1, F_2 \in \mathcal F$ where $\mu(F_1), \mu(F_2) < \infty$, then:
```{math}
    \mu(F_1 \cup F_2) = \mu(F_1) + \mu(F_2) - \mu(F_1 \cap F_2).
```
````

````{prf:property} Inclusion/Exclusion II
Let $(\Omega, \mathcal F, \mu)$ be a measure space. If $n \in \mathbb N$ and $\{F_m\}_{m \in [n]} \subseteq \mathcal F$ where for all $m \in [n]$, $\mu(F_m) < \infty$, then:
```{math}
    \mu\left(\bigcup_{m \in [n]} F_m\right) = \sum_{k = 1}^n \left[(-1)^{k - 1}\sum_{\mathcal M \subseteq [n] : |\mathcal M| = k}\mu(F_{\mathcal I})\right],
```
where $F_\mathcal I \triangleq \cap_{m \in \mathcal M}F_m$.
````
### Almost everywhere

When making statements using measures, we might want to ascribe things to subsets of the sample space that might not, necessarily, hold true everywhere. As you will learn later in this book, in particular, we can say a lot of *really useful* things about the sample space that are, for all intents and purposes, nearly always true (but not, necessarily, absolutely *always* true). The key fine point here has to do with something called $\mu$-null elements of the $\sigma$-algebra:

````{prf:definition} $\mu$-null element
Let $(\Omega, \mathcal F, \mu)$ be a measure space. $F \in \mathcal F$ is called $\mu$-null if $\mu(F) = 0$.
````

Next, we'll define the specific wording for what we are talking about here, which is called *almost everywhere*:

````{prf:definition} Almost everywhere (a.e.)
:label: mt:intro:prob_spaces:meas:ae
Let $(\Omega, \mathcal F, \mu)$ be a measure space, and let $\mathcal S : \Omega \rightarrow \{0, 1\}$ be a statement about points in the event space $\Omega$ that is either true or false. The statement $\mathcal S$ is said to hold almost everywhere (a.e.) if:
1. $F = \{\omega \in \Omega : \mathcal S(\omega)\text{ is false}\} \in \mathcal F$, and
2. $\mu(F) = 0$.
````
The first condition of this definition asserts that, for a statement to hold almost everywhere, the places it does not hold must all be an element of the $\sigma$-algebra $\mathcal F$. Further, the places the statement does not hold true must be $\mu$-null. So, by *almost everywhere*, what we mean is that it holds everywhere that has an appreciable (non-zero) size (measure isn't *size* persay, but it can be conceptualized that way). The first place we can put this into practice so you can get a feel for what we mean is in the construction of the Lebesgue measure, which is an invaluable tool we will use more in the later sections of this chapter.

### Lebesgue measure

So; why did we talk about $\pi$-systems and $\lambda$-systems at all in this book? It wasn't just because we thought they were cool, or interesting, it is because they are *essential* tools for building out the Lebesgue measure, which we'll learn about now. Just like we had $\sigma$-generated algebras, we have $\lambda$-generated systems, which are defined in the same way:

````{prf:definition} $\lambda$-generated system
:label: mt:intro:prob_spaces:lambda
Suppose that $\Omega$ is an event space, and that $\mathcal A$ is a collection of sets, where $A \in \mathcal A \Rightarrow A \subseteq \Omega$. Then the smallest $\lambda$-system containing $\mathcal A$ is denoted $\lambda(\mathcal A)$, and is defined:
```{math}
    \lambda(\mathcal A) \triangleq \bigcap_{\mathcal F_i \in \left\{\mathcal F_i : \substack{\mathcal F_i \textrm{ is a $\lambda$-system } \\ \textrm{on $\Omega$ and $\mathcal A \subseteq \mathcal L_i$}}\right\}}\mathcal L_i
```
````

We can tie $\pi$-systems together with $\lambda$-systems, by noting that if a set is a $\pi$-system, then the $\lambda$-system generated by the $\pi$-system is also a $\lambda$-system:
````{prf:lemma} $\lambda$-system generated by a $\pi$-system is a $\pi$-system
Suppose that $\Omega$ is an event space, and that $\mathcal P$ is a $\pi$-system. Then $\lambda(\mathcal P)$ is a $\pi$-system.
````
````{prf:proof}
Notice that $\mathcal L = \{A \in \lambda(\mathcal P) : \forall B \in \mathcal P, A \cap B \in \lambda(\mathcal P)\}$ is a $\lambda$-system containing $\mathcal P$:
1\. Contains whole set: For any $B \in \mathcal P \subseteq \lambda(\mathcal P)$, note that $\Omega \cap B = B$.
````

The essential fact that we need about $\pi$-systems and $\lambda$-systems is that they are tied together by Dynkin's $\pi$-$\lambda$ theorem:

````{prf:theorem} Dynkin's $\pi$-$\lambda$
:label: mt:intro:prob_spaces:dynkin
Suppose that $\Omega$ is an event space, and suppose that $\mathcal P$ is a $\pi$-system on $\Omega$, and $\mathcal L$ is a $\lambda$-system on $\Omega$, where $\mathcal P \subseteq \mathcal L$. Then $\sigma(\mathcal P) \subseteq \mathcal L$.
````
````{prf:proof}

````

The most interesting aspect of a $\pi$-system as it relates to probability theory is the following:

````{prf:lemma} Uniqueness of extensions of measures from $\pi$-systems to $\sigma$-generated algebras
:label: mt:intro:prob_spaces:meas:unique
Let $\Omega$ be the event space, and suppose that $\mathcal P$ is a $\pi$-system on $\Omega$, where $\mathcal F = \sigma(\mathcal P)$. Suppose that $\mu_1, \mu_2$ are measures on the measurable space $(\Omega, \mathcal F)$. Further, suppose:
1. For every $P \in \mathcal P$, $\mu_1(P) = \mu_2(P)$, and
2. The measures are finite: $\mu_1(\Omega) = \mu_2(\Omega) < \infty$.

Then for every $F \in \mathcal F = \sigma(\mathcal P)$, $\mu_1(F) = \mu_2(F)$.
````

To state this a little more succinctly, if two measures agree on a $\pi$-system, they also agree on the $\sigma$-algebra generated by that $\pi$-system. The "kind of" obscure second criterion, that the measures are *finite*, simply ensures that we don't have a situation where we are trying to justify that $\infty = \infty$, and in fact, this lemma turns out to be *false* if we omit this condition (while the reason it is false falls somewhat out of the scope of this book, we'd encourage you to look around and gain some intuition as to why).

Now is where the real magic happens:

````{prf:theorem} Carothéodory's Extension
:label: mt:intro:prob_spaces:meas:carotheodory
Let $\Omega$ be an event space, let $\mathcal A$ be an algebra on $\Omega$, and let $\mathcal F = \sigma(\mathcal A)$. Then if $\mu_0 : \mathcal A \rightarrow \bar{\mathbb R}_{\geq 0}$ is a countably additive map, $\exists \mu$ on $(\Omega, \mathcal F)$ s.t. $\mu = \mu_0$ on $\mathcal A$.
````

So, what is this telling us? This is telling us that, if we can define a measure on an algebra $\mathcal A$, we can extend it to the $\sigma$-algebra *generated* by that algebra, $\mathcal F = \sigma(\mathcal A)$, for free! Further, when we combine this with {prf:ref}`mt:intro:prob_spaces:meas:unique`, we can also conclude that this measure is unique. Cool, right? Let's see how this helps us for defining the Lebesgue measure:

````{prf:definition} Lebesgue measure on $(\alpha, \beta]$
:label: mt:intro:prob_spaces:meas:lebesgue
Let $\Omega = (\alpha, \beta]$, and define the algebra:
```{math}
    \mathcal A_\lambda \triangleq \left\{\bigcup_{i \in [k]}(a_i, b_i] : k \in \mathbb N, \alpha \leq a_1 < b_1 \leq a_2 < b_2 \leq ... \leq a_k < b_k \leq \beta \right\}
```
Then $\mathcal A_\lambda$ is an algebra (and hence, also a $\pi$-system) on $\Omega$, and $\mathcal F_\lambda = \sigma(\mathcal A_\lambda) = \mathcal B(\alpha, \beta]$.

Let $A \in \mathcal A_\lambda$. Define:
```{math}
    \lambda_0(A) = \sum_{i \in [k]} b_i - a_i
```
Then we define the Lebesgue measure $\lambda$ to be the unique measure that extends $\lambda_0$ from $\mathcal A_\lambda$ to $\mathcal F_\lambda$.
````

So, what is this statement saying? In effect, what we are doing here is first, we define an algebra that is pretty easy to understand: Basically, the sets in the algebra defined on $\Omega$ are just all of the different ways that we could take (countable) unions of (non-overlapping) sub-intervals on $(\alpha, \beta]$. The $\sigma$-algebra generated by this algebra is, in fact, the Borel $\sigma$-algebra. We define a somewhat intuitive countably additive set function on this algebra, by just taking the value to be the sum of the (non-overlapping) interval lengths. Let's see what elements of $A_\lambda$ look like:


```{figure} ./Images/leb_alg.png
---
width: 700px
name: mt:intro:rvs:leb
---
An example of a set in $\mathcal A_\lambda$, where $k = 7$. $\mathcal A_\lambda$ contains the set of the union of all of the points expressed by the intervals $(a_i, b_i]$, for $k \in [7]$, shown above (the union of these items being a single set in $\mathcal A_\lambda$). It also includes all possible sets we could express in this way, where we could pick $a_i$s and $b_i$s, and still be left with $\alpha \leq a_1 < b_1 \leq ... \leq a_7 < b_7 \leq \beta$. Finally, it repeats this for *every single* natural number, $k \in \mathbb N$. On such a set, the measure $\lambda_0$ is defined as the width of the intervals that the set is comprised of, shown in red.
```


We don't bother to describe the Lebesgue measure much more in-depth, because we don't have to:

````{prf:theorem} Existence and uniqueness of the Lebesgue measure
:label: mt:intro:prob_spaces:meas:lebesgue:exist_unique
Suppose the Lebesgue measure $\lambda$ is defined as in {prf:ref}`mt:intro:prob_spaces:meas:lebesgue` on the measurable space $\left(\Omega, \mathcal F_\lambda\right)$ where $\Omega = (\alpha, \beta]$ is an interval. $\lambda$ exists and is unique.
````
````{prf:proof}
Existence: by {prf:ref}`mt:intro:prob_spaces:meas:carotheodory`, a measure $\lambda^*$ as the measure that extends $\lambda_0$ from $\mathcal A_\lambda$ to $\mathcal F_\lambda = \sigma(\mathcal A_\lambda)$ exists. 

Uniqueness: By {prf:ref}`mt:intro:prob_spaces:meas:unique`, since $\mathcal A_\lambda$ is an algebra (and hence, also a $\pi$-system), this measure is unique on $\mathcal F_\lambda = \sigma(\mathcal A_\lambda)$.
````

In effect, what this means is that the relatively intuitive description we gave in the definition for $\lambda_0$ was *plenty* for us to know that there is a unique measure $\lambda$ that behaves exactly this way on $\mathcal A_\lambda$ (and extends it to $\mathcal F_\lambda$), and that measure is the Lebesgue measure. 

If you're careful, you'll also notice that the measure we just defined has another interesting property: one of the sets in $\mathcal A_\lambda$ is the set where $k = 1$, and $a_1 = \alpha$, and $b_1 = \beta$. As you can see as we defined it, $\lambda((a_1, b_2]) \equiv \lambda_0((a_1, b_1])= \beta - \alpha$. Further, note that $(a_1, b_1] = (\alpha, \beta] = \Omega$. This means that $\lambda(\Omega) = \beta - \alpha$. Then if we were to take $\beta = 1$ and $\alpha = 0$, the Lebesgue measure in this case is something special: it is a *probability* measure. Let's take a look at what this means.

## Probability measures

Finally, we are ready for the most important concept of this section: the probability measure. To define a probability measure, we really don't need to do much: we just add a single property to the definition of measure. Since probability measures are measures (but reverse need not be true), all of those nice properties we just learned about measures extend to probability measures, too:

````{prf:definition} Probability Measure
:label: mt:intro:prob_spaces:probmeas:def
Let $(\Omega, \mathcal F, \mu)$ be a measure space. We say that $\mu$ is a probability measure if $\mu(\Omega) = 1$, and we typically denote such measures with $\mathbb P$.
````

Nothing to it: all we added was the condition that the measure, or the *probability*, of the entire event space was $1$. Likewise, we have probability spaces:

````{prf:definition} Probability Space
:label: mt:intro:prob_spaces:probsp:def
The triple $(\Omega, \mathcal F, \mathbb P)$ is called a probability space, where:
1. $\Omega$ is a set (the event space),
2. $\mathcal F$ is a set of events, where $\mathcal F$ is a $\sigma$-algebra on $\Omega$,
3. $\mathbb P: \mathcal F \rightarrow [0, 1]$ is a probability measure that assigns probabilities to events.
````

In that last definition, we snuck in a fact about probability measures $\mathbb P$: that their upper bound is always $1$ (e.g., $\mathbb P$ ascribes values in $[0, 1]$), which wasn't *quite* what the definition of a probability measure states unless you look closely. This is a consequence of the fact that any $F \in \mathcal F$ is a subset of $\Omega$; e.g., $F \subseteq \Omega$, so therefore $\mathbb P(F) \leq \mathbb P(\Omega) = 1$, simply by {prf:ref}`mt:intro:prob_spaces:meas:monotone`.

### Almost surely

Now that we have probability measures under our belt, we are ready to adapt some lingo we used for measures to probability statements:

````{prf:definition} Almost surely
:label: mt:intro:prob_spaces:meas:as
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space, and let $\mathcal S : \Omega \rightarrow \{0, 1\}$ be a statement about points in the event space $\Omega$ that are either true or false. The statement $\mathcal S$ is said to hold almost surely (a.s.) if, with $F = \{\omega \in \Omega : \mathcal S(\omega)\text{ is true}\} \in \mathcal F$, and $\mathbb P(F) = 1$.
````

We can understand this to be, in effect, an (equivalent) reversal of what we said for almost everywhere in {prf:ref}`mt:intro:prob_spaces:meas:ae`. Whereas for almost everywhere, we asserted that the space on which the statement does *not* hold has measure $0$, for almost surely, we asserted that the space on which the statement *does* hold has probability $1$. But, if a statement holds almost surely, it holds almost everywhere, as with $F$ as above, then $F^c = \Omega \setminus F = \{\omega \in \Omega : \mathcal S(\omega)\text{ is false}\}$:
```{math}
    \mathbb P(F^c) &= \mathbb P(\Omega \setminus F) \\
    &= \mathbb P(\Omega) - \mathbb P(F),\,\,\,\,F \subseteq \Omega\\
    &= 1 - 1 = 0.
```
the language "almost surely" is typically just used to make explicit the fact that the measure that the statement holds with respect to is a probability measure.