# **Information and σ-Algebras**

## **Mathematical Modeling of Information**

In the theory of derivative pricing and no-arbitrage, we often need to **describe and use the information available at different points** in time. This is critical for:

1. Constructing hedge portfolios.
2. Modeling uncertainty.
3. Defining the flow of information and decision-making.

To mathematically capture this idea, we rely on **σ-algebras**, which represent collections of subsets of a sample space that are "resolved" by the available information over time.

---

## **Definition: σ-Algebra**

Let's first define the intuition behind $\sigma$-Algebra in the context of information (generation). 

### **Intuitive Statement for a σ-Algebra**

A **σ-algebra** is a mathematical structure that represents all the information we have about a random experiment up until a point t. Think of it as a "lens" through which we view the outcome of the experiment. Each set in the σ-algebra corresponds to a specific piece of information that can either be true or false based on the actual outcome.

- At the most basic level (the **trivial σ-algebra**), we only know two things: 
  1. The entire experiment has occurred ($\Omega$).
  2. Nothing has occurred (the empty set $\emptyset$).

- As we gain more information, the σ-algebra "expands," allowing us to resolve finer details about the outcome. For example:
  - Knowing the result of a coin toss divides the outcomes into two groups: "heads" and "tails."
  - Knowing the first two coin tosses divides the outcomes into four groups based on the first and second tosses.

- In essence, a σ-algebra organizes the possible outcomes of an experiment into a structured framework of "what we know" and "what remains uncertain." This makes it possible to assign probabilities consistently to events while respecting the information available.

**Analogy:** 
Imagine you are solving a mystery with a sequence of clues. Initially, you have no clues, so your understanding is vague (the trivial σ-algebra). As you uncover each clue, your understanding sharpens, allowing you to eliminate some possibilities and focus on others. The σ-algebra represents the structured set of possibilities consistent with the clues you've gathered so far.

### Formal definition

Given our intuition, we can now formally define the $\sigma$-Algebra.

A **σ-algebra** $\mathcal{F}$ on a sample space $\Omega$ is a collection of subsets of $\Omega$ (called **events**) that satisfies the following properties:
1. $\Omega \in \mathcal{F}$ (the full space is resolved).
2. If $A \in \mathcal{F}$, then $A^c \in \mathcal{F}$ (closure under complements).
3. If $A_1, A_2, \dots \in \mathcal{F}$, then $\bigcup_{n=1}^\infty A_n \in \mathcal{F}$ (closure under countable unions).

These properties ensure that a σ-algebra represents a mathematically consistent collection of events where probabilities can be assigned to. 

---

## **Information and σ-Algebras**

The sets in a σ-algebra represent the events resolved by the available information. For example:
- Let $\Omega$ be the sample space of outcomes of three coin tosses: 
  $$
  \Omega = \{\text{HHH}, \text{HHT}, \text{HTH}, \text{HTT}, \text{THH}, \text{THT}, \text{TTH}, \text{TTT}\}.
  $$

1. **No information**:
   At the start, we know nothing about the outcomes. The **trivial σ-algebra** is:
   $$
   \mathcal{F}_0 = \{\emptyset, \Omega\}.
   $$

2. **First coin toss revealed**:
   If we are told the result of the first toss, we can have a more precise information set, and the σ-algebra becomes:
   $$
   \mathcal{F}_1 = \{\emptyset, \Omega, A_H, A_T\},
   $$
   where $A_H = \{\text{HHH, HHT, HTH, HTT}\}$ and $A_T = \{\text{THH, THT, TTH, TTT}\}$.

3. **First two coin tosses revealed**:
   Knowing the first two tosses refines the σ-algebra:
   $$
   \mathcal{F}_2 = \{\emptyset, \Omega, A_{HH}, A_{HT}, A_{TH}, A_{TT}, \dots\},
   $$
   where $A_{HH} = \{\text{HHH, HHT}\}$, $A_{HT} = \{\text{HTH, HTT}\}$, etc.

   More precisely, we obtain:

   1. All elements from $\mathcal{F}$:
    - $A_H = {HHH, HHT, HTH, HTT}$ 
    - $A_T = {THH, THT, TTH, TTT}$
   2. Each of the four possible H-T combinations (including their third possible outcome)
    - $A_{HH} = {HHH, HHT}$, - $A_{HT} = {HTH, HTT}$, - $A_{TH} = {THH, THT}$, - $A_{TT} = {TTH, TTT}$
   3. All of the unions of the outcomes (By definition of the $\sigma$-Algebra)
    - $A_H = A_{HH} \cup A_{HT}$ and $A_T = A_{TH} \cup A_{TT}$ (already resolved above)
    - $A_{TH} \cup A_{HT}$, $A_{HH} \cup A_{TH}$, $A_{TT} \cup A_{HT}$, $A_{HH} \cup A_{TT}$
   4. All complements of the outcomes (By definition of the $\sigma$-Algebra)
    - $A_{HH}^c$, $A_{HT}^c$, $A_{TH}^c$, $A_{TT}^c$
   5. The empty set and the full set (By definition of the $\sigma$-Algebra)
    - $\Omega$, $\emptyset$

4. **All three coin tosses revealed**:
   When all three tosses are known, we resolve all subsets of $\Omega$, so:
   $$
   \mathcal{F}_3 = 2^\Omega,
   $$
   the power set of $\Omega$.

   Which provides a total of 16 options. 

4. Knowing all three tosses resolves all possible outcomes, $\mathcal{F}_3$ becomes the power set of $\Omega$.

---

## **Definition: Filtration**

In essence, we understand that after each coin toss we obtain more information about the possible sets. As such, the information set becomes more precise (finer) and we understand that, if m > n, then $\mathcal{F}_m$ contains all information of $\mathcal{F}_n$ and more information. Sets of $\sigma$-Algebras indexed by continuous-time formulation are called a **Filtration**. 

A **filtration** $\{\mathcal{F}(t)\}_{t \in [0, T]}$ is a family of σ-algebras indexed by time, satisfying:
1. $\mathcal{F}(s) \subseteq \mathcal{F}(t)$ for $s \leq t$ (information grows over time).
2. $\mathcal{F}(0) = \{\emptyset, \Omega\}$ (no information at the start).

Filtrations describe the progressive accumulation of information over time.


---

## **Generated σ-Algebra**
The **σ-algebra generated by a random variable** $X$, denoted $\sigma(X)$, is:
$$
\sigma(X) = \{X \in B \mid B \text{ is Borel measurable}\}.
$$

This represents the smallest σ-algebra containing all information about $X$.

### Example: Three-Period Coin Toss Model

We consider the set of all possible outcomes $\Omega$ from three coin tosses. The total sample space is:

$$
\Omega = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}.
$$

The random variable $S_2$ is defined based on the results of the first two coin tosses:

$$
S_2(HHH) = S_2(HHT) = 16, \\
S_2(HTH) = S_2(HTT) = S_2(THH) = S_2(THT) = 4, \\
S_2(TTH) = S_2(TTT) = 1.
$$

Here, $S_2$ only depends on the first two coin tosses but is expressed as a function of all three tosses.

#### Constructing the σ-Algebra $\sigma(S_2)$:
The σ-algebra $\sigma(S_2)$ is generated by the sets of outcomes that can be distinguished by the value of $S_2$. Specifically, the sets corresponding to different values of $S_2$ are:

- For $S_2 = 16$: 
  $$
  A_{HH} = \{HHH, HHT\}.
  $$

- For $S_2 = 4$:
  $$
  A_{HT} \cup A_{TH} = \{HTH, HTT, THH, THT\}.
  $$

- For $S_2 = 1$:
  $$
  A_{TT} = \{TTH, TTT\}.
  $$

The σ-algebra $\sigma(S_2)$ is formed by taking all possible unions, intersections, and complements of these sets. This includes:
$$
\{ \emptyset, \Omega, A_{HH}, A_{HT} \cup A_{TH}, A_{TT}, A_{HH} \cup (A_{HT} \cup A_{TH}), A_{HH} \cup A_{TT}, (A_{HT} \cup A_{TH}) \cup A_{TT}, \dots \}.
$$

#### Relationship Between $\sigma(S_2)$ and $\mathcal{F}_2$:
- The σ-algebra $\mathcal{F}_2$ contains all the information about the first two coin tosses. It includes sets such as $A_{HT}$ and $A_{TH}$ separately, as these correspond to distinct outcomes of the first two tosses.
- In contrast, $\sigma(S_2)$ does not distinguish between $A_{HT}$ and $A_{TH}$ because $S_2$ only provides their combined value of $4$. Hence, $A_{HT} \cup A_{TH} \in \sigma(S_2)$, but neither $A_{HT}$ nor $A_{TH}$ appears individually.

#### Measurability:
- The random variable $S_2$ is $\mathcal{F}_2$-measurable because $\mathcal{F}_2$ contains enough information to determine the value of $S_2$.
- $\mathcal{F}_2$ provides more information than $\sigma(S_2)$, as it can distinguish between $A_{HT}$ and $A_{TH}$. However, $\sigma(S_2)$ contains just enough information to determine the value of $S_2$ but no more.

In summary, while $\sigma(S_2)$ is a subset of $\mathcal{F}_2$, it encapsulates only the information relevant to determining the value of $S_2$. This is why $S_2$ is said to be $\mathcal{F}_2$-measurable.

---

## **Adapted Stochastic Process**
A **stochastic process** $X(t)$ is adapted to a filtration $\{\mathcal{F}(t)\}$ if:
$$
X(t) \text{ is } \mathcal{F}(t)\text{-measurable for all } t \in [0, T].
$$

This means $X(t)$ depends only on the information available up to time $t$.



---

## Independence in Random Variables

When a random variable is **measurable** with respect to a σ-algebra $\mathcal{G}$, the information contained in $\mathcal{G}$ is sufficient to determine the value of the random variable. At the other extreme, when a random variable is **independent** of a σ-algebra, the information contained in the σ-algebra provides no clue about the value of the random variable. 

### Independence of Sets
Let $(\Omega, \mathcal{F}, P)$ be a probability space. Two sets $A$ and $B$ in $\mathcal{F}$ are **independent** if:
$$
P(A \cap B) = P(A) \cdot P(B).
$$

**Example:**
In $\Omega = \{HH, HT, TH, TT\}$, with $P(HH) = p^2$, $P(HT) = pq$, $P(TH) = pq$, and $P(TT) = q^2$, consider the sets:
- $A = \{\text{head on the first toss}\} = \{HH, HT\}$
- $B = \{\text{head on the second toss}\} = \{HH, TH\}$

We check independence:
$$
P(A \cap B) = P(HH) = p^2, \quad P(A) = p^2 + pq, \quad P(B) = p^2 + pq.
$$
$$
P(A) \cdot P(B) = (p^2 + pq)(p^2 + pq) = p^2.
$$
Thus, $A$ and $B$ are independent.

### Independence of Random Variables
Let $X$ and $Y$ be random variables on $(\Omega, \mathcal{F}, P)$. They are **independent** if the σ-algebras they generate, $\sigma(X)$ and $\sigma(Y)$, are independent. Formally:
$$
P\{X \in C, Y \in D\} = P\{X \in C\} \cdot P\{Y \in D\},
$$
for all Borel sets $C, D \subseteq \mathbb{R}$.

#### Example: Dependent Random Variables
Consider three independent coin tosses:
- $\Omega = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}$.
- Define random variables:
  - $S_2$: Sum of the first two tosses.
  - $S_3$: Outcome of the third toss.

If $P(H) = p$ and $P(T) = q$, the probabilities are:
$$
P(HHH) = p^3, \quad P(HHT) = p^2q, \quad P(TTT) = q^3, \quad \text{etc.}
$$
The random variables $S_2$ and $S_3$ are **not independent** because knowing $S_2 = 16$ restricts $S_3$ to $8$ or $32$ (not all possible values). Formally:
$$
P\{S_2 = 16 \cap S_3 = 32\} = P\{HHH\} = p^3,
$$
but:
$$
P\{S_2 = 16\} \cdot P\{S_3 = 32\} = p^2 \cdot p^3 = p^5.
$$
Thus, $P\{S_2 = 16 \cap S_3 = 32\} \neq P\{S_2 = 16\} \cdot P\{S_3 = 32\}$, and $S_2$ and $S_3$ are dependent.

### Independence of σ-Algebras
Let $\mathcal{G}_1$ and $\mathcal{G}_2$ be sub-σ-algebras of $\mathcal{F}$. They are **independent** if:
$$
P(A \cap B) = P(A) \cdot P(B), \quad \forall A \in \mathcal{G}_1, \, B \in \mathcal{G}_2.
$$

### Theorem: Properties of Independence
1. If $X$ and $Y$ are independent, then any Borel-measurable functions $f(X)$ and $g(Y)$ are also independent.
2. Random variables $X$ and $Y$ are independent if and only if their joint density factors:
   $$ f_{X,Y}(x, y) = f_X(x) \cdot f_Y(y). $$

### Intuition for Independence:
Independence implies that knowing the outcome of one random variable provides no information about the other. For example, knowing the result of one coin toss does not affect the probability of the next toss.


---

# Conditional Expectation

Let $(\Omega, \mathcal{F}, P)$ be a probability space, let $\mathcal{G}$ be a sub-σ-algebra of $\mathcal{F}$, and let $X$ be a random variable that is either nonnegative or integrable. The **conditional expectation** of $X$ given $\mathcal{G}$, denoted by $\mathbb{E}[X \mid \mathcal{G}]$, is any random variable that satisfies:

1. **Measurability**: 
   $$\mathbb{E}[X \mid \mathcal{G}] \text{ is } \mathcal{G}\text{-measurable.}$$

2. **Partial Averaging**: 
   For all $A \in \mathcal{G}$:
   $$
   \int_A \mathbb{E}[X \mid \mathcal{G}](\omega) \, dP(\omega) = \int_A X(\omega) \, dP(\omega).
   $$

If $\mathcal{G}$ is the σ-algebra generated by a random variable $W$ (i.e., $\mathcal{G} = \sigma(W)$), we typically write $\mathbb{E}[X \mid W]$ instead of $\mathbb{E}[X \mid \sigma(W)]$.

---

## Conditions of Conditional Expectation

### 1. **Measurability**

Property (i) ensures that $\mathbb{E}[X \mid \mathcal{G}]$ is a $\mathcal{G}$-measurable random variable, meaning that the value of the estimate $\mathbb{E}[X \mid \mathcal{G}]$ can be determined from the information contained in $\mathcal{G}$. In simpler terms, $\mathbb{E}[X \mid \mathcal{G}]$ is fully determined by the events in $\mathcal{G}$.

### 2. **Partial Averaging**
Property (ii) guarantees that the conditional expectation $\mathbb{E}[X \mid \mathcal{G}]$ preserves the expected value of $X$ over subsets of $\mathcal{G}$. For example, if $\mathcal{G}$ is generated by some random variable $W$, $\mathbb{E}[X \mid W]$ represents the best estimate of $X$ given the value of $W$ while maintaining consistency with $X$'s averages over $W$'s outcomes.

In mathematical terms, $\mathbb{E}[X \mid \mathcal{G}]$ is constant on the **atoms** of $\mathcal{G}$ (i.e., the indivisible subsets in $\mathcal{G}$).

### Intuition Behind Conditional Expectation
- **Measurability** ensures that the estimate $\mathbb{E}[X \mid \mathcal{G}]$ can be computed using the information in $\mathcal{G}$. 
- **Partial Averaging** ensures that $\mathbb{E}[X \mid \mathcal{G}]$ is a faithful estimate of $X$ on average over events in $\mathcal{G}$. 
- If $\mathcal{G} = \sigma(W)$, then $\mathbb{E}[X \mid W]$ uses the information provided by $W$ to estimate $X$ while maintaining consistency with $X$'s probability distribution.

### Example: Conditional Expectation in a Three-Period Model
In the context of a three-period coin toss model:
1. $\Omega = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}$.
2. $\mathcal{F}_2$ is the σ-algebra generated by the outcomes of the first two tosses:
   $$
   \mathcal{F}_2 = \{\emptyset, \Omega, A_{HH}, A_{HT}, A_{TH}, A_{TT}\},
   $$
   where:
   - $A_{HH} = \{HHH, HHT\}$,
   - $A_{HT} = \{HTH, HTT\}$,
   - $A_{TH} = \{THH, THT\}$,
   - $A_{TT} = \{TTH, TTT\}$.

The random variable $S_3$ represents a stock price dependent on three coin tosses. The conditional expectation $\mathbb{E}_2[S_3]$ is computed as:
$$
\mathbb{E}_2[S_3](HH) = pS_3(HHH) + qS_3(HHT),
$$
$$
\mathbb{E}_2[S_3](HT) = pS_3(HTH) + qS_3(HTT).
$$
$$
\mathbb{E}_2[S_3](TH) = pS_3(THH) + qS_3(THT).
$$
$$
\mathbb{E}_2[S_3](TT) = pS_3(TTH) + qS_3(TTT).
$$

If we define the probability of Heads to be p and the probability of Tails to be q = (1-p), then we obtain the probability weighted outcomes as:

$$
\mathbb{E}_2[S_3](HH)\mathcal{P}A_{HH} = \sum_{\omega \in A_{HH}}S_3(\omega)\mathcal{P}(\omega)
$$
$$
\mathbb{E}_2[S_3](HT)\mathcal{P}A_{HT} = \sum_{\omega \in A_{HT}}S_3(\omega)\mathcal{P}(\omega)
$$
$$
\mathbb{E}_2[S_3](TH)\mathcal{P}A_{TH} = \sum_{\omega \in A_{TH}}S_3(\omega)\mathcal{P}(\omega)
$$
$$
\mathbb{E}_2[S_3](TT)\mathcal{P}A_{TT} = \sum_{\omega \in A_{TT}}S_3(\omega)\mathcal{P}(\omega)
$$

The left-hand sides of these equations can be written as integrals of the integrand $A \in \mathcal{F}_2$, $\mathbb{E}_2[S_3]$ since the conditional expectation does not depend on the third toss. The right-hand sides of these equations are sums, which are Lebesgue integrals on a finite probability space.

So, for our HH case, we have that:

$$
\int_{A_{HH}}\mathbb{E}_2[S_3](\omega)d\mathcal{P}(\omega) = \int_{A_{HH}}S_3(\omega)d\mathcal{P}(\omega)
$$

In other words, on each of the atoms the value of the conditional expectation has been chosen to be that constant that yields the same average over the atom as the random variable $S_3$ being estimated.

In general, for any atom $A \in \mathcal{F}_2$, $\mathbb{E}_2[S_3]$ is constant on $A$ and satisfies the partial averaging property:
$$
\int_A \mathbb{E}_2[S_3](\omega) \, dP(\omega) = \int_A S_3(\omega) \, dP(\omega).
$$

This illustrates how $\mathbb{E}[X \mid \mathcal{G}]$ uses the information in $\mathcal{G}$ to estimate $X$ while maintaining its average value over events in $\mathcal{G}$.

---

## Properties of Conditional Expectation

Let $(\Omega, \mathcal{F}, P)$ be a probability space and let $\mathcal{G}$ be a sub-σ-algebra of $\mathcal{F}$. The following properties hold:

### 1. **Linearity of Conditional Expectations**
If $X$ and $Y$ are integrable random variables and $c_1$ and $c_2$ are constants, then:
$$
\mathbb{E}[c_1 X + c_2 Y \mid \mathcal{G}] = c_1 \mathbb{E}[X \mid \mathcal{G}] + c_2 \mathbb{E}[Y \mid \mathcal{G}].
$$
This equation also holds if $X$ and $Y$ are nonnegative (rather than integrable) and $c_1$ and $c_2$ are positive, although both sides may equal $+\infty$.

### 2. **Taking Out What Is Known**
If $X$ and $Y$ are integrable random variables, $Y$ and $XY$ are integrable, and $X$ is $\mathcal{G}$-measurable, then:
$$
\mathbb{E}[XY \mid \mathcal{G}] = X \mathbb{E}[Y \mid \mathcal{G}].
$$
This equation also holds if $X$ is positive and $Y$ is nonnegative (rather than integrable), although both sides may equal $+\infty$.

### 3. **Iterated Conditioning**
If $\mathcal{H}$ is a sub-σ-algebra of $\mathcal{G}$ (i.e., $\mathcal{H}$ contains less information than $\mathcal{G}$) and $X$ is an integrable random variable, then:
$$
\mathbb{E}[\mathbb{E}[X \mid \mathcal{G}] \mid \mathcal{H}] = \mathbb{E}[X \mid \mathcal{H}].
$$
This equation also holds if $X$ is nonnegative (rather than integrable), although both sides may equal $+\infty$.

#### 4. **Independence**
If $X$ is integrable and independent of $\mathcal{G}$, then:
$$
\mathbb{E}[X \mid \mathcal{G}] = \mathbb{E}[X].
$$
This equation also holds if $X$ is nonnegative (rather than integrable), although both sides may equal $+\infty$.

#### 5. **Conditional Jensen's Inequality**
If $\varphi(x)$ is a convex function of a dummy variable $x$ and $X$ is integrable, then:
$$
\mathbb{E}[\varphi(X) \mid \mathcal{G}] \geq \varphi(\mathbb{E}[X \mid \mathcal{G}]).
$$

### Intuitive Explanation
1. **Linearity** ensures that the conditional expectation respects linear combinations of random variables.
2. **Taking out what is known** highlights that $\mathcal{G}$-measurable components can be factored out of the conditional expectation.
3. **Iterated conditioning** guarantees consistency of nested expectations with respect to information contained in smaller σ-algebras.
4. **Independence** reflects that the conditional expectation equals the marginal expectation when $X$ is independent of $\mathcal{G}$.
5. **Conditional Jensen's inequality** extends the classical Jensen's inequality to conditional expectations, showing that the convex transformation of an expectation underestimates the expectation of the convex transformation.

These properties form the foundation of working with conditional expectations in both discrete and continuous probability spaces.

---

## Independence in Conditional Expectation

Let $(\Omega, \mathcal{F}, P)$ be a probability space, and let $\mathcal{G}$ be a sub-σ-algebra of $\mathcal{F}$. Suppose the random variables $X_1, \ldots, X_K$ are $\mathcal{G}$-measurable, and the random variables $Y_1, \ldots, Y_L$ are independent of $\mathcal{G}$. Let $f(x_1, \ldots, x_K, y_1, \ldots, y_L)$ be a function of the dummy variables $x_1, \ldots, x_K$ and $y_1, \ldots, y_L$, and define:
$$
g(x_1, \ldots, x_K) = \mathbb{E}[f(x_1, \ldots, x_K, Y_1, \ldots, Y_L)].
$$
Then:
$$
\mathbb{E}[f(X_1, \ldots, X_K, Y_1, \ldots, Y_L) \mid \mathcal{G}] = g(X_1, \ldots, X_K).
$$

### Intuition Behind the Lemma
1. Since the random variables $X_1, \ldots, X_K$ are $\mathcal{G}$-measurable, the information in $\mathcal{G}$ is sufficient to determine the values of $X_1, \ldots, X_K$.
2. The random variables $Y_1, \ldots, Y_L$ are independent of $\mathcal{G}$. Thus, their contribution can be "integrated out" without any dependence on the information in $\mathcal{G}$.
3. The function $g(x_1, \ldots, x_K)$, defined as the expected value of $f(x_1, \ldots, x_K, Y_1, \ldots, Y_L)$ over the distribution of $Y_1, \ldots, Y_L$, captures this integration step.
4. Finally, the result depends on the values of $X_1, \ldots, X_K$, which are then replaced by their corresponding random variables to yield the random variable $\mathbb{E}[f(X_1, \ldots, X_K, Y_1, \ldots, Y_L) \mid \mathcal{G}]$.

### Key Takeaway
This lemma highlights how independence between random variables and a σ-algebra simplifies conditional expectations. The contribution of independent random variables is integrated out, leaving a dependence only on the $\mathcal{G}$-measurable random variables.

--- 
## Martingales, Submartingales, and Supermartingales

Let $(\Omega, \mathcal{F}, P)$ be a probability space, let $T$ be a fixed positive number, and let $\mathcal{F}(t)$, $0 \leq t \leq T$, be a filtration of sub-σ-algebras of $\mathcal{F}$. Consider an adapted stochastic process $M(t)$, $0 \leq t \leq T$. The definitions of martingales, submartingales, and supermartingales capture how the process $M(t)$ evolves over time with respect to the information captured by the filtration $\mathcal{F}(t)$.

#### (i) Martingale
The process $M(t)$ is a **martingale** if for all $0 \leq s \leq t \leq T$:
$$
\mathbb{E}[M(t) \mid \mathcal{F}(s)] = M(s).
$$

**Explanation:**
- A martingale represents a "fair game" where the conditional expected future value of the process, given all the information up to time $s$ (encoded in $\mathcal{F}(s)$), is equal to the value of the process at time $s$.
- There is **no tendency for the process to rise or fall** over time.
- Martingales are central in financial modeling, particularly for modeling fair asset prices in the absence of arbitrage.

**Intuition:**
The process reflects a situation where, knowing the present, the best estimate of the future value is the current value. Examples include stock prices under certain conditions in efficient markets.

#### (ii) Submartingale
The process $M(t)$ is a **submartingale** if for all $0 \leq s \leq t \leq T$:
$$
\mathbb{E}[M(t) \mid \mathcal{F}(s)] \geq M(s).
$$

**Explanation:**
- A submartingale reflects a process with **no tendency to decrease** over time; it may have a tendency to increase.
- The conditional expected value of the future process, given information up to time $s$, is at least as large as its current value.

**Intuition:**
The process has a built-in upward bias, such as a stock price expected to grow due to positive drift or a betting scenario where the odds are in your favor.

#### (iii) Supermartingale
The process $M(t)$ is a **supermartingale** if for all $0 \leq s \leq t \leq T$:
$$
\mathbb{E}[M(t) \mid \mathcal{F}(s)] \leq M(s).
$$

**Explanation:**
- A supermartingale reflects a process with **no tendency to increase** over time; it may have a tendency to decrease.
- The conditional expected value of the future process, given information up to time $s$, is at most as large as its current value.

**Intuition:**
The process has a built-in downward bias, such as a depreciating asset or a betting scenario where the odds are against you.

### Key Points to Note
1. **Adaptation to Filtration:**
   - The process $M(t)$ is **adapted** to the filtration $\mathcal{F}(t)$, meaning the value of $M(t)$ at any time $t$ is determined by the information available up to time $t$.

2. **Filtration $\mathcal{F}(t)$:**
   - This is a family of σ-algebras representing the accumulation of information over time. At time $t$, $\mathcal{F}(t)$ captures all information available up to and including $t$.

3. **Comparison:**
   - Martingale: No bias in the evolution of $M(t)$.
   - Submartingale: Potential upward bias.
   - Supermartingale: Potential downward bias.

## Markov Process

Let $(\Omega, \mathcal{F}, P)$ be a probability space, let $T$ be a fixed positive number, and let $\mathcal{F}(t)$, $0 \leq t \leq T$, be a filtration of sub-σ-algebras of $\mathcal{F}$. Consider an **adapted stochastic process** $X(t)$, $0 \leq t \leq T$. We say that $X(t)$ is a **Markov process** if for all $0 \leq s \leq t \leq T$, and for every nonnegative, Borel-measurable function $f$, there exists another Borel-measurable function $g$ such that:
$$
\mathbb{E}[f(X(t)) \mid \mathcal{F}(s)] = g(X(s)). \tag{2.3.29}
$$

### Explanation
The **Markov property** characterizes a process where the future evolution of the process depends on its current state but **not on its past history**, given the present. 

1. **Conditioning on the Past:**
   - $\mathcal{F}(s)$ contains all the information available up to time $s$.
   - The Markov property implies that the conditional expectation of any function of $X(t)$ (future state) depends only on $X(s)$ (current state) and not on how the process arrived at $X(s)$.

2. **Functions $f$ and $g$:**
   - $f$ is any nonnegative, Borel-measurable function applied to the state at time $t$.
   - $g$ is another Borel-measurable function that captures the dependence of the conditional expectation on the current state $X(s)$.

3. **Adaptation:**
   - The process $X(t)$ is adapted to the filtration $\mathcal{F}(t)$, meaning that $X(t)$ depends only on the information available up to time $t$.

### Remark: Time Dependence
In the Markov property, the functions $f$ and $g$ can explicitly depend on time:

- By writing $f(t, x)$ instead of $f(x)$, we emphasize the dependence of $f$ on both the time $t$ and the state $x$.
- Similarly, $g$ can depend on $s$, written as $f(s, x)$ to reflect the time and state dependence at the earlier time.

Using this notation, we can rewrite Equation (2.3.29) as:
$$
\mathbb{E}[f(t, X(t)) \mid \mathcal{F}(s)] = f(s, X(s)), \quad 0 \leq s \leq t \leq T. \tag{2.3.30}
$$

This version of the Markov property highlights that:
- The conditional expectation at time $t$ (given the past up to time $s$) depends only on the value of $X(s)$ and is determined by a function $f$ evaluated at $(s, X(s))$.

### Connection to Partial Differential Equations
The Markov property leads to a partial differential equation (PDE) when $f(t, x)$ is treated as a function of two variables (time $t$ and state $x$). 

#### Key Insight:
If we know $f(t, x)$ at time $t$, we can use the Markov property to determine $f(s, x)$ at an earlier time $s$. This relationship is often governed by a PDE.

#### Example:
- In financial mathematics, the **Black-Scholes-Merton PDE** is a specific example of how the Markov property applies to pricing derivative securities. The PDE relates the price of a derivative (as a function of time and underlying asset price) to its boundary conditions and the dynamics of the underlying asset.


### Summary of Markov Property
| **Key Feature**               | **Explanation**                                                                 |
|--------------------------------|---------------------------------------------------------------------------------|
| **Memoryless Property**        | Future depends only on the present state, not the past history.                 |
| **Conditional Expectation**    | $\mathbb{E}[f(X(t)) \mid \mathcal{F}(s)] = g(X(s))$.                            |
| **Time Dependence**            | The functions $f$ and $g$ may explicitly depend on time: $f(t, x)$ and $f(s, x)$. |
| **Connection to PDEs**         | The Markov property leads to PDEs that describe the evolution of $f(s, x)$.    |

The Markov process provides a foundation for modeling systems where future behavior depends only on the current state, making it a crucial concept in probability theory, statistics, and mathematical finance.



