# Stochastic Differential Equations [[src](https://ethz.ch/content/dam/ethz/special-interest/mavt/dynamic-systems-n-control/idsc-dam/Lectures/Stochastic-Systems/SDE.pdf)]
### Definition
***SDEs*** are generally equations of the form: 
\begin{equation*}
dX(t) = f(t,X(t))dt + g(t,X(t))dW(t)
\end{equation*}
where $t$ denotes time, $f$ is the drift coefficient and $g$ the diffusion coefficient.

The above can be written as an integral eqaution:
\begin{equation*}
X(t) = X_0 + \int_0^t{f(s,X(s))}ds + \int_0^t{g(s,X(s))dW(s)}
\end{equation*}
where the integral for $g$ is what is known as a **stochastic integral**. Formally, the stochastic integral is defined by the following limit:
\begin{equation*}
\int_0^t{g(s,X(s))dW(s)}=\lim_{n\to\infty}\sum_{i=0}^{n-1}g(t_i,X(t_i))(W(t_{i+1})-W(t_i))
\end{equation*}

*Note: In layman terms, an SDE has a "global" solution if it does not blow up or become undefined in finite time, i.e. you can give bounds if you know what time it ends.*

### The ito integral
$S$ is the **ito integral** of $g(t)$ w.r.t $W(t)$ on $[0,T]$ if:
\begin{equation*}
\lim_{n\to\infty}{\mathbb{E}[S-\sum_{i=0}^{n-1}g(t_i,X(t_i))(W(t_{i+1})-W(t_i))]} = 0
\end{equation*}

Some results related to ito integrals:
\begin{array}{rll}
    \int_0^T{c}dW(t) &= cW(T) \\\\
    \int_0^T{W(t)}dW(t) &= \displaystyle \frac{1}{2}W(T)^2-\frac{1}{2}T^2 \\\\
    \mathbb{E}[\int_0^Tg(t)dW(t)] &=\displaystyle  0 & \text{ Zero Expectation}\\\\
    \mathbb{Var}[\int_0^Tg(t)dW(t)] &= \displaystyle \int_0^T\mathbb{E}[g^2(t)]dt & \text{ Variance} \\\\
    \int_0^T{a_1g_1(t)+a_2g_2(t)}dW(t) &= \displaystyle a_1\int_0^T{g_1(t)}dW(t)+a_2\int_0^T{g_2(t)}dW(t) & \text{ Linearity of the ito integral}
\end{array}

### Ito's lemma
Ito's lemma is like the chain rule but for stochastic differential equations. Suppose we are given the following:
\begin{array}{rl}
dX(t) &=\displaystyle  f(t,X(t))dt + g(t,X(t))dW(t) \\\\
Y(t) &=\displaystyle  \phi(t,X(t)) \\\\
dY(t) &=\displaystyle  \tilde{f}(t,X(t))dt + \tilde{g}(t,X(t))dW(t)
\end{array}
where $\phi(t,x)$ is a deterministic function.

Ito's lemma states that:
\begin{equation*}
dY(t) = [\frac{\partial\phi}{\partial t} + \frac{\partial\phi}{\partial x}f(t,X(t)) + \frac{1}{2}\frac{\partial^2\phi}{\partial x^2}g^2(t,X(t))]dt + \frac{\partial\phi}{\partial x}g(t,X(t))dW(t)
\end{equation*}

Recall that the taylor expansion up to the nth order of a multivariate function $f(x_1,...,x_n)$ is given by $\sum_{|\alpha|<=n}{D^{\alpha} f} + R_n$, where $\alpha=(\alpha_1,...,\alpha_k)$ is a multi-index set and $D^\alpha f=\frac{\partial^{|\alpha|} f}{\partial x_1^{\alpha_1}...\partial x_k^{\alpha_k}}$.

By the taylor expansion of the differential of $Y(t)$:
\begin{equation*}
dY(t) = \frac{\partial\phi}{\partial t}dt + \frac{1}{2}\frac{\partial^2\phi}{\partial t^2}dt^2 + \frac{\partial\phi}{\partial x}dX(t) + \frac{1}{2}\frac{\partial^2\phi}{\partial x^2}dX(t)^2 + ...
\end{equation*}

\begin{equation*}
dY(t) = \frac{\partial\phi}{\partial t}dt + \frac{1}{2}\frac{\partial^2\phi}{\partial t^2}dt^2 + \frac{\partial\phi}{\partial x}(f(t,X(t))dt + g(t,X(t))dW(t)) + \frac{1}{2}\frac{\partial^2\phi}{\partial x^2}(f(t,X(t))dt + g(t,X(t))dW(t))^2 + ...
\end{equation*}
Considering the higher order terms, we have $dtdW(t)\rightarrow 0,dt^2\rightarrow 0$, and $dW(t)^2=dt$. Thus after cancellation we have:
\begin{equation*}
dY(t) = [\frac{\partial\phi}{\partial t} + \frac{\partial\phi}{\partial x}f(t,X(t)) + \frac{1}{2}\frac{\partial^2\phi}{\partial x^2}g^2(t,X(t))]dt + \frac{\partial\phi}{\partial x}g(t,X(t))dW(t)
\end{equation*}

### Expectation and Variance of stochastic processes given SDE (an example)
The following is an example of how you can apply ito integrals and ito's lemma to find the expectation and variance of a stochastic process given an SDE.

Suppose $dX_t = m dt + \sigma X_t dW_t, X_0=x_0$. Then: 
\begin{array}{rll}
X_t-x_0 &=\displaystyle  mt + \sigma\int_0^TX_tdW_t & \text{ Integral on both sides}\\\\
\mathbb{E}[X_t] &=\displaystyle  x_0 + mt & \text{ Expectation on both sides} 
\end{array}

By ito's lemma:
\begin{equation*}
    dX_t^2 = [2mX_t+\sigma^2 X_t^2]dt + 2\sigma X_t^2dW_t
\end{equation*}
Then the same steps can be taken to find the expectation, and variance follows.

### Solving an SDE using substitution and Ito's lemma (an example)
Suppose $dS_t = \mu S_t dt+\sigma S_td W_t$. Then let $X_t = \log{S_t}$, where $\phi(t,x)=\phi(x)=\log{x}$. By Ito's lemma:
\begin{array}{rl}
dX_t &= \displaystyle (\frac{d\phi}{dx}\mu S_t+\frac{1}{2}\frac{d^2\phi}{dx^2}\sigma^2 S_t^2)dt + \frac{d\phi}{dx}\sigma S_t dW_t\\\\
dX_t &=\displaystyle  (\mu-\frac{1}{2}\sigma^2)dt+\sigma dW_t 
\end{array}
Then integrating both sides:
\begin{equation*}
X_t-X_0 = (\mu-\frac{1}{2}\sigma^2)t+\sigma W_t
\end{equation*}
The above can then be rearranged to solve for $S_t$ to yield S_t = S_0e^{(\mu-\frac{1}{2}\sigma^2)t+\sigma W_t}.

### Deriving an SDE using substition and Ito's lemma (an example):
Suppose we are given $S_t=S_0e^{(\mu-\frac{1}{2}\sigma^2)t+\sigma W_t}$ and $X_t = \log{S_t}-\log{S_0}$. First we can derive the SDE for $X_t$:
\begin{array}{rl}
X_t &= \displaystyle (\mu-\frac{1}{2}\sigma^2)t+\sigma W_t
dX_t &=\displaystyle  (\mu-\frac{1}{2}\sigma^2)dt+\sigma dW_t
\end{array}
Then by using Ito's lemma:
\begin{array}{rl}
S_t &= \displaystyle e^{X_t} = f(X_t) \\\\
df(X_t) &= \displaystyle [\frac{d\phi}{dx}(\mu-\frac{1}{2}\sigma^2) + \frac{1}{2}\frac{d^2f}{dx^2}\sigma^2]dt + \frac{df}{dx}\sigma dW_t \\\\
dS_t &= \displaystyle [S_t(\mu-\frac{1}{2}\sigma^2)+\frac{1}{2}S_t\sigma^2]dt+S_t\sigma dW_t \\\\
dS_t &=\displaystyle  \mu S_t dt + \sigma S_t dW_t
\end{array}

# Probability Measures [[src1](https://math.nyu.edu/~goodman/teaching/StochCalc2012/notes/Week10.pdf),[src2](https://makslevental.github.io/Girsanov-Theorem/)]

### Definition

Consider a probability space $(\Omega,\mathcal{F},\mathbb{P})$, where $\Omega$ is the sample space (set of possible outcomes), $\mathcal{F}$ is the $\sigma$-algebra (set of events which are sets of outcomes), and $\mathbb{P}$ is a **probability measure**. The probability measure $\mathbb{P}:\mathcal{F}\rightarrow[0,1]$ is a function from the $\sigma$-algebra to the interval $[0,1]$, assigning a probability to every event. It satisfies the following properties:
- $\mathbb{P}(\Omega)=1$
- Given countably many disjoint sets $A_i$, $\mathbb{P}(\cup_{i\in\mathcal{I}}{A_i})=\sum_{i\in\mathcal{I}}{\mathbb{P}(A_i)}$
- It is non-negative.

We can express expectation of a random variable $X$ under probability measure $\mathbb{P}$ as $\mathbb{E}_{\mathbb{P}}[X]=\int_{\Omega}X(\omega)d\mathbb{P}(\omega)$. $d\mathbb{P}(\omega)$ can be interpreted as the probability of $\omega$.

### Radon Nikodym Derivative

Consider two probability measures $\mathbb{P}$ and $\mathbb{Q}$ on the same $\sigma$-algebra. $\mathbb{Q}$ is *absolutely continuous* with respect to  $\mathbb{P}$ if $\mathbb{Q}(A)=0\implies\mathbb{P}(A)=0$ (The converse is known as being *singular* w.r.t to $\mathbb{P}$). **Radon Nikodym's theorem** states that two measures are equivalent if they are absolutely continuous to one another. Given absolute continuity of $\mathbb{Q}$ w.r.t $\mathbb{P}$, there exists a function $Z = \frac{d\mathbb{Q}}{d\mathbb{P}}$ called the **Radon Nikodym Derivative** (sometimes the likelihood ratio function) whereby,
\begin{array}{rl}
\mathbb{Q}(A) &= \displaystyle  \int_A Z d\mathbb{P}, \forall A \in \mathcal{F} \\\\
\mathbb{E}_{\mathbb{Q}}[X] &=  \displaystyle \mathbb{E}_{\mathbb{P}}[XZ]
\end{array} 
where $X$ is a random variable. If we treat $\mathbb{P}$ and $\mathbb{Q}$ to represent distributions, then the RN derivative is a likelihood ratio between the two (measuring how much more likely something is to occur in $\mathbb{Q}$ than in $\mathbb{P}$). As a sidenote, the radon nikodym derivative can also be used to interpret statistical tests (type I errors, type II errors, etc).

### Quadratic Variations and Girsanov's Theorem
An **adapted process** is any time varying (stochastic or deterministic)process $\theta_t$ that is dependent only to the information available up to time $t$ (it is non *anticipative*). The **Novikov condition** states that given $X_t$ an adapted process up to time $T$ and the condition $\mathbb{E}[e^{\int_0^TX_s^2ds}]<\infty$, then the process $M_t$ is a martingale (also known as the **exponential martingale**).
\begin{equation*}
    M_t=e^{\int_0^TX_sdW_s-\frac{1}{2}\int_0^TX_s^2ds}
\end{equation*}

Given a stochastic process $X_t$, its **quadratic variation** is given by:
\begin{equation*}
    [X]_t=\lim_{\Delta t\rightarrow 0}{\sum_{t_k<t}{(X_{k+1}-X_k)^2}}
\end{equation*}
For the process defined by $dX_t=a(t,X_t)dt+b(t,X_t)dW_t$, its quadratic variation is $[X]_t=\int_0^tb(s,X_s)^2ds$. 

**Girsanov's theorem** allows us to connect a diffusion process on two different probability measures $\mathbb{P}$ and $\mathbb{Q}$ so long as their diffusion coefficients are the same. Given $X_t$ with the SDEs $dX_t=a_1(t,X_t)dt+\sigma(t,X_t)dW_t^{\mathbb{P}}$ under $\mathbb{P}$ and $dX_t=a_2(t,X_t)dt+\sigma(t,X_t)dW_t^{\mathbb{Q}}$ under $\mathbb{Q}$, the Radon Nikodym derivative is given by 
\begin{equation*}
\frac{d\mathbb{Q}}{d\mathbb{P}}=Z_t=e^{\int_0^T\theta_sdW_s-\frac{1}{2}\int_0^T\theta_s^2ds}
\end{equation*}
where $\theta_t=\frac{a_1-a_2}{\sigma}$ is known as the **girsanov kernel**. The formal statement/definition of Girsanov's theorem is defined through the use of quadratic variations. Note that a change of measure will not change the value of things like options prices since when we compute expectation $\mathbb{E}(V_T)$ using different measures, the part we are calculating the expectation of also changes (in formula) to match.

# Payoffs and Payoff diagrams, Forwards, Futures, Puts, Calls and the Put-call parity
The **spot price** $S_t$ of an asset is its current price.

A **forward** contract is an OTC agreement to purchase or sell an asset at a specified price (delivery price $K$) at a specified future date (maturity date $T$).
- Payoff: $S_T-K$ for a long position, $K-S_T$ for short.
- Delivery price: $K$
- Fair forward price at time t for maturity at time T: $S_te^{r(T-t)}$ (Expected final value assuming growth at the risk free rate)
- Value of a forward contract (Difference between discounted fair forward price and discounted delivery price): $F_0=0, F_t = S_t-Ke^{-r(T-t)}$.

A **futures** contract is exactly like a forward except its traded on an exchange.

An **option** contract provides the buyer the right but not the obligation to pruchase/sell an asset at a specified price at maturity. Many extra rules can be added, like early exercise. The most basic are European options where you can only exercise the options at expiry. There are two types of options: **calls** and **puts**. 
- Payoff for a (european) call: $(S_T-K)^+$
- Payoff for a (european) put: $(K-S_T)^+$

where $(x)^+$ denotes $\max(0,x)$.

The put-call parity states the following for non-dividend yielding assets: $C_t-P_t = S_t-Ke^{-r(T-t)}$. It relates a call and a put to the price of a forward contract all with the same strike(delivery price) and maturity.

# Bachelier Model
### Definition
Under the bachelier model, stock prices are assumed to move according to brownian motion with drift.
\begin{equation*}
dS_t = \mu dt + \sigma dW_t
\end{equation*}
This was among the earliest models of the price process. However, it has a few inaccuracies, most notably the fact that prices can go negative. To derive for instance the price of a European option with strike $K$, simply consider taking the expectation of the terminal value under the risk-neutral measure.


# Black Scholes
### Black scholes model/Ideal economy
Under the black scholes model/world, we assume we have access to two assets, the risky asset and the risk free asset, following the SDEs below 
\begin{array}{rll}
\text{Risk-free }& dB_t&=B_t r dt, B_0=1\\\\
\text{Risky }& dS_t&=\mu S_t dt + \sigma S_t dW_t^{\mathbb{P}}\\\\
\end{array}
where $r$ is the risk-free rate, $\mu$ is the drift of the risky asset under real-world measure $\mathbb{P}$ and $\sigma$ is its volatility. The above are solvable and yield:
\begin{array}{rl}
B_t &= \displaystyle e^{rt} \\\\
S_t &= \displaystyle S_0e^{(r-\frac{1}{2}\sigma^2)t+\sigma W_t^{\mathbb{Q}}}
\end{array}
where we have used the risk-neutral measure $\mathbb{Q}$ which is justifiable for further steps in options pricing by Girsanov's theorem.

The following assumptions are used in the economy:
- No transaction costs
- No dividends are paid
- Shares are infinitely divisible (e.g. you can have $\pi$ shares)
- Short selling is allowed without restriction (typically requires margin among other things)

### Heat equation
The heat equation is a specific type of parabolic PDE on a function $\Phi:\Omega\times[0\times\infty)$, taking as inputs $(x_1,...,x_n)\in\Omega,t\in[0,\infty)$: 
\begin{equation*}
\frac{\partial\Phi}{\partial t}=K\nabla^2\Phi
\end{equation*}
where $\nabla^2$ is the laplacian operator (divergence of the gradient $\nabla\cdot\nabla\Phi=\sum_i\frac{\partial^2\Phi}{\partial x_i^2}$, the sum of all non-mixed second partial derivatives). The equation was originally used to describe how heat diffusion across a given region, so typically you would only have time and the $(x,y,z)$ coordinates. As we will see below, under the black scholes merton differential equation can be converted into heat equation form by substitution of variables.

### Deriving the Black Scholes-Merton Differential Equation (PDE)
Consider a european-style (exercise at maturity only) derivative who price is $V(t,S_t)$ and consider the risk neutral measure $\mathbb{Q}$ such that drift of $S_t$ is r. By Ito's lemma:
\begin{equation*}
dV = (\frac{\partial V}{\partial t} + \frac{\partial V}{\partial x}r S_t + \frac{1}{2}\frac{\partial^2 V}{\partial x^2}\sigma^2 S_t^2)dt + \frac{\partial V}{\partial x}\sigma S_t dW_t
\end{equation*}

Now consider a portfolio $\Pi$ which is perfectly hedged to delta/the brownian part, i.e. $\Pi=V-\Delta S$ where $\Delta=\frac{\partial V}{\partial S}$. Then we have a risk free portfolio following:
\begin{array}{rl}
d\Pi &= dV - \Delta dS\\\\
d\Pi &= \displaystyle (\frac{\partial V}{\partial t} + \frac{\partial V}{\partial S}r  S_t + \frac{1}{2}\frac{\partial^2 V}{\partial S^2}\sigma^2 S_t^2)dt + \frac{\partial V}{\partial S}\sigma S_t dW_t - \frac{\partial V}{\partial S}[\mu S_t dt + \sigma S_t dW_t] \\\\
d\Pi &=  \displaystyle (\frac{\partial V}{\partial t} + \frac{1}{2}\frac{\partial^2 V}{\partial S^2}\sigma^2 S_t^2)dt\\\\
d\Pi &= \displaystyle  r\Pi dt = r(V-\frac{\partial V}{\partial S}S_t)dt \\\\
rV &= \displaystyle \frac{\partial V}{\partial t} + \frac{1}{2}\frac{\partial^2 V}{\partial S^2}\sigma^2 S_t^2 + \frac{\partial V}{\partial S}S_t
\end{array}
The final line above is the Black Scholes-Merton PDE which satisfies terminal/boundary condition $V(T,S_T)$ (just as you would have in a regular PDE problem). 

### Converting to the heat equation

To convert this into heat equation form, use the substitution $\tau=T-t$ for time to maturity and $x = \log{\frac{S}{K}}$:
\begin{equation*}
V(t,S) = Ke^{-r\tau}u(\tau,x)
\end{equation*}
where $u(\tau,x)$ is another function. By finding each of the partial derivatives in the black scholes PDE in terms of $u$, we get:
\begin{equation*}
\frac{\partial u}{\partial \tau}=\frac{1}{2}\sigma\frac{\partial^2u}{\partial x^2}-(r-\frac{1}{2}\sigma^2)\frac{\partial u}{\partial x}
\end{equation*}
Doing another substitution of $u(\tau,x)=e^{A\tau+Bx}\omega(\tau,x)$ with $A=-\frac{r-\frac{1}{2}\sigma^2}{\sigma^2}$ and $B$ to cancel the constant terms to get the final black scholes heat equation:
\begin{equation*}
\frac{\partial \omega}{\partial \tau}=\frac{1}{2}\sigma^2\frac{\partial^2 \omega}{\partial x^2}
\end{equation*}
Note that the above can be solved using the typical heat-equation methods in order to arrive at the same solutions as the Feynman-Kac method.

### Feynman-Kac Formula and its Discounted form
The Feynman-Kac theorem links classic PDEs and stochastic processes. It states that we can solve certain PDEs through the simulation of stochastic processes. Suppose we are trying to solve for $V(t,x)$ where:
\begin{equation*}
\frac{\partial V}{\partial t} + \mu(t,x)\frac{\partial V}{\partial x} + \frac{1}{2}\sigma^2(t,x)\frac{\partial^2V}{\partial x^2}=0
\end{equation*}
with terminal condition $V(T,x)=\phi(x)$, then the Feynman-Kac formula says:
\begin{equation*}
V(x,t)=\mathbb{E}(\phi(X_T)|\mathcal{F}_t)
\end{equation*}
where $\mathcal{F}_t$ is the filtration for time $t$ (information up to time $t$), $X_t$ is a diffsion/ito process following $dX_t = \mu(t,X_t) dt + \sigma(t,X_t) dW_t$ and $X_t = x$. 

The more general form of the Feynman-Kac formula is it's discounted form with an extra term $-r(t,x)V$, where we have:
\begin{array}{rl}
0 &= \displaystyle \frac{\partial V}{\partial t} + \mu(t,x)\frac{\partial V}{\partial x} + \frac{1}{2}\sigma^2(t,x)\frac{\partial^2V}{\partial x^2}-r(t,x)V \\\\
V(x,t) &= \mathbb{E}(e^{-\int_t^Tr(s,X_s)ds}\phi(X_T)|\mathcal{F}_t)
\end{array}

### Deriving the solution to a European Call via Feynman-Kac
Consider the terminal condition given by $V(T,S_T)=(S_T-K)^+$. We can solve the black scholes PDE (which assumes the risk-neutral measure) by applying the discounted Feynman-Kac formula:
\begin{array}{rl}
0 &= \displaystyle \frac{\partial V}{\partial t} + rS_t\frac{\partial V}{\partial S} + \frac{1}{2}\sigma^2 S_t^2\frac{\partial^2 V}{\partial S^2} - rV\\\\
\mu(t,x) &= rx , \sigma(t,x) = \sigma x, r(t,x) = r \\\\
V(t,S_t) &= \mathbb{E}_{\mathbb{Q}}[e^{-r(T-t)}(S_T-K)^+|\mathcal{F}_t] \\\\
 &=\displaystyle  e^{-r(T-t)}\mathbb{E}_{\mathbb{Q}}[(S_T-K)^+|S_t] \\\\
\end{array}
Next, recall that we are under the risk-neutral measure so we have the following formula for $S_T$.
\begin{array}{rl}
\log{S_T} &\displaystyle \sim \mathcal{N}_{\mathbb{Q}}(\log{S_t}+(r-\frac{1}{2}\sigma^2)(T-t),\sigma^2(T-t))\\\\
S_T&=\displaystyle S_te^{(r-\frac{1}{2}\sigma^2)(T-t)+\sigma\sqrt{T-t}\epsilon}, \epsilon\sim\mathcal{N}(0,1) \\\\
\mathbb{Q}(S_T\geq K) &= \displaystyle\mathbb{Q}(\epsilon>\frac{\log{K/S}-(r-\sigma^2/2)(T-t)}{\sigma\sqrt{T-t}}) = \mathbb{Q}(\epsilon>-d_2)\\\\
d_2 &:=-\frac{\log{K/S}-(r-\sigma^2/2)(T-t)}{\sigma\sqrt{T-t}} \\\\
\end{array}
Proceeding using this knowledge we eventually arrive at the final formula for a european call option.
\begin{array}{rl}
\mathbb{E}[V(T,S_T)|S_t] &= \int_{-d_2}^\infty S_T \frac{1}{\sqrt{2\pi}}e^{-\epsilon^2/2}d\epsilon\\\\
&= \displaystyle \int_{-d_2}^\infty (S_te^{(r-\frac{1}{2}\sigma^2)(T-t)+\sigma\sqrt{T-t}\epsilon}-K) \frac{1}{\sqrt{2\pi}}e^{-\epsilon^2/2}d\epsilon\\\\
&= \displaystyle S_te^{r(T-t)}\int_{-d_2}^\infty \frac{1}{\sqrt{2\pi}} e^{-(\epsilon-\sqrt{(T-t)}\sigma)^2/2}d\epsilon - K \int_{-d_2}^\infty  \frac{1}{\sqrt{2\pi}}e^{-\epsilon^2/2}d\epsilon \\\\
\text{Substituting }& \tilde{\epsilon}=\epsilon-\sigma\sqrt{T-t} \text{ and } d_1 = d_2+\sigma\sqrt{T-t}\\\\
\mathbb{E}[V(T,S_T)|S_t] &= \displaystyle S_te^{r(T-t)}\int_{-d_1}^\infty \frac{1}{\sqrt{2\pi}} e^{-\tilde\epsilon^2/2}d\tilde{\epsilon} - K \int_{-d_2}^\infty  \frac{1}{\sqrt{2\pi}}e^{-\epsilon^2/2}d\epsilon \\\\
&= \displaystyle Se^{r(T-t)}N(d_1)-KN(d_2) \\\\
V(t,S_t) &= e^{-r(T-t)}\mathbb{E}[V(T,S_T)]=S_tN(d_1)-Ke^{-r(T-t)}N(d_2)
\end{array}
where $N(\cdot)$ denotes the standard normal CDF.

### Notes about the European call option solution
$N(d_2)$ can be understood as the probability, under the risk neutral measure, of finishing ITM. As we will see later, $N(d_1)$ is the delta of a European call option and can be understood as the probability of finishing ITM under under the real-world measure.

### Combo European Options
Some common combo options:
- Butterfly: Buying a low and high strike call and shorting a middle strike call. A bet on low volatility.
- Straddle: Buying both a call and put. A bet on high volatility.
- Bull spread: Buying a low strike and shorting a higher strike. A bet on a moderate rise in prices.

The main thing to note for combos of european-style options is that their value is simply the sum of their parts by the replication principle. 

# The Greeks
The greeks refer to the various partial derivatives of an option's price with respect to variables like the price of the underlying and volatility.

### Delta ($\Delta$)
$\Delta$ is the sensitivity of an option's price to movements in the underlying, i.e. $\frac{\partial V}{\partial S}=\Delta$. Below is an example derivation of $\Delta$ for a European call option.
\begin{array}{rl}
V(t,S_t)&=\displaystyle S_tN(d_1)-Ke^{-r(T-t)}N(d_2)\\\\
\frac{\partial V}{\partial S}(t,S_t)&=\displaystyle \frac{\partial}{\partial S}(S_tN(d_1)-Ke^{-r(T-t)}N(d_2))\\\\
&=\displaystyle N(d_1)+S_t\frac{\partial N(d_1)}{\partial S}-Ke^{-r(T-t)}\frac{\partial N(d_2)}{\partial S} \\\\
&=N(d_1)
\end{array}
Some steps have been skipped above since the expansion of the rest of the partial derivatives is fairly straightforward and they cancel.

One main use of delta is for the purposes of delta hedging. This is the strategy of purchasing/selling $x$ units of options contracts and then selling/purchasing its underlying in $\Delta x$ units. Overall, this can help with things like the variance of your returns, and focusing on predictions for other influences on an options price like volatility.

Intuitively, a put option's delta must be negative while a call option's delta must be positive. Deep ITM options will tend to have larger delta values. Lastly, ATM options have the most change in delta as the underlying shifts in price (another greek known as $\Gamma$).

### Gamma ($\Gamma$)
$\Gamma$ is the sensitivity of an option's $\Delta$ to price movements in the underlying, i.e. $\frac{\partial\Delta}{\partial S}=\Gamma$. Below is an example of $\Gamma$ for a European call option.
\begin{array}{rl}
\Gamma&=\displaystyle \frac{\partial}{\partial S} N(d_1)\\\\
&=\displaystyle \frac{\phi(d_1)}{S_t\sigma\sqrt{T-t}}
\end{array}
where $\phi(\cdot)$ denotes the standard normal pdf. 

Gamma is relevant for knowing how quickly delta needs to be rebalanced for better hedging. As explained previously, it is highest when an option is ATM and decreases as the underlying moves further away from the strike price. Of note is that it is always positive and the same for both puts and calls. Moreover, the positive nature implies that both european calls and puts are convex functions of the underlying price.

### Vega ($\upsilon$)
$\upsilon$ is the sensitivity of an option's price to changes in volatility. Below is an example of $\upsilon$ for a European call option.
\begin{array}{rl}
\upsilon&=\displaystyle \frac{\partial}{\partial \sigma}(S_tN(d_1)-Ke^{-r(T-t)}N(d_2))\\\\
&= S_t\phi(d_1)\sqrt{T-t}
\end{array}
It is positive for both european calls and puts since an increase in volatility corresponds to a higher diffusion coefficient and probability of ending ITM.

### Theta ($\Theta$)
$\Theta$ is the change in an option's price as time passes/maturity draws near. Below is an example of $\Theta$ for a European call option.
\begin{array}{rl}
\Theta &= \displaystyle \frac{\partial}{\partial (T-t)}(S_tN(d_1)-Ke^{-r(T-t)}N(d_2)) \\\\
&= -\frac{S_t\phi(d_1)}{2\sqrt{T-t}}-rKe^{-r(T-t)}N(d_2)
\end{array}
It is negative for regular calls and puts and is sometimes known as the time decay of an option. This is because as maturity draws near the probability of finishing (deeper) in the money decreases.

### Rho ($\rho$)
$\rho$ is the sensitivity of an option's price to changes in interest rates. Below is an example of $\rho$ for a European call option.
\begin{array}{rl}
\rho&=\displaystyle \frac{\partial}{\partial r}(S_tN(d_1)-Ke^{-r(T-t)}N(d_2))\\\\
&=Ke^{-r(T-t)}(T-t)N(d_2)
\end{array}

### Higher order greeks 
Higher-order greeks are simply higher order partial derivatives with respect to an option's price. Here are a few:
- Vanna: Measures the sensitivity of an option's delta to changes in volatility
- Charm: Measures the decay of delta over time.
- Volga: Measures the sensitivity of vega to changes in volatility. This is also a measure of the convexity of an option's price with respect to volatility.
- Color: Measures the decay of gamma over time.
- Speed: Measures the change in gamma with respect to the price of the underlying.
- Zomma: Measures the sensitivity of gamma with respect to volatility.
- Ultima: The third order derivative with respect to volatility.

# Including Dividends(and other flows)
Stock price path:
\begin{equation*}
S_t=S_0e^{(r-y-\frac{1}{2}\sigma^2)(T-t)+\sigma\sqrt{T-t}W_t}
\end{equation*}

SDE:
\begin{equation*}
dS_t=(r-y)S_tdt+\sigma S_tdW_t
\end{equation*}

Black Scholes European Call Option solution:
\begin{equation*}
C(S_t,T)=S_te^{-y(T-t)}N(d_1)-Ke^{-r(T-t)}N(d_2)
\end{equation*}
Notice that the call option has a lower price due to the dividends. This makes sense as dividends reduce the growth of the stock.



# Discrete time modelling

### Cox-Ross Rubenstein Binomial trees
This is a discrete model of stock prices where the stock can only move up or down. This is expressed by:
\begin{equation*}
S_{t+1} = \begin{cases}uS_t & u\geq1 \text{ with probability } p\\dS_t& d\leq 1 \text{ with probability } 1-p\end{cases}
\end{equation*}
Similarly to having a risk-neutral measure for the continuous model, we can define the probability, up and down factors such that price grows at the risk-free rate with volatility $\sigma^2$.
- up-factor: $u = e^{\sigma\sqrt{\delta t}}$
- down-factor: $d = e^{-\sigma\sqrt{\delta t}}$
- up-probability: $p = \frac{e^{r\delta t}-d}{u-d}$
In order to price options using this model, the terminal values for all the paths are calculated and then intermediate values at each timestep are calculated based on expectation till time 0.

Consider an option who's price is $V$. Then the delta of the option approximated by the binomial model is:
\begin{equation*}
    \Delta = \frac{V_u-V_d}{uS_t-dS_t}
\end{equation*}
where $V_u$ and $V_d$ are the values of the option when the stock goes up and down respectively. This can be used for delta hedging in discrete time.

### Snell's Envelope
Consider an adapted process $X_t$. The Snell Envelope $Y_t$ of $X_t$ is the value process of the optimal stopping problem defined by:
\begin{array}{rl}
Y_T &= X_T\\\\
Y_t &= \max(X_t, \mathbb{E}[Y_{t+1}|\mathcal{F}_t]), t=T-1,...,0\\\\
&= \sup_{\tau\geq t}\mathbb{E}[X_\tau|\mathcal{F}_t]
\end{array}
By definition, it has the following properties:
- $Y_t\geq X_t$
- Stopping at $\tau^*=\inf{(t\geq 0:Y_t=X_t)}$ is optimal.

This theorem is particularly useful when dealing with options with early expiry among other conditions beyond the European-style.

# American Options and other common Exotic Options

### Binary Options
Binary options pay $1 if the underlying finishes above/below the strike. The analytic solution to the price of a binary option is derivable.

### American Options
An American option, unlike a European option, allows for early exercise of the option. This "extra optionality" means that the American option is always at least as expensive as its European counterpart. Unless dividends are being paid, it is almost always best to sell an American option rather than exercise it early since it always carries extra time value (due to theta decay) while exercising early will only pay the intrinsic. In discrete time, these can be priced using Snell's Envelope on the European option value process.

### Arithmetic and Geometric Asian Option
An arithmetic asian option takes the settlement price to be the arithmetic average of the last n days of trading. Vice versa for the geometric asian option.

### Exchange Options
An exchange option involves the exchange of one asset for another. Its payout is denoted by $(S_1-S_2)^+$ where $S_1$ is the stock being exchanged out of. Note that by using ito's lemma and considering $f=\frac{S_1}{S_2}$, you can arrive at an analytic solution.


# Some Extensions (Just for knowledge)

### Ornstein Uhlenbeck driven Price
The OU-process describes a stochastic process that is mean reverting. Using this as a base can help to improve the accuracy of an options pricing model.
\begin{equation*}
    dX_t = \theta(\mu-X_t)dt+\sigma dW_t
\end{equation*}
The above makes the OU-process mean revert to the long term average $\mu$ with speed $\theta$.

### Interest rate modelling using Short rates
Short rate models for interest rates describe interest rates as their own stochastic processes. There are two categories of short-rate models: equilibrium models which do not match current term structure; and arbitrage-free models which do match current term structure.

Equilibrium:
- Vasicek: $dr_t = a(b-r_t)dt+\sigma dW_t$ where $b$ is the long term average and $a$ is the mean reverting speed.
- CIR Model: $dr_t = a(b-r_t)dt+\sigma\sqrt{r_t}dW_t$ where $\sqrt{r_t}$ guarantees positivity (assuming $2ab\geq\sigma^2$) of the short-rate unlike the Vasicek model.

Arbitrage free:
- Ho-Lee: $dr_t=\theta_t dt+\sigma dW_t$ where $\theta_t$ is adjusted to match the current rate curve (it is its own time varying process which is solvable). This does not have an explicit mean-reverting component.
- Hull-White: $dr_t = a(b_t-r_t)dt + \sigma dW_t$ where $b_t$ is used to fit the current rate curve and $a$ is the speed of mean reversing.

Besides short-rate models, there are also forward rate models which look at the evolution of the entire forward rate curve.

### Volatility modelling
Volatility in the black scholes model is taken to be a constant. However, we can model it as its own time-varying process (as it is in real life). One of the most basic stochastic volatility models is the Heston model:
\begin{equation*}
    d\sigma_t = a(b-\sigma_t)dt+\xi\sqrt{\sigma_t}dW_t
\end{equation*}
The heston model is the exact same in form to the CIR model which guarantees positivity of volatility and clustering. Commonly, the brownian motion/wiener process used in stochastic volatility models is assumed to have some correlation $\rho$ with the wiener process driving the price process. This model, however, still fails to capture a couple other aspects of volatility:
- Volatility spikes/jumps
- Non-linearities
- The volatility smile's dynamics
- Long term memory
- Rough volatility (This is to do with a more complex concept called the Holder exponent which quantifies the roughness of a path)

Some other models that attempt to capture these aspects include: local volatility models, stochastic local volatility models, and rough volatility models.

### Transaction cost modelling [[src](https://math.nyu.edu/~goodman/theses/ChiLee.pdf)]

### Multi-factor modelling [[src](https://arxiv.org/pdf/2408.15416)]
