# Interview Questions
Finance is known for asking trap-like interview questions. One of those that challenged me was the following:

What is more likely: 30 days with a 1% drop, or 1 day with a 30% drop? Assume  $\epsilon$ ~ iid N(0,1).

For redability and intuition, let's simplify to 2 days at 1% and 1 day at 2%. 

$P(X<-1)^2$ $\overset{?}{>}$ $P(X<-2)$


#### As per iid assumption:

$P(X<-1)^n$ $=$ $P\left( \bigcap_{t=1}^{30} \{ X_t \leq -1 \} \right)$,

Since $ P(X_t \leq -1)$ is the same for all $t $

$P\left( \bigcap_{t=1}^{n} \{ X_t \leq -1 \} \right) = \prod_{t=1}^{n} P(X_t \leq -1)$ = $P(X < -1)^n$.

## $\epsilon$ ~ iid N(0,1)

In [3]:
import numpy as np
from scipy.stats import norm

# Parameters
mu = 0  # Mean of the distribution
sigma = 1  # Standard deviation of the distribution

In [4]:
# Simplified case
# 2 days with a 1% drop
prob_2_days_1_percent = norm.cdf(-1, loc=mu, scale=sigma) ** 2

# 1 day with a 2% drop
prob_1_day_2_percent = norm.cdf(-2, loc=mu, scale=sigma)

# Display results
if prob_2_days_1_percent > prob_1_day_2_percent:
    print(f"Probability of 2 days with a 1% drop: {prob_2_days_1_percent:.6f} (greater)")
    print(f"Probability of 1 day with a 2% drop: {prob_1_day_2_percent:.6f}")
else:
    print(f"Probability of 2 days with a 1% drop: {prob_2_days_1_percent:.6f}")
    print(f"Probability of 1 day with a 2% drop: {prob_1_day_2_percent:.6f} (greater)")

Probability of 2 days with a 1% drop: 0.025171 (greater)
Probability of 1 day with a 2% drop: 0.022750


In [5]:
# Actual case
prob_30_days_1_percent = norm.cdf(-1, loc=mu, scale=sigma) ** 30
prob_1_day_30_percent = norm.cdf(-30, loc=mu, scale=sigma)

# Determine which is greater (too small to read)
if prob_30_days_1_percent > prob_1_day_30_percent:
    print(f"Probability of 30 days with a 1% drop: {prob_30_days_1_percent:.6e} (greater)")
    print(f"Probability of 1 day with a 30% drop: {prob_1_day_30_percent:.6e}")
else:
    print(f"Probability of 30 days with a 1% drop: {prob_30_days_1_percent:.6e}")
    print(f"Probability of 1 day with a 30% drop: {prob_1_day_30_percent:.6e} (greater)")

Probability of 30 days with a 1% drop: 1.031891e-24 (greater)
Probability of 1 day with a 30% drop: 4.906714e-198


It is worth understanding that this result relies on the fact the $\epsilon$ are iid. The stock market is not iid and most likely follows something like an ARMA process.

Let's look at how this behaves under an ARMA process. Note: ARMA combines the AR (autoregressive) and MA (the error terms lags). ARIMA captures the non-stationarity of the market.

## $\epsilon$ ~ ARMA(p,d,q), with $\eta_t$ ~ N(0,1) 

In [8]:
from statsmodels.tsa.arima_process import ArmaProcess

### n = 2-days

In [10]:
# Parameters for ARMA
mu = 0  # Mean
sigma = 1  # Standard deviation of white noise
ar_params = [1, -0.5]  # AR(1) coefficients
ma_params = [1, 0.4]   # MA(1) coefficients
n_simulations = 1000000  # Number of simulations
n_days = 2  # Days for comparison

# Define ARMA process
ar = np.array(ar_params)
ma = np.array(ma_params)
arma_process = ArmaProcess(ar, ma)

In [11]:
# Simulate returns for n_days and 1 day
# This simulate, jointly, a n-days return, 
simulated_returns_2_days = arma_process.generate_sample(nsample=n_simulations * n_days, scale=sigma, burnin=100).reshape(n_simulations, n_days)
simulated_returns_1_day = arma_process.generate_sample(nsample=n_simulations, scale=sigma, burnin=100)

simulated_returns_2_days,simulated_returns_1_day

(array([[-0.59820912, -1.68285308],
        [-1.93384093, -1.81580211],
        [-1.05285845,  1.245632  ],
        ...,
        [-2.53578156, -0.74384314],
        [ 0.37628393,  1.20721287],
        [ 0.38766901, -0.25795533]]),
 array([ 0.58166972,  0.66019304,  0.43421751, ..., -0.97571733,
        -0.2647429 ,  0.51638965]))

In [12]:
# Probability calculation
prob_1_day_2_percent = np.mean(simulated_returns_1_day <= -2)  # One 2% drop
prob_consecutive_1_percent = np.mean(
    np.all(simulated_returns_2_days <= -1, axis=1)  # Check if all days in each simulation meet the condition
)
prob_consecutive_1_percent,prob_1_day_2_percent

(0.144352, 0.082809)

In this case, the two-day scenario does not map well to the 30-day scenario because the draws are not iid (so one cannot do $P(X<-1)^n$)! Instead, they depend on the previous days. As a result, having 30 days in a row with drops is substantially less likely under an ARMA process than under a simple iid N(0,1). 

### n = 30-days

In [15]:
n_days = 30  # Days for comparison

# Define ARMA process
ar = np.array(ar_params)
ma = np.array(ma_params)
arma_process = ArmaProcess(ar, ma)

In [16]:
simulated_returns_n_days = arma_process.generate_sample(nsample=n_simulations * n_days, scale=sigma, burnin=100).reshape(n_simulations, n_days)
#So this is a n_days wide row (30), by n_simulations simulations long.
simulated_returns_1_day = arma_process.generate_sample(nsample=n_simulations, scale=sigma, burnin=100)


In [17]:
prob_all_30_days_1_percent = np.mean(
    np.all(simulated_returns_n_days <= -1, axis=1)  # Check if all days in each simulation meet the condition
)
prob_1_day_30_percent = np.mean(simulated_returns_1_day <= -30)  # One 30% drop

In [18]:
prob_all_30_days_1_percent>prob_1_day_30_percent

False

In [19]:
if prob_all_30_days_1_percent > prob_1_day_30_percent:
    print(f"Probability of 30 days with a 1% drop: {prob_all_30_days_1_percent:.6f} (greater)")
    print(f"Probability of 1 day with a 30% drop: {prob_1_day_30_percent:.6f}")
else:
    print(f"Probability of 30 days with a 1% drop: {prob_all_30_days_1_percent:.6f}")
    print(f"Probability of 1 day with a 30% drop: {prob_1_day_30_percent:.6f} (greater)")

Probability of 30 days with a 1% drop: 0.000000
Probability of 1 day with a 30% drop: 0.000000 (greater)


## Result
Thus, we now obtain a more realistic result, as a 30-day 1% drop is quite unlikely in the real world. However, in the "fantasy land" of iid, it appears very likely. This highlights the importance of choosing the appropriate model for the given context. As seen earlier, the qualitative results under iid were similar for two consecutive days, but when examining the extremity of the tail of the distribution, the right modeling matters.

Thus:

$P(X<-1)^n$ $\not=$ $P\left( \bigcap_{t=1}^{30} \{ X_t \leq -1\} \right)$,

AND

$P\left( \bigcap_{t=1}^{30} \{ X_t \leq -1 \} \right)$ < $P(X<-30)$.


### Sub-question
Now, what can we do to ensure both have the same probability?
Namely that:

$P\left( \bigcap_{t=1}^{30} \{ X_t \leq -1 \} \right)$ = $P(X<-30)$.

Note: here is a written proof:

Apriori: in the iid world $P\left( \bigcap_{t=1}^{30} \{ X_t \leq -1 \} \right)$ > $P(X<-30)$ and in the ARMA world (with current assumed value), $P\left( \bigcap_{t=1}^{30} \{ X_t \leq -1 \} \right)$ < $P(X<-30)$. 

Moreover, the i.i.d. case is a subset of the ARMA process (where all parameters are zero). Thus, we must only iterate through our algorithm until both probabilities are equal for \(n = 30\). It is akin to the Squeeze Theorem. 