#Measures of Dispersion

By dispersion, aka the spread of some data set, we essentially measure the variance of the probability distribution. This is incredibly important in finance since the risk is measured via measuring the spread over the historical returns of some asset. The more centered the variance or the spread, the less risky the asset, whereas if the spread seems to deviate far too much from central value, then the asset is naturally very risky.

In [1]:
# We can examine this by trying it out on some random data
import numpy as np
np.random.seed(11)

In [2]:
# gen. 50 random ints less than 100
Returns = np.random.randint(100, size=20)

# sort them
Returns = np.sort(Returns)
print("Returns:", Returns)

mu = np.mean(Returns)
print("Mean of Returns: ", mu)

Returns: [ 1  4 12 13 24 25 32 33 34 45 48 55 63 71 76 80 81 82 91 92]
Mean of Returns:  48.1


#Range

This is simply the difference between the minimum and the maximum of the data points. Thus, if there is an outlier in the dataset, it will heavily skew the value of the *Range*.

In [3]:
Range = np.ptp(Returns)
print("Range is: ", Range)

Range is:  91


# Mean Absolute Deviation (MAD)

This is the mean of the $L_1$ distance between the data points and their arithmetic mean. $\mu$ is the arithemtic mean, and $N$ is the number of data samples.
$$
\text{MAD} = \frac{\sum_{i=1}^{N}|X_i - \mu|}{N}
$$

In [4]:
meanAbsDev = np.sum([np.abs(mu - x) for x in Returns])/len(Returns)
print("MAD: ", meanAbsDev)

MAD:  25.809999999999995


#Variance and standard deviation

The variance, denoted as $\sigma^2$, is defined as the average of the squared distance between the mean and the sample data points; $$
\sigma^2 = \frac{\sum_{i=1}^{N} (X_i - \mu)^2}{N}
$$

Notice that unlike the mean absolute deviation, which uses the absolute value as opposed to squaring the distances, variance is real differentiable where as absolute value is not, and since most optimization algorithms rely heavily on differentiability, hence why we prefer variance to mean asbolute deviation.

In [5]:
print("Variance of Returns: ", np.var(Returns))
print("Standard Deviation of Returns: ", np.std(Returns))

Variance of Returns:  854.89
Standard Deviation of Returns:  29.23850201361212


Side note: Recall ***Chebyshev's inequality*** which states that:
$$
P(|X-\mu| \geq k\sigma) \leq \frac{1}{k^2}
$$
That is; for any random variable with finite expected value, i.e. $\mu$, and finite non-zero variance $\sigma^2$, and for any $k>0$, the probability that $X$ is more than $k$ std away from $\mu$ is *at most* $\frac{1}{k^2}$.

This could help us gain an intution with regards to standard deviation and that is that the proportion of data points within $k$ standard deviation, i.e. $k \cdot σ$, away from the mean is at least $1 - \frac{1}{k^2}$.

In [6]:
# Let us verify this
# fix k=1.25 any constant
k = 1.25
dist = k*np.std(Returns)
l = [x for x in Returns if abs(x - mu) <= dist]
print("Number of observations within ", k , 'std. away from the mean is: ', l)
print("Confirming that: ", float(len(l))/len(Returns), " > ", 1 - 1/k**2)


Number of observations within  1.25 std. away from the mean is:  [12, 13, 24, 25, 32, 33, 34, 45, 48, 55, 63, 71, 76, 80, 81, 82]
Confirming that:  0.8  >  0.36


Notice that the bound given by Chebyshev's inequality seems fairly loose in this case. This bound is rarely strict, but it is useful because it holds for all data sets and distributions.

#Semivariance and semideviation

Although variance and standard deviation tell us how volatile a quantity is, they do not differentiate between deviations upward and deviations downward. However it becomes ever more important in asset return analysis to be able to differentiate between deviations downward and or upward. This is exactly where semivariance and semideviations become useful in capturing the number of data points that are below the mean. Thus ***semivariance*** is defined as:
$$
\text{Semi Variance = }\frac{\sum_{X_i < \mu}^{N} (X_i - \mu)^2}{N_<}
$$
where $N_<$ is the number of data points below the mean. ***Semideviation*** is just the square root of semivariance.




In [7]:
# Because there is no built-in semideviation function thus we can implement our own
lows = [i for i in Returns if i < mu]

semivar = np.sum( (lows - mu)**2 )/len(lows)

In [8]:
print('Semivariance of X:', semivar)
print('Semideviation of X:', np.sqrt(semivar))

Semivariance of X: 773.5009090909091
Semideviation of X: 27.811884313920714


If we instead alter the "mean" in the formula above to any arbitrary value, say $K$, then we can technically filter out any value above that target value $K$, ergo targeted semivaraince/semideviation. That is, let $K$ be some nonnegative value, then we have that:
$$
\text{targeted semivaraince} = \frac{\sum_{X_i < K}^{N} (X_i - K)}{N_{< K}}
$$

In [9]:
K = 100
lows_K = [e for e in Returns if e <= K]
semivar_K = sum(map(lambda x: (x - K)**2,lows_K))/len(lows_K)

print('Target semivariance of X:', semivar_K)
print('Target semideviation of X:', np.sqrt(semivar_K))

Target semivariance of X: 3548.5
Target semideviation of X: 59.56928738872071


In [10]:
import yfinance as yf

# Download historical data for desired ticker symbol
data = yf.download('AAPL', start='2020-01-01', end='2023-7-01')['Close']

# Lets try these on some real historical prices
# we use the Rolling mean function from numpy with 30 day window and then obtain the variance & std
variance = data.rolling(window=30).var()
std = data.rolling(window=30).std()


[*********************100%***********************]  1 of 1 completed


Suppos we wish to model the ***portfolio variance*** which is defined as:
$$
\text{VAR_p} = \text{VAR}_{s_1}(w_1^2) + \text{VAR}_{s_2}(w_2^2) + \text{COVAR}_{s_1,s_2}(2w_1w_2)
$$
where $w_1,w_2$ are the weights of the the respective assets $s_1,s_2$. We want to specificaly find the weights so that the $\text{VAR_p} = 50$.

In [11]:
asset1 = yf.download('AAPL',  start='2020-01-01', end='2023-7-01')['Close']
asset2 = yf.download('NVDA',  start='2020-01-01', end='2023-7-01')['Close']

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


In [13]:
cov = np.cov(asset1, asset2)[0,1] # obtain the covaraince matrix between APPLE and NVIDIA stock

w1 = 0.87 # in this example we fixed the weights, as we did not do any linear regression to find the weights
w2 = 1 - w1

v1 = np.var(asset1)
v2 = np.var(asset2)

pvariance = (w1**2)*v1+(w2**2)*v2+(2*w1*w2)*cov

print("Portfolio variance is: ", pvariance)

Portfolio variance is:  1276.3249930194775
