### Measures of Dispersion

Dispersion measures how spread out a set of data is. This is especially important in finance because one of the main ways risk is measured is in how spread out returns have been historically. If returns have been very tight around a central value, then we have less reason to worry. If returns have been all over the place, that is risky. 
Data with low dispersion is heavily clustered around the mean, while data with high dispersion indicates many very large and very small values.

In [11]:
# Array of random integers
import numpy as np

np.random.seed(121)

In [3]:
# Generate 20 random integers < 100
X = np.random.randint(100, size=20)

# Sort X
X = np.sort(X)
print('X: %s' %(X))

mu = np.mean(X)
print('Mean of X:', mu)

X: [ 3  8 34 39 46 52 52 52 54 57 60 65 66 75 83 85 88 94 95 96]
Mean of X: 60.2


#### Range

Range is simply the difference between the maximum and minimum values in a dataset. It is very sensitive to outliers.

In [4]:
print('Range of X: %s' %(np.ptp(X)))

Range of X: 93


#### Mean Absolute Deviation (MAD)

The mean absolute deviation is the average of the distances of observations from the arithmetic mean. We use the absolute value of the deviation, so that 5 above the mean and 5 below the mean both contribute 5, because otherwise the deviations always sum to 0.

$MAD = \frac{\sum_{i=1}^n |X_i - \mu|}{n}$

In [5]:
abs_dispersion = [np.abs(mu - x) for x in X]
MAD = np.sum(abs_dispersion) / len(abs_dispersion)
print('Mean absolute deviation of X:', MAD)

Mean absolute deviation of X: 20.520000000000003


#### Variance and standard deviation

The variance $\sigma^2$ is defined as the average of the squared deviations around them:

$\sigma^2 = \frac{\sum_{i=1}^n (X_i - \mu)^2}{n}$

This is sometimes more convenient that the mean absolute deviation because absolute value is not differentiable, while squaring is smooth, and some optimization algorithms rely on differentiability.

Standard deviation is the square root of the variance, and it is easier to interpret because it is in the same units as the observations.

In [6]:
print('Variance of X:', np.var(X))
print('Standard deviation of X:', np.std(X))

Variance of X: 670.16
Standard deviation of X: 25.887448696231154


In [7]:
# Chebyshev's inequality. The proportion of samples within k std of the mean is at least 1 - 1/k^2 for all k > 1.
k = 1.25
dist = k * np.std(X)
l = [x for x in X if abs(x - mu) <= dist]
print('Observations within', k, 'stds of mean:', l)
print('Confirming that', float(len(l)) / len(X), '>', 1 - 1/k**2)

Observations within 1.25 stds of mean: [34, 39, 46, 52, 52, 52, 54, 57, 60, 65, 66, 75, 83, 85, 88]
Confirming that 0.75 > 0.36


#### Semivariance and semideviation

Variance and std do not differentiate between deviations upward and deviations downward. Often, such as returns of an asset, we are more worried about deviations downward. Semivariance is defined as

$\frac{\sum_{X_i < \mu} (X_i - \mu)^2}{n_<}$

where $n_<$ is the number of observations which are smaller than the mean.

In [8]:
# Because there is no built-in semideviation, we'll compute it ourselves
lows = [e for e in X if e <= mu]

semivar = np.sum((lows-mu)**2) / len(lows)

print("Semivariance of X:", semivar)
print("Semideviation of X:", np.sqrt(semivar))

Semivariance of X: 689.5127272727273
Semideviation of X: 26.258574357202395


$\frac{\sum_{X_i < B} (X_i - B)^2}{n_{<B}}$

In [10]:
# Target semivariance: we average the distance from a target of values which fall below that target.
B = 19
lows_B = [e for e in X if e <= B]
semivar_B = sum(map(lambda x: (x - B)**2, lows_B)) / len(lows_B)

print("Target semivariance of X:", semivar_B)
print("Target semideviation of X:", np.sqrt(semivar_B))

Target semivariance of X: 188.5
Target semideviation of X: 13.729530217745982


### Exercises

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf

In [3]:
X = np.random.randint(100, size = 100)
print(X)

[28 23 65 72 26 82 60 70  3 65 44 76 77 58 31 68 30 89 14 79  6 25 63  3
 37 51 29 70 23 31 23 31 25 72 57 71 35 85 67 80 42 93 60 10 21 88 97 52
 45 17 39 59 11 90 15 46 36 25 41 80 95 14 99 69 89 97 81 24 99 25 31 61
 64 82 99 55 33 48 52 56 64 66 87 85 84 22 46 48 83 54 77 92 94 55 34 19
 85 14 16 95]


Find the following parameters of the list X:
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Semivariance and Semideviation
- Target variance (with B = 60)

In [4]:
# Range of X
range_X = np.ptp(X)

print("Range of X: %s" %(range_X))

Range of X: 96


In [5]:
# Mean Absolute Deviation
# First calculate the value of mu (the mean)

mu = np.mean(X)

dispersion = [np.abs(x - mu) for x in X]
MAD = np.sum(dispersion) / len(dispersion)
print("Mean absolute deviation of X:", MAD)

Mean absolute deviation of X: 23.9184


In [6]:
# Variance and Standard Deviation

print('Variance of X:', np.var(X))
print('Standard deviation of X:', np.std(X))

Variance of X: 754.5184000000002
Standard deviation of X: 27.468498320803782


In [7]:
# Semivariance and semideviation
lows = [e for e in X if e <= mu]

semivar = np.sum((lows - mu)**2) / len(lows)

print('Semivariance of X:', semivar) 
print('Semideviation of X:', np.sqrt(semivar)) 

Semivariance of X: 802.7832666666667
Semideviation of X: 28.33343019591286


In [17]:
# Target variance

B = 60
lows_B = [e for e in X if e <= B]
semivar_B = sum(map(lambda x: (x - B)**2, lows_B)) / len(lows_B)

print('Target semivariance of X:', semivar_B)
print('Target semideviation of X:', np.sqrt(semivar_B))

Target semivariance of X: 974.5357142857143
Target semideviation of X: 31.21755458529246


Using the skills aquired in the lecture series, find the following parameters of prices for AT&T stock over a year:

- 30 days rolling variance
- 15 days rolling Standard Deviation

In [19]:
att = yf.download("T", start='2016-01-01', end='2017-01-01')['Open']


[*********************100%%**********************]  1 of 1 completed


In [29]:
# Rolling mean
rolling_var = att.rolling(window=30).var()

In [25]:
# Rolling Standard deviation
rolling_std = att.rolling(window=15).std()
print(rolling_std)

Date
2016-01-04         NaN
2016-01-05         NaN
2016-01-06         NaN
2016-01-07         NaN
2016-01-08         NaN
                ...   
2016-12-23    0.966514
2016-12-27    0.898908
2016-12-28    0.777096
2016-12-29    0.677884
2016-12-30    0.635416
Name: Open, Length: 252, dtype: float64


The portfolio variance is calculated as

$\text{VAR}_p = \text{VAR}_{s1} (w_1^2) + \text{VAR}_{s2}(w_2^2) + \text{COV}_{S_1, S_2} (2 w_1 w_2)$

Where $w_1$ and $w_2$ are the weights of $S_1$ and $S_2$.
Find values of $w_1$ and $w_2$ to have a portfolio variance of 50.

In [39]:
asset1 = yf.download("AAPL", start='2016-01-01', end='2017-01-01')['Open']
asset2 = yf.download("XLF", start='2016-01-01', end='2017-01-01')['Open']

cov = np.cov(asset1, asset2)[0, 1]

w1 = 0.87
w2 = 1 - w1

v1 = np.var(asset1)
v2 = np.var(asset2)

pvariance = (w1**2)*v1+(w2**2)*v2+(2*w1*w2)*cov

print('Portfolio variance: ', pvariance)

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

Portfolio variance:  3.215851446415779



