In [1]:
# Two useful statistical libraries
import scipy.stats as stats
import numpy as np

#Arithmetic Mean



* Synonymous with Average
* Quick window into the average of bunch of data
* It is defined as following:
$$ \mu = \frac{\sum_{i=1}^{N} X_i}{N} $$
where each $X_i$ are the data points.


In [2]:
# We'll use the following two data sets as examples
x1 = [i for i in range(0,10)]
x2 = x1 + [100]


print("A. mean of x1 is: ", sum(x1), '/', len(x1), '=', np.mean(x1))
print("A. mean of x2 is: ", sum(x2), '/', len(x2), '=', np.mean(x2))





A. mean of x1 is:  45 / 10 = 4.5
A. mean of x2 is:  145 / 11 = 13.181818181818182


# Weighted Mean

* Similiar to Arithmetic mean, however the difference now is that each data point $X_i$ has a multiplier assosciated with it, usually denoted as $w_i$, and we dont divide the cumulitave sum by the number of data points anymore
* The formula is: $$ \mu = \sum_{i=1}^{N} w_i \cdot X_i $$

In [3]:
# In the context of financial math, suppose we have some portfolio
# Each X_i represent different asset classes, i.e. equities, bonds, real estate and etc.
# Then we could have for example given the equities in our portfolio %70 weight in measuing the Weighted Mean

#Weighted Arithmetic Mean

* Weighted Artihmetic Mean is just a special case of Weighted mean in which case $\sum_{i=1}^{N} w_i = 1$

# Median



*   The median is simply the exact data point which occurs right in the middle of the list of data points when arranged in decreasing or ascending manner.
*   In the case where $N$ is odd, we simply just choose the $
(n+1)/2$ term

* In the case where $N$ is even, since there is no exact middle term in an even numbered objects, we simply just choose the average of the $n/2$ and $(n+2)/2$ terms.




In [4]:
print('Median of x1:', np.median(x1))
print('Median of x2:', np.median(x2))

Median of x1: 4.5
Median of x2: 5.0


#Mode

We can understand the *Mode* as the value which occurs the most among the data points, thus making it even applicable to non-numeric data as well. The *Mode* of some data set becomes extremely relevant when the underlying assumption about the data is that they are independant from one another. Meaning, suppose we were to examine the *Mode* of the outcome of weighted die over several trials, and we observe that 6 is the mode, then that informs us in which way the die is weighted to favor a certain outcome as opposed to examining the mean, which might be 4.5

In [5]:
# Scipy has a built-in mode function, but it will return exactly one value
# even if two values occur the same number of times, or if no value appears more than once
print('Mode of x1:', stats.mode(x1)[0][0])

Mode of x1: 0


  print('Mode of x1:', stats.mode(x1)[0][0])


In [6]:
# One can also define its own, albeit rather naive implementation
def mode(l : list[int]) -> int:

    dic = dict()

    for i in l:
      if i in dic.keys():
        dic[i] += 1
      else:
        dic[i] = 1

    max = 0
    val = 0

    for i,j in enumerate(dic.items()):

      if j[1] > max:

        max = j[1]
        val = j[0]

    return val

x3 = [3, 2, 3, 3, 4, 5, 5]
(mode(x3))

3

#Bins in Financial Data

If we encounter a data set, in which all the data points do occur at most once, then it is rather difficult to find the mode and or even implement a historgram, so rather we instead can split our data set into $n$ many interval where each subinterval is some fixed length, aka bin. And then we proceed to find the *Mode* amongst the bins.

In [7]:
# Get return data for an asset and compute the mode of the data set
start = '2014-01-01'
end = '2015-01-01'

In [8]:
import yfinance as yf

data = yf.download('AAPL', start='2020-01-01', end='2020-12-31')['Close']

[*********************100%***********************]  1 of 1 completed


In [9]:
returns = data.pct_change()[1:]

In [10]:
print('Mode of returns:', mode(returns))

Mode of returns: -0.00972203551987938


In [11]:
# np.histogram returns the frequency distribution over the bins as well as the endpoints of the bins
hist, bins = np.histogram(returns, 20) # Break data up into 20 bins
maxfreq = max(hist)

# Find all of the bins that are hit with frequency maxfreq, then print the intervals corresponding to them
print('Mode of bins:', [(bins[i], bins[i+1]) for i, j in enumerate(hist) if j == maxfreq])

Mode of bins: [(-0.0044193447092110705, 0.008003416416441617)]


#Geometric Mean

Unlike the arithmetic mean, the geometric mean is †he multiplication of the data points with the $n$^th root of the total product:
$$
\text{Geometric Mean} = \sqrt[n]{X_1 \cdot X_2 \cdot ... \cdot X_n} =
$$
Where each $X_i \geq 0$ are non-negative data points.

Furthermore, it is always the case that $$ \text{Geom. Mean} \leq \text{Arith. Mean} $$

In [12]:
# Use scipy's gmean function to compute the geometric mean
print('Geometric mean of x1:', stats.gmean(x1))
print('Geometric mean of x2:', stats.gmean(x2))

Geometric mean of x1: 0.0
Geometric mean of x2: 0.0


What if we want to compute the geometric mean when we have negative observations? Suppose once again our data points represent asset returns, and since we are concerned with percent change, the minimum value is at most $-1$, thus it suffices to just add $1$ to each $X_i$, and instead we have:
$$
X' = \sqrt[n]{(1+X_1) \cdot (1+X_2) \cdot ... \cdot (1+X_n)} - 1
$$

In [17]:
# Add 1 to every value in the returns array and then compute R_G
ratios = returns + np.ones(len(returns))
X_new = stats.gmean(ratios) - 1
print('Geometric mean of returns:', X_new)

Geometric mean of returns: 0.002301824223453375


The geometric mean is defined so that if the rate of return over the whole time period were constant and equal to $X_{new}$, then the final price of the security would be the same as in the case of returns $X_1,...,X_n$.

In [18]:
T = len(returns)
init_price = data[0]
final_price = data[T]
print('Initial price:', init_price)
print('Final price:', final_price)
print('Final price as computed with X_new:', init_price*(1 + X_new)**T)

Initial price: 75.0875015258789
Final price: 133.72000122070312
Final price as computed with X_new: 133.72000122070096
