# Arithmetic mean

$$\\mu = \\frac{\\sum_{i=1}^N X_i}{N}$$\n

In [2]:
# Two useful statistical libraries
import scipy.stats as stats
import numpy as np

# We'll use these two data sets as examples
x1 = [1, 2, 2, 3, 4, 5, 5, 7]
x2 = x1 + [100]

print('Mean of x1:', sum(x1), '/', len(x1), '=', np.mean(x1))
print('Mean of x2:', sum(x2), '/', len(x2), '=', np.mean(x2))

Mean of x1: 29 / 8 = 3.625
Mean of x2: 129 / 9 = 14.333333333333334


Weighted arithmetic mean

Use for instances when specifying the number of times a observation should be counted. AKA 34% of my stocks are gold miners.

# Median
number which appears in the middle of the list

If odd number $n$ of data points, this is simply the value in position $(n+1)/2$.

Even number of data points, the list splits in half and there is no item in the middle; so we define the median as the average of the values in positions $n/2$ and $(n+2)/2$.\n

In [3]:
print('Median of x1:', np.median(x1))
print('Median of x2:', np.median(x2))

Median of x1: 3.5
Median of x2: 4.0


The mode is the most frequently occurring value in a data set

In [4]:
print('One mode of x1:', stats.mode(x1)[0][0])

# So we will write our own
def mode(l):
    # Count the number of times each element appears in the list
    counts = {}
    for e in l:
        if e in counts:
            counts[e] += 1
        else:
            counts[e] = 1

    # Return the elements that appear the most times
    maxcount = 0
    modes = {}
    for (key, value) in counts.items():
        if value > maxcount:
            maxcount = value
            modes = {key}
        elif value == maxcount:
            modes.add(key)

    if maxcount > 1 or len(l) == 1:
        return list(modes)
    return 'No mode'

print('All of the modes of x1:', mode(x1))

One mode of x1: 2
All of the modes of x1: [2, 5]


A more useful way of getting the mode for stocks

In [5]:
import pandas as pd
import pandas_datareader as pdr

data_nvda = pdr.get_data_yahoo('NVDA')
returns = data_nvda["Open"].pct_change()[1:]
print('Mode of returns:', mode(returns))

hist, bins = np.histogram(returns, 20) # Break data up into 20 bins
maxfreq = max(hist)
# Find all of the bins that are hit with frequency maxfreq, then print the intervals corresponding to them
print('Mode of bins:', [(bins[i], bins[i+1]) for i, j in enumerate(hist) if j == maxfreq])

Mode of returns: [0.0]
Mode of bins: [(-0.009015424528865279, 0.007222532912099677)]


# Geometric Mean
Geometric mean uses multiplication instead of addition, can also be written as a logarithm, will always be less than or equal to the arithmetic mean (provide not negative)
$$ G = \\sqrt[n]{X_1X_1\\ldots X_n} $$

$$ \\ln G = \\frac{\\sum_{i=1}^n \\ln X_i}{n} $$


In [6]:
print('Geometric mean of x1:', stats.gmean(x1))
print('Geometric mean of x2:', stats.gmean(x2))

Geometric mean of x1: 3.0941040249774403
Geometric mean of x2: 4.552534587620071


If you need to handle negative observations, for asset returns it is easier since values are always at least -1.
Can just add 1 while computing then minus one to output

$$ R_G = \\sqrt[T]{(1 + R_1)\\ldots (1 + R_T)} - 1$$


In [7]:
ratios = returns + np.ones(len(returns))
R_G = stats.gmean(ratios) - 1
print('Geometric mean of returns:', R_G)

Geometric mean of returns: 0.0009386380473392908


The geometric mean is defined so that if the rate of return over the whole time period were constant and equal to $R_G$, the final price of the security would be the same as in the case of returns $R_1, \\ldots, R_T$.

In [8]:
T = len(returns)
init_price = data_nvda["Open"][0]
final_price = data_nvda["Open"][T]
print('Initial price:', init_price)
print('Final price:', final_price)
print('Final price as computed with R_G:', init_price*(1 + R_G)**T)

Initial price: 41.3849983215332
Final price: 134.58999633789062
Final price as computed with R_G: 134.58999633787343


# Harmonic mean
Less commonly used
$$ H = \\frac{n}{\\sum_{i=1}^n \\frac{1}{X_i}} $$

Can write to look like Arithmetic mean
$$ \\frac{1}{H} = \\frac{\\sum_{i=1}^n \\frac{1}{X_i}}{n} $$

for non-neg, is always at most the geometric mean (at most the arithmetic mean)

All of these are only equal when all observations are equal


In [9]:
print('Harmonic mean of x1:', stats.hmean(x1))
print('Harmonic mean of x2:', stats.hmean(x2))

Harmonic mean of x1: 2.5590251332825593
Harmonic mean of x2: 2.869723656240511


Can be used when the data is termed in ratios. Dollar cost averaging is an example, The higher the price of the stock, then, the fewer shares an investor following this strategy buys. The average (arithmetic mean) amount they pay for the stock is the harmonic mean of the prices.

Means by nature hide a lot of information, as they collapse entire distributions into one number. As a result often 'point estimates' or metrics that use one number, can disguise large programs in your data. You should be careful to ensure that you are not losing key information by summarizing your data, and you should rarely, if ever, use a mean without also referring to a measure of spread.

Distribution type would also matter a lot

Means by nature hide a lot of information, as they collapse entire distributions into one number. As a result often 'point estimates' or metrics that use one number, can disguise large programs in your data. You should be careful to ensure that you are not losing key information by summarizing your data, and you should rarely, if ever, use a mean without also referring to a measure of spread.