<a href="https://colab.research.google.com/github/adeeconometrics/Math-Logic-and-Programming-/blob/master/Stat_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Statistical Implementation Protypes
by Dave Amiana 

Description:
This notebook contains functional implementation of statistical functions; it will soon be transformed to a statistical package. This contains naive implementation of statistical functions. 

----
Functions:
- mean 
    - arithmetic mean
    - geometric mean
    - harmonic mean

----
#### Notes:

The arithmetic mean of a population, or population mean, is often denoted $μ$.The sample mean ${\bar {x}}$ (the arithmetic mean of a sample of values drawn from the population) makes a good estimator of the population mean, as its expected value is equal to the population mean (that is, it is an unbiased estimator). The sample mean is a random variable, not a constant, since its calculated value will randomly differ depending on which members of the population are sampled, and consequently it will have its own distribution.  

In [None]:
import numpy as np
import scipy as sci

# Mean

Discussion:

When do we use mean: 

arithmetic mean: 
$${\displaystyle {\bar {x}}={\frac {1}{n}}\left(\sum _{i=1}^{n}{x_{i}}\right)={\frac {x_{1}+x_{2}+\cdots +x_{n}}{n}}}$$

where 
- $x_{1},x_{2},\ldots ,x_{n},$ pertains to the sample
- ${\bar {x}}$ is the arithmetic mean

In [None]:
# arithmetic mean
def arithmetic_mean(data):
    return np.sum(data)/len(data)

geometric mean:
$${\displaystyle {\bar {x}}=\left(\prod _{i=1}^{n}{x_{i}}\right)^{\frac {1}{n}}=\left(x_{1}x_{2}\cdots x_{n}\right)^{\frac {1}{n}}}$$

where
- $x_{1},x_{2},\ldots ,x_{n},$ pertains to the sample
- ${\bar {x}}$ is the geometric mean

In [None]:
def geom_mean(data):
    return np.power(np.prod(data), 1/len(data))

harmonic mean:
$${\displaystyle {\bar {x}}=n\left(\sum _{i=1}^{n}{\frac {1}{x_{i}}}\right)^{-1}}$$

where
- $x_i$ are the sample
- $n$ sample size
- $\bar{x}$ is the harmonic mean

In [None]:
def harmonic_mean(data):
    return len(data)*np.power(np.sum([1/data[i] for i in range(0, len(data))]), -1)

Root Mean Square (quadratic mean)

$$RMS=\sqrt{\frac{1}{n}\sum_i x_i^2}$$
where

- $RMS$	=	root mean square
- ${n}$	=	number of measurements
- $x_i$	=	each value

In [None]:
# quadratic mean 
def quadratic_mean(data):
    return np.sqrt((1/len(data))*np.sum(np.power(data[i], 2) for i in range(0, len(data))))

Weighted arithmetic mean

$${\displaystyle {\bar {x}}={\frac {\sum _{i=1}^{n}{w_{i}x_{i}}}{\sum _{i=1}^{n}w_{i}}}}$$

where
- $x_i$ are the sample
- $n$ sample size
- $\bar{x}$ is the weighted mean

In [None]:
# assume len(data)==len(weights)
def weighted_mean(data, weights):
    if(len(data)==len(weights)):
        return (np.sum([data[i]*weights[i] for i in range(0, len(data))]))/(np.sum(weights))
    else: 
        print("error: data and weights should have the same size")

Interquartile Mean

description:
 the arithmetic mean after removing the lowest and the highest quarter of values.
 
$${\displaystyle {\bar {x}}={\frac {2}{n}}\;\sum _{i={\frac {n}{4}}+1}^{{\frac {3}{4}}n}\!\!x_{i}}$$

where
- $x_i$ are the sample
- $n$ sample size
- $\bar{x}$ is the interquartile mean

In [None]:
# test for the case of odd and even interquartile mean
def interquartile_mean(data):
    scale = 1/len(data)
    upper_limit = np.ceil(3/4*len(data))
    lower_limit = np.floor(len(data)/4 +1)
    return scale* np.sum([data[i] for i in range(lower_limit, upper_limit)])

### Generalized means

Power mean

description: The generalized mean, also known as the power mean or Hölder mean, is an abstraction of the quadratic, arithmetic, geometric and harmonic means.


$${\displaystyle {\bar {x}}(m)=\left({\frac {1}{n}}\sum _{i=1}^{n}x_{i}^{m}\right)^{\frac {1}{m}}}$$

By choosing different values for the parameter m, the following types of means are obtained:

- $m\rightarrow \infty$ maximum of $x_{i}$
- $m=2$	quadratic mean
- $m=1$	arithmetic mean
- $m\rightarrow 0$	geometric mean
- $m=-1$ harmonic mean
- $m\rightarrow -\infty$ minimum of $x_{i}$



In [None]:
def power_mean(data, m):
    return np.power(*1/len(data))*np.sum(data), 1/m)

$f$- mean

description: further generalization of the power mean. 

$${\displaystyle {\bar {x}}=f^{-1}\left({{\frac {1}{n}}\sum _{i=1}^{n}{f\left(x_{i}\right)}}\right)}$$

In [None]:
# import symbolic library for inverse function
import sympy as sp

mean value theorem

$${\displaystyle y_{\text{ave}}(a,b)={\frac {1}{b-a}}\int \limits _{a}^{b}\!f(x)\,dx}$$

In [None]:
# import scipy library for numerical evaluation of integral
import scipy.integrate as integrate

# definition of a function
def mean_value_thm(f,a,b):
    scale = 1/(b-a)
    return scale*sci.integrate.quad(f,a,b)[0]

# Median

Discussion:
The basic advantage of the median in describing data compared to the mean (often simply described as the "average") is that it is not skewed so much by a small proportion of extremely large or small values, and so it may give a better idea of a "typical" value. Because of this, the median is of central importance in [robust statistics](https://en.wikipedia.org/wiki/Resistant_statistic), as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median will not give an arbitrarily large or small result.

Basic median for finite samples:

Case 1: $n \text{ is odd}$
$${\textstyle \mathrm {median} =x{({\frac {n+1}{2}})}}$$

Case 2: $n \text{ is even}$
$${\displaystyle \mathrm {median} ={\frac {x{({\frac {n}{2}})}+x{({\frac {n}{2}}+1)}}{2}}}$$

In [None]:
def median_basic(data):
    if (len(data) % 2 == 0):
        for i in data:
            for j in data:
                if (data[i] < data[j]):
                    temp = data[i]
                    data[i] = data[j]
                    data[j] = temp
        mid = int(len(data) / 2)
        return (data[mid] + data[mid + 1]) / 2

    else:
        mid = int(len(data) / 2)
        return data[mid + 1]

# Mode

In [None]:
def mode(data):
    data_set = list(set(data))
    frequency = [(data.count(i),i) for i in data_set]
    return max(frequency)

In [None]:
# test for mutlimodality 