In [1]:
import numpy as np

# Statistics

## Mean

In [2]:
# vector mean
# define vector
v = np.array([1, 2, 3, 4, 5, 6])
print(v)

# calculate mean
result = np.mean(v)
print(result)

[1 2 3 4 5 6]
3.5


The mean function can calculate the row or column means of a matrix by specifying the
axis argument and the value 0 or 1 respectively. The example below de nes a 2   6 matrix and
calculates both column and row means.

In [4]:
# matrix means
# define matrix
M = np.array([
    [1,2,3,4,5,6],
    [1,2,3,4,5,6]
])
print(M)

# column means
col_mean = np.mean(M, axis=0)
print(col_mean)

# row means
row_mean = np.mean(M, axis=1)
print(row_mean)


[[1 2 3 4 5 6]
 [1 2 3 4 5 6]]
[1. 2. 3. 4. 5. 6.]
[3.5 3.5]


## Variance

In probability, the variance of some random variable X is a measure of how much values in the
distribution vary on average with respect to the mean. The variance is denoted as the function
V ar() on the variable.

### V ar[X]

Variance is calculated as the average squared difference of each value in the distribution
from the expected value. Stated another way, it is the expected squared difference from the
expected value.

### V ar[X] = E[(X - E[X])^2]

Assuming the expected value of the variable has been calculated (E[X]), the variance of the
random variable can be calculated as the sum of the squared difference of each example from
the expected value multiplied by the probability of that value.

In [7]:
# vector variance
# define vector
v = np.array([1, 2, 3, 4, 5, 6])
print(v)

# calculate variance
result = np.var(v, ddof=1)
print(result)

[1 2 3 4 5 6]
3.5


In [9]:
# The var function can calculate the row or column variances of a matrix by specifying the
# axis argument and the value 1 or 0 respectively, the same as the mean function above. The
# example below de nes a 2 x 6 matrix and calculates both column and row sample variances.

# define matrix
M = np.array([
    [1,2,3,4,5,6],
    [1,2,3,4,5,6]
])
print(M)

# column variances
col_var = np.var(M, ddof=1, axis=0)
print(col_var)

# row variances
row_var = np.var(M, ddof=1, axis=1)
print(row_var)

[[1 2 3 4 5 6]
 [1 2 3 4 5 6]]
[0. 0. 0. 0. 0. 0.]
[3.5 3.5]


## Standard Deviation

The standard deviation is calculated as the square root of the variance and is denoted as
lowercase s. 

In [12]:
# matrix standard deviation
# define matrix
M = np.array([
    [1,2,3,4,5,6],
    [1,2,3,4,5,6]
])
print(M)

# column standard deviations
col_std = np.std(M, ddof=1, axis=0)
print(col_std)

# column standard deviations
row_std = np.std(M, ddof=1, axis=1)
print(row_std)

[[1 2 3 4 5 6]
 [1 2 3 4 5 6]]
[0. 0. 0. 0. 0. 0.]
[1.87082869 1.87082869]


## Covariance and Correlation

In probability, covariance is the measure of the joint probability for two random variables. It
describes how the two variables change together. It is denoted as the function cov(X; Y ), where
X and Y are the two random variables being considered.

### cov(X, Y )

The sign of the covariance can be interpreted as whether the two variables change together
(positive) or change in opposite directions (negative). A covariance value of zero indicates that both variables are completely
independent.

In [19]:
# The example below defines two vectors of equal length with one increasing and one decreasing.
# We would expect the covariance between these variables to be negative.

# define first vector
x = np.array([1,2,3,4,5,6,7,8,9])
print(x)

# define second covariance
y = np.array([9,8,7,6,5,4,3,2,1])
print(y)

# calculate covariance matrix
Sigma = np.cov(x, y)

# We access just the covariance for the two variables as the [0, 1] element of the square covariance matrix returned.
print(Sigma[0, 1])



[1 2 3 4 5 6 7 8 9]
[9 8 7 6 5 4 3 2 1]
-7.5


The covariance can be normalized to a score between -1 and 1 to make the magnitude
interpretable by dividing it by the standard deviation of X and Y . The result is called the
correlation of the variables, also called the Pearson's correlation coe cient, named for the
developer of the method.

In [17]:
# define first vector
x = np.array([1,2,3,4,5,6,7,8,9])
print(x)

# define second vector
y = np.array([9,8,7,6,5,4,3,2,1])
print(y)

# calculate correlation
corr = np.corrcoef(x,y)[0,1]
print(corr)

[1 2 3 4 5 6 7 8 9]
[9 8 7 6 5 4 3 2 1]
-1.0


## Covariance Matrix

The covariance matrix is a square and symmetric matrix that describes the covariance between
two or more random variables. The diagonal of the covariance matrix are the variances of each
of the random variables, as such it is often called the variance-covariance matrix. A covariance
matrix is a generalization of the covariance of two variables and captures the way in which all
variables in the dataset may change together. 

A covariance matrix is a calculation of covariance of a given matrix with
covariance scores for every column with every other column, including itself.

The covariance matrix provides a useful tool for 
separating the structured relationships in a matrix of random variables. This can be used to 
decorrelate variables or applied as a transform to other variables. It is a key element used in 
the Principal Component Analysis data reduction method

In [20]:
# define matrix of observations
X = np.array([
    [1, 5, 8],
    [3, 5, 11],
    [2, 4, 9],
    [3, 6, 10],
    [1, 5, 10]
])
print(X)
# calculate covariance matrix
Sigma = np.cov(X.T)
print(Sigma)

[[ 1  5  8]
 [ 3  5 11]
 [ 2  4  9]
 [ 3  6 10]
 [ 1  5 10]]
[[1.   0.25 0.75]
 [0.25 0.5  0.25]
 [0.75 0.25 1.3 ]]
