# Numerical Descriptive Measures

#### Loading required libraries

In [63]:
import numpy as np

#### Defining a list of elements

In [64]:
vec = [20,34,56,23,45,22,60,23,56,78,23,45]

#### Mean
The arithmetic mean (typically referred to as the mean) is the most common measure of central tendency.

$\large\frac{\large \text{Sum of the values}}{\large \text{Number of values}} \large \rightarrow$$\large \sum_{i=1}^{n} \large \frac{x_i}{n}$

In [65]:
print("Average is : " + str(round(np.mean(vec),2)))

Average is : 40.42


#### Median
The median is the middle value in an ordered array of data that has been ranked from smallest to largest or largest to smallest

$\large \text{Median} = \large \frac{(n+1)}{2} \text{ranked value}$

In [66]:
print("Median is : "+str(round(np.median(vec),2)))

Median is : 39.5


* For symmetrical distributed data mean, median and mode are almost equal in value
* For asymmetrical distributed data, following relationship holds good approximately
* Mode = 3 * Median - 2 * Mean (or)
* Mean - Mode = 3 * (Mean - Median)
* Above realtion is called as empirical relation. Using this if two measures are known, it is easy to find out the third measure

#### Harmonic Mean
The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals

$\Large\frac {\Large n}{\Large \sum_{\Large i=1}^{\Large n}\Large (\frac{\Large 1}{\Large x_i})}$

In [67]:
def hm_(lis):
    sum_ = 0
    for i in range(len(lis)):
        sum_ = sum_ + (1/lis[i]) 
    return(len(lis)/sum_)
        

In [68]:
print("Harmonic mean is : " + str(round(hm_(vec),2)))

Harmonic mean is : 32.88


#### Geometric Mean
When you want to measure the rate of change of a variable over time, you need to use the geo- metric mean instead of the arithmetic mean

$\large \bar X_g = (X_1*X_2*X_3 * \ .... \ X_n )^\frac{1}{n}$


In [69]:
def gm_(lis):
    return(np.prod(lis)**(1/len(lis)))

In [70]:
print('Geometric mean is : '+ str(gm_(vec)))

Geometric mean is : 36.399618874358126


* The arithmetic mean is appropriate if the values have the same units
* The geometric mean is appropriate if the values have differing units
* The harmonic mean is appropriate if the data values are ratios of two variables with different measures, called rates

### Variation and Shape

#### Range
Range is the difference between largest and smallest element. It is the simplest descriptive measure of variation.\
$\sf \normalsize Range = (\large x_{max}-\large x_{min})$

In [71]:
def range_(lis):
    return (np.max(lis) - np.min(lis))

In [72]:
print("Range is : "+str(range_(vec)))

Range is : 58


#### Variance
Variance measures the average scatter around the mean.\
$\large S^2 (\text{sample variance}) \small =\large \frac{\sum_{i=1}^{n}{(x_i-\bar X)^2}}{n-1}$

In [73]:
def variance_(lis):
    mean_ = np.mean(lis)
    numerator = 0
    for i in range(len(lis)):
        numerator += (lis[i] - mean_)**2
    return numerator/(len(lis))

In [74]:
print("Variance is : "+ str(round(variance_(vec),2)))

Variance is : 335.91


#### Standard deviation
It is the square root of the variance.\
$\large S (\text{sample standard deviation}) \small =\large \sqrt \frac{\sum_{i=1}^{n}{(x_i-\bar X)^2}}{n-1}$

In [75]:
print("Standard deviation is : "+str(round(np.sqrt(variance_(vec)),2)))

Standard deviation is : 18.33


In [76]:
# Validating with numpy standard deviation formula
round(np.std(vec),2)

18.33

#### Coefficient of Variation
It measures the scatter in the data with respect to the mean.\
$\large (\frac {\sigma}{\mu})*100$

In [77]:
def CoeffVar_(lis):
    return (np.sqrt(variance_(lis))/np.mean(lis))*100

In [80]:
print("Coefficient of Variation is : " + str(round(CoeffVar_(vec),2)))

Coefficient of Variation is : 45.35
