## Chapter 1: Exploratory Data Analysis: Estimates of Variability - Page 40
    Contains Functions:
        Variance
        Standard Deviation
        Mean Absolute Deviation (MAD)
        MAD from the median
        Range
        Percentile
        Interquantile Range

In [9]:
# import modules
import math
import random

In [10]:
# define helper function
def __get_median(num_list):
    num_range = len(num_list)
    if (num_range % 2) == 0:
        mid = int(num_range/2)
        mid_1 = mid - 1
        mid_2 = mid
        median = (num_list[mid_1]+num_list[mid_2])/2
    else:
        mid = int(num_range/2)
        median = num_list[mid]

    return median

### Variance
    - In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of (random) numbers are spread out from their average value. 

In [11]:
def variance(num_range=10):
    rand_int_list =[random.randint(1,num_range) for i in range(num_range)]
    print(rand_int_list)

    #compute mean
    sum = 0
    for rand_int in rand_int_list:
        sum+=rand_int
    mean = sum/num_range

    #compute variance
    sum = 0
    for rand_int in rand_int_list:
        sum+=(rand_int - mean)**2
    variance = sum/num_range-1

    return variance

print('Variance')
variance()

Variance
[10, 3, 3, 2, 6, 3, 5, 6, 10, 4]


6.359999999999999

### Mean Absolute Deviation
    - The average absolute deviation about any certain point of a data set is the average of the absolute deviations or the positive difference of the given data and that certain value

In [12]:
def mean_absolute_deviation(num_range=10):
    rand_int_list =[random.randint(1,num_range) for i in range(num_range)]
    print(rand_int_list)

    #compute mean
    sum = 0
    for rand_int in rand_int_list:
        sum+=rand_int
    mean = sum/num_range

    #compute mean absolute deviation
    sum = 0
    for rand_int in rand_int_list:
        sum+=(rand_int - mean)
    mad = sum/num_range-1

    return mad
print('Mean absolute deviation')
mean_absolute_deviation()

Mean absolute deviation
[5, 7, 6, 9, 9, 9, 4, 5, 2, 3]


-1.0000000000000004

### Standard Deviation
    - In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range

In [13]:
def standard_deviation(num_range=10):
    rand_int_list =[random.randint(1,num_range) for i in range(num_range)]
    print(rand_int_list)

    #compute mean
    sum = 0
    for rand_int in rand_int_list:
        sum+=rand_int
    mean = sum/num_range

    #compute variance
    sum = 0
    for rand_int in rand_int_list:
        sum+=(rand_int - mean)**2   #square every value
    variance = sum/num_range-1

    standard_deviation = math.sqrt(variance)    #standard deviation is just the square root of variance

    return standard_deviation

print('Standard deviation')
standard_deviation()

Standard deviation
[7, 6, 2, 2, 9, 7, 6, 9, 1, 5]


2.537715508089904

### Median Absolute Deviation
    -In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample.

In [20]:
def median_absolute_deviation(num_range=100):
    rand_int_list =[random.randint(1,num_range) for i in range(num_range)]
    #print(rand_int_list)

    #compute median
    median = __get_median(rand_int_list)
    n_list = []
    for rand_int in rand_int_list:
        n_list.append(rand_int - median)
    print(n_list)
    median_abs_deviation = __get_median(n_list)
    return median_abs_deviation

print('Median Absolute Deviation')
median_absolute_deviation()

Median Absolute Deviation
[-31.5, 32.5, 19.5, 8.5, 32.5, 2.5, 15.5, -12.5, 13.5, -28.5, 20.5, -21.5, 5.5, 19.5, -29.5, 6.5, -38.5, -52.5, 13.5, -21.5, -61.5, -33.5, -66.5, 9.5, -51.5, -44.5, -52.5, -42.5, -20.5, -31.5, -29.5, -48.5, 30.5, -36.5, -24.5, -55.5, -40.5, 3.5, -45.5, -64.5, -7.5, -59.5, -51.5, -23.5, 20.5, -36.5, 11.5, -63.5, -28.5, 32.5, -32.5, -43.5, -36.5, 19.5, -40.5, -62.5, -31.5, -40.5, -55.5, 30.5, -60.5, -30.5, -13.5, -50.5, -18.5, -61.5, -48.5, -40.5, -50.5, -57.5, -55.5, -44.5, -26.5, -61.5, -5.5, -28.5, 32.5, -53.5, 25.5, -44.5, -45.5, -11.5, -13.5, 17.5, 20.5, -20.5, -53.5, -46.5, -1.5, -65.5, -43.5, 5.5, -0.5, -28.5, -8.5, 18.5, -17.5, -55.5, 11.5, -5.5]


0.0

### Interquatile Range (IQR)
    -In descriptive statistics, the interquartile range, also called the midspread or middle 50%, or technically H-spread, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles

In [22]:
def interquantile_range(num_range=10):
    rand_int_list =[random.randint(1,num_range) for i in range(num_range)]
    print(rand_int_list)

    rand_int_list = sorted(rand_int_list)   #sort data 
    mid_point = int(len(rand_int_list)/2)
    if (len(rand_int_list)%2) != 0:
        q1 = rand_int_list[0:mid_point]
        q2 = rand_int_list[mid_point+1:len(rand_int_list)]
    else:
        q1 = rand_int_list[0:mid_point]
        q2 = rand_int_list[mid_point:len(rand_int_list)]

    median_q1 = __get_median(q1)
    median_q2 = __get_median(q2)

    iqr = median_q2 - median_q1

    return iqr

print('IQR')
interquantile_range()

IQR
[1, 8, 8, 5, 9, 10, 2, 9, 10, 3]


6

### Percentile
    -A percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. For example, the 20th percentile is the value below which 20% of the observations may be found

In [25]:
def percentile(num_range=10,P=0.2):
    rand_int_list =[random.randint(1,num_range) for i in range(num_range)]
    print(rand_int_list)

    rand_int_list = sorted(rand_int_list)   #sort data

    ord_rank = (P/100)/len(rand_int_list)

    return ord_rank

print('Percentile')
percentile()

Percentile
[1, 4, 9, 9, 4, 9, 6, 8, 9, 8]


0.0002