# Interquartile Range

The interquartile range is a measure of where the “middle fifty” is in a data set. Where a range is a measure of where the beginning and end are in a set, an interquartile range is a measure of where the bulk of the values lie. That’s why it’s preferred over many other measures of spread (i.e. the average or median) when reporting things like school performance or SAT scores.

The interquartile range formula is the first quartile subtracted from the third quartile:
IQR = Q3 – Q1

In [38]:
numbers_ir = [1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27]
numbers_ir.sort()
numbers_ir

[1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27]

In [34]:
# 1) find median
median_index = int(len(numbers_ir) / 2)
median = numbers_ir[median_index]
print(f"Median: {median}")

Median: 9


In [35]:
# 2) find Q1 - as a median in the lower half of the data
Q1_index = int(len(numbers_ir[:median_index]) / 2)
Q1 = numbers_ir[Q1_index]
print(f"Q1: {Q1}")

Q1: 5


In [36]:
# 3) find Q3 - as a median for the upper half of data
Q3_index = median_index + int(len(numbers_ir[median_index:]) / 2)
Q3 = numbers_ir[Q3_index]
print(f"Q3: {Q3}")

Q3: 18


In [37]:
# 3) calculate interquartile range
interquartile_range = Q3 - Q1
print(f"interquartile_range: {Q3}")

interquartile_range: 18


# Range
In statistics, the range is a measure of spread: it’s the difference between the highest value and the lowest value in a data set.

In [39]:
numbers_range = [7, 10, 21, 33, 43, 45, 45, 65, 67, 87, 98, 99]
numbers_range.sort()
numbers_range

[7, 10, 21, 33, 43, 45, 45, 65, 67, 87, 98, 99]

In [40]:
range_res = numbers_range[-1] - numbers_range[0]
print(f"range: {range_res}")

range: 92


# Five-Number Summary
The five number summary includes 5 items:
* The minimum.
* Q1 (the first quartile, or the 25% mark).
* The median.
* Q3 (the third quartile, or the 75% mark).
* The maximum.

The five number summary gives you a rough idea about what your data set looks like. for example, you’ll have your lowest value (the minimum) and the highest value (the maximum). Although it’s useful in itself, the main reason you’ll want to find a five-number summary is to find more useful statistics, like the interquartile range, sometimes called the middle fifty.

# Sampling Error Margin of Error
Errors happen when you take a sample from the population rather than using the entire population. In other words, it’s the difference between the statistic you measure and the parameter you would find if you took a census of the entire population.

If you were to survey the entire population (like the US Census), there would be no error. It’s nearly impossible to calculate the error margin. However, when you take samples at random, you estimate the error and call it the margin of error.

Formula: the formula for the margin of error is 1/√n, where n is the size of the sample. For example, a random sample of 1,000 has about a 1/√n; = 3.2% error.

# Variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of (random) numbers are spread out from their average value.

In [22]:
population = [2, 4, 4, 4, 5, 5, 7, 9]

In [23]:
# 1) calc mean
mean = sum(i for i in population) / len(population)
print(f"mean: {mean}")

mean: 5.0


In [24]:
# 2) calc variance - mean of devations from origin mean
deviations_mean_var = [(i - mean_var) ** 2 for i in population_var]
variance = sum(i for i in deviations_mean_var) / len(deviations_mean_var)
print(f"variance: {variance}")

variance: 4.0


# Standard deviation
In statistics, the standard deviation (SD, also represented by the lower case Greek letter sigma σ for the population standard deviation or the Latin letter s for the sample standard deviation) is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.

In [25]:
# calc standard deviation - square root of variance
std_dev = variance ** 0.5
print(f"std_dev: {std_dev}")

std_dev: 2.0
