# Measurements of a set

## Basics

We have three basic statistical measurements of a set:
- min
- max
- range
- median
- mean
- mode

### Min

Obviously ...

In [1]:
# Basic python imports
import numpy as np
import pandas as pd
np.random.seed(42)

In [2]:
def create_basic_table():
    row_1 = np.random.randint(1,100,100)
    row_2 = np.random.randint(1,100,100)
    row_3 = np.random.randint(1,100,100)
    row_4 = np.random.randint(1,100,100)
    df = pd.DataFrame(zip(row_1,row_2,row_3,row_4), columns=['Class1','Class2','Class3','Class4'])
    return df

In [3]:
class_scores= create_basic_table()

In [4]:
class_min = class_scores['Class1'].min()
print(class_min)

2


### Max

Obviously again ...

In [5]:
class_max = class_scores['Class1'].max()
print(class_max)

95


### Range

Still obvious but it is the difference of Maximum and Minimum. So $Range= Max - Min$

In [6]:
class_range = class_max - class_min
print(class_range)

93


### Mean

Average of all of the elements within a set. </br>
$$\bar x= \mu = \frac{\sum_{i=1}^{n} x_i}{n}$$ </br>
Where:
- μ (mu) represents the population mean.
- ∑ (sigma) denotes summation.
- xi represents the individual data points.
- n represents the sample size.

In [7]:
class_scores['Class1'].mean()

50.69

### Mode

The value which has the most population in a set

In [8]:
class_scores['Class1'].mode()

0    62
Name: Class1, dtype: int32

### Median

Middle value of a range

In [9]:
class_scores['Class1'].median()

53.5

This is writtern 53.5 because our set has even numbers of elements so it would be between 54 and 53

## Advance

### Deviation

Distance of each element from mean of the set. It is meaningless to calculate and sum all of the deviations Since it would always be 0.

$$Deviation = \frac{(x_i - \bar x)+ ...+(x_1 - \bar x)}{n} = \frac{\sum_{i=1}^{n}(x_i - \bar x)}{n}$$

$$ Deviation * n = \sum_{i=1}^{n} x_i - n*\bar x = \sum_{i=1}^{n} x_i -\sum_{i=1}^{n} \bar x =  \sum_{i=1}^{n} x_i -\frac{n}{n}\sum_{i=1}^{n} x_i  = 0$$

### Mean absolute deviation

Since absolute deviation is useless, the best way is to sum the absolute (non negative) value of all of the deviations. The purpose of deviation calculation is how much spread is in a set. so it would be like this in mathematical terms:

$$\text{Mean Absolute Deviation (MAD)} = \frac{1}{n} \sum_{i=1}^{n} |x_i - \bar x|$$

In [10]:
def mean_absolute_deviation(data:pd.Series):
    mean = data.mean()
    mad = 0
    for i in data:
        d=np.absolute(i-mean)
        mad +=d
    mad /= len(data)
    return mad

In [11]:
mad = mean_absolute_deviation(class_scores['Class1'])
print(mad)

24.567199999999993


### Variance

Similar to MAD, Variance ($\sigma^2$) has the same purpose but instead of absolute values, we use square of each deviation. The only downside is the unit of Variance is the square of the original unit. For example: If the set of heights of 100 people is based on Meter $m$, their Variance unit would be square of Meter $m^2$

$$\sigma^2 = \frac{\sum_{i=1}^{n} (x_i - \bar x)^2}{n}$$
Explanation:
- σ² (sigma squared) represents the variance.
- ∑ (sigma) denotes the summation.
- x_i represents each individual data point.
- n represents the number of data points.
- μ (mu) represents the mean of the data set.

In [12]:
class_scores['Class1'].var()

838.4786868686867

### Standard Deviation

To eliminate the caveat of Variance unit discrepancy, just calculate the square root of variance.

$$\sigma = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar x)^2}{n}}$$

In [13]:
print(class_scores['Class1'].std())
print(class_scores['Class1'].var())
print(class_scores['Class1'].std()**2)

28.956496453623092
838.4786868686867
838.4786868686867


### Coefficient Variance

The coefficient of variation (CV) is a standardized measure of dispersion (spread) of a dataset relative to its mean. It expresses the standard deviation as a percentage of the mean. Essentially, it tells you how much more spread there is in one dataset compared to another, regardless of their actual values. This doesn't have any unit and enables us to compare the spread of two sets independent of each others range and value sizes. e.g: </br>
$a={2,5,10,17,20}$</br>
$b={55,56,70,66,63}$</br>
The $a$ has more spread but has less std and mean than $b$

$$\text{Coefficient of Variation(CV)} = \frac{\sigma}{\mu} \times 100\%$$

Explanation:

- \text{...}: This command ensures that the text is rendered as actual text, not as a mathematical symbol.
- $\sigma$: Represents the standard deviation.
- $\mu$: Represents the mean.

In [18]:
cv= class_scores['Class1'].std()/class_scores['Class1'].mean()
print(f'{cv:.2f}')
#Or like this
print('{:.2f}'.format(cv))

0.57
0.57
