In [18]:
import numpy as np
from scipy import stats

# Descriptive Statistics
1. [Arithmetic Mean](#arithmetic-Mean)
1. [Median](#median)
1. [Mode](#mode)
1. [Range](#range)
1. [Interquartile Range (IQR)](#iqr)

### <a id="arithmetic-mean">Arithmetic Mean</a>
$$
\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i
$$

$\bar{x}$ = arithmetic mean  
$n$ = number of data points  
$x_i$ = the $i$-th data point in the dataset

---
The arithmetic mean, often simply called the mean, is a measure of central tendency that represents the "average" of a set of numbers. <br>It gives you a sense of the "center" of the data by taking the sum of all the values and dividing it by the number of values.


In [37]:
data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

arithmetic_mean = np.mean(data)
print(f"Arithmetic Mean = {arithmetic_mean}")

Arithmetic Mean = 5.0


### <a id="median">Median</a>
$$
\text{Let } x_1, x_2, \ldots, x_n \text{ be a sorted sequence (in increasing order).} \\
\text{median}(x) = 
\begin{cases}
x_{\left( \frac{n+1}{2} \right)}, & \text{if } n \text{ is odd} \\
\frac{1}{2} \left( x_{\left( \frac{n}{2} \right)} + x_{\left( \frac{n}{2} + 1 \right)} \right), & \text{if } n \text{ is even}
\end{cases} \\
$$
$ \text{For odd } n\text{: take the middle value.}$ <br>
$ \text{For even } n\text{: take the average of the two middle values.}$

---
The median is the middle value in a sorted list of numbers. It's the point that splits the data in half — 50% of the values lie below it, and 50% lie above it.<br>
It gives you a sense of the "typical" value in a dataset — especially when the data is skewed or contains outliers.


In [38]:
data = np.array([0, 1, 2, 50, 98, 99, 100])

median = np.median(data)
print(f"Median = {median}")

Median = 50.0


### <a id="mode">Mode</a>

In statistics, the mode is the value that appears most frequently in a dataset.



In [39]:
data = np.array([1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3])

mode = stats.mode(data)
print(f"Most frequent number: {mode[0]}")
print(f"Occurences: {mode[1]}")

Most frequent number: 1
Occurences: 5


### <a id="range">Range</a>

$$
Range = x_{max} - x_{min}
$$

---
The range is one of the simplest measures of dispersion — it tells you how spread out the data is. <br>
It shows the total span of the dataset from the lowest to the highest value.

In [40]:
data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

range = max(data) - min(data)
print(range)

10


### <a id="iqr">Interquartile Range</a>

$$
IQR = Q_3 - Q_1 \\

Q_1 = \text{First quartile (25th percentile)}\\
Q_3 = \text{First quartile (75th percentile)}
$$

---
IQR stands for Interquartile Range. It measures the spread of the middle 50% of a dataset. <br>
The IQR shows the range between the 25th and 75th percentile — where the bulk of the data lies.

In [49]:
data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
iqr = stats.iqr(data)
print(f"Q1 = {percentile_25}")
print(f"Q3 = {percentile_75}")
print(f"IQR = {iqr}")

Q1 = 2.5
Q3 = 7.5
IQR = 5.0


### Skewness

$$
\hat{\gamma}_m = \frac{n}{(n - 1)(n - 2)} \sum_{i=1}^{n} \left( \frac{x_i - \bar{x}}{s} \right)^3
$$

---
The Fisher-Pearson moment coefficient of skewness is a way to measure how asymmetrical a distribution is. It's also known as Pearson's moment coefficient of skewness.<br>
* The value of the skewness measure can be positive or negative, or even undefined 
* The higher the absolute value of the skewness measure, the more asymmetric the distribution 
* A symmetrical distribution has a skewness of zero 

Not the official definition of skewness, but a simple approximation that is often used for intuitive estimation of skewness is "Pearson's skewness".

$$
\text{Skewness} = \frac{\text{Mean - Median}}{\text{Standard Deviation}}
$$

Skewness > 0 = Positive Skewness. High outliers. Most values low. <br>
Skewness = 0 = Symmetric dataset. Curve normally distributed. <br>
Skewness < 0 = Negative Skewness. Low outliers. Most values high. <br>

In [None]:
data = np.array([0, 1, 2, 50, 98, 99, 100, 111, 222, 333, 444])

skew = stats.skew(data)
print(f"Skew = {skew}")

Skew = 1.0981235937257903
