In [None]:
import matplotlib.pyplot as plt
import numpy as np

# set random seed
np.random.seed(42)

# Descriptive statistics

Descriptive statistics provides a summary of the main aspects of the dataset (sample), which we can quantify:

* Measures of Central Tendency: mean, median
* Measures of Dispersion (or Spread): variance, standard deviation
* Measures of Shape: percentile and quantile

In [None]:
# generate a random sample
X = np.random.normal(loc=1.0, scale=5.0, size=1000)

In [None]:
# mean
x_mean = np.mean(X)
x_mean

In [None]:
# median
x_median = np.median(X)
x_median

In [None]:
# sample variance
x_var = np.var(X, ddof=1)
x_var

In [None]:
# standard deviation
np.sqrt(x_var)

In [None]:
np.std(X, ddof=1)

In [None]:
# quantile
x_25th_quantile = np.quantile(X, q=0.25)
x_25th_quantile

In [None]:
x_50th_quantile = np.quantile(X, q=0.5)
x_50th_quantile

## Visual representation

### Histogram

A histogram is a graphical representation of the distribution of a dataset. It is an estimate of the probability distribution of a random variable. 

To construct a histogram, the first step is to "bin" the range of values — that is, divide the entire range of values into a series of intervals — and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable.

In [None]:
# make the histogram
plt.hist(X, bins=50)
plt.show()

### Boxplot

A boxplot, also known as a whisker plot or box-and-whisker plot, is a standardized way of displaying the distribution of data based on a summary, based on the minimum, the maximum, the sample median, the first and third quartiles. Other quantities include IQR (Inter Quantile Range), which is the difference between the first and the thrid quantiles.

In [None]:
# make the box-plot
plt.boxplot(X)
plt.show()

## Examples of well-known distributions

### Bernoulli

The Bernoulli distribution is a discrete probability distribution for a random variable which can take on one of two possible outcomes, often labeled 0 and 1. It's a special case of the binomial distribution where a single trial is conducted.

In [None]:
from scipy.stats import bernoulli

p = 0.6
k = [0, 1]

pmf = bernoulli.pmf(k=k, p=p)

_, ax = plt.subplots(1, 1)
ax.plot(k, pmf, "bo")
_ = ax.vlines(x=k, ymin=0, ymax=pmf, colors="b", lw=5, alpha=0.5)

### Binomial

The Binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials, each with the same probability of success.

In [None]:
from scipy.stats import binom

p = 0.6
n = 100

k = np.arange(start=0, stop=(n + 1), step=1)
pmf = binom.pmf(k=k, n=n, p=p)

_, ax = plt.subplots(1, 1)
ax.plot(k, pmf, "bo")
_ = ax.vlines(x=k, ymin=0, ymax=pmf, colors="b", lw=2, alpha=0.5)

### Geometric 

The geometric distribution is a discrete probability distribution that describes the number of Bernoulli trials required for a success to occur for the first time. It's a model for the "waiting time" until the first success.

In [None]:
from scipy.stats import geom

p = 0.2

k = np.arange(start=0, stop=(n + 1), step=1)
pmf = geom.pmf(k=k, p=p)

_, ax = plt.subplots(1, 1)
ax.plot(k, pmf, "bo")
_ = ax.vlines(x=k, ymin=0, ymax=pmf, colors="b", lw=2, alpha=0.5)

### Uniform

The uniform distribution is the probability distribution in which all outcomes (from within a range) are equally likely. 

In [None]:
from scipy.stats import uniform

x = np.arange(start=-1, stop=2, step=0.01)
pdf = uniform.pdf(x)

_, ax = plt.subplots(1, 1)
_ = ax.plot(x, pdf, "r-")

### Normal

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that has a bell-shaped probability density function. It is one of the most important and widely used distributions in statistics and natural sciences due to its descriptive power for many natural phenomena.

In [None]:
from scipy.stats import norm

x = np.arange(start=-5, stop=5, step=0.01)
pdf = norm.pdf(x)

_, ax = plt.subplots(1, 1)
_ = ax.plot(x, pdf, "r-")