In [1]:
import numpy as np
from scipy import stats

In [3]:
players_paycheck = [40000, 18000, 12000, 250000, 30000, 140000, 300000, 40000, 800000]

# Measures

## Centrality
Aim to study how the values are centered.

### Mean
**WARNING**: This measure is influenced by all numbers, so it can show a distortion towards `median`, so use then together.

In [5]:
?np.mean

In [4]:
np.mean(players_paycheck)

181111.11111111112

### Median
In order to calculate it, the population must be ordered.

Rules:
- Odd N: `mean(N/2 + N/2+1)`
- Even N: `(N+1)/2`

In [7]:
?np.median

In [8]:
np.median(players_paycheck)

40000.0

### Mode
The population has some if a number repeats, otherwise it doesn't.

In [10]:
?stats.mode

In [9]:
stats.mode(players_paycheck)

ModeResult(mode=array([40000]), count=array([2]))

-------------

## Variability
Shows the distancies between the data.

### Variance
Measures how far a set of numbers are spread out from their average value.

Formulas:
- Population: $\sigma² = \frac{\sum x_{i} - \bar{x}}{N}$
    + $\bar{x}$ == sample mean
- Sample:  $s² = \frac{\sum x_{i} - \bar{x}}{n-1}$

In [12]:
?np.var

In [15]:
# Population
population_variance = np.var(players_paycheck)
print(population_variance)
# 
sample_variance = np.var(players_paycheck, ddof=1)
print(sample_variance)

57939654320.98765
65182111111.11111


### Standard deviation
Measures the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

In [16]:
?np.std

In [17]:
# Population
population_std = np.std(players_paycheck)
print(population_std)
# 
sample_std = np.std(players_paycheck, ddof=1)
print(sample_std)

240706.5730739143
255307.87514511007


### Amplitude
Difference between `max(N) - min(N)`

In [18]:
np.max(players_paycheck) - np.min(players_paycheck)

788000

### Quarters
Shows how and where the data are distributed.

Quarters:
- Q1: 25% of the smallest values
- Q2: 50%, equals median
- Q3: 75% of the biggest values

In [19]:
?np.quantile

In [22]:
np.quantile(players_paycheck, [0, 0.25, 0.50, 0.75, 1])

array([ 12000.,  30000.,  40000., 250000., 800000.])

In [24]:
# All sample stats

stats.describe(players_paycheck)

DescribeResult(nobs=9, minmax=(12000, 800000), mean=181111.11111111112, variance=65182111111.11111, skewness=1.758635899846188, kurtosis=1.9572075427527729)