# Calculating Descriptive Statistics

In [1]:
import math
import statistics
import numpy as np
import scipy.stats
import pandas as pd

In [2]:
x = [8.0, 1, 2.5, 4, 28.0]
x_with_nan = [8.0, 1, 2.5, math.nan, 4, 28.0]

Now you have the lists x and x_with_nan. They’re almost the same, with the difference that x_with_nan contains a nan value. It’s important to understand the behavior of the Python statistics routines when they come across a not-a-number value (nan).

In [3]:
x

[8.0, 1, 2.5, 4, 28.0]

In [4]:
x_with_nan

[8.0, 1, 2.5, nan, 4, 28.0]

In [5]:
math.isnan(np.nan), np.isnan(math.nan)

(True, True)

# Central Tendency 

The sample mean, also called the sample arithmetic mean or simply the average, is the arithmetic average of all the items in a dataset. The mean of a dataset 𝑥 is mathematically expressed as Σᵢ𝑥ᵢ/𝑛, where 𝑖 = 1, 2, …, 𝑛. In other words, it’s the sum of all the elements 𝑥ᵢ divided by the number of items in the dataset 𝑥.

In [6]:
mean = sum(x) / len(x)
mean

8.7

In [7]:
mean = statistics.mean(x)
mean

8.7

fmean() is introduced in Python 3.8 as a faster alternative to mean(). It always returns a floating-point number.

In [8]:
mean = statistics.fmean(x)
mean

8.7

In [9]:
mean = statistics.mean(x_with_nan)
mean



nan

In [10]:
mean = statistics.fmean(x_with_nan)
mean

nan

# Use Numpy

The function mean() and method .mean() from NumPy return the same result as statistics.mean(). This is also the case when there are nan values among your data

In [11]:
mean = np.mean(x)
mean

8.7

You often don’t need to get a nan value as a result. If you prefer to ignore nan values, then you can use np.nanmean()

In [16]:
np.nanmean(x_with_nan)

8.7

# Weighted Mean


In [12]:
x = [8.0, 1, 2.5, 4, 28.0]
w = [0.1, 0.2, 0.3, 0.25, 0.15]
wmean = sum(w[i] * x[i] for i in range(len(x))) / sum(w)
wmean



6.95

In [13]:
wmean = sum(x_ * w_ for (x_, w_) in zip(x, w)) / sum(w)
wmean

6.95

# Harmonic Mean

The harmonic mean is the reciprocal of the mean of the reciprocals of all items in the dataset: 𝑛 / Σᵢ(1/𝑥ᵢ), where 𝑖 = 1, 2, …, 𝑛 and 𝑛 is the number of items in the dataset 𝑥. One variant of the pure Python implementation of the harmonic mean is this

In [14]:
hmean = len(x) / sum(1 / item for item in x)
hmean

2.7613412228796843

In [15]:
hmean = statistics.harmonic_mean(x)
hmean

2.7613412228796843