# 1. Central Tendency

# Mean, Median, Mode

This notebook provides explanations and examples of three fundamental statistical measures: mean, median, and mode.



## Mean (Average)

The mean, commonly referred to as the average, is calculated by summing all the numbers in a dataset and then dividing by the count of those numbers. It is a measure of the central tendency of the data.

### Formula:
$	\text{Mean} = \frac{\sum_{i=1}^{n} x_i}{n} $

where $ x_i $ represents each number in the dataset, and $ n $ is the number of data points.

### Example:
Let's calculate the mean of the following dataset: 10, 20, 30, 40, 50


In [None]:

# Example code to calculate the mean
dataset = [10, 20, 30, 40, 50]
mean = sum(dataset) / len(dataset)
mean


In [2]:
x_list  = [4, 7, 4, 0, -3, 4, 7]
x_tuple = (4, 7, 4, 0, -3, 4, 7)

## Compute means explictly from the definition

In [3]:
list_total = 0
for x in x_list:
    list_total += x
    
tuple_total = 0
for t in x_tuple:
    tuple_total += t
    
list_average  = list_total/len(x_list)
tuple_average = tuple_total/len(x_tuple)
  
print(f'The average of x_list  = {list_average:.2f}')
print(f'The average of x_tuple = {tuple_average:.2f}')

The average of x_list  = 3.29
The average of x_tuple = 3.29


## statistics.mean function

In [4]:
import statistics

mean_list  = statistics.mean(x_list)
mean_tuple = statistics.mean(x_tuple)

print(f'statistics.mean(x_list)  = {mean_list:.2f}')
print(f'statistics.mean(x_tuple) = {mean_tuple:.2f}')

statistics.mean(x_list)  = 3.29
statistics.mean(x_tuple) = 3.29


## Create a NumPy array

In [5]:
import numpy as np
x_array = np.array(x_list)

## np.mean and np.ndarray.mean functions

In [6]:
np_mean_list       = np.mean(x_list)           # operate on a list
np_mean_tuple      = np.mean(x_tuple)          # operate on a list
np_mean_array      = np.mean(x_array)          # operate on an array
ndarray_mean_array = np.ndarray.mean(x_array)  # operate on an array only

print(f'np.mean(x_list)  = {np_mean_list:.2f}')
print(f'np.mean(x_tuple) = {np_mean_tuple:.2f}')
print(f'np.mean(x_array) = {np_mean_array:.2f}')
print(f'np.ndarray.mean(x_array) = {ndarray_mean_array:.2f}')

np.mean(x_list)  = 3.29
np.mean(x_tuple) = 3.29
np.mean(x_array) = 3.29
np.ndarray.mean(x_array) = 3.29


## 2-d array

In [7]:
matrix = np.array([[2, 5, 6, 3],
                   [1, 7, 0, 4],
                   [3, 1, 2, 6]])

mean_of_all  = np.mean(matrix)
mean_of_cols = np.mean(matrix, axis=0)
mean_of_rows = np.mean(matrix, axis=1)

print(f'np.mean(matrix) [all]          = {mean_of_all}')
print(f'np.mean(matrix, axis=0) [cols] = {mean_of_cols}')
print(f'np.mean(matrix, axis=1) [rows] = {mean_of_rows}')

np.mean(matrix) [all]          = 3.3333333333333335
np.mean(matrix, axis=0) [cols] = [2.         4.33333333 2.66666667 4.33333333]
np.mean(matrix, axis=1) [rows] = [4. 3. 3.]



## Median

The median is the **middle value** in a dataset when it is arranged in ascending order. If there is an even number of observations, the median is the average of the two middle numbers.

### Example:
Let's find the median of two datasets: 

1. An odd number of elements: 1, 3, 5, 7, 9
2. An even number of elements: 1, 2, 3, 4, 5, 6


In [1]:

# Median for odd number of elements
dataset_odd = [1, 3, 5, 7, 9]
sorted_dataset_odd = sorted(dataset_odd)
median_odd = sorted_dataset_odd[len(sorted_dataset_odd) // 2]
median_odd


5

In [2]:

# Median for even number of elements
dataset_even = [1, 2, 3, 4, 5, 6]
sorted_dataset_even = sorted(dataset_even)
midpoint = len(sorted_dataset_even) // 2
median_even = (sorted_dataset_even[midpoint - 1] + sorted_dataset_even[midpoint]) / 2
median_even


3.5

In [3]:
x_list_odd  = [6, 20, 15, 13, 2, 6, 8]      # sorted [2, 6, 6, 8, 13, 15, 20]
x_list_even = [6, 20, 15, 13, 2, 6, 8, 25]  # sorted [2, 6, 6, 8, 13, 15, 20, 25]

## Function to explicitly compute the median from its definition

In [4]:
def my_median(values):
    """
    Compute the median of a list of values.
    @param values the list of values.
    @return the median.
    """
    
    count = len(values)             # count of values     
    middle_index = count//2         # index of the middle value when count is odd
    sorted_values = sorted(values)  # must work with sorted values
    
    if count%2 == 1:  # odd number of values
        median_value = sorted_values[middle_index]
    else:             # even number of values
        low_median_value  = sorted_values[middle_index - 1]
        high_median_value = sorted_values[middle_index]
        median_value = (low_median_value + high_median_value)/2

    return median_value 

In [5]:
my_median_odd  = my_median(x_list_odd)
my_median_even = my_median(x_list_even)
  
print(f'my_median(x_list_odd)  = {my_median_odd:5.2f}')
print(f'my_median(x_list_even) = {my_median_even:5.2f}')

my_median(x_list_odd)  =  8.00
my_median(x_list_even) = 10.50


## statistics.median function

In [6]:
import statistics

median_list_odd  = statistics.median(x_list_odd)
median_list_even = statistics.median(x_list_even)

print(f'statistics.median(x_list_odd)  = {median_list_odd:5.2f}')
print(f'statistics.median(x_list_even) = {median_list_even:5.2f}')

statistics.median(x_list_odd)  =  8.00
statistics.median(x_list_even) = 10.50


## Create NumPy arrays

In [7]:
import numpy as np
x_array_odd  = np.array(x_list_odd)
x_array_even = np.array(x_list_even)

## np.median function

In [8]:
np_median_list_odd   = np.median(x_list_odd)    # operate on an odd-numbered list
np_median_list_even  = np.median(x_list_even)   # operate on an even-numbered list
np_median_array_odd  = np.median(x_array_odd)   # operate on an odd-numbered array
np_median_array_even = np.median(x_array_even)  # operate on an even-numbered array

print(f'np.median(x_list_odd)   = {np_median_list_odd:5.2f}')
print(f'np.median(x_list_even)  = {np_median_list_even:5.2f}')
print(f'np.median(x_array_odd)  = {np_median_array_odd:5.2f}')
print(f'np.median(x_array_even) = {np_median_array_even:5.2f}')

np.median(x_list_odd)   =  8.00
np.median(x_list_even)  = 10.50
np.median(x_array_odd)  =  8.00
np.median(x_array_even) = 10.50


## 2-d array

In [9]:
matrix = np.array([[2, 5, 6, 3],
                   [1, 7, 0, 4],
                   [3, 1, 2, 6]])

median_of_all  = np.median(matrix)
median_of_cols = np.median(matrix, axis=0)
median_of_rows = np.median(matrix, axis=1)

print(f'np.median(matrix) [all]          = {median_of_all}')
print(f'np.median(matrix, axis=0) [cols] = {median_of_cols}')
print(f'np.median(matrix, axis=1) [rows] = {median_of_rows}')

np.median(matrix) [all]          = 3.0
np.median(matrix, axis=0) [cols] = [2. 5. 2. 4.]
np.median(matrix, axis=1) [rows] = [4.  2.5 2.5]



## Mode

The mode is the number that appears most frequently in a dataset. There can be more than one mode if two or more numbers appear with the same highest frequency.

### Example:
Let's find the mode of the following dataset: 2, 3, 4, 2, 5, 4, 3, 2


In [8]:

from collections import Counter

# Example code to calculate the mode
dataset = [2, 3, 4, 2, 5, 4, 3, 2]
frequency = Counter(dataset)
mode = [num for num, freq in frequency.items() if freq == max(frequency.values())]
mode


[2]

In [12]:
x_list_unimodal = [4, 7, 4, 0, 3, 4, 7]  # sorted [0, 3, 4, 4, 4, 7, 7]
x_list_bimodal  = [5, 0, 5, 0, 3, 5, 0]  # sorted [0, 0, 0, 3, 5, 5, 5]

## Function to explicitly compute the mode from its definition

In [13]:
def modes_of_list(values):
    """
    Compute the mode(s) of a list of values.
    @param values the list of values.
    @return the list of mode(s)
    """
    
    count = len(values)
    modes = [];
    
    if count == 0:  # if no values then no modes
        return []
    
    sorted_values = sorted(values)     # sort to group the values
    longest_length = 0                 # longest length of consecutive values so far
    current_length = 1                 # current length of consecutive values
    previous_value = sorted_values[0]  # previous value in the sorted list
    
    for i in range(1, count + 1):      # start at 1, go up to and include the count
        if (i == count) or (sorted_values[i] != previous_value):
            # End of the values or the value has changed from the previous one.
            if current_length == longest_length:
                modes.append(previous_value)     # ties the longest_length, so append to modes
            elif current_length > longest_length:
                longest_length = current_length  # found a longer length
                modes = [previous_value]         # so start a new modes list
                
            if i < count:
                previous_value = sorted_values[i]  # start new consecutive values
                current_length = 1
        else:
            # The value hasn't changed, so increment the current length.
            current_length += 1
    
    return modes 

In [14]:
modes_unimodal = modes_of_list(x_list_unimodal)

print(f'modes_of_list(x_list_unimodal) = {modes_unimodal}')

modes_of_list(x_list_unimodal) = [4]


In [15]:
modes_bimodal = modes_of_list(x_list_bimodal)

print(f'modes_of_list(my_modes_bimodal) = {modes_bimodal}')

modes_of_list(my_modes_bimodal) = [0, 5]


## statistics.mode function

In [16]:
import statistics
mode_list_unimodal = statistics.mode(x_list_unimodal)

print(f'statistics.mode(x_list_unimodal)  = {mode_list_unimodal}')

statistics.mode(x_list_unimodal)  = 4


In [17]:
mode_list_bimodal = statistics.mode(x_list_bimodal)
print(f'statistics.mode(x_list_bimodal)  = {mode_list_bimodal}')

statistics.mode(x_list_bimodal)  = 5


## String values

In [18]:
colors = ['red', 'white', 'red', 'blue', 'white', 'red']
colors_mode = statistics.mode(colors)

print(f'statistics.mode(colors) = {colors_mode}')

statistics.mode(colors) = red


## Create NumPy arrays

In [19]:
import numpy as np
x_array_unimodal = np.array(x_list_unimodal)
x_array_bimodal  = np.array(x_list_bimodal)

## scipy.stats.mode function

In [20]:
from scipy import stats

stats_mode_list_unimodal = stats.mode(x_list_unimodal)    # operate on a unimodal list
stats_mode_list_bimodal  = stats.mode(x_list_bimodal)     # operate on a bimodal list

print(f'stats.mode(x_list_unimodal) = {stats_mode_list_unimodal}')
print(f'stats.mode(x_list_bimodal)  = {stats_mode_list_bimodal}')

stats.mode(x_list_unimodal) = ModeResult(mode=array([4]), count=array([3]))
stats.mode(x_list_bimodal)  = ModeResult(mode=array([0]), count=array([3]))


## 2-d arrays

In [21]:
matrix = np.array([[1, 3, 4, 2, 2, 1],
                   [5, 2, 2, 1, 4, 1],
                   [3, 3, 2, 2, 1, 1],
                   [0, 3, 4, 2, 1, 1]])

matrix_modes_columns = stats.mode(matrix)
print(f'stats.mode(matrix) = {matrix_modes_columns}')

stats.mode(matrix) = ModeResult(mode=array([[0, 3, 2, 2, 1, 1]]), count=array([[1, 3, 2, 3, 2, 4]]))


In [22]:
matrix_modes_all = stats.mode(matrix, axis=None)
print(f'stats.mode(matrix, axis=None) = {matrix_modes_all}')

stats.mode(matrix, axis=None) = ModeResult(mode=array([1]), count=array([8]))



# Median and Mode in Python

This notebook demonstrates how to easily calculate the median and mode using Python's `statistics` library.



## Median

The median is the middle value in a dataset when it is arranged in ascending order. If the dataset has an even number of observations, the median is the average of the two middle numbers.

The `statistics.median` function can be used to find the median.

### Example:
Let's calculate the median of the following dataset: 1, 2, 3, 4, 5


In [2]:

import statistics

# Dataset
data = [1, 2, 3, 4, 5]

# Calculate the median
median = statistics.median(data)
median


3

In [4]:
# Median for even number of elements
dataset_even = [1, 2, 3, 4, 5, 6]
# Calculate the median
median_even = statistics.median(dataset_even)
median_even

3.5


## Mode

The mode is the value that appears most frequently in a dataset. A dataset may have more than one mode if multiple values appear with the same highest frequency.

The `statistics.mode` function is used to find the mode.

### Example:
Let's calculate the mode of the following dataset: 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6


In [3]:

# Calculate the mode
mode = statistics.mode([1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6])
mode


4

In [7]:
# Example code to calculate the mode
dataset = [2, 3, 4, 2, 5, 4, 3, 2]
mode = statistics.mode(dataset)
mode

2

## Weighted Mean (Weighted Average)

In [1]:
import numpy as np

# Sample data and weights
data = np.array([3, 5, 7, 10, 15])
weights = np.array([1, 2, 3, 4, 5])

# Calculating weighted mean
weighted_mean = np.average(data, weights=weights)

# Displaying the result
print('Weighted Mean:', weighted_mean)

Weighted Mean: 9.933333333333334


In [2]:
import numpy as np

unique_values = [4, 7, 0, -3]
value_weights = [3, 2, 1, 1]
weighted_average = np.average(unique_values, weights=value_weights)

print('np.average(unique_values, weights=value_weights) = ' +
      f'{weighted_average:.2f}')

np.average(unique_values, weights=value_weights) = 3.29


In [3]:
import numpy as np

x_list  = [4, 7, 4, 0, -3, 4, 7]
unique_values, value_weights = values, frequencies = np.unique(x_list, return_counts=True)
weighted_average = np.average(unique_values, weights=value_weights)

print('np.average(unique_values, weights=value_weights) = ' +
      f'{weighted_average:.2f}')

np.average(unique_values, weights=value_weights) = 3.29


# Quartiles and Percentiles

## Quartiles

Quartiles are values that divide a dataset into four equal parts. The three quartiles are typically denoted as Q1 (first quartile), Q2 (second quartile or median), and Q3 (third quartile). These values represent the 25th, 50th, and 75th percentiles of the data, respectively.

### Example of Quartiles

Consider the dataset: `3, 7, 8, 5, 12, 14, 21, 15, 18, 14`.

1. First, we sort the data: `3, 5, 7, 8, 12, 14, 14, 15, 18, 21`.
2. Q2 (median) is the average of the middle two numbers: `(12 + 14) / 2 = 13`.
3. Q1 is the median of the first half: `(5 + 7) / 2 = 6`.
4. Q3 is the median of the second half: `(15 + 18) / 2 = 16.5`.



In [2]:
import numpy as np

# Dataset
data = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

# Calculate quartiles
q1 = np.percentile(data, 25)
q2 = np.percentile(data, 50)
q3 = np.percentile(data, 75)

(q1, q2, q3)


(5.5, 10.0, 14.5)

## Percentiles

Percentiles are similar to quartiles but divide the dataset into 100 equal parts. The nth percentile is the value below which n% of the data falls. They are useful for understanding the distribution of data and for outlier detection.

### Example of Percentiles

Using the same dataset, let's find the 20th and 40th percentiles.

1. 20th percentile: 20% of 10 data points is 2, so it's the value at the 2nd position after sorting, which is 5.
2. 40th percentile: 40% of 10 is 4, so it's the value at the 4th position, which is 8.

In [3]:
# Percentiles calculation
p10 = np.percentile(data, 10)
p50 = np.percentile(data, 50)
p90 = np.percentile(data, 90)

(p10, p50, p90)

(2.8, 10.0, 17.2)