# Mean

Mean is one of the most fundamental measures of central tendency in statistics. It provides a single numerical value that represents the typical or average value of a set of numbers.

$$ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i $$

# Standard Deviation

Standard deviation is a measure of the spread of a dataset. It quantifies the amount of variation or uncertainty in the data points. A low standard deviation indicates that the data points tend to be close to the mean, while a high standard deviation suggests that the data points are spread out over a wider range of values.

$$ \sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2 } $$

# Dataset

In [1]:
# Blood glucose value
data = [124, 110, 120, 109, 99, 90, 89, 95, 126, 122, 120]

# Numpy

NumPy is a fundamental package for scientific computing in Python. It provides support for arrays, mathematical functions to operate on these arrays, and tools for working with them. NumPy's primary data structure is the ndarray (n-dimensional array), which allows for efficient manipulation of large datasets.

In [2]:
# Installing numpy
! pip install numpy



# Calculate Mean using Numpy

In [3]:
# Importing module
import numpy as np

# Calculating mean
mean = np.mean(data)

print(f"Mean = {mean.round(2)}")

Mean = 109.45


# Calculate Standard Deviation using Numpy

In [4]:
# Importing module
import numpy as np

# Calculating standard deviation
std = np.std(data)

print(f"Standard deviation = {std.round(2)}")

Standard deviation = 13.42


# Data Standardization using Mean and Standard Deviation

Standardization, also known as z-score normalization, is a preprocessing technique used in machine learning to transform features so that they have a mean of 0 and a standard deviation of 1. This process centers the data around 0 and scales it, making it easier for machine learning algorithms to interpret and process the features.

$$\ x_{\text{standardized}} = \frac{x - \bar{x}}{\sigma} \$$

In [5]:
# Importing module
import numpy as np

# Calculating mean
mean = np.mean(data)

# Calculating standard deviation
std = np.std(data)

# Standardization
data_stand = (data - mean)/std

print(f"Data after standardization: {data_stand.round(2)}")
print(f"Mean of standardized data = {np.mean(data_stand).round(2)}")
print(f"Standard deviation of standardized data = {np.std(data_stand).round(2)}")

Data after standardization: [ 1.08  0.04  0.79 -0.03 -0.78 -1.45 -1.52 -1.08  1.23  0.93  0.79]
Mean of standardized data = 0.0
Standard deviation of standardized data = 1.0


# Limitations
While the mean and standard deviation are widely used statistical measures. It has some limitations as well.

1. Sensitive to outliers
2. Not robust for skewed data

In [6]:
# Data with an outlier
data = [124, 110, 120, 109, 99, 90, 89, 95, 126, 122, 120, 250] # 250 is the outlier


# Calculating mean
mean = np.mean(data)

# Calculating standard deviation
std = np.std(data)

print(f"Mean = {mean.round(2)}\nStandard deviation = {std.round(2)}")

Mean = 121.17
Standard deviation = 40.91
