Data normalisation
Take a dataset and apply normalisation
- Minmax normalisation
- Z score normalisation using mean and std deviations
- Z score using mean and mean absolute deviations
- Normalisation using decimal scale
- Find the outliers using inter quartile range method( IQR) , Z score method, modified Z score method

In [None]:
import numpy as np

data = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 300])

print("Original Data:")
print(data)

print("\n" + "="*50 + "\n")

# 1. Min-Max Normalization
# Formula: minmax_norm = (x - min(x)) / (max(x) - min(x))
data_min = np.min(data)
data_max = np.max(data)
minmax_normalized = (data - data_min) / (data_max - data_min)
print("Min-Max Normalization:")
print(minmax_normalized)

print("\n" + "="*50 + "\n")

# 2. Z-Score Normalization using Mean and Standard Deviation
# Formula: z = (x - mean) / std
mean_val = np.mean(data)
std_val = np.std(data)
zscore_std = (data - mean_val) / std_val
print("Z-Score Normalization (using standard deviation):")
print(zscore_std)

print("\n" + "="*50 + "\n")

# 3. Z-Score Normalization using Mean and Mean Absolute Deviation (MAD)
# Compute MAD = mean(|x - mean|)
mad = np.mean(np.abs(data - mean_val))
if mad == 0:
    zscore_mad = np.zeros_like(data)
else:
    zscore_mad = (data - mean_val) / mad
print("Z-Score Normalization (using mean absolute deviation):")
print(zscore_mad)

print("\n" + "="*50 + "\n")

# 4. Normalization using Decimal Scaling
# Formula: normalized = x / 10^j, where j is the smallest integer such that max(|x|)/10^j < 1
max_abs = np.max(np.abs(data))
j = 0
while max_abs / (10 ** j) >= 1:
    j += 1
decimal_scaled = data / (10 ** j)
print("Decimal Scaling Normalization:")
print(f"Scaling factor: 10^{j}")
print(decimal_scaled)

print("\n" + "="*50 + "\n")

# 5. Outlier Detection
print("Outlier Detection:")

# a) Using Interquartile Range (IQR) Method
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers_iqr = data[(data < lower_bound) | (data > upper_bound)]
print("a) IQR Method:")
print(f"Q1: {Q1}, Q3: {Q3}, IQR: {IQR}")
print(f"Lower bound: {lower_bound}, Upper bound: {upper_bound}")
print("Outliers:", outliers_iqr)
print()

# b) Using Z-Score Method (using standard deviation)
threshold = 2.5
outliers_zscore = data[np.abs(zscore_std) > threshold]
print("b) Z-Score Method (threshold = ±2.5):")
print("Z-scores:", zscore_std)
print("Outliers:", outliers_zscore)
print()

# c) Using Modified Z-Score Method
# Modified Z-score formula: M = 0.6745*(x - median) / MAD (where MAD is median absolute deviation)
median_val = np.median(data)
mad_median = np.median(np.abs(data - median_val))
if mad_median == 0:
    modified_z_scores = np.zeros_like(data)
else:
    modified_z_scores = 0.6745 * (data - median_val) / mad_median
threshold_modified = 3.5
outliers_modified_z = data[np.abs(modified_z_scores) > threshold_modified]
print("c) Modified Z-Score Method (threshold = ±3.5):")
print("Modified Z-scores:", modified_z_scores)
print("Outliers:", outliers_modified_z)


Original Data:
[ 10  20  30  40  50  60  70  80  90 100 300]


Min-Max Normalization:
[0.         0.03448276 0.06896552 0.10344828 0.13793103 0.17241379
 0.20689655 0.24137931 0.27586207 0.31034483 1.        ]


Z-Score Normalization (using standard deviation):
[-0.89021047 -0.75788188 -0.6255533  -0.49322472 -0.36089614 -0.22856755
 -0.09623897  0.03608961  0.1684182   0.30074678  2.94731844]


Z-Score Normalization (using mean absolute deviation):
[-1.41811847 -1.20731707 -0.99651568 -0.78571429 -0.57491289 -0.3641115
 -0.1533101   0.05749129  0.26829268  0.47909408  4.69512195]


Decimal Scaling Normalization:
Scaling factor: 10^3
[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1  0.3 ]


Outlier Detection:
a) IQR Method:
Q1: 35.0, Q3: 85.0, IQR: 50.0
Lower bound: -40.0, Upper bound: 160.0
Outliers: [300]

b) Z-Score Method (threshold = ±2.5):
Z-scores: [-0.89021047 -0.75788188 -0.6255533  -0.49322472 -0.36089614 -0.22856755
 -0.09623897  0.03608961  0.1684182   0.30074678  2.947318