The Z-score (standard score) measures how many standard deviations a data point is from the mean of a distribution.

**Formula**

![z-score formula](assets/z-score-formula.png)

- 𝑍 = Z-score
- 𝑋 = Data point (observation)
- 𝜇 = Population mean
- 𝜎 = Population standard deviation

**Interpretation**

- 𝑍 = 0 → The data point is exactly at the mean.
- 𝑍 > 0 → The data point is above the mean.
- 𝑍 < 0 → The data point is below the mean.
- 𝑍 > 2 or 𝑍 > 3 → The data point is an outlier (extreme value).

**The Importance of Z-Score**

- Standardization: Converts different datasets into a common scale.
- Outlier Detection: Identifies extreme values in data.
- Probability & Normal Distribution: Helps in finding probabilities using the Z-table.
- Comparison Across Different Datasets: Useful when data comes from different sources with different units.

In [None]:
import numpy as np

# Example dataset: Exam scores
scores = np.array([60, 70, 75, 85, 90, 95, 50])

mean_score = np.mean(scores)
std_dev = np.std(scores)

z_scores = (scores - mean_score) / std_dev

for i, score in enumerate(scores):
    print(f"Score: {score}, Z-score: {z_scores[i]:.2f}")


**Visualization**

In [None]:
import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np

# Generate normal distribution data
x = np.linspace(-4, 4, 100)
y = stats.norm.pdf(x, 0, 1)  # Standard normal distribution (mean=0, std=1)

# Plot normal distribution
plt.figure(figsize=(10, 5))
plt.plot(x, y, color='blue')

# Highlight Z = -2, -1, 0, 1, 2
for z in [-2, -1, 0, 1, 2]:
    plt.axvline(z, color='red', linestyle='dashed')
    plt.text(z, 0.02, f'Z={z}', horizontalalignment='center', fontsize=12)

plt.title("Z-Score on a Normal Distribution")
plt.xlabel("Z-Score")
plt.ylabel("Probability Density")
plt.show()

**Z-Score and Empirical Rule (68-95-99.7 Rule)**

For normally distributed data, Z-scores help us determine probabilities:
- 68% of data falls within ±1 standard deviation (𝑍 ∈ [−1,1]).
- 95% of data falls within ±2 standard deviations (𝑍 ∈ [−2,2]).
- 99.7% of data falls within ±3 standard deviations (𝑍 ∈ [−3,3]).

**Predicting Probabilites using Z-Score**

The Standard Normal Distribution Table (Z-Table).

The Z-table gives the probability that a value is less than a given Z-score.

These probabilities represent the cumulative area from the left of the Z-score.

The total area under the normal curve = 1 (or 100%).

Example:
- Z = 1.0 → 84.13% of values are below this point.
- Z = -1.0 → 15.87% of values are below this point.
- Z = 0 → 50% of values are below the mean.

In [None]:
import scipy.stats as stats

X = 85  # Exam score
mu = 70  # Mean
sigma = 10  # Standard deviation

# Calculate Z-score
Z = (X - mu) / sigma

# Find probability of a value being below a given z-score
probability = stats.norm.cdf(Z)
print(f"Probability of scoring less than {X}: {probability:.4f}")

# Find probability of a value being above a given z-score
probability_above = 1 - stats.norm.cdf(Z)
print(f"Probability of scoring more than {X}: {probability_above:.4f}")

# Find probability of a value falling between two z-scores
X1 = 65  # Lower bound
X2 = 85  # Upper bound

Z1 = (X1 - mu) / sigma
Z2 = (X2 - mu) / sigma

prob_between = stats.norm.cdf(Z2) - stats.norm.cdf(Z1)
print(f"Probability of scoring between {X1} and {X2}: {prob_between:.4f}")


**Visualization**

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats

x = np.linspace(40, 100, 1000)
y = stats.norm.pdf(x, mu, sigma)

plt.figure(figsize=(10, 5))
plt.plot(x, y, color='blue')

x_fill = np.linspace(40, 85, 500)
y_fill = stats.norm.pdf(x_fill, mu, sigma)
plt.fill_between(x_fill, y_fill, color='red', alpha=0.5)

plt.title("Probability of Scoring Less Than 85")
plt.xlabel("Exam Score")
plt.ylabel("Density")
plt.axvline(X, color='black', linestyle='dashed', label="X = 85")
plt.legend()
plt.show()