Standard deviation(σ) measures how spread out the data points are from the mean. It tells us whether the data is closely packed or widely dispersed.

**Understanding Standard Deviation**

- Low Standard Deviation → Data points are close to the mean (less variation).
- High Standard Deviation → Data points are far from the mean (more variation).

Example: Suppose we have two datasets of test scores:
- Dataset A: [48, 50, 52, 50, 49] → Low spread, low standard deviation
- Dataset B: [30, 70, 20, 90, 50] → High spread, high standard deviation

Both have the same mean, but Dataset B has more variation.

**Formula for Standard Deviation**

for population: 

![standard deviation for population](assets/sd-population.png)

where:

- Xi​ = individual data points
- μ = population mean
- N = population size

In [None]:
import numpy as np

# Sample dataset (test scores)
data = [48, 50, 52, 50, 49, 47, 51, 53, 50, 48]

# Calculate standard deviation
std_population = np.std(data)   # Population Standard Deviation

print(f"Population Standard Deviation: {std_population:.2f}")


for sample: 

![standard deviation for population](assets/sd-sample.png)

where:

- Xi​ = individual data points
- x̄ = sample mean
- n = sample size

In [None]:
import numpy as np

# Sample dataset (test scores)
data = [48, 50, 52, 50, 49, 47, 51, 53, 50, 48]

# Calculate standard deviation
std_sample = np.std(data, ddof=1)  # Sample Standard Deviation (ddof=1 adjusts for sample)

print(f"Sample Standard Deviation: {std_sample:.2f}")


**Standard Deviation and Normal Distribution**

In a normal distribution, most data falls within specific standard deviations:
- 68% of data is within 1 standard deviation of the mean.
- 95% is within 2 standard deviations.
- 99.7% is within 3 standard deviations.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate normally distributed data
np.random.seed(42)
data = np.random.normal(loc=50, scale=10, size=1000)  # Mean=50, Std Dev=10

# Calculate mean and standard deviation
mean = np.mean(data)
std_dev = np.std(data)

# Create histogram
plt.figure(figsize=(10, 5))
plt.hist(data, bins=30, color='blue', alpha=0.6, edgecolor='black')

# Plot mean and standard deviation lines
plt.axvline(mean, color='red', linestyle='dashed', linewidth=2, label="Mean")
plt.axvline(mean - std_dev, color='green', linestyle='dashed', linewidth=2, label="1 Std Dev")
plt.axvline(mean + std_dev, color='green', linestyle='dashed', linewidth=2)
plt.axvline(mean - 2*std_dev, color='purple', linestyle='dashed', linewidth=2, label="2 Std Dev")
plt.axvline(mean + 2*std_dev, color='purple', linestyle='dashed', linewidth=2)

plt.xlabel("Values")
plt.ylabel("Frequency")
plt.title("Normal Distribution with Standard Deviation")
plt.legend()
plt.show()
