# Module 02: Spread and Variation

**Difficulty**: ⭐ (Beginner)

**Estimated Time**: 45 minutes

**Prerequisites**: 
- Module 00: Introduction and Stock Returns
- Module 01: Averages and Central Tendency
- Basic understanding of mean/average

## Learning Objectives

By the end of this notebook, you will be able to:
1. Calculate **range, variance, and standard deviation** for stock prices
2. Explain **why standard deviation measures risk** in trading
3. Understand the **mathematical relationship** between variance and standard deviation
4. Measure and compare **volatility** across different Malaysian stocks
5. Explain the **mathematical foundation** of Bollinger Bands

## Why This Matters

**Volatility is risk. Standard deviation is how we measure it.**

Understanding spread and variation is crucial for:
- **Risk Management** → Standard deviation tells you how much a stock typically moves
- **Position Sizing** → Higher volatility = smaller position size
- **Stop Loss Placement** → Use volatility to set appropriate stop distances
- **Bollinger Bands** → Uses standard deviation to create dynamic support/resistance
- **ATR (Average True Range)** → Another volatility measure we'll cover in Module 06

Every professional trader uses volatility metrics. After this module, you will too!

---

## Setup

Let's import our libraries and download Malaysian stock data.

In [None]:
# Data manipulation and numerical operations
import pandas as pd
import numpy as np

# Data acquisition
import yfinance as yf

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Display settings
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Pandas display options
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 20)
pd.set_option('display.precision', 4)

# Random seed for reproducibility
np.random.seed(42)

print("✓ Libraries imported successfully!")

In [None]:
# Download Malaysian stock data
print("Downloading Malaysian stock data...\n")

# Maybank - stable, large-cap banking stock
maybank = yf.download('1155.KL', start='2023-01-01', end='2024-01-01', progress=False)

# Top Glove - volatile healthcare stock
topglove = yf.download('5225.KL', start='2023-01-01', end='2024-01-01', progress=False)

# Axiata - telecommunications stock
axiata = yf.download('6888.KL', start='2023-01-01', end='2024-01-01', progress=False)

# Validate data
assert len(maybank) > 0, "Failed to download Maybank data"
assert len(topglove) > 0, "Failed to download Top Glove data"
assert len(axiata) > 0, "Failed to download Axiata data"

print(f"✓ Maybank: {len(maybank)} days")
print(f"✓ Top Glove: {len(topglove)} days")
print(f"✓ Axiata: {len(axiata)} days")
print("\nData ready for analysis!")

---

## Part 1: Range - The Simplest Measure of Spread

### What is Range?

The **range** is the difference between the highest and lowest values.

### Formula

$$
\text{Range} = \text{Maximum} - \text{Minimum}
$$

### Why Range Matters in Trading

- **Daily Range** → Difference between high and low of the day
- **Trading Range** → Price boundaries for a stock over time
- **Breakout Detection** → When price moves outside typical range

### Simple Example First

In [None]:
# Example: Weekly price data
week_prices = [8.50, 8.75, 8.45, 8.90, 8.60]

# Calculate range
price_min = min(week_prices)
price_max = max(week_prices)
price_range = price_max - price_min

print("=" * 60)
print("CALCULATING RANGE")
print("=" * 60)
print(f"\nWeek prices: {week_prices}")
print(f"\nLowest price:  RM {price_min:.2f}")
print(f"Highest price: RM {price_max:.2f}")
print(f"\nRange = Maximum - Minimum")
print(f"Range = RM {price_max:.2f} - RM {price_min:.2f}")
print(f"Range = RM {price_range:.2f}")
print(f"\nInterpretation: Stock moved RM {price_range:.2f} during the week")

### Applying to Real Stock Data

In [None]:
# Calculate range for Maybank in 2023
maybank_close = maybank['Close']

maybank_min = maybank_close.min()
maybank_max = maybank_close.max()
maybank_range = maybank_max - maybank_min

# Calculate as percentage of mean (for standardization)
maybank_mean = maybank_close.mean()
maybank_range_pct = (maybank_range / maybank_mean) * 100

print("Maybank (1155.KL) - 2023 Range Analysis:")
print(f"\nLowest closing price:  RM {maybank_min:.4f}")
print(f"Highest closing price: RM {maybank_max:.4f}")
print(f"Range: RM {maybank_range:.4f}")
print(f"\nMean price: RM {maybank_mean:.4f}")
print(f"Range as % of mean: {maybank_range_pct:.2f}%")
print(f"\nInterpretation: Maybank's price varied by {maybank_range_pct:.1f}% during 2023")

### Daily Range (High - Low)

More useful than annual range is the **daily range** - how much stock moves within each day.

In [None]:
# Calculate daily range for each trading day
maybank_daily_range = maybank['High'] - maybank['Low']

# Statistics about daily range
avg_daily_range = maybank_daily_range.mean()
max_daily_range = maybank_daily_range.max()
min_daily_range = maybank_daily_range.min()

print("Maybank Daily Range Statistics (2023):")
print(f"\nAverage daily range: RM {avg_daily_range:.4f}")
print(f"Largest daily range:  RM {max_daily_range:.4f}")
print(f"Smallest daily range: RM {min_daily_range:.4f}")
print(f"\nInterpretation: On average, Maybank moves RM {avg_daily_range:.2f} within a day")

In [None]:
# Visualize daily range over time
plt.figure(figsize=(14, 6))

plt.plot(maybank_daily_range.index, maybank_daily_range.values, 
         linewidth=1, alpha=0.7, label='Daily Range (High - Low)')
plt.axhline(y=avg_daily_range, color='red', linestyle='--', 
            linewidth=2, label=f'Average Daily Range (RM {avg_daily_range:.2f})')

plt.title('Maybank - Daily Price Range (2023)', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Daily Range (RM)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Notice: Days with larger ranges indicate higher volatility on those specific days")

### Limitation of Range

**Problem**: Range only uses 2 data points (min and max), ignoring all other data!

Example: These two datasets have the same range but very different spreads:
- Dataset A: [1, 1, 1, 1, 10] → Range = 9, most values clustered at 1
- Dataset B: [1, 3, 5, 7, 10] → Range = 9, values evenly spread

**Solution**: We need a measure that uses ALL data points → **Variance and Standard Deviation**

---

## Part 2: Variance - Average Squared Deviation

### What is Variance?

**Variance** measures how far each data point is from the mean, on average.

### The Logic Behind Variance

1. Calculate how far each value is from the mean (deviation)
2. Square these deviations (to make them all positive)
3. Average these squared deviations

### Formula

**Population Variance** (when you have ALL data):
$$
\sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i - \mu)^2
$$

**Sample Variance** (when you have a sample):
$$
s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2
$$

Where:
- $\sigma^2$ = population variance
- $s^2$ = sample variance
- $x_i$ = individual values
- $\mu$ or $\bar{x}$ = mean
- $n$ = number of values

**Note**: We use $n-1$ for sample variance (Bessel's correction) to get unbiased estimate.

### Step-by-Step Calculation

In [None]:
# Example: Calculate variance manually
prices = np.array([8.50, 8.55, 8.45, 8.60, 8.52])

print("=" * 70)
print("CALCULATING VARIANCE - STEP BY STEP")
print("=" * 70)

# Step 1: Calculate mean
mean = np.mean(prices)
print(f"\nStep 1: Calculate mean")
print(f"  Prices: {prices}")
print(f"  Mean = {mean:.4f}")

# Step 2: Calculate deviations from mean
deviations = prices - mean
print(f"\nStep 2: Calculate deviations (x - mean)")
for i, (price, dev) in enumerate(zip(prices, deviations)):
    print(f"  Price {i+1}: {price:.2f} - {mean:.4f} = {dev:+.4f}")

# Step 3: Square the deviations
squared_deviations = deviations ** 2
print(f"\nStep 3: Square the deviations")
for i, (dev, sq_dev) in enumerate(zip(deviations, squared_deviations)):
    print(f"  ({dev:+.4f})² = {sq_dev:.6f}")

# Step 4: Calculate variance (using n-1 for sample variance)
variance_manual = np.sum(squared_deviations) / (len(prices) - 1)
variance_numpy = np.var(prices, ddof=1)  # ddof=1 means use n-1

print(f"\nStep 4: Average the squared deviations (using n-1)")
print(f"  Sum of squared deviations = {np.sum(squared_deviations):.6f}")
print(f"  Variance = {np.sum(squared_deviations):.6f} / {len(prices)-1}")
print(f"  Variance = {variance_manual:.6f}")

print(f"\n{'='*70}")
print(f"Manual calculation: {variance_manual:.6f}")
print(f"NumPy calculation:  {variance_numpy:.6f}")
print(f"Difference: {abs(variance_manual - variance_numpy):.10f}")
print("\n✓ They match!")

### Why Do We Square the Deviations?

**Question**: Why not just average the deviations directly?

**Answer**: Because positive and negative deviations would cancel out!

In [None]:
# Demonstrate why we need to square
prices = np.array([8.50, 8.55, 8.45, 8.60, 8.52])
mean = np.mean(prices)
deviations = prices - mean

print("Why We Square Deviations:")
print(f"\nDeviations from mean ({mean:.4f}):")
for i, dev in enumerate(deviations):
    print(f"  Price {i+1}: {dev:+.4f}")

print(f"\nSum of deviations: {np.sum(deviations):.10f}")
print("→ Deviations always sum to ~0 (due to rounding errors)")
print("→ Can't use simple average!")

print(f"\nSquared deviations:")
squared_deviations = deviations ** 2
for i, sq_dev in enumerate(squared_deviations):
    print(f"  Price {i+1}: {sq_dev:.6f}")

print(f"\nSum of squared deviations: {np.sum(squared_deviations):.6f}")
print("→ Now we have meaningful positive values to average!")

### Variance of Real Stock Data

In [None]:
# Calculate variance for Maybank and Top Glove
maybank_close = maybank['Close']
topglove_close = topglove['Close']

maybank_variance = maybank_close.var()
topglove_variance = topglove_close.var()

print("Stock Price Variance Comparison:")
print(f"\nMaybank variance:  {maybank_variance:.6f}")
print(f"Top Glove variance: {topglove_variance:.6f}")
print(f"\nRatio: {topglove_variance / maybank_variance:.2f}x")

if topglove_variance > maybank_variance:
    print(f"\n→ Top Glove has higher variance (more spread in prices)")
else:
    print(f"\n→ Maybank has higher variance (more spread in prices)")

print("\nBUT... variance units are 'squared Ringgit' which is hard to interpret!")
print("That's why we use STANDARD DEVIATION instead.")

---

## Part 3: Standard Deviation - The Risk Measure

### What is Standard Deviation?

**Standard deviation** is simply the square root of variance.

### Formula

$$
\text{Standard Deviation} = \sigma = \sqrt{\text{Variance}} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(x_i - \mu)^2}
$$

### Why Standard Deviation is Better Than Variance

**Units**: Standard deviation is in the SAME units as the original data!
- Variance of price → Squared Ringgit (RM²) ← Hard to interpret!
- Std dev of price → Ringgit (RM) ← Easy to understand!

**Example**:
- Variance = 0.25 RM² (what does squared Ringgit mean?)
- Std Dev = 0.50 RM (price typically varies by 50 sen)

In [None]:
# Calculate standard deviation from variance
prices = np.array([8.50, 8.55, 8.45, 8.60, 8.52])

variance = np.var(prices, ddof=1)
std_dev = np.sqrt(variance)
std_dev_direct = np.std(prices, ddof=1)

print("Relationship Between Variance and Standard Deviation:")
print(f"\nVariance: {variance:.6f} RM²")
print(f"Standard Deviation: √{variance:.6f} = {std_dev:.4f} RM")
print(f"\nDirect calculation: {std_dev_direct:.4f} RM")
print(f"\nInterpretation: Prices typically vary by about RM {std_dev:.2f}")

### Standard Deviation of Stock Prices

In [None]:
# Calculate standard deviation for Malaysian stocks
maybank_std = maybank_close.std()
topglove_std = topglove_close.std()
axiata_std = axiata['Close'].std()

# Also get means for context
maybank_mean = maybank_close.mean()
topglove_mean = topglove_close.mean()
axiata_mean = axiata['Close'].mean()

print("=" * 70)
print("STANDARD DEVIATION COMPARISON - MALAYSIAN STOCKS (2023)")
print("=" * 70)

print(f"\nMaybank (1155.KL):")
print(f"  Mean price:    RM {maybank_mean:.4f}")
print(f"  Std deviation: RM {maybank_std:.4f}")
print(f"  Coefficient of variation: {(maybank_std/maybank_mean)*100:.2f}%")

print(f"\nTop Glove (5225.KL):")
print(f"  Mean price:    RM {topglove_mean:.4f}")
print(f"  Std deviation: RM {topglove_std:.4f}")
print(f"  Coefficient of variation: {(topglove_std/topglove_mean)*100:.2f}%")

print(f"\nAxiata (6888.KL):")
print(f"  Mean price:    RM {axiata_mean:.4f}")
print(f"  Std deviation: RM {axiata_std:.4f}")
print(f"  Coefficient of variation: {(axiata_std/axiata_mean)*100:.2f}%")

print(f"\n{'='*70}")
print("INTERPRETATION:")
print("• Coefficient of Variation (CV) = Std Dev / Mean × 100%")
print("• CV allows fair comparison between stocks with different prices")
print("• Higher CV = Higher risk (more volatile relative to price)")

### Visualizing Standard Deviation

In [None]:
# Visualize price with standard deviation bands
plt.figure(figsize=(14, 7))

# Plot Maybank price
plt.plot(maybank_close.index, maybank_close.values, 
         linewidth=1.5, label='Maybank Price', color='blue')

# Plot mean
plt.axhline(y=maybank_mean, color='red', linestyle='--', 
            linewidth=2, label=f'Mean (RM {maybank_mean:.2f})')

# Plot ±1 standard deviation bands
plt.axhline(y=maybank_mean + maybank_std, color='green', linestyle=':', 
            linewidth=1.5, alpha=0.7, label=f'+1 Std Dev (RM {maybank_mean + maybank_std:.2f})')
plt.axhline(y=maybank_mean - maybank_std, color='green', linestyle=':', 
            linewidth=1.5, alpha=0.7, label=f'-1 Std Dev (RM {maybank_mean - maybank_std:.2f})')

# Shade ±1 std dev area
plt.fill_between(maybank_close.index, 
                 maybank_mean - maybank_std, 
                 maybank_mean + maybank_std, 
                 alpha=0.2, color='green', label='±1 Std Dev Range')

plt.title('Maybank - Price with Standard Deviation Bands (2023)', 
          fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price (RM)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate percentage of days within ±1 std dev
within_1std = ((maybank_close >= maybank_mean - maybank_std) & 
               (maybank_close <= maybank_mean + maybank_std)).sum()
pct_within_1std = (within_1std / len(maybank_close)) * 100

print(f"Days within ±1 std dev: {within_1std} out of {len(maybank_close)} ({pct_within_1std:.1f}%)")
print(f"\nStatistical theory: ~68% of data should fall within ±1 std dev")
print(f"Actual: {pct_within_1std:.1f}% (close to expected!)")

### The 68-95-99.7 Rule (Empirical Rule)

For **normally distributed** data:
- **68%** of data falls within ±1 standard deviation
- **95%** of data falls within ±2 standard deviations  
- **99.7%** of data falls within ±3 standard deviations

**This is why Bollinger Bands typically use ±2 standard deviations!**

In [None]:
# Test the 68-95-99.7 rule
within_1std = ((maybank_close >= maybank_mean - 1*maybank_std) & 
               (maybank_close <= maybank_mean + 1*maybank_std)).sum()
within_2std = ((maybank_close >= maybank_mean - 2*maybank_std) & 
               (maybank_close <= maybank_mean + 2*maybank_std)).sum()
within_3std = ((maybank_close >= maybank_mean - 3*maybank_std) & 
               (maybank_close <= maybank_mean + 3*maybank_std)).sum()

total_days = len(maybank_close)

print("Testing the 68-95-99.7 Rule (Empirical Rule):")
print(f"\nTotal trading days: {total_days}")
print("\n" + "=" * 60)
print(f"Within ±1 std dev: {within_1std} days ({within_1std/total_days*100:.1f}%)")
print(f"  Expected: ~68% | Actual: {within_1std/total_days*100:.1f}%")

print(f"\nWithin ±2 std dev: {within_2std} days ({within_2std/total_days*100:.1f}%)")
print(f"  Expected: ~95% | Actual: {within_2std/total_days*100:.1f}%")

print(f"\nWithin ±3 std dev: {within_3std} days ({within_3std/total_days*100:.1f}%)")
print(f"  Expected: ~99.7% | Actual: {within_3std/total_days*100:.1f}%")

print("\n" + "=" * 60)
print("Note: Stock prices aren't perfectly normal, but often close enough!")

---

## Part 4: Volatility - Standard Deviation of Returns

### Price Std Dev vs Return Std Dev

In trading, we usually measure volatility as **standard deviation of RETURNS**, not prices.

**Why?**
- Price std dev isn't comparable across stocks (RM 1 movement means different things for RM 5 vs RM 50 stock)
- Return std dev is standardized (5% volatility means the same for all stocks)

### Calculating Return Volatility

In [None]:
# Calculate daily returns
maybank_returns = maybank_close.pct_change() * 100  # Convert to percentage
topglove_returns = topglove_close.pct_change() * 100
axiata_returns = axiata['Close'].pct_change() * 100

# Calculate volatility (std dev of returns)
maybank_volatility = maybank_returns.std()
topglove_volatility = topglove_returns.std()
axiata_volatility = axiata_returns.std()

print("=" * 70)
print("VOLATILITY COMPARISON - MALAYSIAN STOCKS (2023)")
print("=" * 70)
print("\nVolatility = Standard Deviation of Daily Returns")
print("\n" + "-" * 70)

print(f"\nMaybank (1155.KL):   {maybank_volatility:.4f}% per day")
print(f"Top Glove (5225.KL): {topglove_volatility:.4f}% per day")
print(f"Axiata (6888.KL):    {axiata_volatility:.4f}% per day")

print("\n" + "=" * 70)
print("INTERPRETATION:")
print("=" * 70)

# Find most and least volatile
stocks = {'Maybank': maybank_volatility, 'Top Glove': topglove_volatility, 'Axiata': axiata_volatility}
most_volatile = max(stocks, key=stocks.get)
least_volatile = min(stocks, key=stocks.get)

print(f"\nMost volatile:  {most_volatile} ({stocks[most_volatile]:.2f}% daily volatility)")
print(f"Least volatile: {least_volatile} ({stocks[least_volatile]:.2f}% daily volatility)")
print(f"\n→ Higher volatility = Higher risk = Larger daily price swings")
print(f"→ Lower volatility = Lower risk = More stable prices")

### Visualizing Volatility Comparison

In [None]:
# Create comparison visualization
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Maybank
axes[0].hist(maybank_returns.dropna(), bins=40, edgecolor='black', alpha=0.7, color='blue')
axes[0].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[0].axvline(x=maybank_volatility, color='green', linestyle=':', linewidth=2, label=f'Std Dev: {maybank_volatility:.2f}%')
axes[0].axvline(x=-maybank_volatility, color='green', linestyle=':', linewidth=2)
axes[0].set_title(f'Maybank\nVolatility: {maybank_volatility:.2f}%', fontweight='bold')
axes[0].set_xlabel('Daily Return (%)')
axes[0].set_ylabel('Frequency')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Top Glove
axes[1].hist(topglove_returns.dropna(), bins=40, edgecolor='black', alpha=0.7, color='orange')
axes[1].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[1].axvline(x=topglove_volatility, color='green', linestyle=':', linewidth=2, label=f'Std Dev: {topglove_volatility:.2f}%')
axes[1].axvline(x=-topglove_volatility, color='green', linestyle=':', linewidth=2)
axes[1].set_title(f'Top Glove\nVolatility: {topglove_volatility:.2f}%', fontweight='bold')
axes[1].set_xlabel('Daily Return (%)')
axes[1].set_ylabel('Frequency')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Axiata
axes[2].hist(axiata_returns.dropna(), bins=40, edgecolor='black', alpha=0.7, color='purple')
axes[2].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[2].axvline(x=axiata_volatility, color='green', linestyle=':', linewidth=2, label=f'Std Dev: {axiata_volatility:.2f}%')
axes[2].axvline(x=-axiata_volatility, color='green', linestyle=':', linewidth=2)
axes[2].set_title(f'Axiata\nVolatility: {axiata_volatility:.2f}%', fontweight='bold')
axes[2].set_xlabel('Daily Return (%)')
axes[2].set_ylabel('Frequency')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Wider distribution = Higher volatility = More risk")
print("Narrower distribution = Lower volatility = Less risk")

---

## Part 5: Introduction to Bollinger Bands

### What are Bollinger Bands?

Bollinger Bands use standard deviation to create dynamic support and resistance levels.

### Formula

$$
\begin{aligned}
\text{Middle Band} &= SMA(n) \\
\text{Upper Band} &= SMA(n) + k \times \sigma \\
\text{Lower Band} &= SMA(n) - k \times \sigma
\end{aligned}
$$

Where:
- $SMA(n)$ = Simple Moving Average over $n$ periods
- $\sigma$ = Standard deviation over same $n$ periods
- $k$ = Number of standard deviations (typically 2)

### Standard Settings

- Period: 20 days
- Multiplier: 2 standard deviations
- Based on 95% confidence (from 68-95-99.7 rule)

In [None]:
# Calculate Bollinger Bands for Maybank
period = 20
std_multiplier = 2

# Middle band = 20-day SMA
bb_middle = maybank_close.rolling(window=period).mean()

# Calculate rolling standard deviation
bb_std = maybank_close.rolling(window=period).std()

# Upper band = Middle + 2 * std dev
bb_upper = bb_middle + (std_multiplier * bb_std)

# Lower band = Middle - 2 * std dev
bb_lower = bb_middle - (std_multiplier * bb_std)

print("Bollinger Bands Calculation (20-day, 2 std dev):")
print(f"\nPeriod: {period} days")
print(f"Standard deviation multiplier: {std_multiplier}")
print(f"\nBased on 68-95-99.7 rule:")
print(f"  ~95% of prices should fall within the bands")
print(f"  When price touches upper/lower band → Potential reversal signal")

In [None]:
# Visualize Bollinger Bands
plt.figure(figsize=(14, 7))

# Plot price
plt.plot(maybank_close.index, maybank_close.values, 
         linewidth=1.5, label='Maybank Price', color='black', zorder=3)

# Plot Bollinger Bands
plt.plot(bb_middle.index, bb_middle.values, 
         linewidth=2, label='Middle Band (20-day SMA)', color='blue', linestyle='--')
plt.plot(bb_upper.index, bb_upper.values, 
         linewidth=1.5, label='Upper Band (+2 Std Dev)', color='red', linestyle=':')
plt.plot(bb_lower.index, bb_lower.values, 
         linewidth=1.5, label='Lower Band (-2 Std Dev)', color='green', linestyle=':')

# Shade the band area
plt.fill_between(bb_middle.index, bb_lower, bb_upper, alpha=0.1, color='gray')

plt.title('Maybank - Bollinger Bands (20-day, 2 Std Dev) - 2023', 
          fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price (RM)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate how often price is outside bands
above_upper = (maybank_close > bb_upper).sum()
below_lower = (maybank_close < bb_lower).sum()
within_bands = ((maybank_close >= bb_lower) & (maybank_close <= bb_upper)).sum()

# Account for NaN values at the start
valid_days = len(maybank_close) - period + 1

print(f"\nDays above upper band: {above_upper} ({above_upper/valid_days*100:.1f}%)")
print(f"Days below lower band: {below_lower} ({below_lower/valid_days*100:.1f}%)")
print(f"Days within bands: {within_bands} ({within_bands/valid_days*100:.1f}%)")
print(f"\nExpected within bands: ~95%")
print(f"Actual within bands: {within_bands/valid_days*100:.1f}%")

### How Traders Use Bollinger Bands

1. **Overbought/Oversold**:
   - Price near upper band → Potentially overbought
   - Price near lower band → Potentially oversold

2. **Volatility**:
   - Bands widen → Increased volatility
   - Bands narrow → Decreased volatility ("squeeze")

3. **Trend Strength**:
   - Price consistently near upper band → Strong uptrend
   - Price consistently near lower band → Strong downtrend

**We'll explore Bollinger Bands in depth in Module 06!**

---

## Part 6: Exercises

Time to practice! Try these exercises to solidify your understanding.

### Exercise 1: Calculate All Spread Measures

Calculate range, variance, and standard deviation for Axiata (6888.KL).

**Tasks**:
1. Extract Axiata closing prices
2. Calculate range (max - min)
3. Calculate variance
4. Calculate standard deviation
5. Compare with Maybank's values

In [None]:
# Your code here




<details>
<summary><b>Click here for solution</b></summary>

```python
# Extract Axiata closing prices
axiata_close = axiata['Close']

# Calculate spread measures
axiata_range = axiata_close.max() - axiata_close.min()
axiata_variance = axiata_close.var()
axiata_std = axiata_close.std()
axiata_mean = axiata_close.mean()

print("Axiata (6888.KL) - Spread Measures:")
print(f"\nMean:     RM {axiata_mean:.4f}")
print(f"Range:    RM {axiata_range:.4f}")
print(f"Variance: {axiata_variance:.6f} RM²")
print(f"Std Dev:  RM {axiata_std:.4f}")

# Compare with Maybank
print(f"\n{'='*60}")
print("Comparison with Maybank:")
print(f"\nAxiata Std Dev:  RM {axiata_std:.4f}")
print(f"Maybank Std Dev: RM {maybank_close.std():.4f}")

if axiata_std > maybank_close.std():
    print(f"\n→ Axiata is more volatile (higher std dev)")
else:
    print(f"\n→ Maybank is more volatile (higher std dev)")
```
</details>

### Exercise 2: Volatility Ranking

Rank all three stocks (Maybank, Top Glove, Axiata) by volatility from lowest to highest.

**Tasks**:
1. Calculate daily return volatility for all three stocks
2. Create a ranking (1 = lowest volatility, 3 = highest)
3. Visualize the comparison with a bar chart
4. Interpret what this means for risk

In [None]:
# Your code here




<details>
<summary><b>Click here for solution</b></summary>

```python
# Calculate daily returns and volatility
maybank_returns = maybank['Close'].pct_change() * 100
topglove_returns = topglove['Close'].pct_change() * 100
axiata_returns = axiata['Close'].pct_change() * 100

maybank_vol = maybank_returns.std()
topglove_vol = topglove_returns.std()
axiata_vol = axiata_returns.std()

# Create ranking
volatilities = {
    'Maybank': maybank_vol,
    'Top Glove': topglove_vol,
    'Axiata': axiata_vol
}

# Sort by volatility
sorted_vol = sorted(volatilities.items(), key=lambda x: x[1])

print("Volatility Ranking (Lowest to Highest):")
for rank, (stock, vol) in enumerate(sorted_vol, 1):
    print(f"{rank}. {stock}: {vol:.4f}% daily volatility")

# Visualize
plt.figure(figsize=(10, 6))
stocks = list(volatilities.keys())
vols = list(volatilities.values())
colors = ['green', 'orange', 'red']

bars = plt.bar(stocks, vols, color=colors, edgecolor='black', alpha=0.7)
plt.title('Daily Return Volatility Comparison (2023)', fontsize=14, fontweight='bold')
plt.ylabel('Volatility (% per day)')
plt.xlabel('Stock')
plt.grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for bar, vol in zip(bars, vols):
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{vol:.2f}%', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

print(f"\nRisk Interpretation:")
print(f"Lowest risk (most stable): {sorted_vol[0][0]}")
print(f"Highest risk (most volatile): {sorted_vol[-1][0]}")
```
</details>

### Exercise 3: Manual Bollinger Bands Calculation

Calculate Bollinger Bands manually for a specific date to verify understanding.

**Tasks**:
1. Take the last 20 days of Maybank closing prices
2. Manually calculate the 20-day SMA (middle band)
3. Manually calculate the 20-day standard deviation
4. Calculate upper band (SMA + 2*std)
5. Calculate lower band (SMA - 2*std)
6. Verify against pandas rolling calculations

In [None]:
# Your code here




<details>
<summary><b>Click here for solution</b></summary>

```python
# Get last 20 days of Maybank data
last_20_days = maybank_close.tail(20).values
last_date = maybank_close.tail(20).index[-1]

# Manual calculation
manual_sma = np.mean(last_20_days)
manual_std = np.std(last_20_days, ddof=1)  # Use sample std dev
manual_upper = manual_sma + 2 * manual_std
manual_lower = manual_sma - 2 * manual_std

# Pandas calculation
pandas_sma = maybank_close.rolling(window=20).mean().iloc[-1]
pandas_std = maybank_close.rolling(window=20).std().iloc[-1]
pandas_upper = pandas_sma + 2 * pandas_std
pandas_lower = pandas_sma - 2 * pandas_std

# Current price
current_price = maybank_close.iloc[-1]

print(f"Bollinger Bands for {last_date.strftime('%Y-%m-%d')}:")
print(f"\nCurrent price: RM {current_price:.4f}")
print(f"\nManual Calculation:")
print(f"  Middle Band (SMA): RM {manual_sma:.4f}")
print(f"  Std Deviation:     RM {manual_std:.4f}")
print(f"  Upper Band:        RM {manual_upper:.4f}")
print(f"  Lower Band:        RM {manual_lower:.4f}")

print(f"\nPandas Calculation:")
print(f"  Middle Band (SMA): RM {pandas_sma:.4f}")
print(f"  Std Deviation:     RM {pandas_std:.4f}")
print(f"  Upper Band:        RM {pandas_upper:.4f}")
print(f"  Lower Band:        RM {pandas_lower:.4f}")

print(f"\nDifferences:")
print(f"  SMA:   {abs(manual_sma - pandas_sma):.8f}")
print(f"  Upper: {abs(manual_upper - pandas_upper):.8f}")
print(f"  Lower: {abs(manual_lower - pandas_lower):.8f}")
print(f"\n✓ Calculations match!")

# Interpretation
if current_price > pandas_upper:
    print(f"\nSignal: Price above upper band (potentially overbought)")
elif current_price < pandas_lower:
    print(f"\nSignal: Price below lower band (potentially oversold)")
else:
    print(f"\nSignal: Price within normal range")
```
</details>

### Exercise 4: Volatility Over Time

Analyze how volatility changes over time using rolling standard deviation.

**Tasks**:
1. Calculate 20-day rolling volatility (std dev of returns) for Maybank
2. Plot volatility over time
3. Identify periods of high and low volatility
4. Explain what might cause volatility to change

In [None]:
# Your code here




<details>
<summary><b>Click here for solution</b></summary>

```python
# Calculate daily returns
maybank_returns = maybank_close.pct_change() * 100

# Calculate 20-day rolling volatility
rolling_volatility = maybank_returns.rolling(window=20).std()

# Plot volatility over time
plt.figure(figsize=(14, 7))

plt.subplot(2, 1, 1)
plt.plot(maybank_close.index, maybank_close.values, linewidth=1.5, color='blue')
plt.title('Maybank Price (2023)', fontsize=12, fontweight='bold')
plt.ylabel('Price (RM)')
plt.grid(True, alpha=0.3)

plt.subplot(2, 1, 2)
plt.plot(rolling_volatility.index, rolling_volatility.values, 
         linewidth=1.5, color='red', label='20-day Rolling Volatility')
plt.axhline(y=rolling_volatility.mean(), color='green', linestyle='--', 
            linewidth=2, label=f'Average Volatility ({rolling_volatility.mean():.2f}%)')
plt.title('20-Day Rolling Volatility (2023)', fontsize=12, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Volatility (% per day)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Find high and low volatility periods
max_vol_date = rolling_volatility.idxmax()
min_vol_date = rolling_volatility.idxmin()

print("Volatility Analysis:")
print(f"\nHighest volatility: {rolling_volatility.max():.4f}% on {max_vol_date.strftime('%Y-%m-%d')}")
print(f"Lowest volatility:  {rolling_volatility.min():.4f}% on {min_vol_date.strftime('%Y-%m-%d')}")
print(f"Average volatility: {rolling_volatility.mean():.4f}%")

print(f"\nCauses of volatility changes:")
print("• High volatility periods: News events, earnings reports, market uncertainty")
print("• Low volatility periods: Stable market conditions, low trading activity")
print("• Volatility clustering: High volatility tends to be followed by high volatility")
```
</details>

---

## Summary

Excellent work! You've completed Module 02. Let's review:

### Key Concepts Mastered

1. **Range**
   - Formula: Maximum - Minimum
   - Simple but limited (only uses 2 data points)
   - Daily range = High - Low

2. **Variance**
   - Average of squared deviations from mean
   - Formula: $s^2 = \frac{1}{n-1}\sum(x_i - \bar{x})^2$
   - Units are squared (RM²) - hard to interpret

3. **Standard Deviation**
   - Square root of variance
   - Same units as original data (RM)
   - **The primary measure of risk in trading**

4. **Volatility**
   - Standard deviation of returns (not prices)
   - Standardized measure across all stocks
   - Higher volatility = Higher risk

5. **68-95-99.7 Rule (Empirical Rule)**
   - 68% of data within ±1 std dev
   - 95% of data within ±2 std dev
   - 99.7% of data within ±3 std dev

6. **Bollinger Bands Foundation**
   - Middle: 20-day SMA
   - Upper: SMA + 2×StdDev
   - Lower: SMA - 2×StdDev
   - Based on 95% confidence interval

### How This Connects to Technical Indicators

Everything you learned applies directly to:
- **Bollinger Bands** (Module 06): Uses SMA ± 2×StdDev
- **ATR (Average True Range)** (Module 06): Uses range for volatility
- **Position Sizing**: Adjust position size based on volatility
- **Stop Loss Placement**: Use ATR or std dev for stop distances
- **Risk Management**: Volatility determines how much to risk

### What's Next?

In **Module 03: Percentages, Ratios, and Changes**, you'll learn:
- Percentage calculations in trading context
- Rate of Change (ROC) mathematics
- Normalizing values to 0-100 scale
- Foundation for RSI indicator

### Additional Practice

Before Module 03, try:
1. Calculate volatility for other Malaysian stocks
2. Compare volatility across different sectors (banking, healthcare, telecom)
3. Identify volatility clustering in your data
4. Experiment with different Bollinger Band settings (10-day, 50-day, etc.)

---

## Additional Resources

### Further Reading
- [Investopedia: Standard Deviation](https://www.investopedia.com/terms/s/standarddeviation.asp)
- [Investopedia: Volatility](https://www.investopedia.com/terms/v/volatility.asp)
- [Investopedia: Bollinger Bands](https://www.investopedia.com/terms/b/bollingerbands.asp)
- [Khan Academy: Variance and Standard Deviation](https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/variance-standard-deviation-population/v/statistics-standard-deviation)

### Python Documentation
- [NumPy var](https://numpy.org/doc/stable/reference/generated/numpy.var.html)
- [NumPy std](https://numpy.org/doc/stable/reference/generated/numpy.std.html)
- [Pandas rolling](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html)

---

**Congratulations!** You now understand how to measure and interpret volatility - a critical skill for risk management. Ready for Module 03?