# Module 01: Averages and Central Tendency

**Difficulty**: ‚≠ê (Beginner)

**Estimated Time**: 45 minutes

**Prerequisites**: 
- Module 00: Introduction and Stock Returns
- Basic arithmetic (addition, division)
- Understanding of what an average is

## Learning Objectives

By the end of this notebook, you will be able to:
1. Calculate **mean, median, and mode** for stock prices and explain when to use each
2. Understand **why averaging reduces noise** in stock price data
3. Calculate **weighted averages** and explain why they matter in technical analysis
4. Explain the **mathematical foundation** of Simple Moving Averages (SMA)
5. Apply these concepts to real Malaysian stock data

## Why This Matters

**Averages are the backbone of technical indicators.**

Almost every indicator uses some form of averaging:
- **Moving Averages (SMA, EMA)** ‚Üí Direct application of mean and weighted mean
- **RSI** ‚Üí Averages positive and negative returns
- **MACD** ‚Üí Difference between two moving averages
- **Bollinger Bands** ‚Üí Standard deviation around a moving average

Understanding the mathematics of averages helps you:
- Know **when** moving averages give reliable signals
- Understand **why** EMAs react faster than SMAs
- Choose the **right averaging period** for your trading style
- Avoid **false signals** from improper averaging

---

## Setup

Let's import our libraries and download Malaysian stock data.

In [None]:
# Data manipulation and numerical operations
import pandas as pd
import numpy as np

# Data acquisition
import yfinance as yf

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Display settings
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Pandas display options
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 20)
pd.set_option('display.precision', 4)

# Random seed for reproducibility
np.random.seed(42)

print("‚úì Libraries imported successfully!")

In [None]:
# Download Malaysian stock data
print("Downloading Malaysian stock data...\n")

# Maybank - stable, large-cap banking stock
maybank = yf.download('1155.KL', start='2023-01-01', end='2024-01-01', progress=False)

# Top Glove - volatile healthcare stock
topglove = yf.download('5225.KL', start='2023-01-01', end='2024-01-01', progress=False)

# CIMB - another banking stock for comparison
cimb = yf.download('1023.KL', start='2023-01-01', end='2024-01-01', progress=False)

# Validate data
assert len(maybank) > 0, "Failed to download Maybank data"
assert len(topglove) > 0, "Failed to download Top Glove data"
assert len(cimb) > 0, "Failed to download CIMB data"

print(f"‚úì Maybank: {len(maybank)} days")
print(f"‚úì Top Glove: {len(topglove)} days")
print(f"‚úì CIMB: {len(cimb)} days")
print("\nData ready for analysis!")

---

## Part 1: The Mean (Arithmetic Average)

### What is the Mean?

The **mean** (or arithmetic average) is the sum of all values divided by the number of values.

### Mathematical Formula

$$
\text{Mean} = \bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} = \frac{1}{n}\sum_{i=1}^{n} x_i
$$

Where:
- $\bar{x}$ = mean (pronounced "x-bar")
- $x_i$ = individual values
- $n$ = total number of values
- $\sum$ = sum (add all values)

### Simple Example First

In [None]:
# Example: Average closing price for a week
week_prices = [8.50, 8.55, 8.45, 8.60, 8.52]  # 5 days of prices

# Method 1: Manual calculation
total = sum(week_prices)
count = len(week_prices)
mean_manual = total / count

# Method 2: Using numpy
mean_numpy = np.mean(week_prices)

print("=" * 60)
print("CALCULATING MEAN (AVERAGE) PRICE")
print("=" * 60)
print(f"\nWeek prices: {week_prices}")
print(f"\nStep 1: Sum all prices")
print(f"  {' + '.join(map(str, week_prices))} = RM {total:.2f}")
print(f"\nStep 2: Divide by number of days")
print(f"  RM {total:.2f} √∑ {count} = RM {mean_manual:.4f}")
print(f"\nManual calculation: RM {mean_manual:.4f}")
print(f"NumPy calculation:  RM {mean_numpy:.4f}")
print(f"\nThey match! ‚úì")

### Applying to Real Stock Data

Now let's calculate the mean price for Maybank in 2023:

In [None]:
# Extract closing prices
maybank_close = maybank['Close']

# Calculate mean
maybank_mean = maybank_close.mean()

# Also calculate other useful statistics
maybank_min = maybank_close.min()
maybank_max = maybank_close.max()
maybank_first = maybank_close.iloc[0]
maybank_last = maybank_close.iloc[-1]

print("=" * 60)
print("MAYBANK (1155.KL) - 2023 PRICE STATISTICS")
print("=" * 60)
print(f"\nNumber of trading days: {len(maybank_close)}")
print(f"\nFirst trading day (Jan 2023): RM {maybank_first:.4f}")
print(f"Last trading day (Dec 2023):  RM {maybank_last:.4f}")
print(f"\nLowest price:  RM {maybank_min:.4f}")
print(f"Highest price: RM {maybank_max:.4f}")
print(f"\nMean (Average) price: RM {maybank_mean:.4f}")
print(f"\nInterpretation:")
print(f"  The typical price for Maybank in 2023 was around RM {maybank_mean:.2f}")

### Visualizing the Mean

In [None]:
# Plot price over time with mean line
plt.figure(figsize=(14, 6))

# Plot actual prices
plt.plot(maybank_close.index, maybank_close.values, 
         linewidth=1.5, alpha=0.7, label='Daily Close Price')

# Plot mean as horizontal line
plt.axhline(y=maybank_mean, color='red', linestyle='--', 
            linewidth=2, label=f'Mean Price (RM {maybank_mean:.2f})')

plt.title('Maybank (1155.KL) - Daily Close Price vs Mean (2023)', 
          fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price (RM)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate how often price is above vs below mean
above_mean = (maybank_close > maybank_mean).sum()
below_mean = (maybank_close < maybank_mean).sum()
at_mean = (maybank_close == maybank_mean).sum()

print(f"Days above mean: {above_mean} ({above_mean/len(maybank_close)*100:.1f}%)")
print(f"Days below mean: {below_mean} ({below_mean/len(maybank_close)*100:.1f}%)")
print(f"Days at mean: {at_mean}")

### Key Insight: Mean as Center of Gravity

The mean acts like a **center of gravity** for the data:
- Prices fluctuate around the mean
- Roughly 50% of days above, 50% below (if data is balanced)
- When price moves far from mean, it often returns (mean reversion)

**This is the basis for many trading strategies!**

---

## Part 2: The Median (Middle Value)

### What is the Median?

The **median** is the middle value when data is sorted from smallest to largest.

### How to Calculate

1. **Sort all values** from smallest to largest
2. **Find the middle**:
   - If odd number of values: take the middle one
   - If even number of values: average the two middle ones

### Formula

For sorted data $x_1, x_2, ..., x_n$:

$$
\text{Median} = \begin{cases}
x_{(n+1)/2} & \text{if } n \text{ is odd} \\
\frac{x_{n/2} + x_{(n/2)+1}}{2} & \text{if } n \text{ is even}
\end{cases}
$$

In [None]:
# Example 1: Odd number of values (5 days)
prices_odd = [8.50, 8.55, 8.45, 8.60, 8.52]
sorted_odd = sorted(prices_odd)
median_odd = np.median(prices_odd)

print("=" * 60)
print("EXAMPLE 1: ODD NUMBER OF VALUES (5 days)")
print("=" * 60)
print(f"Original: {prices_odd}")
print(f"Sorted:   {sorted_odd}")
print(f"\nMiddle position: {len(sorted_odd)//2 + 1}")
print(f"Median (middle value): RM {sorted_odd[len(sorted_odd)//2]:.2f}")
print(f"NumPy median: RM {median_odd:.2f}")

# Example 2: Even number of values (6 days)
prices_even = [8.50, 8.55, 8.45, 8.60, 8.52, 8.48]
sorted_even = sorted(prices_even)
middle1 = sorted_even[len(sorted_even)//2 - 1]
middle2 = sorted_even[len(sorted_even)//2]
median_even = np.median(prices_even)

print("\n" + "=" * 60)
print("EXAMPLE 2: EVEN NUMBER OF VALUES (6 days)")
print("=" * 60)
print(f"Original: {prices_even}")
print(f"Sorted:   {sorted_even}")
print(f"\nTwo middle values: RM {middle1:.2f} and RM {middle2:.2f}")
print(f"Median (average of middles): RM {(middle1 + middle2)/2:.2f}")
print(f"NumPy median: RM {median_even:.2f}")

### Mean vs Median: Which is Better?

Let's see when median is more useful than mean:

In [None]:
# Scenario: Normal week vs week with extreme price spike
normal_week = [8.50, 8.55, 8.52, 8.48, 8.53]
spike_week = [8.50, 8.55, 8.52, 8.48, 12.00]  # Huge spike on last day!

print("=" * 60)
print("COMPARING MEAN VS MEDIAN WITH OUTLIERS")
print("=" * 60)

print("\nNormal Week (no outliers):")
print(f"  Prices: {normal_week}")
print(f"  Mean:   RM {np.mean(normal_week):.2f}")
print(f"  Median: RM {np.median(normal_week):.2f}")
print(f"  Difference: RM {abs(np.mean(normal_week) - np.median(normal_week)):.2f}")

print("\nWeek with Spike (outlier present):")
print(f"  Prices: {spike_week}")
print(f"  Mean:   RM {np.mean(spike_week):.2f} ‚Üê Pulled up by spike!")
print(f"  Median: RM {np.median(spike_week):.2f} ‚Üê Not affected by spike")
print(f"  Difference: RM {abs(np.mean(spike_week) - np.median(spike_week)):.2f}")

print("\n" + "=" * 60)
print("KEY INSIGHT:")
print("=" * 60)
print("‚Ä¢ Mean is sensitive to outliers (extreme values)")
print("‚Ä¢ Median is resistant to outliers")
print("‚Ä¢ Use median when data has extreme spikes or gaps")

In [None]:
# Calculate mean and median for Maybank 2023
maybank_mean = maybank_close.mean()
maybank_median = maybank_close.median()

print("Maybank (1155.KL) - 2023:")
print(f"Mean:   RM {maybank_mean:.4f}")
print(f"Median: RM {maybank_median:.4f}")
print(f"Difference: RM {abs(maybank_mean - maybank_median):.4f}")
print(f"\nSince the difference is small, Maybank had relatively balanced price distribution.")

---

## Part 3: The Mode (Most Frequent Value)

### What is the Mode?

The **mode** is the value that appears most frequently in the data.

### Why Mode is Less Useful for Stock Prices

Stock prices are **continuous** (can be any value like 8.5234), so:
- Exact same price rarely repeats
- Mode is less meaningful than mean or median

**Mode is more useful for:**
- Discrete data (whole numbers)
- Categorical data (e.g., most common sector)
- Volume analysis (most common trading volume range)

In [None]:
# Demonstrate why mode is tricky with continuous data
# Round prices to 2 decimal places to find approximate mode
maybank_rounded = maybank_close.round(2)
mode_value = maybank_rounded.mode()

print("Looking for mode in Maybank prices:")
print(f"\nTotal unique prices (rounded to 2 decimals): {maybank_rounded.nunique()}")

if len(mode_value) > 0:
    print(f"Mode (most common price): RM {mode_value.iloc[0]:.2f}")
    occurrences = (maybank_rounded == mode_value.iloc[0]).sum()
    print(f"This price appeared {occurrences} times out of {len(maybank_rounded)} days")
else:
    print("No mode found - all prices are unique!")

print("\n‚Üí This is why we don't use mode for stock price analysis.")
print("‚Üí Mean and median are much more useful!")

---

## Part 4: Why Averaging Matters - Reducing Noise

Stock prices have two components:
1. **Signal** (true trend)
2. **Noise** (random fluctuations)

**Averaging helps filter out noise and reveal the underlying trend.**

### Demonstration: Rolling Mean (Moving Average)

In [None]:
# Calculate moving averages of different periods
maybank_ma5 = maybank_close.rolling(window=5).mean()   # 5-day MA
maybank_ma20 = maybank_close.rolling(window=20).mean()  # 20-day MA
maybank_ma50 = maybank_close.rolling(window=50).mean()  # 50-day MA

# Visualize noise reduction
plt.figure(figsize=(14, 7))

plt.plot(maybank_close.index, maybank_close.values, 
         linewidth=1, alpha=0.5, label='Daily Price (Noisy)', color='gray')
plt.plot(maybank_ma5.index, maybank_ma5.values, 
         linewidth=1.5, label='5-Day Average', color='blue')
plt.plot(maybank_ma20.index, maybank_ma20.values, 
         linewidth=2, label='20-Day Average', color='orange')
plt.plot(maybank_ma50.index, maybank_ma50.values, 
         linewidth=2.5, label='50-Day Average', color='red')

plt.title('Maybank - How Averaging Reduces Noise (2023)', 
          fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price (RM)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("OBSERVATIONS:")
print("‚Ä¢ Gray line (daily price) = Most noisy, shows all fluctuations")
print("‚Ä¢ Blue line (5-day MA) = Some smoothing, still responsive")
print("‚Ä¢ Orange line (20-day MA) = Smoother, clearer trend")
print("‚Ä¢ Red line (50-day MA) = Very smooth, shows major trend only")
print("\n‚Üí Longer averaging period = More noise reduction = Smoother line")

### The Trade-off: Smoothness vs Responsiveness

**Shorter average (5-day)**:
- ‚úÖ Responds quickly to price changes
- ‚ùå Still contains noise
- üìä Good for short-term trading

**Longer average (50-day)**:
- ‚úÖ Very smooth, clear trend
- ‚ùå Slow to respond (lags behind)
- üìä Good for long-term trend identification

**This is why traders use multiple averages together!**

---

## Part 5: Weighted Averages

### The Problem with Simple Averages

In a simple average, all values have **equal importance**:
- Yesterday's price = Same weight as 20 days ago
- But recent prices are usually more relevant!

### Solution: Weighted Average

Give **more weight** to recent values, **less weight** to older values.

### Formula

$$
\text{Weighted Mean} = \frac{w_1 x_1 + w_2 x_2 + ... + w_n x_n}{w_1 + w_2 + ... + w_n} = \frac{\sum_{i=1}^{n} w_i x_i}{\sum_{i=1}^{n} w_i}
$$

Where:
- $w_i$ = weight for value $x_i$
- Weights must sum to 1 (or be normalized)

### Example: Recent Days More Important

In [None]:
# Example: 5-day price data
prices = np.array([8.40, 8.45, 8.50, 8.55, 8.60])
days = ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5 (Today)']

# Simple average - all weights equal (0.2 each)
simple_avg = np.mean(prices)

# Weighted average - more weight on recent days
# Weights: oldest ‚Üí newest = [0.1, 0.1, 0.2, 0.25, 0.35]
weights = np.array([0.10, 0.10, 0.20, 0.25, 0.35])
weighted_avg = np.average(prices, weights=weights)

print("=" * 60)
print("SIMPLE AVERAGE VS WEIGHTED AVERAGE")
print("=" * 60)
print("\nPrices:")
for day, price in zip(days, prices):
    print(f"  {day}: RM {price:.2f}")

print("\n" + "-" * 60)
print("SIMPLE AVERAGE (all days equally important):")
print("-" * 60)
print("Weights: [0.20, 0.20, 0.20, 0.20, 0.20]")
print(f"Calculation: ({' + '.join(map(str, prices))}) / 5")
print(f"Result: RM {simple_avg:.4f}")

print("\n" + "-" * 60)
print("WEIGHTED AVERAGE (recent days more important):")
print("-" * 60)
print(f"Weights: {weights}")
print("Calculation:")
for i, (price, weight) in enumerate(zip(prices, weights)):
    print(f"  Day {i+1}: RM {price:.2f} √ó {weight:.2f} = {price * weight:.4f}")
print(f"  Sum: {(prices * weights).sum():.4f}")
print(f"Result: RM {weighted_avg:.4f}")

print("\n" + "=" * 60)
print(f"Simple average:   RM {simple_avg:.4f}")
print(f"Weighted average: RM {weighted_avg:.4f}")
print(f"Difference:       RM {abs(weighted_avg - simple_avg):.4f}")
print("\n‚Üí Weighted average is HIGHER because recent prices (8.55, 8.60) get more weight")
print("‚Üí This makes it more responsive to recent price movements!")

### Why Weighted Averages Matter in Trading

**Exponential Moving Average (EMA)** uses weighted averaging:
- Recent prices get exponentially more weight
- Responds faster to new price movements
- More sensitive to trend changes

**We'll cover EMA mathematics in detail in Module 04.**

For now, understand:
- Simple average = all equal weights
- Weighted average = different weights (usually more for recent data)
- EMA = special weighted average with exponential decay

---

## Part 6: Introduction to Moving Averages

### What is a Moving Average?

A **moving average** is an average that "moves" through time:
- Calculate average of last N days
- Move forward one day
- Calculate again with new N-day window
- Repeat for all days

### Simple Moving Average (SMA) Formula

For a period of $n$ days:

$$
SMA_t = \frac{P_t + P_{t-1} + P_{t-2} + ... + P_{t-n+1}}{n}
$$

Where:
- $SMA_t$ = moving average at time $t$
- $P_t$ = price at time $t$
- $n$ = number of periods (days)

### Manual Calculation Example

In [None]:
# Take first 10 days of Maybank data
sample_prices = maybank_close.head(10).values
sample_dates = maybank_close.head(10).index

# Calculate 5-day SMA manually
window = 5
print("=" * 70)
print(f"CALCULATING {window}-DAY SIMPLE MOVING AVERAGE (SMA)")
print("=" * 70)
print(f"\nPrice data (first 10 days):")
for i, (date, price) in enumerate(zip(sample_dates, sample_prices)):
    print(f"  Day {i+1} ({date.strftime('%Y-%m-%d')}): RM {price:.4f}")

print(f"\n{'='*70}")
print(f"MANUAL SMA CALCULATION (window={window}):")
print(f"{'='*70}")

# Can only start calculating SMA from day 5 (need 5 days of data)
for i in range(window - 1, len(sample_prices)):
    window_prices = sample_prices[i - window + 1 : i + 1]
    sma = np.mean(window_prices)
    
    print(f"\nDay {i+1} SMA:")
    print(f"  Window: Days {i - window + 2} to {i + 1}")
    print(f"  Prices: {[f'{p:.4f}' for p in window_prices]}")
    print(f"  Average: {' + '.join([f'{p:.4f}' for p in window_prices])} / {window}")
    print(f"  SMA = RM {sma:.4f}")

# Verify with pandas rolling
print(f"\n{'='*70}")
print("VERIFICATION WITH PANDAS:")
print(f"{'='*70}")
sma_pandas = maybank_close.head(10).rolling(window=window).mean()
print(sma_pandas)
print("\n‚Üí Our manual calculations match pandas rolling mean! ‚úì")

### Key Properties of SMA

1. **Lag**: SMA lags behind price because it uses historical data
2. **Smoothing**: Longer period = smoother line
3. **Equal weights**: All days in window have equal importance
4. **Drops old data**: When window moves, oldest day is dropped

**In Module 04, we'll dive deeper into SMA and learn about EMA!**

---

## Part 7: Exercises

Time to practice! Try these exercises to solidify your understanding.

### Exercise 1: Calculate Central Tendency for Top Glove

Calculate mean, median, and compare them for Top Glove (5225.KL).

**Tasks**:
1. Extract Top Glove closing prices
2. Calculate mean and median
3. Determine if there's a significant difference
4. Interpret what this tells you about the price distribution

In [None]:
# Your code here




<details>
<summary><b>Click here for solution</b></summary>

```python
# Extract closing prices
topglove_close = topglove['Close']

# Calculate statistics
tg_mean = topglove_close.mean()
tg_median = topglove_close.median()
tg_min = topglove_close.min()
tg_max = topglove_close.max()

print("Top Glove (5225.KL) - 2023 Statistics:")
print(f"Mean:   RM {tg_mean:.4f}")
print(f"Median: RM {tg_median:.4f}")
print(f"Min:    RM {tg_min:.4f}")
print(f"Max:    RM {tg_max:.4f}")
print(f"\nDifference (mean - median): RM {abs(tg_mean - tg_median):.4f}")
print(f"Percentage difference: {abs(tg_mean - tg_median) / tg_median * 100:.2f}%")

if tg_mean > tg_median:
    print("\nInterpretation: Mean > Median")
    print("‚Üí Distribution is right-skewed (positive skew)")
    print("‚Üí There were some high price outliers pulling the mean up")
else:
    print("\nInterpretation: Mean < Median")
    print("‚Üí Distribution is left-skewed (negative skew)")
    print("‚Üí There were some low price outliers pulling the mean down")
```
</details>

### Exercise 2: Compare Volatility Around Mean

Which stock stays closer to its mean - Maybank or Top Glove?

**Hint**: Calculate the average absolute deviation from the mean:
$$\text{Avg Deviation} = \frac{1}{n}\sum_{i=1}^{n} |x_i - \bar{x}|$$

**Tasks**:
1. Calculate mean for both stocks
2. Calculate absolute deviation from mean for each day
3. Calculate average absolute deviation
4. Compare and interpret results

In [None]:
# Your code here




<details>
<summary><b>Click here for solution</b></summary>

```python
# Get closing prices
maybank_close = maybank['Close']
topglove_close = topglove['Close']

# Calculate means
maybank_mean = maybank_close.mean()
topglove_mean = topglove_close.mean()

# Calculate absolute deviations
maybank_deviations = np.abs(maybank_close - maybank_mean)
topglove_deviations = np.abs(topglove_close - topglove_mean)

# Calculate average absolute deviation
maybank_avg_dev = maybank_deviations.mean()
topglove_avg_dev = topglove_deviations.mean()

print("Average Deviation from Mean:")
print(f"\nMaybank:")
print(f"  Mean price: RM {maybank_mean:.4f}")
print(f"  Avg deviation: RM {maybank_avg_dev:.4f}")
print(f"  As percentage: {maybank_avg_dev / maybank_mean * 100:.2f}%")

print(f"\nTop Glove:")
print(f"  Mean price: RM {topglove_mean:.4f}")
print(f"  Avg deviation: RM {topglove_avg_dev:.4f}")
print(f"  As percentage: {topglove_avg_dev / topglove_mean * 100:.2f}%")

if maybank_avg_dev < topglove_avg_dev:
    print(f"\n‚Üí Maybank stays closer to its mean (more stable)")
else:
    print(f"\n‚Üí Top Glove stays closer to its mean (more stable)")
```
</details>

### Exercise 3: Custom Weighted Average

Create a custom weighted average that gives:
- 40% weight to today's price
- 30% weight to yesterday's price
- 20% weight to 2 days ago
- 10% weight to 3 days ago

**Tasks**:
1. Take the last 4 days of CIMB data
2. Apply the weights above
3. Compare with simple average of the same 4 days
4. Explain why they're different

In [None]:
# Your code here




<details>
<summary><b>Click here for solution</b></summary>

```python
# Get last 4 days of CIMB data
cimb_close = cimb['Close']
last_4_days = cimb_close.tail(4).values

# Define weights (oldest to newest)
weights = np.array([0.10, 0.20, 0.30, 0.40])

# Calculate simple average
simple_avg = np.mean(last_4_days)

# Calculate weighted average
weighted_avg = np.average(last_4_days, weights=weights)

print("CIMB Last 4 Days Analysis:")
print(f"\nPrices (oldest to newest): {last_4_days}")
print(f"Weights: {weights}")
print(f"\nSimple average: RM {simple_avg:.4f}")
print(f"Weighted average: RM {weighted_avg:.4f}")
print(f"Difference: RM {abs(weighted_avg - simple_avg):.4f}")

# Detailed calculation
print("\nWeighted calculation breakdown:")
for i, (price, weight) in enumerate(zip(last_4_days, weights)):
    print(f"  Day {i+1}: RM {price:.4f} √ó {weight} = {price * weight:.4f}")
print(f"  Total: {(last_4_days * weights).sum():.4f}")

if last_4_days[-1] > last_4_days[0]:
    print("\nExplanation: Prices are rising, so weighted average")
    print("(which favors recent prices) is higher than simple average.")
else:
    print("\nExplanation: Prices are falling, so weighted average")
    print("(which favors recent prices) is lower than simple average.")
```
</details>

### Exercise 4: Moving Average Crossover

A common trading signal: when a short-period MA crosses above a long-period MA.

**Tasks**:
1. Calculate 10-day and 30-day SMAs for Maybank
2. Find dates where the 10-day MA crosses above the 30-day MA (bullish signal)
3. Find dates where the 10-day MA crosses below the 30-day MA (bearish signal)
4. Visualize these crossover points

In [None]:
# Your code here




<details>
<summary><b>Click here for solution</b></summary>

```python
# Calculate moving averages
maybank_close = maybank['Close']
ma_10 = maybank_close.rolling(window=10).mean()
ma_30 = maybank_close.rolling(window=30).mean()

# Find crossovers
# Bullish: MA10 crosses above MA30 (was below, now above)
bullish_cross = (ma_10 > ma_30) & (ma_10.shift(1) <= ma_30.shift(1))
# Bearish: MA10 crosses below MA30 (was above, now below)
bearish_cross = (ma_10 < ma_30) & (ma_10.shift(1) >= ma_30.shift(1))

# Get dates
bullish_dates = maybank_close[bullish_cross].index
bearish_dates = maybank_close[bearish_cross].index

print(f"Bullish crossovers (10-day crosses above 30-day): {len(bullish_dates)}")
for date in bullish_dates:
    print(f"  {date.strftime('%Y-%m-%d')}")

print(f"\nBearish crossovers (10-day crosses below 30-day): {len(bearish_dates)}")
for date in bearish_dates:
    print(f"  {date.strftime('%Y-%m-%d')}")

# Visualize
plt.figure(figsize=(14, 7))
plt.plot(maybank_close.index, maybank_close.values, 
         linewidth=1, alpha=0.5, label='Price', color='gray')
plt.plot(ma_10.index, ma_10.values, 
         linewidth=1.5, label='10-Day MA', color='blue')
plt.plot(ma_30.index, ma_30.values, 
         linewidth=2, label='30-Day MA', color='red')

# Mark crossovers
plt.scatter(bullish_dates, ma_10[bullish_cross], 
           marker='^', s=200, color='green', label='Bullish Cross', zorder=5)
plt.scatter(bearish_dates, ma_10[bearish_cross], 
           marker='v', s=200, color='red', label='Bearish Cross', zorder=5)

plt.title('Maybank - Moving Average Crossovers (2023)', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price (RM)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
```
</details>

---

## Summary

Excellent work! You've completed Module 01. Let's review:

### Key Concepts Mastered

1. **Mean (Arithmetic Average)**
   - Formula: $\bar{x} = \frac{\sum x_i}{n}$
   - Acts as "center of gravity" for data
   - Sensitive to outliers

2. **Median (Middle Value)**
   - Middle value when data is sorted
   - Resistant to outliers
   - Better than mean when data has extreme values

3. **Mode (Most Frequent)**
   - Less useful for continuous stock prices
   - More useful for discrete/categorical data

4. **Noise Reduction Through Averaging**
   - Averaging smooths out random fluctuations
   - Longer averaging period = More smoothing
   - Trade-off: Smoothness vs Responsiveness

5. **Weighted Averages**
   - Give different importance to different values
   - Recent data often gets more weight
   - Foundation for Exponential Moving Average (EMA)

6. **Simple Moving Average (SMA)**
   - Average of last N periods
   - "Moves" through time
   - All values in window have equal weight

### How This Connects to Technical Indicators

Everything you learned here appears in indicators:
- **SMA** (Module 04): Direct application of mean
- **EMA** (Module 04): Weighted average with exponential decay
- **RSI** (Module 05): Uses average of gains vs losses
- **Bollinger Bands** (Module 06): Standard deviation around SMA
- **MACD** (Module 07): Difference between two EMAs

### What's Next?

In **Module 02: Spread and Variation**, you'll learn:
- Range, variance, and standard deviation
- How to measure stock volatility mathematically
- Why standard deviation is crucial for risk management
- The mathematical foundation of Bollinger Bands

### Additional Practice

Before Module 02, try:
1. Calculate moving averages for different window sizes (5, 10, 20, 50, 200)
2. Compare mean vs median for all Malaysian banking stocks
3. Create custom weighted averages with different weight schemes
4. Identify moving average crossovers in other Malaysian stocks

---

## Additional Resources

### Further Reading
- [Investopedia: Moving Average](https://www.investopedia.com/terms/m/movingaverage.asp)
- [Investopedia: Central Tendency](https://www.investopedia.com/terms/c/central-tendency.asp)
- [Khan Academy: Mean, Median, Mode](https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data)

### Python Documentation
- [NumPy mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html)
- [NumPy median](https://numpy.org/doc/stable/reference/generated/numpy.median.html)
- [NumPy average (weighted)](https://numpy.org/doc/stable/reference/generated/numpy.average.html)
- [Pandas rolling](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html)

---

**Congratulations!** You now understand the mathematical foundation of moving averages. Ready for Module 02?