# Module 08: Correlation and Relationships

**Difficulty**: ‚≠ê‚≠ê Intermediate  
**Estimated Time**: 60 minutes  
**Prerequisites**: 
- Module 02: Spread and Variation (standard deviation, variance)
- Module 03: Percentages, Ratios, and Changes (returns calculation)
- Basic understanding of covariance

## Learning Objectives

By the end of this notebook, you will be able to:
1. **Calculate the correlation coefficient** between two stocks mathematically
2. **Interpret correlation values** from -1 (perfect negative) to +1 (perfect positive)
3. **Analyze sector correlations** in KLSE (banking, gloves, tech sectors)
4. **Create correlation matrices** for multiple stocks
5. **Use correlation for portfolio diversification** to reduce risk
6. **Understand the limitations** of correlation (correlation ‚â† causation)

---

## What is Correlation?

**Correlation** measures the **strength and direction** of the relationship between two variables.

In stock trading, correlation tells you:
- **Do two stocks move together?** (positive correlation)
- **Do they move in opposite directions?** (negative correlation)
- **Are they unrelated?** (zero correlation)

### The Correlation Coefficient (Pearson's r):

$$r = \frac{\text{Cov}(X, Y)}{\sigma_X \times \sigma_Y}$$

Where:
- $\text{Cov}(X, Y)$ = Covariance between X and Y
- $\sigma_X$ = Standard deviation of X
- $\sigma_Y$ = Standard deviation of Y

**Expanded formula**:

$$r = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n} (X_i - \bar{X})^2} \times \sqrt{\sum_{i=1}^{n} (Y_i - \bar{Y})^2}}$$

### Interpreting Correlation:

| Correlation (r) | Interpretation | Example |
|-----------------|----------------|----------|
| **r = +1** | Perfect positive correlation | Two stocks always move together |
| **r = +0.7 to +1** | Strong positive correlation | Banking stocks (Maybank, CIMB) |
| **r = +0.3 to +0.7** | Moderate positive correlation | Different sectors, same economy |
| **r = -0.3 to +0.3** | Weak/no correlation | Unrelated stocks |
| **r = -0.7 to -0.3** | Moderate negative correlation | Inverse relationship |
| **r = -1 to -0.7** | Strong negative correlation | Hedged positions |
| **r = -1** | Perfect negative correlation | Perfect hedge |

Let's calculate correlation using **real Malaysian stocks**!

---

In [None]:
# Setup and imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
from datetime import datetime, timedelta
import warnings

# Configuration
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
warnings.filterwarnings('ignore')
np.random.seed(42)

# Plot settings
plt.rcParams['figure.figsize'] = (14, 6)
plt.rcParams['font.size'] = 10

print("‚úì Libraries imported successfully")
print(f"Today's date: {datetime.now().strftime('%Y-%m-%d')}")

## 1. Calculating Correlation Between Two Stocks

Let's start by calculating the correlation between **two Malaysian banking stocks**: Maybank (1155.KL) and CIMB (1023.KL).

These should have **high positive correlation** because:
- Same sector (banking)
- Same economy (Malaysia)
- Similar business models
- Affected by same interest rate policies

### Step-by-Step Calculation:

1. Download stock prices
2. Calculate daily returns for both stocks
3. Calculate covariance of returns
4. Calculate standard deviations
5. Compute correlation: r = Cov(X,Y) / (œÉ_X √ó œÉ_Y)

---

In [None]:
# Download data for two banking stocks
tickers = ['1155.KL', '1023.KL']  # Maybank, CIMB
names = ['Maybank', 'CIMB']
start_date = '2023-01-01'
end_date = '2024-01-01'

print(f"Downloading banking stocks data from {start_date} to {end_date}...")
print("="*70)

# Download both stocks
maybank = yf.download('1155.KL', start=start_date, end=end_date, progress=False)['Close']
cimb = yf.download('1023.KL', start=start_date, end=end_date, progress=False)['Close']

# Combine into DataFrame
banking_stocks = pd.DataFrame({
    'Maybank': maybank,
    'CIMB': cimb
})

# Remove any missing data
banking_stocks = banking_stocks.dropna()

print(f"\n‚úì Downloaded {len(banking_stocks)} trading days")
print("\nFirst few rows:")
print(banking_stocks.head())

print("\nPrice Statistics:")
print(banking_stocks.describe())

In [None]:
# Calculate daily returns
returns = banking_stocks.pct_change().dropna()

print("Daily Returns Calculated:")
print("="*70)
print(returns.head(10))

print("\nReturns Statistics:")
print(returns.describe())

In [None]:
# Calculate correlation from scratch
def calculate_correlation_manual(x, y):
    """
    Calculate Pearson correlation coefficient manually.
    
    Parameters:
    -----------
    x, y : array-like
        Two data series
    
    Returns:
    --------
    float : Correlation coefficient
    """
    # Step 1: Calculate means
    x_mean = np.mean(x)
    y_mean = np.mean(y)
    
    # Step 2: Calculate deviations from mean
    x_dev = x - x_mean
    y_dev = y - y_mean
    
    # Step 3: Calculate covariance (numerator)
    covariance = np.sum(x_dev * y_dev) / (len(x) - 1)
    
    # Step 4: Calculate standard deviations
    std_x = np.sqrt(np.sum(x_dev**2) / (len(x) - 1))
    std_y = np.sqrt(np.sum(y_dev**2) / (len(y) - 1))
    
    # Step 5: Calculate correlation
    correlation = covariance / (std_x * std_y)
    
    return correlation, covariance, std_x, std_y


# Calculate correlation manually
corr_manual, cov, std_maybank, std_cimb = calculate_correlation_manual(
    returns['Maybank'].values, 
    returns['CIMB'].values
)

# Verify with pandas built-in
corr_pandas = returns['Maybank'].corr(returns['CIMB'])

print("Correlation Calculation (Step by Step):")
print("="*70)
print(f"Covariance: {cov:.8f}")
print(f"Std Dev (Maybank): {std_maybank:.6f}")
print(f"Std Dev (CIMB): {std_cimb:.6f}")
print(f"\nCorrelation (Manual): {corr_manual:.6f}")
print(f"Correlation (Pandas): {corr_pandas:.6f}")
print(f"Match: {np.isclose(corr_manual, corr_pandas)}")

print("\nüìä Interpretation:")
print(f"Maybank and CIMB have a correlation of {corr_manual:.4f}")
if corr_manual > 0.7:
    print("This is a STRONG POSITIVE correlation - they move together!")
elif corr_manual > 0.3:
    print("This is a MODERATE POSITIVE correlation.")
else:
    print("This is a WEAK correlation.")

In [None]:
# Visualize the relationship
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left panel: Scatter plot of returns
axes[0].scatter(returns['Maybank'], returns['CIMB'], alpha=0.5, s=30)
axes[0].set_xlabel('Maybank Returns', fontsize=11)
axes[0].set_ylabel('CIMB Returns', fontsize=11)
axes[0].set_title(f'Daily Returns Correlation: r = {corr_manual:.4f}', 
                  fontsize=12, fontweight='bold')
axes[0].axhline(y=0, color='black', linestyle='--', linewidth=0.8, alpha=0.5)
axes[0].axvline(x=0, color='black', linestyle='--', linewidth=0.8, alpha=0.5)
axes[0].grid(True, alpha=0.3)

# Add regression line
z = np.polyfit(returns['Maybank'], returns['CIMB'], 1)
p = np.poly1d(z)
axes[0].plot(returns['Maybank'], p(returns['Maybank']), 
             "r--", linewidth=2, label=f'Best fit line')
axes[0].legend(loc='best')

# Right panel: Time series of normalized prices
# Normalize to start at 100
normalized = (banking_stocks / banking_stocks.iloc[0]) * 100
axes[1].plot(normalized.index, normalized['Maybank'], label='Maybank', linewidth=2)
axes[1].plot(normalized.index, normalized['CIMB'], label='CIMB', linewidth=2)
axes[1].set_ylabel('Normalized Price (Base = 100)', fontsize=11)
axes[1].set_xlabel('Date', fontsize=11)
axes[1].set_title('Price Movement Comparison', fontsize=12, fontweight='bold')
axes[1].legend(loc='best')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä Chart Interpretation:")
print("="*70)
print("Left: Scatter plot shows positive correlation (upward slope)")
print("      When Maybank returns are positive, CIMB tends to be positive too")
print("\nRight: Normalized prices show both stocks move in similar patterns")
print("       This is expected - both are Malaysian banking stocks!")

## 2. Sector Correlation Analysis

Now let's analyze correlation **across different sectors** in KLSE:

### Stocks We'll Analyze:

| Ticker | Company | Sector |
|--------|---------|--------|
| 1155.KL | Maybank | Banking |
| 1023.KL | CIMB | Banking |
| 5225.KL | Top Glove | Healthcare (Gloves) |
| 6888.KL | Axiata | Telecommunications |
| 5347.KL | Petronas Chemicals | Energy/Chemicals |

### Expected Correlations:

- **High correlation**: Stocks in same sector (Maybank ‚Üî CIMB)
- **Moderate correlation**: Different sectors, same economy
- **Low correlation**: Unrelated sectors with different drivers

Let's create a **correlation matrix** to see all relationships at once!

---

In [None]:
# Download multiple stocks from different sectors
stock_tickers = {
    'Maybank': '1155.KL',
    'CIMB': '1023.KL',
    'Top Glove': '5225.KL',
    'Axiata': '6888.KL',
    'Petronas Chem': '5347.KL'
}

print("Downloading multi-sector Malaysian stocks...")
print("="*70)

# Download all stocks
all_stocks = pd.DataFrame()

for name, ticker in stock_tickers.items():
    try:
        data = yf.download(ticker, start=start_date, end=end_date, progress=False)['Close']
        all_stocks[name] = data
        print(f"‚úì {name} ({ticker}): {len(data)} days")
    except Exception as e:
        print(f"‚úó {name} ({ticker}): Failed - {e}")

# Remove any missing data
all_stocks = all_stocks.dropna()

print(f"\n‚úì Total: {len(all_stocks)} trading days with complete data")
print("\nFirst few rows:")
print(all_stocks.head())

In [None]:
# Calculate returns for all stocks
all_returns = all_stocks.pct_change().dropna()

print("Daily Returns for All Stocks:")
print("="*70)
print(all_returns.describe())

# Calculate correlation matrix
correlation_matrix = all_returns.corr()

print("\nüìä Correlation Matrix:")
print("="*70)
print(correlation_matrix.round(4))

In [None]:
# Visualize correlation matrix with heatmap
fig, ax = plt.subplots(figsize=(10, 8))

# Create heatmap
sns.heatmap(correlation_matrix, 
            annot=True,           # Show correlation values
            fmt='.3f',            # 3 decimal places
            cmap='RdYlGn',        # Red-Yellow-Green colormap
            center=0,             # Center colormap at 0
            vmin=-1, vmax=1,      # Correlation range
            square=True,          # Square cells
            linewidths=1,         # Cell borders
            cbar_kws={'label': 'Correlation Coefficient'},
            ax=ax)

ax.set_title('Malaysian Stocks - Correlation Matrix\n(Based on Daily Returns)', 
             fontsize=14, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

print("\nüìä Heatmap Interpretation:")
print("="*70)
print("üü¢ Green: Positive correlation (stocks move together)")
print("üü° Yellow: Weak/no correlation")
print("üî¥ Red: Negative correlation (stocks move opposite)")
print("\nDiagonal: Always 1.0 (stock is perfectly correlated with itself)")

In [None]:
# Analyze sector correlations
def analyze_correlation_pairs(corr_matrix):
    """
    Extract and analyze correlation pairs.
    """
    # Get upper triangle (avoid duplicates)
    pairs = []
    
    for i in range(len(corr_matrix.columns)):
        for j in range(i+1, len(corr_matrix.columns)):
            stock1 = corr_matrix.columns[i]
            stock2 = corr_matrix.columns[j]
            corr_val = corr_matrix.iloc[i, j]
            
            pairs.append({
                'Stock 1': stock1,
                'Stock 2': stock2,
                'Correlation': corr_val
            })
    
    df = pd.DataFrame(pairs)
    df = df.sort_values('Correlation', ascending=False)
    
    return df


# Analyze pairs
correlation_pairs = analyze_correlation_pairs(correlation_matrix)

print("Stock Pair Correlations (Ranked):")
print("="*70)
print(correlation_pairs.to_string(index=False))

# Identify strongest relationships
print("\nüîç Key Findings:")
print("="*70)

strongest = correlation_pairs.iloc[0]
print(f"\nüèÜ Strongest Correlation: {strongest['Stock 1']} ‚Üî {strongest['Stock 2']}")
print(f"   Correlation: {strongest['Correlation']:.4f}")
if 'Maybank' in [strongest['Stock 1'], strongest['Stock 2']] and \
   'CIMB' in [strongest['Stock 1'], strongest['Stock 2']]:
    print("   ‚úì Both are banking stocks - same sector correlation!")

weakest = correlation_pairs.iloc[-1]
print(f"\nüìä Weakest Correlation: {weakest['Stock 1']} ‚Üî {weakest['Stock 2']}")
print(f"   Correlation: {weakest['Correlation']:.4f}")
print("   ‚úì Different sectors - more diversification benefit!")

## 3. Portfolio Diversification Using Correlation

**Why does correlation matter for portfolios?**

The mathematical reason is **portfolio variance reduction**:

### Portfolio Risk Formula (Two Assets):

For a portfolio with two assets (weights $w_1$ and $w_2$):

$$\sigma_p^2 = w_1^2 \sigma_1^2 + w_2^2 \sigma_2^2 + 2 w_1 w_2 \rho_{12} \sigma_1 \sigma_2$$

Where:
- $\sigma_p$ = Portfolio standard deviation (risk)
- $\sigma_1, \sigma_2$ = Individual stock standard deviations
- $\rho_{12}$ = Correlation coefficient
- $w_1, w_2$ = Portfolio weights

### Key Insight:

The **third term** ($2 w_1 w_2 \rho_{12} \sigma_1 \sigma_2$) depends on correlation!

- **If œÅ = +1**: No diversification benefit (stocks move together)
- **If œÅ = 0**: Moderate diversification (stocks independent)
- **If œÅ = -1**: Maximum diversification (perfect hedge)

Let's demonstrate this with real Malaysian stocks!

---

In [None]:
# Calculate portfolio risk for different stock combinations
def calculate_portfolio_risk(returns1, returns2, weight1=0.5, weight2=0.5):
    """
    Calculate portfolio standard deviation for two assets.
    
    Parameters:
    -----------
    returns1, returns2 : pd.Series
        Return series for two assets
    weight1, weight2 : float
        Portfolio weights (should sum to 1)
    
    Returns:
    --------
    dict : Portfolio statistics
    """
    # Individual statistics
    std1 = returns1.std()
    std2 = returns2.std()
    corr = returns1.corr(returns2)
    
    # Portfolio variance
    portfolio_var = (weight1**2 * std1**2 + 
                     weight2**2 * std2**2 + 
                     2 * weight1 * weight2 * corr * std1 * std2)
    
    # Portfolio standard deviation
    portfolio_std = np.sqrt(portfolio_var)
    
    # Weighted average of individual risks (no diversification)
    weighted_avg_risk = weight1 * std1 + weight2 * std2
    
    # Diversification benefit
    benefit = weighted_avg_risk - portfolio_std
    benefit_pct = (benefit / weighted_avg_risk) * 100
    
    return {
        'Stock 1 Risk': std1,
        'Stock 2 Risk': std2,
        'Correlation': corr,
        'Portfolio Risk': portfolio_std,
        'Weighted Avg Risk': weighted_avg_risk,
        'Diversification Benefit': benefit,
        'Benefit %': benefit_pct
    }


# Compare portfolios
print("Portfolio Risk Analysis (50-50 allocation):")
print("="*80)

# Portfolio 1: Two banking stocks (HIGH correlation)
print("\nüìä Portfolio 1: Maybank + CIMB (Same Sector)")
print("-" * 80)
port1 = calculate_portfolio_risk(all_returns['Maybank'], all_returns['CIMB'])
for key, value in port1.items():
    if isinstance(value, float):
        print(f"{key:.<40} {value:.6f}")

# Portfolio 2: Banking + Gloves (LOWER correlation)
print("\nüìä Portfolio 2: Maybank + Top Glove (Different Sectors)")
print("-" * 80)
port2 = calculate_portfolio_risk(all_returns['Maybank'], all_returns['Top Glove'])
for key, value in port2.items():
    if isinstance(value, float):
        print(f"{key:.<40} {value:.6f}")

# Portfolio 3: Banking + Telecom
print("\nüìä Portfolio 3: Maybank + Axiata (Different Sectors)")
print("-" * 80)
port3 = calculate_portfolio_risk(all_returns['Maybank'], all_returns['Axiata'])
for key, value in port3.items():
    if isinstance(value, float):
        print(f"{key:.<40} {value:.6f}")

print("\n" + "="*80)
print("üí° Key Insight:")
print("Lower correlation ‚Üí Higher diversification benefit ‚Üí Lower portfolio risk!")

In [None]:
# Visualize diversification benefit
portfolios = [
    {'Name': 'Maybank + CIMB\n(Same Sector)', 'Data': port1},
    {'Name': 'Maybank + Top Glove\n(Diff Sectors)', 'Data': port2},
    {'Name': 'Maybank + Axiata\n(Diff Sectors)', 'Data': port3}
]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left panel: Risk comparison
names = [p['Name'] for p in portfolios]
portfolio_risks = [p['Data']['Portfolio Risk'] for p in portfolios]
weighted_risks = [p['Data']['Weighted Avg Risk'] for p in portfolios]

x = np.arange(len(names))
width = 0.35

bars1 = axes[0].bar(x - width/2, weighted_risks, width, label='Weighted Avg Risk (No Diversification)', 
                     color='red', alpha=0.7)
bars2 = axes[0].bar(x + width/2, portfolio_risks, width, label='Portfolio Risk (With Diversification)', 
                     color='green', alpha=0.7)

axes[0].set_ylabel('Risk (Standard Deviation)', fontsize=11)
axes[0].set_title('Portfolio Risk: With vs Without Diversification', fontsize=12, fontweight='bold')
axes[0].set_xticks(x)
axes[0].set_xticklabels(names, fontsize=9)
axes[0].legend(loc='upper left', fontsize=9)
axes[0].grid(True, alpha=0.3, axis='y')

# Right panel: Diversification benefit
benefits = [p['Data']['Benefit %'] for p in portfolios]
correlations = [p['Data']['Correlation'] for p in portfolios]

colors = ['red' if b < 5 else 'orange' if b < 10 else 'green' for b in benefits]
bars = axes[1].bar(names, benefits, color=colors, alpha=0.7)

# Add correlation as text on bars
for i, (bar, corr) in enumerate(zip(bars, correlations)):
    height = bar.get_height()
    axes[1].text(bar.get_x() + bar.get_width()/2., height + 0.5,
                 f'œÅ = {corr:.3f}',
                 ha='center', va='bottom', fontsize=9, fontweight='bold')

axes[1].set_ylabel('Diversification Benefit (%)', fontsize=11)
axes[1].set_title('Risk Reduction Through Diversification', fontsize=12, fontweight='bold')
axes[1].set_xticklabels(names, fontsize=9)
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nüìä Chart Interpretation:")
print("="*70)
print("Left: Green bars lower than red bars = Diversification working!")
print("      The gap between bars shows risk reduction")
print("\nRight: Higher benefit % = Better diversification")
print("       Lower correlation (œÅ) ‚Üí Higher diversification benefit")

## 4. Time-Varying Correlation

**Important insight**: Correlation is **not constant** - it changes over time!

### Why Correlation Changes:

1. **Market regimes**: Bull markets vs bear markets
2. **Crisis periods**: Correlations tend to increase during crashes ("correlations go to 1 in a crisis")
3. **Sector-specific events**: Events affecting one sector more than others
4. **Economic cycles**: Different sectors perform differently across cycles

Let's calculate **rolling correlation** to see how it evolves!

---

In [None]:
# Calculate rolling correlation
window = 30  # 30-day rolling window

# Rolling correlation between Maybank and CIMB
rolling_corr_banking = all_returns['Maybank'].rolling(window).corr(all_returns['CIMB'])

# Rolling correlation between Maybank and Top Glove
rolling_corr_cross = all_returns['Maybank'].rolling(window).corr(all_returns['Top Glove'])

print(f"Rolling {window}-day Correlation Analysis:")
print("="*70)
print(f"\nMaybank ‚Üî CIMB (Same Sector):")
print(f"  Mean correlation: {rolling_corr_banking.mean():.4f}")
print(f"  Min correlation: {rolling_corr_banking.min():.4f}")
print(f"  Max correlation: {rolling_corr_banking.max():.4f}")
print(f"  Std deviation: {rolling_corr_banking.std():.4f}")

print(f"\nMaybank ‚Üî Top Glove (Different Sectors):")
print(f"  Mean correlation: {rolling_corr_cross.mean():.4f}")
print(f"  Min correlation: {rolling_corr_cross.min():.4f}")
print(f"  Max correlation: {rolling_corr_cross.max():.4f}")
print(f"  Std deviation: {rolling_corr_cross.std():.4f}")

In [None]:
# Visualize rolling correlation
fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Panel 1: Rolling correlations
axes[0].plot(rolling_corr_banking.index, rolling_corr_banking, 
             label='Maybank ‚Üî CIMB (Same Sector)', linewidth=2, color='blue')
axes[0].plot(rolling_corr_cross.index, rolling_corr_cross, 
             label='Maybank ‚Üî Top Glove (Different Sectors)', linewidth=2, color='orange')
axes[0].axhline(y=0, color='black', linestyle='--', linewidth=1, alpha=0.5)
axes[0].axhline(y=0.5, color='green', linestyle=':', linewidth=1, alpha=0.5, label='Moderate correlation')
axes[0].set_ylabel(f'{window}-Day Rolling Correlation', fontsize=11)
axes[0].set_title(f'Time-Varying Correlation ({window}-Day Window)', fontsize=12, fontweight='bold')
axes[0].legend(loc='best')
axes[0].grid(True, alpha=0.3)
axes[0].set_ylim(-1, 1)

# Panel 2: Normalized prices for context
normalized = (all_stocks / all_stocks.iloc[0]) * 100
axes[1].plot(normalized.index, normalized['Maybank'], label='Maybank', linewidth=1.5)
axes[1].plot(normalized.index, normalized['CIMB'], label='CIMB', linewidth=1.5)
axes[1].plot(normalized.index, normalized['Top Glove'], label='Top Glove', linewidth=1.5)
axes[1].set_ylabel('Normalized Price (Base = 100)', fontsize=11)
axes[1].set_xlabel('Date', fontsize=11)
axes[1].set_title('Price Movements', fontsize=12, fontweight='bold')
axes[1].legend(loc='best')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä Interpretation:")
print("="*70)
print("Top Panel: Shows how correlation changes over time")
print("  - Blue line (banking stocks) generally higher and more stable")
print("  - Orange line (cross-sector) more variable, sometimes negative")
print("\nBottom Panel: Price context - see when stocks move together vs apart")
print("\nüí° Key Insight: Correlation is dynamic - don't assume it's constant!")

## 5. Correlation Limitations and Pitfalls

### ‚ö†Ô∏è Important Warnings:

**1. Correlation ‚â† Causation**
- High correlation doesn't mean one stock causes the other to move
- Both might be driven by a third factor (e.g., interest rates)

**2. Correlation Measures Linear Relationships Only**
- Pearson correlation only captures linear relationships
- Non-linear relationships might exist even with r = 0

**3. Historical Correlation ‚â† Future Correlation**
- Past correlations can break down during market stress
- Diversification benefits can disappear when you need them most

**4. Correlation Assumes Normal Distribution**
- Real stock returns have fat tails (more extreme events)
- Correlation might understate extreme event relationships

**5. Spurious Correlation**
- With enough data, you can find correlations by chance
- Always ask: "Does this relationship make sense?"

### üéØ Best Practices:

1. **Use rolling correlation** to see time variation
2. **Combine with fundamental analysis** - understand WHY stocks correlate
3. **Consider multiple timeframes** (daily, weekly, monthly)
4. **Monitor correlation breakdown** during market stress
5. **Diversify across multiple dimensions** (sector, geography, asset class)

---

## 6. Exercises

Test your understanding!

---

### Exercise 1: Calculate Correlation Manually

**Task**: Given these daily returns for two stocks:

Stock A: [0.02, -0.01, 0.03, -0.02, 0.01]  
Stock B: [0.015, -0.008, 0.025, -0.015, 0.012]

Calculate the correlation coefficient **by hand** (you can use Python for arithmetic, but show each step):

1. Calculate mean of A and B
2. Calculate deviations from mean
3. Calculate covariance
4. Calculate standard deviations
5. Calculate correlation: r = Cov(A,B) / (œÉ_A √ó œÉ_B)

**Verify** your answer using pandas `.corr()` method.

---

In [None]:
# Your code here


### Exercise 2: Optimal Portfolio Allocation

**Task**: You have RM 10,000 to invest in a portfolio of Maybank and Top Glove.

Test different allocations:
- 100% Maybank, 0% Top Glove
- 75% Maybank, 25% Top Glove
- 50% Maybank, 50% Top Glove
- 25% Maybank, 75% Top Glove
- 0% Maybank, 100% Top Glove

For each allocation:
1. Calculate portfolio risk (standard deviation)
2. Calculate portfolio expected return (mean return)
3. Calculate Sharpe-like ratio: return / risk

**Question**: Which allocation has:
- Lowest risk?
- Highest return?
- Best return/risk ratio?

Create a visualization showing risk vs return for each allocation.

---

In [None]:
# Your code here


### Exercise 3: Sector Correlation Deep Dive

**Task**: Create a more comprehensive sector analysis.

1. Define 3 sectors with 2-3 stocks each:
   - Banking: Maybank, CIMB, (add one more if available)
   - Healthcare/Gloves: Top Glove, (add others if available)
   - Others: Axiata, Petronas Chem, etc.

2. Calculate:
   - Average within-sector correlation
   - Average cross-sector correlation

3. Create a heatmap grouped by sector

4. Answer: Is within-sector correlation higher than cross-sector? By how much?

---

In [None]:
# Your code here


### Exercise 4: Correlation Breakdown Detection

**Task**: Analyze correlation during market stress.

Using the full 2023 data:

1. Calculate 30-day rolling correlation between Maybank and Top Glove
2. Identify periods where correlation:
   - Was above 0.5 (high correlation)
   - Was below 0 (negative correlation)
   - Changed by more than 0.3 in a short period (correlation breakdown)

3. For each identified period, check:
   - What were the price movements?
   - What was the market volatility?
   - Can you explain why correlation changed?

4. Create a visualization highlighting these periods

**Bonus**: Research if there were any Malaysian market events during high correlation periods.

---

In [None]:
# Your code here


---

## üìö Summary

Excellent work! You now understand **correlation mathematics** and its application to portfolio management.

### Key Concepts:

1. **Correlation Coefficient (r)**
   - Formula: r = Cov(X,Y) / (œÉ_X √ó œÉ_Y)
   - Range: -1 (perfect negative) to +1 (perfect positive)
   - Measures strength and direction of linear relationship

2. **Interpreting Correlation**
   - r > 0.7: Strong positive (stocks move together)
   - r ‚âà 0: Weak/no relationship
   - r < -0.7: Strong negative (stocks move opposite)

3. **Sector Correlation**
   - Within-sector: Generally higher correlation
   - Cross-sector: Lower correlation, better diversification
   - Malaysian stocks in same sector (Maybank, CIMB) show high correlation

4. **Portfolio Diversification**
   - Portfolio risk depends on correlation
   - Lower correlation ‚Üí Better risk reduction
   - Formula: œÉ_p¬≤ = w‚ÇÅ¬≤œÉ‚ÇÅ¬≤ + w‚ÇÇ¬≤œÉ‚ÇÇ¬≤ + 2w‚ÇÅw‚ÇÇœÅœÉ‚ÇÅœÉ‚ÇÇ

5. **Time-Varying Correlation**
   - Correlation changes over time
   - Crisis periods ‚Üí correlations increase
   - Use rolling windows to monitor changes

### What You've Learned:

‚úÖ Calculate correlation coefficient from scratch  
‚úÖ Create and interpret correlation matrices  
‚úÖ Understand sector correlation patterns in KLSE  
‚úÖ Use correlation for portfolio risk reduction  
‚úÖ Analyze time-varying correlation  
‚úÖ Recognize correlation limitations  

### ‚ö†Ô∏è Critical Warnings:

1. **Correlation ‚â† Causation**: Don't assume one causes the other
2. **Historical ‚â† Future**: Past correlation can break down
3. **Crisis Behavior**: Correlations tend to 1 during crashes
4. **Linear Only**: Pearson correlation only captures linear relationships

### üéØ Practical Applications:

- **Portfolio Construction**: Choose stocks with low correlation
- **Sector Analysis**: Understand which sectors move together
- **Risk Management**: Monitor correlation breakdown
- **Pair Trading**: Find highly correlated pairs for trading strategies
- **Hedging**: Use negative correlation for protection

---

## üîú What's Next?

In **Module 09: Probability and Risk Management**, you'll learn:
- Win rate vs risk/reward ratio mathematics
- Expected value calculations for trading
- Position sizing formulas (fixed fraction, Kelly Criterion)
- Maximum drawdown calculations
- Risk of ruin mathematics

**Ready?** Move on to Module 09 when you can:
- ‚úÖ Calculate correlation coefficient manually
- ‚úÖ Interpret correlation matrices and heatmaps
- ‚úÖ Explain portfolio risk reduction through diversification
- ‚úÖ Complete all exercises without looking at solutions

---

### üìñ Additional Resources:

- [Correlation and Diversification](https://www.investopedia.com/terms/c/correlation.asp)
- [Modern Portfolio Theory](https://www.investopedia.com/terms/m/modernportfoliotheory.asp)
- [Portfolio Optimization](https://www.investopedia.com/terms/p/portfolio-optimization.asp)

---

**Fantastic work!** You now understand how to mathematically analyze stock relationships! üéâ
