# Day 1: NumPy for Financial Arrays

## Week 1 - Python for Quantitative Finance

### üéØ Learning Objectives
- Master NumPy array creation and manipulation for financial data
- Understand vectorization and why it matters for performance
- Apply broadcasting for efficient calculations
- Implement core financial computations using NumPy

### ‚è±Ô∏è Time Allocation
- Theory review: 30 min
- Guided exercises: 90 min
- Practice problems: 60 min
- Interview prep: 30 min

---

**Author**: ML Quant Finance Mastery  
**Difficulty**: Foundation  
**Prerequisites**: Basic Python

## 1. Setup and Data Loading

In [12]:
import numpy as np
import pandas as pd
from pathlib import Path
import time

# Set random seed for reproducibility
np.random.seed(42)

# Load real market data
DATA_DIR = Path("../datasets/raw_data")
prices_df = pd.read_csv(DATA_DIR / "combined_adjusted_close.csv", index_col=0, parse_dates=True)

# Extract a few stocks for examples
tickers = ['AAPL', 'MSFT', 'GOOGL', 'JPM', 'GS']
prices = prices_df[tickers].dropna()

print(f"‚úÖ Data loaded: {prices.shape[0]} days, {len(tickers)} stocks")
print(f"üìÖ Date range: {prices.index[0].strftime('%Y-%m-%d')} to {prices.index[-1].strftime('%Y-%m-%d')}")
print(f"\nüìä Sample prices:")
prices.tail()

‚úÖ Data loaded: 1771 days, 5 stocks
üìÖ Date range: 2019-01-02 to 2026-01-16

üìä Sample prices:


Unnamed: 0_level_0,AAPL,MSFT,GOOGL,JPM,GS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2026-01-12,260.25,477.179993,331.859985,324.48999,949.549988
2026-01-13,261.049988,470.670013,335.970001,310.899994,938.150024
2026-01-14,259.959991,459.380005,335.839996,307.869995,932.669983
2026-01-15,258.209991,456.660004,332.779999,309.26001,975.859985
2026-01-16,255.529999,459.859985,330.0,312.470001,962.0


## 2. NumPy Array Fundamentals

### 2.1 Creating Arrays from Financial Data

NumPy arrays are the foundation of quantitative finance in Python. They offer:
- **Homogeneous data types** (all elements same type ‚Üí fast)
- **Contiguous memory** (cache-friendly ‚Üí fast)
- **Vectorized operations** (no Python loops ‚Üí fast)

In [13]:
# Convert DataFrame to NumPy array
price_array = prices.values

print(f"Array shape: {price_array.shape}")
print(f"Data type: {price_array.dtype}")
print(f"Memory size: {price_array.nbytes / 1024:.2f} KB")

# Verify the structure
print(f"\nRows = trading days ({price_array.shape[0]})")
print(f"Columns = stocks ({price_array.shape[1]})")

# Access patterns
print(f"\nüìä Single stock (AAPL) - all days: shape {price_array[:, 0].shape}")
print(f"üìä Single day (last) - all stocks: shape {price_array[-1, :].shape}")
print(f"üìä Last 5 days, first 3 stocks: shape {price_array[-5:, :3].shape}")

Array shape: (1771, 5)
Data type: float64
Memory size: 69.18 KB

Rows = trading days (1771)
Columns = stocks (5)

üìä Single stock (AAPL) - all days: shape (1771,)
üìä Single day (last) - all stocks: shape (5,)
üìä Last 5 days, first 3 stocks: shape (5, 3)


### 2.2 Vectorization: The Key to Performance

**Why does vectorization matter?**

In quant finance, you often need to:
- Calculate returns for 1000+ stocks √ó 5000+ days
- Run Monte Carlo with 100,000+ simulations
- Optimize portfolios in real-time

Python loops are too slow. NumPy vectorization runs optimized C code.

In [14]:
# Performance comparison: Loop vs Vectorized

# Task: Calculate simple returns for all stocks
n_iterations = 100

# METHOD 1: Python loops (slow)
def calculate_returns_loop(prices):
    n_days, n_stocks = prices.shape
    returns = np.zeros((n_days - 1, n_stocks))
    for i in range(1, n_days):
        for j in range(n_stocks):
            returns[i-1, j] = (prices[i, j] - prices[i-1, j]) / prices[i-1, j]
    return returns

# METHOD 2: NumPy vectorized (fast)
def calculate_returns_vectorized(prices):
    return (prices[1:] - prices[:-1]) / prices[:-1]

# Time both methods
start = time.perf_counter()
for _ in range(n_iterations):
    returns_loop = calculate_returns_loop(price_array)
loop_time = time.perf_counter() - start

start = time.perf_counter()
for _ in range(n_iterations):
    returns_vec = calculate_returns_vectorized(price_array)
vec_time = time.perf_counter() - start

print("‚è±Ô∏è PERFORMANCE COMPARISON")
print("=" * 50)
print(f"Loop method:       {loop_time:.4f} seconds")
print(f"Vectorized method: {vec_time:.4f} seconds")
print(f"Speedup:           {loop_time/vec_time:.1f}x faster!")

# Verify results are identical
print(f"\n‚úÖ Results match: {np.allclose(returns_loop, returns_vec)}")

‚è±Ô∏è PERFORMANCE COMPARISON
Loop method:       0.3240 seconds
Vectorized method: 0.0058 seconds
Speedup:           55.4x faster!

‚úÖ Results match: True


## 3. Core Financial Calculations with NumPy

### 3.1 Returns: Simple vs Log

In [15]:
# Simple returns: R_t = (P_t - P_{t-1}) / P_{t-1}
simple_returns = (price_array[1:] - price_array[:-1]) / price_array[:-1]

# Log returns: r_t = ln(P_t / P_{t-1})
log_returns = np.log(price_array[1:] / price_array[:-1])

print("üìä RETURNS COMPARISON")
print("=" * 60)
print(f"\nSimple Returns (first 5 days, AAPL):")
print(simple_returns[:5, 0].round(4))

print(f"\nLog Returns (first 5 days, AAPL):")
print(log_returns[:5, 0].round(4))

# Key difference: additivity
print("\n" + "=" * 60)
print("üìê KEY PROPERTY: Log returns are ADDITIVE")
print("=" * 60)

# Multi-period return calculation
n_days = 20  # Calculate 20-day return

# Simple returns: must compound (multiply)
simple_20d = np.prod(1 + simple_returns[:n_days, 0]) - 1

# Log returns: just add
log_20d = np.sum(log_returns[:n_days, 0])

print(f"\n20-day return (AAPL):")
print(f"  Simple (compounded): {simple_20d:.4f} ({simple_20d*100:.2f}%)")
print(f"  Log (summed):        {log_20d:.4f} ({log_20d*100:.2f}%)")
print(f"  Log ‚Üí Simple:        {np.exp(log_20d) - 1:.4f}")  # Convert back

üìä RETURNS COMPARISON

Simple Returns (first 5 days, AAPL):
[-0.0996  0.0427 -0.0022  0.0191  0.017 ]

Log Returns (first 5 days, AAPL):
[-0.1049  0.0418 -0.0022  0.0189  0.0168]

üìê KEY PROPERTY: Log returns are ADDITIVE

20-day return (AAPL):
  Simple (compounded): 0.0540 (5.40%)
  Log (summed):        0.0525 (5.25%)
  Log ‚Üí Simple:        0.0540


### 3.2 Volatility and Risk Metrics

In [16]:
# Calculate key risk metrics for each stock
TRADING_DAYS = 252
RISK_FREE_RATE = 0.05  # 5% annual

# Daily metrics
daily_mean = np.mean(simple_returns, axis=0)
daily_std = np.std(simple_returns, axis=0, ddof=1)  # ddof=1 for sample std

# Annualized metrics
annual_return = daily_mean * TRADING_DAYS
annual_vol = daily_std * np.sqrt(TRADING_DAYS)

# Sharpe Ratio
sharpe_ratio = (annual_return - RISK_FREE_RATE) / annual_vol

# Max Drawdown
def calculate_max_drawdown(prices):
    """Calculate maximum drawdown for each column."""
    cummax = np.maximum.accumulate(prices, axis=0)
    drawdown = (prices - cummax) / cummax
    return np.min(drawdown, axis=0)

max_dd = calculate_max_drawdown(price_array)

# Display results
print("üìä RISK METRICS SUMMARY")
print("=" * 70)
print(f"\n{'Metric':<20} " + " ".join(f"{t:>10}" for t in tickers))
print("-" * 70)
print(f"{'Ann. Return':<20} " + " ".join(f"{r*100:>9.2f}%" for r in annual_return))
print(f"{'Ann. Volatility':<20} " + " ".join(f"{v*100:>9.2f}%" for v in annual_vol))
print(f"{'Sharpe Ratio':<20} " + " ".join(f"{s:>10.2f}" for s in sharpe_ratio))
print(f"{'Max Drawdown':<20} " + " ".join(f"{d*100:>9.2f}%" for d in max_dd))

üìä RISK METRICS SUMMARY

Metric                     AAPL       MSFT      GOOGL        JPM         GS
----------------------------------------------------------------------
Ann. Return              32.12%     26.53%     31.12%     23.64%     31.84%
Ann. Volatility          31.00%     28.33%     31.30%     29.89%     31.67%
Sharpe Ratio               0.87       0.76       0.83       0.62       0.85
Max Drawdown            -33.36%    -37.15%    -44.32%    -43.63%    -45.62%


## 4. Broadcasting: Efficient Cross-Sectional Calculations

Broadcasting allows operations between arrays of different shapes. This is essential for:
- Demeaning returns (subtract mean from each stock)
- Standardizing data (z-scores)
- Portfolio calculations

In [17]:
# Broadcasting example: Z-score normalization

# Step 1: Calculate mean and std for each stock (across time)
means = np.mean(simple_returns, axis=0)  # Shape: (5,)
stds = np.std(simple_returns, axis=0)    # Shape: (5,)

print(f"Returns shape:    {simple_returns.shape}")  # (1770, 5)
print(f"Means shape:      {means.shape}")           # (5,)
print(f"Stds shape:       {stds.shape}")            # (5,)

# Step 2: Broadcasting automatically aligns dimensions
# (1770, 5) - (5,) ‚Üí broadcasts to (1770, 5) - (1770, 5)
z_scores = (simple_returns - means) / stds

print(f"Z-scores shape:   {z_scores.shape}")

# Verify z-scores have mean ‚âà 0 and std ‚âà 1
print(f"\n‚úÖ Z-score verification:")
print(f"   Means: {np.mean(z_scores, axis=0).round(10)}")  # Should be ~0
print(f"   Stds:  {np.std(z_scores, axis=0).round(4)}")    # Should be ~1

Returns shape:    (1770, 5)
Means shape:      (5,)
Stds shape:       (5,)
Z-scores shape:   (1770, 5)

‚úÖ Z-score verification:
   Means: [ 0.  0. -0.  0.  0.]
   Stds:  [1. 1. 1. 1. 1.]


## 5. Correlation and Covariance Matrices

These matrices are fundamental to portfolio theory and risk management.

In [18]:
# Correlation matrix
corr_matrix = np.corrcoef(simple_returns.T)  # Transpose: stocks as rows

# Covariance matrix (annualized)
cov_matrix = np.cov(simple_returns.T) * TRADING_DAYS

print("üìä CORRELATION MATRIX")
print("=" * 60)
print(f"\n{'':>10}" + "".join(f"{t:>10}" for t in tickers))
for i, ticker in enumerate(tickers):
    print(f"{ticker:>10}" + "".join(f"{corr_matrix[i,j]:>10.3f}" for j in range(len(tickers))))

print("\n\nüìä ANNUALIZED COVARIANCE MATRIX")
print("=" * 60)
print(f"\n{'':>10}" + "".join(f"{t:>10}" for t in tickers))
for i, ticker in enumerate(tickers):
    print(f"{ticker:>10}" + "".join(f"{cov_matrix[i,j]:>10.4f}" for j in range(len(tickers))))

# Key insight: diagonal = variance, off-diagonal = covariance
print("\nüìê Key insight:")
print(f"   Diagonal elements = Variance (volatility¬≤)")
print(f"   AAPL variance: {cov_matrix[0,0]:.4f}, volatility: {np.sqrt(cov_matrix[0,0]):.4f}")

üìä CORRELATION MATRIX

                AAPL      MSFT     GOOGL       JPM        GS
      AAPL     1.000     0.699     0.616     0.421     0.479
      MSFT     0.699     1.000     0.693     0.428     0.475
     GOOGL     0.616     0.693     1.000     0.400     0.451
       JPM     0.421     0.428     0.400     1.000     0.821
        GS     0.479     0.475     0.451     0.821     1.000


üìä ANNUALIZED COVARIANCE MATRIX

                AAPL      MSFT     GOOGL       JPM        GS
      AAPL    0.0961    0.0614    0.0598    0.0390    0.0470
      MSFT    0.0614    0.0802    0.0614    0.0362    0.0426
     GOOGL    0.0598    0.0614    0.0980    0.0375    0.0447
       JPM    0.0390    0.0362    0.0375    0.0893    0.0777
        GS    0.0470    0.0426    0.0447    0.0777    0.1003

üìê Key insight:
   Diagonal elements = Variance (volatility¬≤)
   AAPL variance: 0.0961, volatility: 0.3100


## 6. Portfolio Calculations

### 6.1 Portfolio Return and Risk

For a portfolio with weights $w$, returns $r$, and covariance matrix $\Sigma$:

$$R_p = w^T r \quad \text{(Portfolio Return)}$$
$$\sigma_p^2 = w^T \Sigma w \quad \text{(Portfolio Variance)}$$

In [19]:
# Define portfolio weights (equal-weighted)
weights = np.array([0.2, 0.2, 0.2, 0.2, 0.2])

# Portfolio return
portfolio_return = np.dot(weights, annual_return)

# Portfolio variance using matrix multiplication
portfolio_variance = np.dot(weights.T, np.dot(cov_matrix, weights))
portfolio_volatility = np.sqrt(portfolio_variance)

# Portfolio Sharpe ratio
portfolio_sharpe = (portfolio_return - RISK_FREE_RATE) / portfolio_volatility

print("üìä EQUAL-WEIGHT PORTFOLIO METRICS")
print("=" * 50)
print(f"\nWeights: {dict(zip(tickers, weights))}")
print(f"\nExpected Annual Return: {portfolio_return*100:.2f}%")
print(f"Portfolio Volatility:   {portfolio_volatility*100:.2f}%")
print(f"Portfolio Sharpe Ratio: {portfolio_sharpe:.2f}")

# Compare to individual stocks
print(f"\nüìä DIVERSIFICATION BENEFIT")
print("-" * 50)
avg_individual_vol = np.mean(annual_vol)
print(f"Average individual volatility: {avg_individual_vol*100:.2f}%")
print(f"Portfolio volatility:          {portfolio_volatility*100:.2f}%")
print(f"Risk reduction:                {(1 - portfolio_volatility/avg_individual_vol)*100:.1f}%")

üìä EQUAL-WEIGHT PORTFOLIO METRICS

Weights: {'AAPL': np.float64(0.2), 'MSFT': np.float64(0.2), 'GOOGL': np.float64(0.2), 'JPM': np.float64(0.2), 'GS': np.float64(0.2)}

Expected Annual Return: 29.05%
Portfolio Volatility:   24.32%
Portfolio Sharpe Ratio: 0.99

üìä DIVERSIFICATION BENEFIT
--------------------------------------------------
Average individual volatility: 30.44%
Portfolio volatility:          24.32%
Risk reduction:                20.1%


## 7. Practice Problems

### Problem 1: Rolling Volatility
Calculate 20-day rolling volatility for AAPL using NumPy (no pandas rolling!).

In [20]:
# SOLUTION: Rolling volatility using stride tricks

def rolling_volatility(returns: np.ndarray, window: int) -> np.ndarray:
    """
    Calculate rolling volatility using NumPy stride tricks.
    
    This is faster than looping but more complex.
    In practice, pandas rolling is preferred for readability.
    """
    n = len(returns)
    
    # Method 1: Simple loop (baseline)
    # rolling_std = np.array([returns[i:i+window].std(ddof=1) 
    #                         for i in range(n - window + 1)])
    
    # Method 2: Stride tricks (advanced, faster)
    from numpy.lib.stride_tricks import sliding_window_view
    windows = sliding_window_view(returns, window)
    rolling_std = np.std(windows, axis=1, ddof=1)
    
    return rolling_std * np.sqrt(TRADING_DAYS)  # Annualize

# Calculate for AAPL
aapl_returns = simple_returns[:, 0]
rolling_vol = rolling_volatility(aapl_returns, window=20)

print(f"üìä Rolling 20-day Volatility (AAPL)")
print(f"   Shape: {rolling_vol.shape}")
print(f"   Min:   {rolling_vol.min()*100:.2f}%")
print(f"   Max:   {rolling_vol.max()*100:.2f}%")
print(f"   Mean:  {rolling_vol.mean()*100:.2f}%")
print(f"\n   Last 5 values: {rolling_vol[-5:].round(4)}")

üìä Rolling 20-day Volatility (AAPL)
   Shape: (1751,)
   Min:   9.46%
   Max:   107.95%
   Mean:  27.78%

   Last 5 values: [0.1131 0.1144 0.1057 0.1053 0.1055]


### Problem 2: Monte Carlo Simulation
Simulate 10,000 possible 1-year price paths for AAPL assuming geometric Brownian motion.

In [21]:
# Monte Carlo simulation using Geometric Brownian Motion
# dS = ŒºS dt + œÉS dW

# Parameters
S0 = price_array[-1, 0]  # Current AAPL price
mu = annual_return[0]     # Drift (expected return)
sigma = annual_vol[0]     # Volatility
T = 1.0                   # Time horizon (1 year)
n_steps = 252             # Daily steps
n_simulations = 10000

# Time step
dt = T / n_steps

# Generate random shocks (all at once for efficiency)
np.random.seed(42)
Z = np.random.standard_normal((n_simulations, n_steps))

# Simulate paths using vectorized operations
# S(t+dt) = S(t) * exp((Œº - œÉ¬≤/2)dt + œÉ‚àödt * Z)
drift = (mu - 0.5 * sigma**2) * dt
diffusion = sigma * np.sqrt(dt) * Z

# Cumulative sum of log returns
log_returns_sim = drift + diffusion
cum_log_returns = np.cumsum(log_returns_sim, axis=1)

# Convert to prices
price_paths = S0 * np.exp(cum_log_returns)

# Add initial price
price_paths = np.column_stack([np.full(n_simulations, S0), price_paths])

print(f"üìä MONTE CARLO SIMULATION RESULTS")
print(f"=" * 50)
print(f"Initial price: ${S0:.2f}")
print(f"Simulations:   {n_simulations:,}")
print(f"Time horizon:  {T} year ({n_steps} days)")
print(f"\nüìà Final Price Distribution:")
final_prices = price_paths[:, -1]
print(f"   Mean:   ${np.mean(final_prices):.2f}")
print(f"   Median: ${np.median(final_prices):.2f}")
print(f"   5th %:  ${np.percentile(final_prices, 5):.2f}")
print(f"   95th %: ${np.percentile(final_prices, 95):.2f}")
print(f"\nüìâ Value at Risk (95%):")
print(f"   VaR: ${S0 - np.percentile(final_prices, 5):.2f} ({(1 - np.percentile(final_prices, 5)/S0)*100:.1f}%)")

üìä MONTE CARLO SIMULATION RESULTS
Initial price: $255.53
Simulations:   10,000
Time horizon:  1.0 year (252 days)

üìà Final Price Distribution:
   Mean:   $351.17
   Median: $334.78
   5th %:  $200.14
   95th %: $555.24

üìâ Value at Risk (95%):
   VaR: $55.39 (21.7%)


## 8. Interview Practice Questions

### Question 1 (Jane Street style)
*You have daily returns for 100 stocks over 5 years. How would you efficiently compute the correlation between every pair of stocks?*

In [22]:
# SOLUTION to Interview Question 1

# Simulate the data
n_stocks = 100
n_days = 252 * 5  # 5 years
returns_large = np.random.randn(n_days, n_stocks) * 0.02  # Simulated returns

# Efficient correlation computation
start = time.perf_counter()
corr_full = np.corrcoef(returns_large.T)  # np.corrcoef expects features as rows
elapsed = time.perf_counter() - start

print(f"üìä INTERVIEW ANSWER")
print("=" * 50)
print(f"Input: {n_stocks} stocks √ó {n_days} days")
print(f"Output: {n_stocks}√ó{n_stocks} correlation matrix")
print(f"Unique pairs: {n_stocks * (n_stocks - 1) // 2:,}")
print(f"Computation time: {elapsed*1000:.2f} ms")

print(f"\nüí° Key insight: np.corrcoef() uses efficient linear algebra")
print(f"   Under the hood: standardize ‚Üí matrix multiply ‚Üí efficient BLAS")
print(f"\n   corr = np.corrcoef(returns.T)  # That's it!")

üìä INTERVIEW ANSWER
Input: 100 stocks √ó 1260 days
Output: 100√ó100 correlation matrix
Unique pairs: 4,950
Computation time: 4.93 ms

üí° Key insight: np.corrcoef() uses efficient linear algebra
   Under the hood: standardize ‚Üí matrix multiply ‚Üí efficient BLAS

   corr = np.corrcoef(returns.T)  # That's it!


## 9. Summary & Key Takeaways

### ‚úÖ What You Learned Today

1. **NumPy arrays** are the foundation for efficient financial calculations
2. **Vectorization** provides 100x+ speedup over Python loops
3. **Broadcasting** enables elegant cross-sectional calculations
4. **Returns**: Simple for portfolios, Log for time series
5. **Risk metrics**: Volatility, Sharpe, Max Drawdown
6. **Portfolio math**: $R_p = w^T r$, $\sigma_p^2 = w^T \Sigma w$

### üéØ Interview Tips

- Always use vectorized operations
- Know the difference between simple and log returns
- Understand correlation vs covariance
- Be comfortable with matrix notation for portfolio calculations

### üìö Tomorrow's Preview

**Day 2: Pandas TimeSeries & Point-in-Time Data**
- DatetimeIndex mastery
- Resampling and alignment
- Look-ahead bias prevention