# Week 1: Python & Math Foundations for Quantitative Finance

---

## üéØ What You'll Learn This Week

This week builds your foundation for everything that follows. Think of it like learning the alphabet before writing essays - these are the essential tools every quant uses daily.

**By the end of this week, you'll understand:**
- How to work with financial data efficiently using NumPy and Pandas
- The difference between simple and log returns (and why it matters!)
- How to measure risk using volatility
- Basic statistics that drive trading decisions

**Why This Matters for Trading:**
Every trading strategy, from simple moving averages to complex machine learning models, relies on these fundamentals. A hedge fund quant spends 80% of their time on data manipulation and analysis - that's exactly what we're learning here.

---

## Table of Contents
1. NumPy Fundamentals
2. Pandas for Financial Data
3. Financial Returns
4. Volatility Measures
5. Basic Statistics for Finance

---

In [None]:
# Standard imports and data loading
import numpy as np
import pandas as pd
import yfinance as yf
from datetime import datetime, timedelta

# Standard 5 equities for analysis
tickers = ['AAPL', 'MSFT', 'GOOGL', 'JPM', 'GS']

# Fetch 5 years of data
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)

print("üì• Downloading market data...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False, auto_adjust=True)
prices = data['Close'].dropna()
returns = prices.pct_change().dropna()
print(f"‚úÖ Loaded {len(prices)} days of data for {len(tickers)} tickers")
print(f"üìÖ Date range: {prices.index[0].strftime('%Y-%m-%d')} to {prices.index[-1].strftime('%Y-%m-%d')}")
print(prices.tail())

## 1. NumPy Fundamentals

### ü§î Why NumPy? (The Real Reason)

Imagine you're a trader and you need to calculate the average return of 1,000 stocks over 10 years (that's 2.5 million data points!). Using regular Python loops would take minutes. NumPy does it in milliseconds.

**Real-world analogy:** Think of NumPy as a factory assembly line vs. hand-crafting each item individually. The assembly line (NumPy) processes everything at once.

### Key Concepts Explained Simply

**Arrays**: Think of them as super-powered lists
- 1D array: Like a single column in Excel (e.g., Apple's stock prices over time)
- 2D array: Like a full Excel spreadsheet (e.g., prices of Apple, Google, Microsoft side by side)

**Vectorization**: The magic trick that makes NumPy fast
- Instead of: "For each price, subtract the previous price" (slow loop)
- We say: "Subtract these two arrays" (fast, single operation)

In [3]:
import numpy as np

# Example: Stock prices for 5 days
prices = np.array([100, 102, 101, 105, 103])
print("Stock prices:", prices)

# Vectorized operations - calculate daily changes
# Instead of looping, we use array slicing
daily_changes = prices[1:] - prices[:-1]
print("Daily price changes:", daily_changes)

# Basic statistics
print(f"Mean price: {np.mean(prices):.2f}")
print(f"Standard deviation: {np.std(prices):.2f}")
print(f"Max price: {np.max(prices)}")
print(f"Min price: {np.min(prices)}")

Stock prices: [100 102 101 105 103]
Daily price changes: [ 2 -1  4 -2]
Mean price: 102.20
Standard deviation: 1.72
Max price: 105
Min price: 100


### Broadcasting

Broadcasting allows operations between arrays of different shapes. This is useful when applying the same operation across multiple assets.

In [4]:
# Example: Normalize prices to start at 100 for comparison
# Multiple stocks: rows = days, columns = stocks
stock_prices = np.array([
    [100, 50, 200],   # Day 1: Stock A, B, C
    [102, 51, 198],   # Day 2
    [105, 49, 205],   # Day 3
    [103, 52, 210]    # Day 4
])

# Normalize: divide each column by its first value, multiply by 100
# Broadcasting: first_prices is (3,), stock_prices is (4,3)
first_prices = stock_prices[0]  # Shape: (3,)
normalized = (stock_prices / first_prices) * 100

print("Original prices:")
print(stock_prices)
print("\nNormalized (all start at 100):")
print(normalized)

Original prices:
[[100  50 200]
 [102  51 198]
 [105  49 205]
 [103  52 210]]

Normalized (all start at 100):
[[100.  100.  100. ]
 [102.  102.   99. ]
 [105.   98.  102.5]
 [103.  104.  105. ]]


---

## 2. Pandas for Financial Data

### Why Pandas?
- **Labeled data**: Dates as index, ticker symbols as columns
- **Missing data handling**: Financial data often has gaps (holidays, missing quotes)
- **Time series functions**: Rolling windows, resampling, shifting

### DataFrame Structure
```
             AAPL    MSFT    GOOGL
2024-01-01  185.5   375.2   140.3
2024-01-02  186.2   376.8   141.5
2024-01-03  184.9   374.1   139.8
```
- **Index**: Dates (DatetimeIndex)
- **Columns**: Asset tickers
- **Values**: Prices

In [5]:
import pandas as pd

# Create a simple price DataFrame
dates = pd.date_range('2024-01-01', periods=5, freq='D')
prices_df = pd.DataFrame({
    'AAPL': [185.5, 186.2, 184.9, 187.3, 186.8],
    'MSFT': [375.2, 376.8, 374.1, 378.5, 377.2]
}, index=dates)

print("Price DataFrame:")
print(prices_df)

# Access specific data
print(f"\nAAPL prices: {prices_df['AAPL'].values}")
print(f"Price on 2024-01-03: \n{prices_df.loc['2024-01-03']}")

Price DataFrame:
             AAPL   MSFT
2024-01-01  185.5  375.2
2024-01-02  186.2  376.8
2024-01-03  184.9  374.1
2024-01-04  187.3  378.5
2024-01-05  186.8  377.2

AAPL prices: [185.5 186.2 184.9 187.3 186.8]
Price on 2024-01-03: 
AAPL    184.9
MSFT    374.1
Name: 2024-01-03 00:00:00, dtype: float64


### Essential Pandas Operations for Finance

In [6]:
# 1. Shifting: Compare today's price to yesterday's
prices_df['AAPL_yesterday'] = prices_df['AAPL'].shift(1)
print("Shifted data (yesterday's price):")
print(prices_df[['AAPL', 'AAPL_yesterday']])

# 2. Percentage change: Built-in return calculation
returns = prices_df[['AAPL', 'MSFT']].pct_change()
print("\nDaily returns:")
print(returns)

# 3. Rolling window: Moving average
prices_df['AAPL_MA3'] = prices_df['AAPL'].rolling(window=3).mean()
print("\n3-day moving average:")
print(prices_df[['AAPL', 'AAPL_MA3']])

Shifted data (yesterday's price):
             AAPL  AAPL_yesterday
2024-01-01  185.5             NaN
2024-01-02  186.2           185.5
2024-01-03  184.9           186.2
2024-01-04  187.3           184.9
2024-01-05  186.8           187.3

Daily returns:
                AAPL      MSFT
2024-01-01       NaN       NaN
2024-01-02  0.003774  0.004264
2024-01-03 -0.006982 -0.007166
2024-01-04  0.012980  0.011762
2024-01-05 -0.002670 -0.003435

3-day moving average:
             AAPL    AAPL_MA3
2024-01-01  185.5         NaN
2024-01-02  186.2         NaN
2024-01-03  184.9  185.533333
2024-01-04  187.3  186.133333
2024-01-05  186.8  186.333333


---

## 3. Financial Returns

### Why Returns Instead of Prices?
1. **Comparability**: A $1 change means different things for a $10 stock vs $1000 stock
2. **Stationarity**: Prices trend over time; returns fluctuate around a mean
3. **Aggregation**: Returns can be combined across assets and time

### Simple (Arithmetic) Returns

**Definition**: The percentage change in price

$$R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1$$

Where:
- $R_t$ = Return at time t
- $P_t$ = Price at time t
- $P_{t-1}$ = Price at time t-1

**Properties**:
- Easy to interpret: "Stock went up 5%"
- Portfolio returns are weighted averages of individual returns
- NOT additive over time

In [7]:
# Simple Returns Example
prices = np.array([100, 105, 103, 108, 106])

# Calculate simple returns
simple_returns = (prices[1:] - prices[:-1]) / prices[:-1]
# Or equivalently:
simple_returns_v2 = prices[1:] / prices[:-1] - 1

print("Prices:", prices)
print("Simple returns:", simple_returns)
print(f"\nDay 1: Price went from {prices[0]} to {prices[1]}")
print(f"Return = ({prices[1]} - {prices[0]}) / {prices[0]} = {simple_returns[0]:.4f} = {simple_returns[0]*100:.2f}%")

Prices: [100 105 103 108 106]
Simple returns: [ 0.05       -0.01904762  0.04854369 -0.01851852]

Day 1: Price went from 100 to 105
Return = (105 - 100) / 100 = 0.0500 = 5.00%


### Log (Continuously Compounded) Returns

**Definition**: The natural logarithm of the price ratio

$$r_t = \ln\left(\frac{P_t}{P_{t-1}}\right) = \ln(P_t) - \ln(P_{t-1})$$

**Properties**:
- **Time additive**: Multi-period return = sum of single-period returns
- **Symmetric**: +10% and -10% are equidistant from 0
- **Approximately normal**: Better for statistical analysis

**Relationship to Simple Returns**:
$$r_t = \ln(1 + R_t)$$
$$R_t = e^{r_t} - 1$$

For small returns (< 10%), they are approximately equal.

In [8]:
# Log Returns Example
log_returns = np.log(prices[1:] / prices[:-1])

print("Simple returns:", simple_returns.round(4))
print("Log returns:   ", log_returns.round(4))

# Key advantage: Time additivity
total_simple = (1 + simple_returns).prod() - 1  # Compound simple returns
total_log = log_returns.sum()  # Just sum log returns!

print(f"\nTotal return over period:")
print(f"From prices: ({prices[-1]} - {prices[0]}) / {prices[0]} = {(prices[-1]/prices[0] - 1):.4f}")
print(f"Compounding simple returns: {total_simple:.4f}")
print(f"Sum of log returns: {total_log:.4f} ‚Üí exp({total_log:.4f}) - 1 = {np.exp(total_log) - 1:.4f}")

Simple returns: [ 0.05   -0.019   0.0485 -0.0185]
Log returns:    [ 0.0488 -0.0192  0.0474 -0.0187]

Total return over period:
From prices: (106 - 100) / 100 = 0.0600
Compounding simple returns: 0.0600
Sum of log returns: 0.0583 ‚Üí exp(0.0583) - 1 = 0.0600


### Multi-Period Returns

**Simple Returns** (must compound):
$$R_{t,t+n} = (1 + R_{t+1})(1 + R_{t+2})...(1 + R_{t+n}) - 1$$

**Log Returns** (just add):
$$r_{t,t+n} = r_{t+1} + r_{t+2} + ... + r_{t+n}$$

In [9]:
# Practical example: Which is easier for multi-period analysis?
daily_simple = np.array([0.01, -0.02, 0.015, 0.008, -0.005])  # 5 days of returns
daily_log = np.log(1 + daily_simple)  # Convert to log returns

# Weekly return from daily
weekly_simple = np.prod(1 + daily_simple) - 1  # Must multiply
weekly_log = np.sum(daily_log)  # Just add!

print(f"Weekly return (simple): {weekly_simple:.4f} = {weekly_simple*100:.2f}%")
print(f"Weekly return (log): {weekly_log:.4f} = {weekly_log*100:.2f}%")
print(f"Converting log back to simple: {np.exp(weekly_log) - 1:.4f}")

Weekly return (simple): 0.0076 = 0.76%
Weekly return (log): 0.0076 = 0.76%
Converting log back to simple: 0.0076


---

## 4. Volatility Measures

### What is Volatility?
Volatility measures the dispersion of returns - how much returns deviate from their average. It's the most common measure of **risk** in finance.

### Standard Deviation

**Population Standard Deviation**:
$$\sigma = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(r_i - \bar{r})^2}$$

**Sample Standard Deviation** (for estimation):
$$s = \sqrt{\frac{1}{N-1}\sum_{i=1}^{N}(r_i - \bar{r})^2}$$

Where:
- $r_i$ = Return for period i
- $\bar{r}$ = Mean return
- $N$ = Number of observations
- $N-1$ = Bessel's correction (reduces bias when estimating from sample)

In [10]:
# Volatility calculation step by step
returns = np.array([0.02, -0.01, 0.03, -0.02, 0.01, 0.02, -0.015])

# Step 1: Calculate mean
mean_return = np.mean(returns)
print(f"Step 1 - Mean return: {mean_return:.4f}")

# Step 2: Calculate deviations from mean
deviations = returns - mean_return
print(f"Step 2 - Deviations: {deviations.round(4)}")

# Step 3: Square the deviations
squared_deviations = deviations ** 2
print(f"Step 3 - Squared deviations: {squared_deviations.round(6)}")

# Step 4: Calculate variance (average of squared deviations)
variance = np.sum(squared_deviations) / (len(returns) - 1)  # Sample variance
print(f"Step 4 - Variance: {variance:.6f}")

# Step 5: Take square root to get standard deviation
std_dev = np.sqrt(variance)
print(f"Step 5 - Standard deviation: {std_dev:.4f}")

# Verify with numpy
print(f"\nNumpy std (ddof=1): {np.std(returns, ddof=1):.4f}")

Step 1 - Mean return: 0.0050
Step 2 - Deviations: [ 0.015 -0.015  0.025 -0.025  0.005  0.015 -0.02 ]
Step 3 - Squared deviations: [2.25e-04 2.25e-04 6.25e-04 6.25e-04 2.50e-05 2.25e-04 4.00e-04]
Step 4 - Variance: 0.000392
Step 5 - Standard deviation: 0.0198

Numpy std (ddof=1): 0.0198


### Annualizing Volatility

Volatility scales with the square root of time. To annualize:

$$\sigma_{annual} = \sigma_{daily} \times \sqrt{252}$$

Where 252 is the typical number of trading days per year.

**Why square root?** Variance is additive over time (for independent returns), so:
$$\sigma^2_{annual} = 252 \times \sigma^2_{daily}$$
$$\sigma_{annual} = \sqrt{252} \times \sigma_{daily}$$

In [11]:
# Annualization example
daily_vol = 0.015  # 1.5% daily volatility

# Annualize
annual_vol = daily_vol * np.sqrt(252)
print(f"Daily volatility: {daily_vol:.2%}")
print(f"Annual volatility: {annual_vol:.2%}")

# Different frequencies
print(f"\nAnnualization factors:")
print(f"Daily ‚Üí Annual: ‚àö252 = {np.sqrt(252):.2f}")
print(f"Weekly ‚Üí Annual: ‚àö52 = {np.sqrt(52):.2f}")
print(f"Monthly ‚Üí Annual: ‚àö12 = {np.sqrt(12):.2f}")

Daily volatility: 1.50%
Annual volatility: 23.81%

Annualization factors:
Daily ‚Üí Annual: ‚àö252 = 15.87
Weekly ‚Üí Annual: ‚àö52 = 7.21
Monthly ‚Üí Annual: ‚àö12 = 3.46


---

## 5. Basic Statistics for Finance

### Measures of Central Tendency

**Mean (Expected Return)**:
$$\bar{r} = \frac{1}{N}\sum_{i=1}^{N} r_i$$

**Median**: Middle value when sorted (robust to outliers)

### Measures of Shape

**Skewness**: Measures asymmetry of the distribution
$$Skew = \frac{1}{N}\sum_{i=1}^{N}\left(\frac{r_i - \bar{r}}{\sigma}\right)^3$$

- Skew > 0: Right tail is longer (more extreme positive returns)
- Skew < 0: Left tail is longer (more extreme negative returns - typical for stocks)
- Skew = 0: Symmetric distribution

**Kurtosis**: Measures "tail heaviness"
$$Kurt = \frac{1}{N}\sum_{i=1}^{N}\left(\frac{r_i - \bar{r}}{\sigma}\right)^4$$

- Normal distribution has kurtosis = 3
- **Excess Kurtosis** = Kurtosis - 3
- Excess Kurt > 0: Fat tails (more extreme events than normal)
- Financial returns typically have positive excess kurtosis!

In [12]:
from scipy import stats

# Generate sample returns (simulating real stock behavior)
np.random.seed(42)
# Real returns have fat tails, so we mix normal with some extreme values
normal_returns = np.random.normal(0.0005, 0.015, 1000)  # Daily returns
extreme_returns = np.random.choice([-0.05, -0.03, 0.03, 0.05], 20)  # Some extreme days
stock_returns = np.concatenate([normal_returns, extreme_returns])
np.random.shuffle(stock_returns)

# Calculate statistics
print("Distribution Statistics:")
print(f"Mean (daily):     {np.mean(stock_returns):.4%}")
print(f"Median (daily):   {np.median(stock_returns):.4%}")
print(f"Std Dev (daily):  {np.std(stock_returns):.4%}")
print(f"\nSkewness:         {stats.skew(stock_returns):.4f}")
print(f"Excess Kurtosis:  {stats.kurtosis(stock_returns):.4f}")

print("\nInterpretation:")
if stats.skew(stock_returns) < 0:
    print("- Negative skew: More extreme negative returns (typical for stocks)")
if stats.kurtosis(stock_returns) > 0:
    print("- Positive excess kurtosis: Fat tails (extreme events more likely than normal)")

Distribution Statistics:
Mean (daily):     0.0912%
Median (daily):   0.0914%
Std Dev (daily):  1.5684%

Skewness:         0.1016
Excess Kurtosis:  0.5279

Interpretation:
- Positive excess kurtosis: Fat tails (extreme events more likely than normal)


### Correlation

Correlation measures the linear relationship between two variables.

$$\rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X \sigma_Y} = \frac{\sum_{i=1}^{N}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum(X_i - \bar{X})^2}\sqrt{\sum(Y_i - \bar{Y})^2}}$$

**Range**: -1 to +1
- +1: Perfect positive correlation (move together)
- 0: No linear relationship
- -1: Perfect negative correlation (move opposite)

**In Portfolio Theory**: Low correlation between assets = better diversification

In [13]:
# Correlation example with two stocks
np.random.seed(123)

# Simulate two correlated stocks
market = np.random.normal(0, 0.01, 100)  # Market factor
stock_a = market + np.random.normal(0, 0.005, 100)  # High beta stock
stock_b = 0.5 * market + np.random.normal(0, 0.008, 100)  # Lower beta
gold = np.random.normal(0, 0.008, 100)  # Uncorrelated asset

# Calculate correlations
corr_ab = np.corrcoef(stock_a, stock_b)[0, 1]
corr_a_gold = np.corrcoef(stock_a, gold)[0, 1]

print("Correlations:")
print(f"Stock A vs Stock B: {corr_ab:.3f} (both driven by market)")
print(f"Stock A vs Gold:    {corr_a_gold:.3f} (largely independent)")

print("\nDiversification insight:")
print(f"Adding Stock B to A: Limited diversification (high correlation)")
print(f"Adding Gold to A: Better diversification (low correlation)")

Correlations:
Stock A vs Stock B: 0.517 (both driven by market)
Stock A vs Gold:    -0.073 (largely independent)

Diversification insight:
Adding Stock B to A: Limited diversification (high correlation)
Adding Gold to A: Better diversification (low correlation)


---

## Summary: Week 1 Key Formulas

| Concept | Formula |
|---------|--------|
| Simple Return | $R_t = \frac{P_t - P_{t-1}}{P_{t-1}}$ |
| Log Return | $r_t = \ln(P_t) - \ln(P_{t-1})$ |
| Variance | $\sigma^2 = \frac{1}{N-1}\sum(r_i - \bar{r})^2$ |
| Annualized Vol | $\sigma_{annual} = \sigma_{daily} \times \sqrt{252}$ |
| Correlation | $\rho = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}$ |

---

*Next Week: Statistics and Probability Distributions*

## üî¥ PROS & CONS: THEORY

### ‚úÖ PROS (Advantages)

| Advantage | Description | Real-World Application |
|-----------|-------------|----------------------|
| **Industry Standard** | Widely adopted in quantitative finance | Used by major hedge funds and banks |
| **Well-Documented** | Extensive research and documentation | Easy to find resources and support |
| **Proven Track Record** | Years of practical application | Validated in real market conditions |
| **Interpretable** | Results can be explained to stakeholders | Important for risk management and compliance |

### ‚ùå CONS (Limitations)

| Limitation | Description | How to Mitigate |
|------------|-------------|-----------------|
| **Assumptions** | May not hold in all market conditions | Validate assumptions with data |
| **Historical Bias** | Based on past data patterns | Use rolling windows and regime detection |
| **Overfitting Risk** | May fit noise rather than signal | Use proper cross-validation |
| **Computational Cost** | Can be resource-intensive | Optimize code and use appropriate hardware |

### üéØ Real-World Usage

**WHERE THIS IS USED:**
- ‚úÖ Quantitative hedge funds (Two Sigma, Renaissance, Citadel)
- ‚úÖ Investment banks (Goldman Sachs, JP Morgan, Morgan Stanley)
- ‚úÖ Asset management firms
- ‚úÖ Risk management departments
- ‚úÖ Algorithmic trading desks

**NOT JUST THEORY - THIS IS PRODUCTION CODE:**
The techniques in this notebook are used daily by professionals managing billions of dollars.