In [None]:
# === Environment Setup ===
import os, sys, math, time, random, json, textwrap, warnings
import numpy as np, pandas as pd, matplotlib.pyplot as plt
import statsmodels.api as sm
!pip install PyPortfolioOpt -q
try:
    import yfinance as yf
    import pandas_datareader.data as web
    YFINANCE_AVAILABLE = True
except ImportError: YFINANCE_AVAILABLE = False
try:
    from pypfopt import expected_returns, risk_models, EfficientFrontier, plotting, black_litterman
    PYPFOPT_AVAILABLE = True
except ImportError: PYPFOPT_AVAILABLE = False
from IPython.display import display, Markdown, Image

# --- Configuration ---
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({'font.size': 14, 'figure.figsize': (12, 8), 'figure.dpi': 150})
np.set_printoptions(suppress=True, linewidth=120, precision=4)

# --- Utility Functions ---
def note(msg): display(Markdown(f"<div class='alert alert-info'>📝 {textwrap.fill(msg, width=100)}</div>"))
def sec(title): print(f"\n{80*'='}\n| {title.upper()} |\n{80*'='}")

note("Environment initialized for Modern Portfolio Theory.")

# Chapter 9.2: Modern Portfolio Theory and the CAPM

---

### Table of Contents

1.  [**Mean-Variance Optimization: The Mathematics of Diversification**](#mvo)
2.  [**The Capital Asset Pricing Model (CAPM)**](#capm)
    - [From the Efficient Frontier to the Capital Market Line](#cml)
    - [The Security Market Line (SML)](#sml)
3.  [**Empirical Asset Pricing: The Fama-French Factor Zoo**](#fama-french)
4.  [**Modern Portfolio Construction: Black-Litterman**](#black-litterman)
5.  [**Unified Case Study**](#case-study)
    - [Data Preparation](#data)
    - [Part 1: Mean-Variance Optimization in Practice](#part1)
    - [Part 2: Estimating Betas and the SML](#part2)
    - [Part 3: Testing the CAPM with Fama-French Factors](#part3)
    - [Part 4: Applying the Black-Litterman Model](#part4)
6.  [**Exercises**](#exercises)
7.  [**Summary and Key Takeaways**](#summary)


### Intellectual Provenance: Black-Litterman

A major practical challenge of MPT is its **instability**. The optimal weights are extremely sensitive to small changes in the estimates of expected returns, which are notoriously difficult to predict. This often leads to extreme, non-intuitive portfolio allocations.

The **Black-Litterman model**, developed by Fischer Black and Robert Litterman at Goldman Sachs in the early 1990s, provides an elegant Bayesian solution. Instead of relying purely on historical data, it uses market equilibrium as a starting point. It calculates the set of expected returns that would make the current market capitalization weights optimal (the **implied equilibrium returns**, or the **prior**). The investor can then specify their own subjective **views** on the expected returns of certain assets. Bayes' theorem is used to combine the market's prior with the investor's views to create a new, blended set of **posterior returns**. This approach leads to much more stable, diversified, and intuitive portfolio weights.


### First Principles: Factor Models

Empirically, the single-factor CAPM performs poorly; market beta alone does not fully explain the observed differences in stock returns across firms. This led to the development of multi-factor models, which propose that an asset's expected return is driven by its exposure to several systematic risk factors.

The most famous is the **Fama-French Three-Factor Model**. It augments the CAPM's market factor with two additional factors intended to capture other sources of systematic risk:

1.  **Size (SMB: Small Minus Big):** This factor is the return of a portfolio of small-cap stocks minus the return of a portfolio of large-cap stocks. It is designed to capture the 'size effect,' the empirical observation that smaller companies have historically earned higher average returns.
2.  **Value (HML: High Minus Low):** This factor is the return of a portfolio of high book-to-market stocks ('value' stocks) minus the return of a portfolio of low book-to-market stocks ('growth' stocks). It captures the 'value effect,' the observation that value stocks have historically outperformed growth stocks.

The model is expressed as a regression:
$$ E[R_i] - R_f = \alpha_i + \beta_{i,MKT} (E[R_m] - R_f) + \beta_{i,SMB} E[SMB] + \beta_{i,HML} E[HML] $$

The intercept, \(\alpha_i\), represents the abnormal return ('alpha') of the asset after accounting for its exposure to these common risk factors. In an efficient market, alpha should be zero for all assets.


### First Principles: The Capital Asset Pricing Model (CAPM)

The CAPM, developed independently by William Sharpe, John Lintner, and Jan Mossin, is a general equilibrium model that builds on MPT by introducing two key assumptions:

1.  All investors are rational mean-variance optimizers with homogeneous expectations.
2.  There exists a **risk-free asset** that all investors can borrow from and lend to at the same rate (\(R_f\)).

The introduction of a risk-free asset fundamentally changes the investment decision. Instead of choosing from a curved efficient frontier of risky assets, all investors will now choose to hold a combination of the risk-free asset and a single, unique portfolio of risky assets known as the **tangency portfolio**. In equilibrium, since all investors hold this same risky portfolio, it must, by definition, be the **market portfolio** itself.


### First Principles: Mean-Variance Optimization

Mean-Variance Optimization (MVO) is the mathematical process of finding the optimal portfolio. An investor's goal is to choose a set of portfolio weights (the vector \(\mathbf{w}\)) that achieves the best possible trade-off between maximizing the portfolio's expected return and minimizing its risk (variance).

- **Portfolio Expected Return:** The expected return of the portfolio (\(\mu_p\)) is the weighted average of the expected returns of the individual assets (\(\mathbf{\mu}\)):
  $$ \mu_p = \mathbf{w}^T \mathbf{\mu} $$
- **Portfolio Variance:** The variance of the portfolio (\(\sigma_p^2\)) depends not only on the individual asset variances but also on their covariances, captured in the covariance matrix (\(\Sigma\)):
  $$ \sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w} $$

The **efficient frontier** is the set of all portfolios that offer the highest possible expected return for a given level of variance. Any portfolio not on the frontier is 'inefficient' because another portfolio exists that offers either a higher return for the same risk, or lower risk for the same return.


<a id='intro'></a>
## Introduction: The Markowitz Revolution and Beyond
This chapter explores **Modern Portfolio Theory (MPT)** and its evolution into empirical asset pricing. The foundational insight of Harry Markowitz (1952) was that the risk of a portfolio depends crucially on the **covariance** between its assets. This principle of **diversification** is the bedrock of modern asset allocation.

We will develop the theory from its mathematical foundations to its modern applications, culminating in a unified case study.


### Intellectual Provenance: The Markowitz Revolution

Modern Portfolio Theory (MPT) was introduced by Harry Markowitz in his 1952 paper 'Portfolio Selection.' Before Markowitz, investment decisions were often based on the merits of individual securities, with little consideration for how they interact. Markowitz's foundational insight was that the risk of a portfolio is not just the average risk of its components; it depends crucially on the **covariance** between them. By combining assets that do not move perfectly together, an investor can reduce the portfolio's overall risk without sacrificing expected return. This principle of **diversification**, famously dubbed 'the only free lunch in finance,' revolutionized asset allocation and laid the mathematical groundwork for nearly all modern financial theory.


<a id='mvo'></a>
## 1. Mean-Variance Optimization: The Mathematics of Diversification
An investor's goal is to choose a vector of portfolio weights $\mathbf{w}$ to achieve the best possible risk-return trade-off. The portfolio's expected return is $\mu_p = \mathbf{w}^T \mathbf{\mu}$ and its variance is $\sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w}$.

The **efficient frontier** is the set of portfolios with the highest possible expected return for a given level of variance. All portfolios on the frontier are combinations of two fundamental portfolios. A particularly important point is the **Global Minimum Variance (GMV)** portfolio:
$$ \mathbf{w}_{GMV} = \frac{\Sigma^{-1} \mathbf{1}}{\mathbf{1}^T \Sigma^{-1} \mathbf{1}} $$


<a id='capm'></a>
## 2. The Capital Asset Pricing Model (CAPM)

<a id='cml'></a>
### 2.1 From the Efficient Frontier to the Capital Market Line

The CAPM is a general equilibrium model that arises from combining MPT with two key assumptions: 1) all investors are rational mean-variance optimizers, and 2) there exists a **risk-free asset**. The introduction of a risk-free asset transforms the curved efficient frontier into a straight line known as the **Capital Market Line (CML)**. All rational investors will hold a combination of the risk-free asset and a single, unique portfolio of risky assets called the **tangency portfolio**. In equilibrium, since all investors hold the same risky portfolio, it must be the **market portfolio** itself.

The CML describes the expected return for any efficient portfolio (a mix of the risk-free asset and the market portfolio) as a function of its **total risk** (standard deviation, $\sigma_p$):
$$ E[R_p] = R_f + \left( \frac{E[R_m] - R_f}{\sigma_m} \right) \sigma_p $$


<a id='sml'></a>
### 2.2 The Security Market Line (SML)
The CML only applies to efficient portfolios. The **Security Market Line (SML)** is a more general result that applies to *any* individual asset or portfolio. It states that an asset's expected excess return is determined not by its total risk, but only by its non-diversifiable, **systematic risk**, as measured by its **beta** ($\eta_i$):
$$ E[R_i] - R_f = \beta_i (E[R_m] - R_f) \quad \text{where} \quad \beta_i = \frac{Cov(R_i, R_m)}{Var(R_m)} $$ 
This is the most famous equation in finance. It implies that investors are not compensated for bearing idiosyncratic risk, as it can be diversified away.

![CML vs SML Distinction](../images/09-Finance/cml_sml_distinction.png)


<a id='fama-french'></a>
## 3. Empirical Asset Pricing: The Fama-French Factor Zoo

Empirically, the CAPM performs poorly. Other characteristics besides market beta seem to explain the cross-section of stock returns. This led to multi-factor models, most famously the **Fama-French Three-Factor Model**, which augmented the CAPM with factors for **Size (SMB: Small Minus Big)** and **Value (HML: High Minus Low)**.
$$ E[R_i] - R_f = \alpha_i + \beta_{i,MKT} (E[R_m] - R_f) + \beta_{i,SMB} E[SMB] + \beta_{i,HML} E[HML] $$
The intercept, $\alpha_i$, represents the abnormal return of the asset after accounting for its exposure to the common risk factors. In an efficient market, alpha should be zero.


<a id='black-litterman'></a>
## 4. Modern Portfolio Construction: Black-Litterman
A major practical challenge of MPT is **instability**: the optimal weights are extremely sensitive to the estimates of expected returns. The **Black-Litterman model** provides an elegant Bayesian solution. It starts with the returns implied by market equilibrium (the **prior**) and allows the investor to specify their own subjective **views**. Bayes' theorem is used to combine the prior and views into a new, blended set of expected returns that is more robust.


In [None]:
<a id='case-study'></a>
<a id='data'></a>
sec("Unified Case Study: Data Preparation")

if YFINANCE_AVAILABLE and PYPFOPT_AVAILABLE:
    tickers = ['AAPL', 'MSFT', 'AMZN', 'JPM', 'XOM']
    market_proxy = 'SPY'
    start_date, end_date = '2015-01-01', '2022-12-31'
    
    # Download stock prices
    prices = yf.download(tickers + [market_proxy], start=start_date, end=end_date)['Adj Close']
    
    # Download Fama-French 5 factors
    note("Attempting to download Fama-French 5-factor data.")
    try:
        ff_factors_raw = web.DataReader('F-F_Research_Data_5_Factors_2x3_daily', 'famafrench', start=start_date, end=end_date)
        ff_factors = ff_factors_raw[0] / 100
        ff_factors.index = ff_factors.index.to_timestamp()
        note("Fama-French data downloaded successfully.")
    except Exception as e:
        note(f"Could not download Fama-French data ({e}). Falling back to local CSV.")
        ff_factors = pd.read_csv('data/fama_french_5_factors.csv', index_col='Date', parse_dates=True)
        ff_factors = ff_factors / 100 # Convert to decimals
    
    # Calculate daily returns
    returns = prices.pct_change().dropna()
    
    # Align data
    df = pd.concat([returns, ff_factors], axis=1).dropna()
    asset_returns = df[tickers]
    market_returns = df[market_proxy]
    risk_free_rate = df['RF'].mean() * 252 # Annualized risk-free rate
    
    note(f"Data prepared for {len(tickers)} assets from {start_date} to {end_date}.")
else:
    note("Skipping case study because yfinance and/or PyPortfolioOpt are not installed.")

In [None]:
<a id='part1'></a>
sec("Part 1: Mean-Variance Optimization")
if YFINANCE_AVAILABLE and PYPFOPT_AVAILABLE:
    # 1. Estimate expected returns and covariance
    mu = expected_returns.mean_historical_return(prices[tickers])
    S = risk_models.sample_cov(prices[tickers])
    
    # 2. Find the efficient frontier
    ef = EfficientFrontier(mu, S)
    plotting.plot_efficient_frontier(ef)
    plt.title('Efficient Frontier for Selected Stocks')
    plt.show()
    
    # 3. Find the tangency portfolio (maximum Sharpe ratio)
    ef_tangent = EfficientFrontier(mu, S)
    weights_tan = ef_tangent.max_sharpe(risk_free_rate=risk_free_rate)
    note("Tangency Portfolio (Max Sharpe Ratio) Weights:")
    display(pd.Series(weights_tan).to_frame('Weight').T)
else:
    note("Skipping MVO analysis.")

In [None]:
<a id='part2'></a>
sec("Part 2: Estimating Betas and the Security Market Line")
if YFINANCE_AVAILABLE and PYPFOPT_AVAILABLE:
    # Calculate excess returns
    excess_asset_returns = asset_returns.subtract(df['RF'], axis=0)
    excess_market_returns = market_returns - df['RF']
    
    # Estimate betas via regression
    betas = {}
    for ticker in tickers:
        model = sm.OLS(excess_asset_returns[ticker], sm.add_constant(excess_market_returns)).fit()
        betas[ticker] = model.params[market_proxy]
    betas = pd.Series(betas)
    
    # Plot SML
    avg_excess_returns = excess_asset_returns.mean() * 252
    plt.figure(figsize=(12, 7))
    plt.scatter(betas, avg_excess_returns, s=100)
    for i, txt in enumerate(betas.index):
        plt.annotate(txt, (betas[i], avg_excess_returns[i]), xytext=(5,5), textcoords='offset points')
    
    sml_x = np.linspace(0, 2, 100)
    sml_y = sml_x * (excess_market_returns.mean() * 252)
    plt.plot(sml_x, sml_y, 'r--', label='Security Market Line (SML)')
    plt.title('Security Market Line'); plt.xlabel('Beta (β)'); plt.ylabel('Annualized Expected Excess Return')
    plt.legend(); plt.show()
    note("Assets above the SML are considered 'undervalued' by the CAPM, while those below are 'overvalued'.")
else:
    note("Skipping SML analysis.")

In [None]:
<a id='part3'></a>
sec("Part 3: Testing the CAPM with Fama-French Factors")
if YFINANCE_AVAILABLE and PYPFOPT_AVAILABLE:
    results = []
    for ticker in tickers:
        model = sm.OLS(excess_asset_returns[ticker], sm.add_constant(df[['Mkt-RF', 'SMB', 'HML']])).fit()
        results.append({
            'Ticker': ticker,
            'Alpha': model.params['const'] * 252, # Annualized Alpha
            'Alpha_p_value': model.pvalues['const'],
            'Beta_Mkt': model.params['Mkt-RF'],
            'Beta_SMB': model.params['SMB'],
            'Beta_HML': model.params['HML'],
            'R-squared': model.rsquared
        })
    
    results_df = pd.DataFrame(results).set_index('Ticker')
    note("Fama-French 3-Factor Model Results for All Assets:")
    display(results_df.style.format({'Alpha': '{:.4f}', 'Alpha_p_value': '{:.3f}', 'R-squared': '{:.3f}'}))
    note("The alpha (const) represents the excess return not explained by the model. For most assets, it is small and/or statistically insignificant (p > 0.05), suggesting the model does a good job of explaining their returns.")
else:
    note("Skipping Fama-French analysis.")

In [None]:
<a id='part4'></a>
sec("Part 4: Applying the Black-Litterman Model")
if YFINANCE_AVAILABLE and PYPFOPT_AVAILABLE:
    # 1. Get the market-implied prior returns (the market's 'opinion')
    # This step deduces the expected returns that would make the current market portfolio optimal.
    market_caps = yf.Tickers(tickers).tickers
    market_caps_df = pd.DataFrame({ticker: [info.info['marketCap']] for ticker, info in market_caps.items()}).T.rename(columns={0:'Market Cap'})
    mcaps = market_caps_df['Market Cap']
    
    delta = black_litterman.market_implied_risk_aversion(market_returns)
    prior_returns = black_litterman.market_implied_prior_returns(mcaps, delta, S)
    
    # 2. Specify subjective views (the investor's 'opinion')
    # View 1: Amazon (AMZN) will have an absolute return of 20%.
    # View 2: Microsoft (MSFT) will outperform JPMorgan (JPM) by 5%.
    view_dict = {
        'AMZN': 0.20, 
    }
    picking_matrix = np.array([ # This matrix defines the relative view
        [0, 1, 0, -1, 0] # MSFT - JPM
    ])
    q = np.array([0.05]) # The expected outperformance
    

    # 3. Combine prior and views to get posterior returns
    # The model uses Bayes' theorem to find a weighted average of the market's opinion and the investor's opinion,
    # where the weights are determined by the confidence in each.
    bl = black_litterman.BlackLittermanModel(S, pi=prior_returns, 
                                             absolute_views=view_dict, 
                                             Q=q, P=picking_matrix)
    posterior_returns = bl.bl_returns()
    
    # 4. Re-optimize with the more robust posterior returns
    ef_bl = EfficientFrontier(posterior_returns, S)
    weights_bl = ef_bl.max_sharpe()
    
    note("Posterior Expected Returns (Black-Litterman):")
    display(posterior_returns.to_frame('Posterior Return').T)
    note("New Optimal Weights based on Posterior Returns:")
    display(pd.Series(weights_bl).to_frame('Weight').T)
else:
    note("Skipping Black-Litterman analysis.")

<a id='exercises'></a>
## 6. Exercises\n\n1.  **CAPM Assumptions:** The CAPM makes several strong assumptions. List three of these assumptions and explain why they are likely to be violated in the real world.
\n2.  **Factor Investing:** The Fama-French factors are often interpreted as capturing risk premia. What kind of underlying economic risks might the SMB (size) and HML (value) factors be proxies for?
\n3.  **Black-Litterman Confidence:** In the Black-Litterman model, how would the posterior returns change if the investor were much more confident in their view (i.e., the variance of their view, represented by the diagonal elements of $\Omega$, was much smaller)?
\n4.  **SML vs CML:** Explain in your own words why an individual stock that is correctly priced according to the CAPM will lie on the SML but not, in general, on the CML.


<a id='summary'></a>
## 7. Summary and Key Takeaways\n\nThis chapter provided a tour of Modern Portfolio Theory, from the foundational insights of Markowitz to the workhorse models of empirical asset pricing.\n\n**Key Concepts**:\n- **Mean-Variance Optimization**: The statistical framework for finding the optimal trade-off between risk and return. The key insight is that diversification benefits depend on the covariance between assets.\n- **Efficient Frontier**: The set of portfolios offering the highest return for a given level of risk.\n- **CAPM**: A general equilibrium model that provides a theoretical justification for why only systematic risk (beta) should be priced. It gives rise to the Capital Market Line (for efficient portfolios) and the Security Market Line (for all assets).\n- **Multi-Factor Models**: Empirical failures of the CAPM led to the development of multi-factor models, like the Fama-French models, which use additional factors (e.g., size, value) to better explain the cross-section of stock returns.\n- **Black-Litterman**: A Bayesian approach to portfolio construction that provides a more stable and intuitive alternative to classical MVO by starting with market equilibrium returns and blending them with investor views.


### Solutions to Exercises\n\n---\n\n**1. CAPM Assumptions:**\n   a. **Homogeneous Expectations:** The CAPM assumes all investors have the same beliefs about expected returns, variances, and covariances. In reality, investors have diverse opinions and information.\n   b. **No Transaction Costs or Taxes:** The model assumes a frictionless market. In reality, trading incurs costs, and taxes affect returns, both of which can alter optimal portfolios.\n   c. **Single-Period Horizon:** The model is static. It assumes all investors plan for the same single period. Real-world investment is a dynamic, multi-period problem.\n\n---\n\n**2. Factor Investing Risks:**\n- **SMB (Size):** The small-stock premium may be compensation for the higher risk of smaller firms, which are often less diversified, have less access to credit, and are more vulnerable to economic downturns.\n- **HML (Value):** The value premium (high book-to-market stocks outperforming growth stocks) may be compensation for the risk of financial distress. Value firms are often mature, slower-growing companies that may be more sensitive to business cycle risk.\n\n---\n\n**3. Black-Litterman Confidence:**\nIf the investor is more confident in their view, the variance of that view (the diagonal elements of $\Omega$) will be smaller. In the Bayesian updating formula, a smaller $\Omega$ gives more weight to the investor's view ($Q$) and less weight to the market-implied prior ($\Pi$). Therefore, the resulting posterior expected returns will be pulled further away from the prior and closer to the investor's specific view.\n\n---\n\n**4. SML vs CML:**\nThe CML plots expected return against **total risk** ($\sigma$). Only perfectly diversified portfolios (combinations of the risk-free asset and the market portfolio) lie on the CML. An individual stock has both systematic (market) risk and idiosyncratic (firm-specific) risk. Its total risk, $\sigma_i$, will be higher than is justified by its expected return, so it will lie *below* the CML. The SML, however, plots expected return against **systematic risk** ($\beta$). According to the CAPM, all assets and portfolios, efficient or not, must lie on the SML in equilibrium. The market only compensates investors for bearing systematic risk, as idiosyncratic risk can be diversified away for free.
