# Lab 2 for EC3318/MN3101 Corporate Finance

## Modern Portfolio Theory with PyPortfolioOpt


## 1. Setup and data recap


### Introduction to Modern Portfolio Theory

**Modern Portfolio Theory (MPT)**, developed by Harry Markowitz in 1952, revolutionized investment management by providing a mathematical framework for constructing portfolios that optimize the trade-off between expected return and risk (measured as variance). The key insight is that portfolio risk depends not just on individual asset volatilities, but critically on the **correlations** between assets—enabling diversification benefits.

In this lab, you will:

1. Load and prepare historical stock return data
2. Estimate key inputs (expected returns and covariance matrix) for portfolio optimization
3. Apply mean-variance optimization to find **efficient portfolios**
4. Understand the **efficient frontier** and the **tangency portfolio** (maximum Sharpe ratio)
5. Explore how constraints (e.g., no short-selling) affect optimal portfolios
6. Translate optimal weights into actionable trade recommendations

We'll use **PyPortfolioOpt**, a Python library that implements these concepts with robust numerical optimization methods.


In [None]:
# Import core data manipulation libraries
import pandas as pd  # For working with tabular data (DataFrames)
import numpy as np  # For numerical operations and array computations

# Import PyPortfolioOpt components for portfolio optimization
from pypfopt import (
    EfficientFrontier,  # Main class for mean-variance optimization
    risk_models,  # Functions to estimate covariance matrices
    expected_returns,  # Functions to estimate expected returns
    objective_functions,  # Additional objectives like L2 regularization
)

# Import discrete allocation tools to convert weights to actual share counts
from pypfopt.discrete_allocation import DiscreteAllocation, get_latest_prices

# Import convex optimization library (used internally by PyPortfolioOpt)
import cvxpy as cp
import cvxpy.constraints

# Import visualization library (ggplot-style grammar of graphics for Python)
from plotnine import *
from plotnine import options as p9_options
from mizani.formatters import percent_format  # Format axis labels as percentages
from adjustText import adjust_text  # Automatically adjust text labels to avoid overlap

# Set default figure size for all plots
p9_options.figure_size = (8, 6)

# Configure pandas display options to show full outputs without truncation
pd.set_option("display.max_rows", None)  # Show all rows in output
pd.set_option("display.max_columns", None)  # Show all columns in output
pd.set_option("display.width", None)  # Auto-detect terminal width
pd.set_option("display.max_colwidth", None)  # Show full column content

### Loading and Preparing Return Data

The foundation of any portfolio optimization is high-quality **historical return data**. Here we load monthly adjusted closing prices from an Excel file and transform them into returns.

**Key data transformations:**

1. **Convert Excel date format**: Excel stores dates as numbers (days since 1899-12-30)
2. **Reshape from wide to long format**: Using `melt()` to create a tidy structure
3. **Calculate returns**: Compute percentage changes within each stock using `groupby()`
4. **Handle missing values**: Drop NaN values from the first return observation

This tidy data structure (one row per stock-date pair) makes subsequent analysis much easier.


In [None]:
# Load historical stock price data and calculate returns
returns = (
    pd.read_excel(
        io="downloading_stock_prices.xlsx",  # Excel file with price data
        sheet_name="Data download",  # Specific sheet name
        skiprows=7,  # Skip header rows
    )
    # Convert Excel serial date numbers to proper datetime objects
    .assign(date=lambda x: pd.to_datetime(x["date"], unit="D", origin="1899-12-30"))
    # Transform from wide format (one column per stock) to long format (one row per stock-date)
    .melt(id_vars="date", var_name="symbol", value_name="adj_close")
    # Sort by stock symbol and then by date for proper time series ordering
    .sort_values(["symbol", "date"])
    # Reset index after sorting
    .reset_index(drop=True)
    # Calculate percentage returns for each stock separately using groupby
    # pct_change() computes (price_t - price_{t-1}) / price_{t-1}
    .assign(ret=lambda df: df.groupby("symbol")["adj_close"].pct_change())
    # Remove the first observation for each stock (which has NaN return)
    .dropna()
)
# Display first few rows to verify data structure
returns.head()

### Summary Statistics by Stock

Before optimization, it's crucial to understand the characteristics of each asset. We compute **descriptive statistics** for the return distribution of each stock:

- **Count**: Number of monthly observations
- **Mean**: Average return (a simple estimate of expected return)
- **Std**: Standard deviation (volatility/risk)
- **Min/Max**: Range of observed returns

These statistics reveal which assets have historically delivered higher returns and which are more volatile. Note that past performance doesn't guarantee future results—one of the key challenges in portfolio optimization!


In [None]:
# Summary statistics for each stock through groupby-aggregation
summary_stats = (
    returns.groupby("symbol")["ret"]  # Group returns by stock symbol
    .agg(
        ["count", "mean", "std", "min", "max"]
    )  # Calculate multiple statistics at once
    .round(3)  # Round to 3 decimal places for readability
    .reset_index()  # Convert index (symbol) back to a regular column
)
# Display the summary statistics table
summary_stats

## 3. Estimating inputs for Markowitz optimization


### Why Annualization Matters

Financial returns are often reported at different frequencies (daily, monthly, annual). For portfolio analysis, we typically **annualize** returns and volatilities to:

1. Make comparisons across different datasets easier
2. Express results in intuitive yearly terms
3. Align with industry standards (Sharpe ratios, target returns, etc.)

**Annualization formulas:**

- **Expected return**: Multiply by the number of periods per year
  - Monthly: $\mu_{\text{annual}} = 12 \times \mu_{\text{monthly}}$
- **Volatility (standard deviation)**: Multiply by the square root of periods per year
  - Monthly: $\sigma_{\text{annual}} = \sqrt{12} \times \sigma_{\text{monthly}}$

This follows from the properties of independently distributed returns: variance scales linearly with time, but standard deviation scales with the square root of time.


In [None]:
# Set annualization factor for monthly data (12 months per year)
# This will be used to convert monthly statistics to annual equivalents
annualisation_factor = 12

### Visualizing the Risk-Return Trade-off

This scatter plot shows the **risk-return profile** of each asset in our investment universe. Each point represents one stock, positioned according to its:

- **X-axis**: Annualized volatility (risk)
- **Y-axis**: Annualized expected return (reward)

**Key insights from this plot:**

- Assets in the **upper-left** quadrant are desirable (high return, low risk)
- Assets in the **lower-right** are undesirable (low return, high risk)
- The **correlation structure** between assets (not shown here) determines diversification benefits
- No individual asset dominates all others—this creates opportunities for portfolio construction

The goal of MPT is to combine these assets to create portfolios that lie **above and to the left** of any individual asset.


In [None]:
# Create asset summary with expected returns and volatilities
asset_summary = pd.DataFrame(
    {
        "symbol": summary_stats["symbol"],  # Stock ticker symbols
        "mu": summary_stats["mean"],  # Mean monthly return (not annualized yet)
        "sigma": summary_stats["std"]
        * np.sqrt(annualisation_factor),  # Annualized volatility
    }
).reset_index(drop=True)

# Create enhanced scatter plot with better text positioning
assets_figure = (
    # Initialize plot with data and aesthetic mappings
    ggplot(asset_summary, aes(x="sigma", y="mu", label="symbol"))
    # Add points for each asset
    + geom_point(size=3, alpha=0.7)
    # Add text labels with automatic adjustment to prevent overlap
    + geom_text(adjust_text={"arrowprops": {"arrowstyle": "-"}})
    # Format x-axis as percentages
    + scale_x_continuous(labels=percent_format())
    # Format y-axis as percentages
    + scale_y_continuous(labels=percent_format())
    # Add informative labels
    + labs(
        x="Volatility (annualised)",
        y="Expected return (annualised)",
        title="Expected returns and volatilities of portfolio constituents",
        subtitle="Based on historical monthly returns",
    )
    # Use clean, professional theme
    + theme_minimal()
    # Set custom figure size
    + theme(figure_size=(10, 7))
)
# Display the plot
assets_figure

### Estimating Expected Returns

**Expected returns** are notoriously difficult to estimate—they have high estimation error and are unstable over time. Here we use **historical mean returns** as our estimate.

PyPortfolioOpt provides the `mean_historical_return()` function which:

1. Calculates the arithmetic mean of historical returns for each asset
2. Annualizes the result using the specified frequency
3. Returns a pandas Series indexed by asset symbols

**Critical caveat**: Historical returns are often poor predictors of future returns. In practice, practitioners may:

- Use more sophisticated forecasting models
- Apply **shrinkage** methods (Black-Litterman, Bayes-Stein)
- Incorporate **fundamental analysis** or **analyst forecasts**
- Use **equal-weighted** or **minimum-variance** approaches that don't rely heavily on return estimates


In [None]:
# Reshape data from long format to wide format (needed for PyPortfolioOpt functions)
# Each column represents one stock, each row represents one date
prices_wide = returns.get(["date", "symbol", "adj_close"]).pivot(
    index="date", columns="symbol", values="adj_close"
)

# Calculate mean historical returns for each asset (annualized)
mu = expected_returns.mean_historical_return(
    prices_wide, frequency=annualisation_factor  # Converts monthly to annual
)

# Display expected returns as a DataFrame (transposed for better readability)
pd.DataFrame({"expected_return": mu}).T

### Estimating the Covariance Matrix

The **covariance matrix** is the heart of portfolio optimization. It captures:

- **Variances** (diagonal elements): How volatile each asset is
- **Covariances** (off-diagonal elements): How assets move together

**Why it matters:**

- **Positive covariance**: Assets tend to move in the same direction (less diversification benefit)
- **Negative covariance**: Assets move in opposite directions (strong diversification benefit)
- **Zero covariance**: Assets move independently (moderate diversification benefit)

The `sample_cov()` function computes the sample covariance matrix from historical returns and annualizes it. This matrix is:

- **Symmetric**: $\Sigma_{ij} = \Sigma_{ji}$
- **Positive semidefinite**: All eigenvalues are non-negative (required for optimization)

**Estimation challenges:**

- For $N$ assets, we must estimate $N(N+1)/2$ parameters
- With limited data, estimation error can be substantial
- Advanced methods (shrinkage, factor models) can improve estimates


In [None]:
# Calculate the sample covariance matrix (annualized)
# This captures both individual asset volatilities and co-movements between assets
cov_matrix = risk_models.sample_cov(prices_wide, frequency=annualisation_factor)
# Display the covariance matrix
cov_matrix

## 4. Mean–variance optimization theory

Let's build some intuition starting with a simple **two-asset** portfolio, then see how the same ideas scale up using **matrix notation** for many assets.

### Two risky assets

Consider two assets, **A** and **B**.

- Expected returns: $\mu_A, \mu_B$
- Standard deviations: $\sigma_A, \sigma_B$
- Correlation: $\rho_{AB}$
- Portfolio weight in A: $w$ (so weight in B is $1-w$)

#### Portfolio expected return

$$\mu_p = w\,\mu_A + (1-w)\,\mu_B$$

#### Portfolio risk (variance)

$$\sigma_p^2 = w^2\sigma_A^2 + (1-w)^2\sigma_B^2 + 2w(1-w)\rho_{AB}\sigma_A\sigma_B$$

- If $\rho_{AB}=1$: no diversification benefit.
- If $\rho_{AB}<1$: diversification reduces risk.
- If $\rho_{AB}=-1$: perfect negative correlation can theoretically eliminate risk entirely.

#### The opportunity set and the minimum-variance portfolio (MVP)

As $w$ varies from 0 to 1, the point $(\sigma_p,\mu_p)$ traces out a curve called the **two-asset opportunity set**.  
The **MVP** sits at the lowest point on this curve. Its weight on A is:
$$w_A^{\text{mvp}} = \frac{\sigma_B^2 - \rho_{AB}\sigma_A\sigma_B}{\sigma_A^2 + \sigma_B^2 - 2\rho_{AB}\sigma_A\sigma_B}, \quad w_B^{\text{mvp}} = 1 - w_A^{\text{mvp}}$$

> If short-selling is **forbidden**, we clip weights to $[0,1]$ and pick the lowest-variance feasible point.

#### Hitting a target return with two assets

With only two assets (and full investment), the weight that achieves target return $\mu_p$ is:
$$w = \frac{\mu_p - \mu_B}{\mu_A - \mu_B}, \quad 1-w = \frac{\mu_A - \mu_p}{\mu_A - \mu_B}$$
This is already the minimum-variance way to reach $\mu_p$ using just A and B.

#### Adding a risk-free asset

Let the risk-free rate be $r_f$. Pick any risky portfolio $R$ (a mix of A and B) and combine it with $r_f$ to get a straight **Capital Allocation Line (CAL)** with slope (Sharpe ratio):
$$\text{Sharpe}(R) = \frac{\mu_R - r_f}{\sigma_R}$$
The **tangency portfolio** $R^\ast$ (a particular mix of A and B) maximizes this slope.

### From 2 assets to many (matrix view)

Matrix notation simply extends the same concepts to $N$ assets in a compact form.

- **Weights:** $\boldsymbol{\omega}\in\mathbb{R}^N$ with $\boldsymbol{\omega}^\top \mathbf{1}=1$
- **Expected returns:** $\boldsymbol{\mu}\in\mathbb{R}^N$
- **Covariance matrix:** $\Sigma\in\mathbb{R}^{N\times N}$ (symmetric, positive semidefinite)
- **Ones vector:** $\mathbf{1}\in\mathbb{R}^N$

#### Portfolio return and variance

$$\mu_p = \boldsymbol{\omega}^\top \boldsymbol{\mu}, \quad \sigma_p^2 = \boldsymbol{\omega}^\top \Sigma \boldsymbol{\omega}$$

#### Minimum-variance portfolio (MVP)

$$\boldsymbol{\omega}_{\text{mvp}} = \frac{\Sigma^{-1}\mathbf{1}}{\mathbf{1}^\top \Sigma^{-1}\mathbf{1}}$$

#### Tangency portfolio (with risk-free rate $r_f$)

$$\boldsymbol{\omega}_{\text{tan}} = \frac{\Sigma^{-1}\big(\boldsymbol{\mu}-r_f\mathbf{1}\big)}{\mathbf{1}^\top \Sigma^{-1}\big(\boldsymbol{\mu}-r_f\mathbf{1}\big)}$$

This is the unique risky portfolio that maximizes the Sharpe ratio. Any point on the CAL is just a mix of $\boldsymbol{\omega}_{\text{tan}}$ with the risk-free asset.

#### General (target-return) problem

$$\min_{\boldsymbol{\omega}}\;\boldsymbol{\omega}^\top\Sigma\,\boldsymbol{\omega} \quad \text{subject to} \quad \boldsymbol{\omega}^\top\boldsymbol{\mu}=\mu_p, \; \boldsymbol{\omega}^\top\mathbf{1}=1$$
This has a closed-form solution (via Lagrange multipliers) and is computationally fast even for large $N$.

### Key takeaways

- The two-asset case builds intuition: returns combine linearly, risk is curved, and there's a closed-form MVP.
- With a risk-free asset, the **tangency portfolio** is the optimal risky mix; you then scale it up or down based on your risk appetite.
- Matrix notation is just a compact, scalable way to express these same ideas for many assets.


## 5. The efficient frontier and tangency portfolio


### What is the Efficient Frontier?

The **efficient frontier** is the set of portfolios that offer the highest expected return for each level of risk (or equivalently, the lowest risk for each level of return). It represents the "best" possible portfolios available.

**Key concepts:**

- **Dominated portfolios**: Any portfolio not on the efficient frontier is dominated—you could achieve higher return for the same risk or lower risk for the same return
- **Tangency portfolio**: The portfolio on the efficient frontier with the highest **Sharpe ratio** (reward-to-risk ratio): $\text{Sharpe} = \frac{\mu_p - r_f}{\sigma_p}$
- **Two-fund separation theorem**: When a risk-free asset exists, all investors should hold the same risky portfolio (the tangency portfolio) combined with the risk-free asset

### Maximum Sharpe Ratio Portfolio (Tangency Portfolio)

The **tangency portfolio** is typically the most important portfolio in practice because:

1. It offers the best risk-adjusted returns
2. It serves as the optimal risky portfolio for all investors
3. Risk-averse investors hold more of the risk-free asset; risk-seeking investors leverage it

We add **L2 regularization** (Ridge penalty) with $\gamma = 0.01$ to:

- Prevent extreme portfolio weights
- Improve numerical stability
- Reduce sensitivity to estimation error


In [None]:
# Create efficient frontier object and compute portfolios
# Initialize with expected returns (mu) and covariance matrix
ef_sharpe = EfficientFrontier(mu, cov_matrix)
# Add L2 regularization to penalize extreme weights (gamma controls strength)
ef_sharpe.add_objective(objective_functions.L2_reg, gamma=0.01)

# Maximum Sharpe ratio portfolio (tangency portfolio)
# This solves: max (mu_p - r_f) / sigma_p where r_f = 0 by default
max_sharpe_weights = ef_sharpe.max_sharpe()
# Clean weights: remove tiny positions (below threshold) for practical implementation
cleaned_max_sharpe = ef_sharpe.clean_weights()
# Calculate portfolio performance metrics (expected return, volatility, Sharpe ratio)
max_sharpe_perf = ef_sharpe.portfolio_performance(verbose=True)

print("\n" + "=" * 50)
print("MAXIMUM SHARPE RATIO PORTFOLIO")
print("=" * 50)
# Convert weights dictionary to DataFrame and sort by weight size
max_sharpe_weights_df = (
    pd.Series(cleaned_max_sharpe, name="weight").sort_values(ascending=False).to_frame()
)
# Create metrics series for cleaner display
max_sharpe_metrics = pd.Series(
    max_sharpe_perf,
    index=["expected_return", "volatility", "sharpe_ratio"],
    name="max_sharpe_portfolio",
)
print("Portfolio weights:")
print(max_sharpe_weights_df)
print(f"\nPerformance metrics:")
print(max_sharpe_metrics)

### Minimum Volatility Portfolio (MVP)

The **minimum volatility portfolio** is the portfolio with the lowest possible risk. It lies at the leftmost point of the efficient frontier.

**Key characteristics:**

- Focuses purely on risk minimization (ignores expected returns)
- Typically highly diversified across many assets
- Often has lower returns than the tangency portfolio
- Suitable for very risk-averse investors
- More stable over time than return-focused portfolios (less sensitive to return estimation errors)

**Mathematical formulation:**
$$\min_{\boldsymbol{\omega}} \; \boldsymbol{\omega}^\top \Sigma \boldsymbol{\omega} \quad \text{subject to} \quad \boldsymbol{\omega}^\top \mathbf{1} = 1$$

The MVP is popular because:

1. It doesn't require expected return estimates (which are notoriously unreliable)
2. It's relatively stable and robust
3. Historically, it has often performed well out-of-sample


In [None]:
# Minimum volatility portfolio
# Create a fresh EfficientFrontier object (each optimization needs its own instance)
ef_min_vol = EfficientFrontier(mu, cov_matrix)
# Add L2 regularization to prevent extreme weights
ef_min_vol.add_objective(objective_functions.L2_reg, gamma=0.01)
# Optimize for minimum volatility (ignores expected returns)
min_vol_weights = ef_min_vol.min_volatility()
# Clean up small weights for practical implementation
cleaned_min_vol = ef_min_vol.clean_weights()
# Calculate performance metrics
min_vol_perf = ef_min_vol.portfolio_performance(verbose=True)

print("\n" + "=" * 50)
print("MINIMUM VOLATILITY PORTFOLIO")
print("=" * 50)
# Convert weights to DataFrame and sort
min_vol_weights_df = (
    pd.Series(cleaned_min_vol, name="weight").sort_values(ascending=False).to_frame()
)
# Create metrics series
min_vol_metrics = pd.Series(
    min_vol_perf,
    index=["expected_return", "volatility", "sharpe_ratio"],
    name="min_vol_portfolio",
)
print("Portfolio weights:")
print(min_vol_weights_df)
print(f"\nPerformance metrics:")
print(min_vol_metrics)

### Target Return Portfolio

Sometimes investors have a **specific return target** in mind (e.g., "I need 8% annual return"). The `efficient_return()` method finds the portfolio on the efficient frontier that achieves this target with **minimum risk**.

**Mathematical formulation:**
$$\min_{\boldsymbol{\omega}} \; \boldsymbol{\omega}^\top \Sigma \boldsymbol{\omega} \quad \text{subject to} \quad \boldsymbol{\omega}^\top \boldsymbol{\mu} = \mu_{\text{target}}, \; \boldsymbol{\omega}^\top \mathbf{1} = 1$$

**Important notes:**

- The target must be **feasible** (between the minimum and maximum achievable returns)
- If target is too high, optimization will fail
- If target equals MVP return, you get the MVP
- If target equals max Sharpe return, you get the tangency portfolio

Here we use the mean of all asset returns as our target, ensuring it's feasible.


In [None]:
# Set a target return (clipped to feasible range for safety)
target_return = float(np.clip(mu.mean(), mu.min(), mu.max()))
# Create fresh EfficientFrontier object
ef_target = EfficientFrontier(mu, cov_matrix)
# Add L2 regularization
ef_target.add_objective(objective_functions.L2_reg, gamma=0.01)
# Optimize to achieve target return with minimum risk
target_weights = ef_target.efficient_return(target_return=target_return)
# Clean weights
cleaned_target = ef_target.clean_weights()
# Calculate performance (verbose=False to suppress output)
target_perf = ef_target.portfolio_performance(verbose=False)

# Convert to DataFrames for display
target_weights_df = (
    pd.Series(cleaned_target, name="weight").sort_values(ascending=False).to_frame()
)
target_metrics = pd.Series(
    target_perf,
    index=["expected_return", "volatility", "sharpe_ratio"],
    name=f"target_return_{target_return:.2%}",
)
# Display weights and metrics
target_weights_df, target_metrics

## 6. Comparing unconstrained vs. constrained portfolios


### Short-Selling: Theory vs. Practice

Up to now, we've been optimizing portfolios **without explicitly allowing short-selling**. PyPortfolioOpt's default behavior is `weight_bounds=(0, 1)`, which prohibits negative weights (short positions). But what if we **truly allow** unconstrained optimization?

**Short-selling** means taking a **negative position** in an asset:

- Borrow shares from a broker
- Sell them immediately at current price
- Later buy them back (hopefully at a lower price) to return to the broker
- Profit if the price falls; lose if it rises

**Why short-sell in portfolio optimization?**

- **Hedge risks**: Short overvalued assets to reduce portfolio volatility
- **Enhance returns**: Profit from expected price declines
- **Improve efficiency**: Access to shorting expands the efficient frontier

**In practice, many investors face constraints:**

- **Institutional restrictions**: Pension funds, mutual funds often can't short
- **Practical difficulties**: Requires margin accounts, involves borrowing costs
- **Regulatory limits**: Some markets restrict short-selling
- **Risk management**: Short positions have unlimited loss potential

Let's compare **two scenarios**:

1. **Unconstrained portfolios** with `weight_bounds=(-1, 1)`: Allow shorts up to -100% and longs up to +100%
2. **Long-only portfolios** with `weight_bounds=(0, 1)`: Traditional buy-and-hold only

This comparison reveals the **cost of constraints** on portfolio performance.


In [None]:
print("=" * 60)
print("COMPARING UNCONSTRAINED VS. LONG-ONLY PORTFOLIOS")
print("=" * 60)

# =============================================================================
# UNCONSTRAINED PORTFOLIOS (Allow short-selling)
# =============================================================================
print("\n" + "=" * 60)
print("UNCONSTRAINED PORTFOLIOS (SHORT-SELLING ALLOWED)")
print("=" * 60)

# Maximum Sharpe with short-selling allowed
# weight_bounds=(-1, 1) means: -100% ≤ weight ≤ +100% for each asset
# Negative weights = short positions, Positive weights = long positions
ef_unconstrained_sharpe = EfficientFrontier(mu, cov_matrix, weight_bounds=(-1, 1))
# Add L2 regularization to prevent extreme weights
ef_unconstrained_sharpe.add_objective(objective_functions.L2_reg, gamma=0.01)
# Optimize for maximum Sharpe ratio
unconstrained_sharpe_weights = ef_unconstrained_sharpe.max_sharpe()
# Clean weights (removes positions below threshold)
cleaned_unconstrained_sharpe = ef_unconstrained_sharpe.clean_weights()
# Calculate performance metrics
unconstrained_sharpe_perf = ef_unconstrained_sharpe.portfolio_performance(verbose=False)

print("\nMaximum Sharpe Portfolio (Unconstrained):")
unconstrained_sharpe_df = (
    pd.Series(cleaned_unconstrained_sharpe, name="weight")
    .sort_values(ascending=False)
    .to_frame()
)
print("Weights:")
print(unconstrained_sharpe_df)
print(f"\nExpected Return: {unconstrained_sharpe_perf[0]:.4f}")
print(f"Volatility: {unconstrained_sharpe_perf[1]:.4f}")
print(f"Sharpe Ratio: {unconstrained_sharpe_perf[2]:.4f}")

# Count short positions
num_shorts = sum(1 for w in cleaned_unconstrained_sharpe.values() if w < 0)
num_longs = sum(1 for w in cleaned_unconstrained_sharpe.values() if w > 0)
print(f"\nShort positions: {num_shorts} assets")
print(f"Long positions: {num_longs} assets")

# Minimum volatility with short-selling allowed
ef_unconstrained_minvol = EfficientFrontier(mu, cov_matrix, weight_bounds=(-1, 1))
# Add L2 regularization
ef_unconstrained_minvol.add_objective(objective_functions.L2_reg, gamma=0.01)
# Optimize for minimum volatility
unconstrained_minvol_weights = ef_unconstrained_minvol.min_volatility()
# Clean weights
cleaned_unconstrained_minvol = ef_unconstrained_minvol.clean_weights()
# Calculate performance
unconstrained_minvol_perf = ef_unconstrained_minvol.portfolio_performance(verbose=False)

print("\nMinimum Volatility Portfolio (Unconstrained):")
unconstrained_minvol_df = (
    pd.Series(cleaned_unconstrained_minvol, name="weight")
    .sort_values(ascending=False)
    .to_frame()
)
print("Weights:")
print(unconstrained_minvol_df)
print(f"\nExpected Return: {unconstrained_minvol_perf[0]:.4f}")
print(f"Volatility: {unconstrained_minvol_perf[1]:.4f}")
print(f"Sharpe Ratio: {unconstrained_minvol_perf[2]:.4f}")

# Count short positions
num_shorts_mv = sum(1 for w in cleaned_unconstrained_minvol.values() if w < 0)
num_longs_mv = sum(1 for w in cleaned_unconstrained_minvol.values() if w > 0)
print(f"\nShort positions: {num_shorts_mv} assets")
print(f"Long positions: {num_longs_mv} assets")

# =============================================================================
# LONG-ONLY PORTFOLIOS (No short-selling)
# =============================================================================
print("\n" + "=" * 60)
print("LONG-ONLY PORTFOLIOS (NO SHORT-SELLING)")
print("=" * 60)

# Maximum Sharpe with long-only constraint
# weight_bounds=(0, 1) means: 0 ≤ weight ≤ 1 for each asset (no shorts, no leverage)
ef_long_only = EfficientFrontier(mu, cov_matrix, weight_bounds=(0, 1))
# Add L2 regularization
ef_long_only.add_objective(objective_functions.L2_reg, gamma=0.01)
# Optimize for maximum Sharpe ratio
long_only_sharpe_weights = ef_long_only.max_sharpe()
# Clean weights (removes positions below threshold)
cleaned_long_only_sharpe = ef_long_only.clean_weights()
# Calculate performance metrics
long_only_sharpe_perf = ef_long_only.portfolio_performance(verbose=False)

print("\nMaximum Sharpe Portfolio (Long-only):")
long_only_sharpe_df = (
    pd.Series(cleaned_long_only_sharpe, name="weight")
    .sort_values(ascending=False)
    .to_frame()
)
print("Weights:")
print(long_only_sharpe_df)
print(f"\nExpected Return: {long_only_sharpe_perf[0]:.4f}")
print(f"Volatility: {long_only_sharpe_perf[1]:.4f}")
print(f"Sharpe Ratio: {long_only_sharpe_perf[2]:.4f}")

# Minimum volatility with long-only constraint
ef_long_only_minvol = EfficientFrontier(mu, cov_matrix, weight_bounds=(0, 1))
# Add L2 regularization
ef_long_only_minvol.add_objective(objective_functions.L2_reg, gamma=0.01)
# Optimize for minimum volatility
long_only_minvol_weights = ef_long_only_minvol.min_volatility()
# Clean weights
cleaned_long_only_minvol = ef_long_only_minvol.clean_weights()
# Calculate performance
long_only_minvol_perf = ef_long_only_minvol.portfolio_performance(verbose=False)

print("\nMinimum Volatility Portfolio (Long-only):")
long_only_minvol_df = (
    pd.Series(cleaned_long_only_minvol, name="weight")
    .sort_values(ascending=False)
    .to_frame()
)
print("Weights:")
print(long_only_minvol_df)
print(f"\nExpected Return: {long_only_minvol_perf[0]:.4f}")
print(f"Volatility: {long_only_minvol_perf[1]:.4f}")
print(f"Sharpe Ratio: {long_only_minvol_perf[2]:.4f}")

# =============================================================================
# COMPARISON: Cost of constraints
# =============================================================================
print("\n" + "=" * 60)
print("IMPACT OF SHORT-SELLING CONSTRAINT")
print("=" * 60)

print("\nMaximum Sharpe Portfolio:")
print(
    f"  Unconstrained - Sharpe: {unconstrained_sharpe_perf[2]:.4f}, Vol: {unconstrained_sharpe_perf[1]:.4f}"
)
print(
    f"  Long-only     - Sharpe: {long_only_sharpe_perf[2]:.4f}, Vol: {long_only_sharpe_perf[1]:.4f}"
)
sharpe_loss = (
    (unconstrained_sharpe_perf[2] - long_only_sharpe_perf[2])
    / unconstrained_sharpe_perf[2]
) * 100
print(f"  → Sharpe ratio reduction: {sharpe_loss:.1f}%")

print("\nMinimum Volatility Portfolio:")
print(
    f"  Unconstrained - Vol: {unconstrained_minvol_perf[1]:.4f}, Return: {unconstrained_minvol_perf[0]:.4f}"
)
print(
    f"  Long-only     - Vol: {long_only_minvol_perf[1]:.4f}, Return: {long_only_minvol_perf[0]:.4f}"
)
vol_increase = (
    (long_only_minvol_perf[1] - unconstrained_minvol_perf[1])
    / unconstrained_minvol_perf[1]
) * 100
print(f"  → Volatility increase: {vol_increase:.1f}%")

### Interpreting the Results: Benefits of Short-Selling

**Key observations from the comparison:**

**1. Negative Weights Appear in Unconstrained Portfolios**

- The unconstrained portfolios now show **negative weights** (short positions)
- These shorts are strategic: we're betting against assets expected to underperform
- Short positions help **hedge** the portfolio and reduce overall risk

**2. Performance Improvement with Short-Selling**

- **Higher Sharpe Ratio**: Unconstrained portfolios achieve better risk-adjusted returns
- **Lower Minimum Volatility**: Ability to short enables deeper risk reduction
- **More Flexibility**: Optimizer has more tools to balance risk and return

**3. Economic Interpretation of Short Positions**
When the optimizer chooses to short an asset, it's because:

- The asset has **negative alpha** (expected to underperform)
- Shorting it provides **hedging benefits** (negative correlation with longs)
- The position improves the **risk-return trade-off** for the overall portfolio

**4. Why the Long-Only Constraint Hurts Performance**

- **Reduced opportunity set**: Can only buy, not sell short
- **Suboptimal hedging**: Can't directly bet against overvalued assets
- **Higher risk**: Must achieve diversification only through long positions
- **Lower returns**: Miss opportunities to profit from price declines

**5. Practical Considerations**
Despite the theoretical benefits of short-selling:

- **Most retail investors** don't have access to margin accounts for shorting
- **Institutional constraints** often prohibit shorts (e.g., mutual funds, pension funds)
- **Costs**: Borrowing fees, margin interest, and regulatory capital requirements
- **Risks**: Unlimited loss potential (stock price can rise indefinitely)

The comparison shows that **constraints are costly**—but in practice, many investors must accept this cost due to regulatory, institutional, or risk management considerations.


## 7. Visualizing the efficient frontier


### Plotting the Efficient Frontier: Methodology

To visualize the efficient frontier, we need to:

1. **Generate a grid** of target returns spanning the feasible range
2. **Solve optimization** for each target return (minimize risk subject to achieving that return)
3. **Handle numerical issues**: Some target returns may be infeasible or cause optimization failures
4. **Remove duplicates**: Ensure we have a clean, monotonic frontier

We'll create **two frontiers**:

- **Unconstrained**: Allows short-selling (weights can be negative)
- **Long-only**: No short-selling (0 ≤ weights ≤ 1)

The `compute_frontier_points()` function handles this systematically, catching optimization errors and deduplicating results.

**What to expect:**

- Long-only frontier lies to the **right** of the unconstrained frontier (higher risk)
- The **tangency portfolios** differ between the two cases
- The **gap** between frontiers shows the cost of the no-short-selling constraint


In [None]:
# Enhanced efficient frontier visualization with Tidy Finance techniques
ridge_gamma = 0.01  # Regularization parameter for all optimizations
# Create grid of target returns to trace out the frontier
frontier_grid = np.linspace(mu.min(), mu.max(), 50)


def compute_frontier_points(target_returns, mu_vec, cov, gamma, weight_bounds=(-1, 1)):
    """
    Sample efficient frontier points for plotting.

    Args:
        target_returns: Array of target return values to optimize for
        mu_vec: Expected returns vector
        cov: Covariance matrix
        gamma: L2 regularization parameter
        weight_bounds: Tuple of (min_weight, max_weight) for each asset
                      (-1, 1) allows shorting; (0, 1) is long-only

    Returns:
        DataFrame with expected_return, volatility, sharpe_ratio for each point
    """
    rows = []
    for target in target_returns:
        # Create fresh optimizer for each target
        ef = EfficientFrontier(mu_vec, cov, weight_bounds=weight_bounds)
        ef.add_objective(objective_functions.L2_reg, gamma=gamma)
        try:
            # Try to find portfolio that achieves target return with minimum risk
            ef.efficient_return(target_return=float(target))
            # Get performance metrics (return, volatility, Sharpe)
            perf = ef.portfolio_performance(verbose=False)
        except (ValueError, OverflowError):
            # Skip this target if optimization fails (infeasible or numerical issues)
            continue
        rows.append(
            {
                "expected_return": perf[0],
                "volatility": perf[1],
                "sharpe_ratio": perf[2],
            }
        )
    return (
        pd.DataFrame(rows)
        .drop_duplicates(subset=["volatility"])  # Remove duplicate points
        .sort_values("volatility")  # Sort by increasing risk
        .reset_index(drop=True)
    )


# Compute frontiers with different constraints
# Unconstrained: allows short-selling (weights between -1 and 1)
frontier_unconstrained = compute_frontier_points(
    frontier_grid, mu, cov_matrix, ridge_gamma, (-1, 1)
)
# Long-only: no short-selling (weights between 0 and 1)
frontier_long_only = compute_frontier_points(
    frontier_grid, mu, cov_matrix, ridge_gamma, (0, 1)
)

# Add constraint type labels for plotting
frontier_unconstrained["constraint_type"] = "Unconstrained"
frontier_long_only["constraint_type"] = "Long-only"

# Combine frontiers for plotting
all_frontiers = pd.concat(
    [frontier_unconstrained, frontier_long_only], ignore_index=True
)

# Individual asset positions (for reference on plot)
asset_positions = pd.DataFrame(
    {
        "symbol": mu.index,
        "expected_return": mu,
        "volatility": np.sqrt(np.diag(cov_matrix)),  # Square root of variance
    }
).reset_index(drop=True)

# Key portfolios with different constraints (for annotation on plot)
key_portfolios = pd.DataFrame(
    [
        {
            "portfolio": "Min Vol (Unconstrained)",
            "expected_return": unconstrained_minvol_perf[0],
            "volatility": unconstrained_minvol_perf[1],
            "constraint_type": "Unconstrained",
        },
        {
            "portfolio": "Max Sharpe (Unconstrained)",
            "expected_return": unconstrained_sharpe_perf[0],
            "volatility": unconstrained_sharpe_perf[1],
            "constraint_type": "Unconstrained",
        },
        {
            "portfolio": "Target (Unconstrained)",
            "expected_return": target_perf[0],
            "volatility": target_perf[1],
            "constraint_type": "Unconstrained",
        },
        {
            "portfolio": "Min Vol (Long-only)",
            "expected_return": long_only_minvol_perf[0],
            "volatility": long_only_minvol_perf[1],
            "constraint_type": "Long-only",
        },
        {
            "portfolio": "Max Sharpe (Long-only)",
            "expected_return": long_only_sharpe_perf[0],
            "volatility": long_only_sharpe_perf[1],
            "constraint_type": "Long-only",
        },
    ]
).reset_index(drop=True)

### Interpreting the Efficient Frontier Plot

This comprehensive visualization shows:

**1. The Efficient Frontiers (curved lines):**

- **Blue line** (Unconstrained): Efficient frontier allowing short positions
- **Orange line** (Long-only): Efficient frontier with no-short-selling constraint
- Notice how the long-only frontier lies to the **right** → constraints increase risk

**2. Individual Assets (gray points):**

- Any portfolio on the efficient frontier **dominates** individual assets
- Diversification creates portfolios superior to holding single stocks

**3. Key Portfolios (colored markers):**

- **Circles**: Unconstrained portfolios
- **Triangles**: Long-only portfolios
- Compare positions to see impact of constraints

**Key insights:**

- **Diversification benefits**: Efficient portfolios achieve lower risk than any individual asset
- **Cost of constraints**: Long-only constraint reduces Sharpe ratio and increases minimum achievable risk
- **Tangency portfolio**: The highest Sharpe ratio point on each frontier
- **Practical implications**: If you can't short-sell, you must accept either lower returns or higher risk


In [None]:
# Enhanced plot with Tidy Finance styling
enhanced_plot = (
    # Initialize empty plot (we'll add multiple layers)
    ggplot()
    # Add efficient frontier lines (one for each constraint type)
    + geom_line(
        all_frontiers,
        aes(x="volatility", y="expected_return", color="constraint_type"),
        size=1.2,
    )
    # Add individual asset positions as reference points
    + geom_point(
        asset_positions,
        aes(x="volatility", y="expected_return"),
        color="#6c757d",  # Neutral gray color
        size=3,
        alpha=0.7,  # Semi-transparent
    )
    # Add labels for individual assets (with automatic positioning to avoid overlap)
    + geom_text(
        asset_positions,
        aes(x="volatility", y="expected_return", label="symbol"),
        adjust_text={"arrowprops": {"arrowstyle": "-"}},
        size=9,
        color="#495057",
    )
    # Add key portfolio positions (MVP, Max Sharpe, etc.)
    + geom_point(
        key_portfolios,
        aes(
            x="volatility",
            y="expected_return",
            color="constraint_type",
            shape="constraint_type",  # Different shapes for different constraints
        ),
        size=4,
    )
    # Add labels for key portfolios
    + geom_text(
        key_portfolios,
        aes(x="volatility", y="expected_return", label="portfolio"),
        adjust_text={"arrowprops": {"arrowstyle": "-"}},
        size=8,
        color="#1b263b",
    )
    # Format x-axis as percentages
    + scale_x_continuous(labels=percent_format())
    # Format y-axis as percentages
    + scale_y_continuous(labels=percent_format())
    # Manual color scale (professional color scheme)
    + scale_color_manual(values={"Unconstrained": "#0d3b66", "Long-only": "#fb8500"})
    # Manual shape scale (circles for unconstrained, triangles for long-only)
    + scale_shape_manual(values={"Unconstrained": "o", "Long-only": "^"})
    # Add comprehensive labels
    + labs(
        title="Efficient Frontier: Impact of Short-Selling Constraints",
        subtitle="Comparing unconstrained vs. long-only portfolios",
        x="Volatility (annualised)",
        y="Expected Return (annualised)",
        color="Constraint",
        shape="Constraint",
        caption="Data: Historical returns with L2 regularisation (γ=0.01)",
    )
    # Use minimal theme for clean look
    + theme_minimal()
    # Set custom figure size
    + theme(figure_size=(12, 8))
)

# Display the plot
enhanced_plot

### Comparing Portfolio Weights Across Strategies

Understanding **how** different strategies allocate capital across assets is crucial for:

1. **Implementation**: Do we need short-selling capability?
2. **Risk management**: Are we overly concentrated in few assets?
3. **Transaction costs**: Extreme weights may be costly to establish
4. **Interpretation**: Why does this portfolio work?

The weight comparison table shows:

- **Positive weights**: Long positions (buy and hold)
- **Negative weights**: Short positions (borrow and sell)
- **Zero weights**: Assets excluded from the portfolio

**What to look for:**

- **Concentration**: Are weights spread across many assets or focused on a few?
- **Short positions**: How important is short-selling to achieving optimal results?
- **Stability**: Do small changes in optimization parameters lead to dramatic weight changes?


In [None]:
# Portfolio weights comparison visualization
print("\n" + "=" * 50)
print("PORTFOLIO WEIGHTS COMPARISON")
print("=" * 50)

# Compile all portfolio weights (excluding custom constraints for now)
# Each column represents one portfolio strategy
weights_comparison = pd.DataFrame(
    {
        "Max Sharpe (Uncon.)": pd.Series(
            cleaned_unconstrained_sharpe
        ),  # Maximum Sharpe, unconstrained
        "Min Vol (Uncon.)": pd.Series(
            cleaned_unconstrained_minvol
        ),  # Minimum volatility, unconstrained
        "Max Sharpe (Long)": pd.Series(
            cleaned_long_only_sharpe
        ),  # Maximum Sharpe, long-only
        "Min Vol (Long)": pd.Series(
            cleaned_long_only_minvol
        ),  # Minimum volatility, long-only
    }
).fillna(
    0
)  # Replace NaN with 0 (assets not included in portfolio)

print("Portfolio weights summary:")
print(weights_comparison)

# Create stacked bar chart with better styling
# Reshape data from wide to long format for ggplot
weights_long = (
    weights_comparison.reset_index()
    .melt(id_vars="index", var_name="Portfolio", value_name="Weight")
    .rename(columns={"index": "Asset"})
)

# Add constraint type for coloring
weights_long["Constraint_Type"] = weights_long["Portfolio"].apply(
    lambda x: "Long-only" if "Long" in x else "Unconstrained"
)

# Sort portfolios for better display (explicit ordering)
weights_long["Portfolio"] = pd.Categorical(
    weights_long["Portfolio"],
    categories=[
        "Max Sharpe (Uncon.)",
        "Min Vol (Uncon.)",
        "Max Sharpe (Long)",
        "Min Vol (Long)",
    ],
    ordered=True,
)

# Create stacked bar chart
weights_plot = (
    ggplot(weights_long, aes(x="Portfolio", y="Weight", fill="Asset"))
    # Stacked bars showing composition of each portfolio
    + geom_col(position="stack", width=0.7)
    # Add horizontal line at zero to clearly show short positions
    + geom_hline(yintercept=0, linetype="dashed", color="black", alpha=0.5)
    # Format y-axis as percentages
    + scale_y_continuous(labels=percent_format())
    # Add comprehensive labels
    + labs(
        title="Portfolio Weights Across Different Optimization Strategies",
        subtitle="Comparing unconstrained vs. constrained portfolios",
        x="Portfolio Strategy",
        y="Portfolio Weight",
        caption="Negative weights indicate short positions",
    )
    # Use minimal theme
    + theme_minimal()
    # Rotate x-axis labels for readability
    + theme(
        axis_text_x=element_text(angle=45, hjust=1),
        figure_size=(12, 7),
        legend_position="right",
    )
    # Customize legend
    + guides(fill=guide_legend(title="Asset"))
)

# Display the plot
weights_plot

## 8. From weights to a trade list

Optimized weights rarely translate directly into actual trades. PyPortfolioOpt includes a discrete allocator that converts optimal weights into real share counts given your budget.


### Understanding Discrete Allocation

Portfolio optimization produces **continuous weights** (e.g., 15.34% in Apple, 8.72% in Microsoft). However, in practice:

- **Stocks trade in whole shares**: You can't buy 0.34 shares of Apple
- **Budget constraints**: You have a fixed amount to invest
- **Transaction costs**: Fractional positions may be expensive or impossible

**Discrete allocation** solves this problem by converting optimal weights into **integer share counts** that:

1. Respect your budget constraint
2. Approximate the optimal weights as closely as possible
3. Are practically implementable

The `DiscreteAllocation` class uses a **greedy algorithm**:

- Start with the most important positions
- Allocate whole shares until budget is exhausted
- Minimize tracking error vs. the target weights

**What you get:**

- **Allocation dictionary**: How many shares of each stock to buy
- **Leftover cash**: Uninvested capital (due to discrete constraint and rounding)


In [None]:
# Get the most recent price for each asset (needed for share count calculation)
latest_prices = get_latest_prices(prices_wide)
# Set portfolio budget (how much capital we have to invest)
portfolio_value = 10_000

print("=" * 60)
print("DISCRETE ALLOCATION FOR LONG-ONLY MAX SHARPE PORTFOLIO")
print("=" * 60)
print("\nNote: Discrete allocation works only for long-only portfolios.")
print("Short positions cannot be represented as negative share counts.\n")

# Create discrete allocation object with:
# - Target weights (from long-only maximum Sharpe portfolio)
# - Latest prices (to convert $ amounts to share counts)
# - Total budget (portfolio value)
da = DiscreteAllocation(
    cleaned_long_only_sharpe, latest_prices, total_portfolio_value=portfolio_value
)
# Run greedy allocation algorithm to determine share counts
# Returns: (1) dictionary of {symbol: shares}, (2) leftover cash
allocation, leftover_cash = da.greedy_portfolio()

# Display results
print("Share allocation:")
allocation_df = pd.Series(allocation, name="shares").sort_index()
print(allocation_df)
print(f"\nLeftover cash: ${leftover_cash:.2f}")

# Calculate total invested
total_invested = sum(
    allocation[symbol] * latest_prices[symbol] for symbol in allocation
)
print(f"Total invested: ${total_invested:.2f}")
print(f"Portfolio utilization: {(total_invested/portfolio_value)*100:.1f}%")

### Why We Use Long-Only for Discrete Allocation

The discrete allocation algorithm **cannot handle short positions** because:

1. **Shares must be non-negative integers**: You can't buy -5 shares
2. **Short-selling mechanics differ**: Shorting requires borrowing shares, not just negative quantities
3. **Practical implementation**: Most brokers handle shorts separately from longs

For the **unconstrained portfolio with shorts**, implementation would require:

- Separate orders for long positions (buy) and short positions (borrow and sell)
- Margin account with sufficient collateral
- Tracking of borrowing costs and margin requirements
- More complex position management

Therefore, we demonstrate discrete allocation using the **long-only Max Sharpe portfolio**, which can be directly implemented through standard buy orders.
