<a href="https://colab.research.google.com/github/boyerb/Investments/blob/master/Exxx_Risk_Factor_Exposures_and_Performance_Spread.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Investment Analysis**, Bates, Boyer, Fletcher

# Example Chapter xx: Rick-Factor Exposures and the Performance Spread
In this example we illustrate how to calcualte the contributions of factor exposures to the performance spread in a multi-factor model.

### Imports and Setup

In [None]:
!curl -O https://raw.githubusercontent.com/boyerb/Investments/master/functions/simple_finance.py
import simple_finance as sf
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick

### Load Monthly Returns for Portfolio,  Benchmark, and Fama-French 5 Factors
We first load portfolio and benchmark return data from an Excel file and prepare it for analysis. The Fama–French 5 factor dataset already uses a PeriodIndex with monthly frequency. To make the portfolio and benchmark data consistent, we convert their date index into the same format:
 * `pd.to_datetime(df.index, format='%Y-%m')` converts the index into proper datetime objects (timestamps at the start of each month).

 * `.to_period('M')` then turns those timestamps into a PeriodIndex with monthly frequency.

This ensures that both datasets use the same type of monthly index, which makes them easy to merge.

In [None]:
# Load in the data by first specifying the URL where the data can be found
url='https://github.com/boyerb/Investments/raw/master/Examples_3.47.xlsx'
# specify which columns to read
columns_to_read = ['date', 'Portfolio','Benchmark']
df = pd.read_excel(url, sheet_name='PA-3', header=0, usecols=columns_to_read, engine='openpyxl')
df = df.dropna()
df = df.set_index('date').sort_index()
df.index  = pd.to_datetime(df.index, format='%Y-%m').to_period('M')

ff5=sf.get_ff5(start_date=None, end_date=None)

### Merfe the Two Datasets

In [None]:
data = df.join(ff5, how='inner')
print(data.head(5))

### Create Excess Returns and Define Explanatory Variables

In [None]:
# Excess returns
data['Port_excess'] = data['Portfolio'] - data['RF']
data['Bench_excess'] = data['Benchmark'] - data['RF']
data['Active_excess'] = data['Portfolio'] - data['Benchmark']   # active return

# Explaantory Variables (Factor Matrix)
factors = data[['Mkt-RF','SMB','HML','RMW','CMA']]

### Regression Attribution Function
The `run_reg_details` function runs an OLS regression of a return series on the Fama–French factors and produces two useful tables:

 * Coefficients table — shows the estimated regression coefficients (alpha, b1, b2, …).

 * Contributions table — shows the contribution of each factor to average return, computed as beta × average factor return, along with alpha and the total.

**How it works:**  
You give the function a return series (portfolio, benchmark, or active), the factor matrix, and a label (name). The function runs an OLS regression, extracts the regression coefficients to build  a  **coefficients table**, and then multiplies each factor’s coefficient by its average return to build a  **contributions table**.

**Inputs you provide:**  

 * `y`: the dependent variable (e.g., portfolio excess return, benchmark excess return, or active excess return).
 * `X`: the independent variables (the factor matrix).
 * `name`: a string label for the regression (e.g., `"Portfolio"`, `"Benchmark"`, `"Active"`).

**What you get back:**

 * `coefs`: a Series with alpha and each regression coefficient (b1, b2, …).
 * `contribs`: a Series with alpha, each bi*avg(Fi), and the total.

In [None]:
def run_reg_details(y, X, name):
    """
    OLS of y on X (with intercept). Returns:
      - coefs:    Alpha, b1, b2, ...             (as a Series named `name`)
      - contribs: Alpha, b1*avg(F1), ..., Total (as a Series named `name`)
    """
    model = sm.OLS(y, sm.add_constant(X)).fit()
    alpha = model.params["const"]
    betas = model.params.drop("const")

    # Coefficients: Alpha, b1..bK
    coefs = pd.Series({"Alpha": alpha, **{f"b{i}": betas[f] for i, f in enumerate(X.columns, 1)}}, name=name)

    # Contributions: Alpha, b1*avg(F1).., Total
    contrib_items = {"Alpha": alpha}
    contrib_items.update({f"b{i}*avg({f})": betas[f] * X[f].mean() for i, f in enumerate(X.columns, 1)})
    contrib_items["Total"] = sum(contrib_items.values())
    contribs = pd.Series(contrib_items, name=name)

    return coefs, contribs


### Run Factor Regressions and Build Tables
We now run the regression function run_reg_details for the portfolio, benchmark, and active return series. Each regression produces two outputs:

 * Coefficients table: shows alpha and the raw factor betas.
 * Contributions table: shows alpha and each factor’s contribution ($\beta \times$ average factor return).

The first block runs the regression three times: once for portfolio excess returns, once for benchmark excess returns, and once for active (portfolio – benchmark).

The second block combines the results side by side into two summary tables:

 * `coef_table`: the regression coefficients ($\alpha$ and betas) for each regression.  
 * `contrib_table`: annualized contributions (each monthly number is multiplied by 12 to annualize).



In [None]:
# Run for Portfolio, Benchmark, Active
coefs_port, contrib_port  = run_reg_details(data["Port_excess"],  factors, "Portfolio")
coefs_bench, contrib_bench  = run_reg_details(data["Bench_excess"], factors, "Benchmark")
coefs_active, contrib_active  = run_reg_details(data["Active_excess"], factors, "Active")

# Build tables
coef_table    = pd.concat([coefs_port,  coefs_bench,  coefs_active],  axis=1)



### Print the regression coefficients
The `\n` adds a blank line after printing each table.

In [None]:
print("Regression Coefficients")
print(coef_table, "\n")

print("Contributions")
print(contrib_active, "\n")

# Show that Active Total equals performance spread (mean Active excess)
mean_excess = pd.Series({
    "Active":    data["Active_excess"].mean(),
}, name="Mean Excess")

print("Performance Spread", data["Active_excess"].mean())

### Summary
The first table above presents the regression coefficients.  Note that the values in the **Active** column are equal to the **Portfolio** value minus the **Benchmark** value.  Both the portfolio and benchmark have a negative alpha, but the portfolio alpha is 31.78 basis points higher. The portfolio has greater exposure to the market factor (b1), the RMW factor (b2) and the CMA factor (b5), and lower exposure to the SMB (b2) and HML (b3) factors.

The contributions table decomposes the performance spread into two parts: alpha and the contributions from exposure to each risk factor. Greater exposure to the market factor is adding 45.42 basis points to the performance spread, which is 82.21 basis points. The next biggest contributor is CMA, the exposure to which is adding another 19.62 basis points.  