<a href="https://colab.research.google.com/github/boyerb/Investments/blob/master/Exxx-Performance_Attribution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Investment Analysis**, Bates, Boyer, Fletcher

# Example Chapter xx: Contributions to Alpha Using Time Series Regressions
In this example we estimate allocation and selection effects for alpha.

### Imports and Setup

In [1]:
#import packages
import pandas as pd
import statsmodels.api as sm

### Load in Data
We load stock returns, benchmark weights, and portfolio weights from the spreadsheet. In this file, the column labels (`A`, `B`, `C`,...) are repeated for each block of data in row 2 . When Python encounters duplicate column names while reading the spreadsheet, it automatically appends suffixes `.1`, `.2`, `.3`, and so on to distinguish them.    

In [2]:
# Load in the data by first specifying the URL where the data can be found
url='https://github.com/boyerb/Investments/raw/master/Examples_3.46.xlsx'
# specify which columns to read
columns_to_read = ['Date', 'A','B','C','D','E','A.1','B.1','C.1','D.1','E.1','A.2','B.2','C.2','D.2','E.2']
df = pd.read_excel(url, sheet_name='PA-2', header=1, usecols=columns_to_read, engine='openpyxl')
df = df.dropna()

# Split into the three panels and set the Date columns as the index
stock_returns = df[['Date','A','B','C','D','E']].set_index('Date')

benchmark_weights = df[['Date','A.1','B.1','C.1','D.1','E.1']].copy()
benchmark_weights.columns = ['Date','A','B','C','D','E']
benchmark_weights = benchmark_weights.set_index('Date')

portfolio_weights = df[['Date','A.2','B.2','C.2','D.2','E.2']].copy()
portfolio_weights.columns = ['Date','A','B','C','D','E']
portfolio_weights = portfolio_weights.set_index('Date')

### Calculate Returns for Every Period
We calculate the benchmark return, portfolio return, and excess returns each period. We are given that the risk-free rate is 25 basis points per month.

In [3]:
benchmark_return = (stock_returns  * benchmark_weights).sum(axis=1)
portfolio_return = (stock_returns  * portfolio_weights).sum(axis=1)
benchmark_excess_return = benchmark_return - 0.0025
portfolio_excess_return = portfolio_return - 0.0025

### Calculate Portfolio and Benchmark Segment Weights
We next group securities into two segments:

* Segment 1 (Seg1): the first three securities (A, B, C)
* Segment 2 (Seg2): the last two securities (D, E)

For each date, we sum the portfolio weights and benchmark weights within these groups to obtain segment-level weights. The key part of the code is the use of sum(axis=1), which tells pandas to sum across columns (horizontally) for each row. Since each row represents a date, this produces the total weight for that segment on that date.

In [4]:
portfolio_segment_weights = pd.DataFrame({
    'Seg1': portfolio_weights.iloc[:, 0:3].sum(axis=1),  # sum of first 3 columns
    'Seg2': portfolio_weights.iloc[:, 3:5].sum(axis=1)   # sum of last 2 columns
}, index=portfolio_weights.index)

benchmark_segment_weights = pd.DataFrame({
    'Seg1': benchmark_weights.iloc[:, 0:3].sum(axis=1),  # sum of first 3 columns
    'Seg2': benchmark_weights.iloc[:, 3:5].sum(axis=1)   # sum of last 2 columns
}, index=benchmark_weights.index)

### Calculate Security Segment Weights
Now we normalize each security’s weight within its segment so that the weights inside a segment always sum to 1. This is done separately for the portfolio and the benchmark.  
Step 1. Select columns with .iloc
 * Code snippet: `portfolio_weights.iloc[:, 0:3]`
 * The colon, `:`, means "select all rows" (i.e., every date).
 * The second part 0:3 means “select columns 0 through 2” (Python indexing is zero-based and stops before 3).  
 * So this picks out securities A, B, C from the portfolio weights DataFrame.

Step 2. Divide row by row
 * Code snippet: `.div(portfolio_segment_weights['Seg1'], axis=0)`  
 * axis=0 means match by row index (i.e., by date).
 * Each security’s weight for a given date is divided by the total segment weight for that same date.
 * This ensures that inside each segment, the security-segment weights sum to 1 per date.

Step 3. Concatenate side by side
 * Code snippet: `pd.concat([pssw1, pssw2], axis=1)`
 * `axis=1` means combine along columns (side by side).
 * Here, the two DataFrames (pssw1 for A–C, pssw2 for D–E) are joined into one DataFrame with all five securities.


In [5]:
# Calcaute Portfolio Security Segment Weights (pssw)
pssw1 = portfolio_weights.iloc[:, 0:3].div(portfolio_segment_weights['Seg1'], axis=0)
pssw2 = portfolio_weights.iloc[:, 3:5].div(portfolio_segment_weights['Seg2'], axis=0)
portfolio_security_segment_weights = pd.concat([pssw1, pssw2], axis=1)

# Calculate Benchmark Security Segment Weights (bssw)
bssw1 = benchmark_weights.iloc[:, 0:3].div(benchmark_segment_weights['Seg1'], axis=0)
bssw2 = benchmark_weights.iloc[:, 3:5].div(benchmark_segment_weights['Seg2'], axis=0)
benchmark_security_segment_weights = pd.concat([bssw1, bssw2], axis=1)

### Calculate Segment Returns
Now we want the return for each segment of the portfolio and the benchmark.

 * Segment 1 = A, B, C
 * Segment 2 = D, E

Because we already normalized the security-segment weights so they sum to 1 within each segment, we can calculate a segment's return as the weighted average of its securities' returns.  
 * `.iloc[:, :3]` :  select all rows (:) and the first three columns (0,1,2), which correspond to A, B, C.
 * `.iloc[:, 3:5]` :  select all rows and the last two columns (3,4), which correspond to D, E.
 * `.sum(axis=1)` means sum across the columns for each row (date), giving the segment's return on that date.  

After we calculate the return for each segment (port_seg1_ret and port_seg2_ret), we want to store them together in one table (a pandas DataFrame) so it's easy to use later.
 * `pd.DataFrame({...})` creates a new DataFrame.
 * Inside the curly braces `{}`, we pass in a dictionary:
  * The key (e.g., `'Port_Seg1_Return'`) becomes the column name.
  * The value (e.g., `port_seg1_ret`) is the data for that column.
  * `index=stock_returns.index` tells pandas to use the same row labels (dates) as the original stock returns DataFrame. This way, each segment return lines up correctly with the same dates as the stock data.

In [6]:
# Portfolio segment returns (weights within each segment sum to 1)
port_seg1_ret = (portfolio_security_segment_weights.iloc[:, :3] * stock_returns.iloc[:, :3]).sum(axis=1)
port_seg2_ret = (portfolio_security_segment_weights.iloc[:, 3:5] * stock_returns.iloc[:, 3:5]).sum(axis=1)

portfolio_segment_returns = pd.DataFrame(
    {'Port_Seg1_Return': port_seg1_ret, 'Port_Seg2_Return': port_seg2_ret},
    index=stock_returns.index
)

# Benchmark segment returns
bench_seg1_ret = (benchmark_security_segment_weights.iloc[:, :3] * stock_returns.iloc[:, :3]).sum(axis=1)
bench_seg2_ret = (benchmark_security_segment_weights.iloc[:, 3:5] * stock_returns.iloc[:, 3:5]).sum(axis=1)

benchmark_segment_returns = pd.DataFrame(
    {'Bench_Seg1_Return': bench_seg1_ret, 'Bench_Seg2_Return': bench_seg2_ret},
    index=stock_returns.index
)

## Calculate Allocation Effects
The allocation effect measures how much of the portfolio's performance relative to the benchmark comes from over- or under-weighting segments, compared to the benchmark.

For each segment:
 * Take the difference in weights (portfolio weight - benchmark weight).
 * Multiply by the difference between the benchmark's return for that segment and the overall benchmark return.

This tells us whether putting more (or less) weight into a segment helped or hurt performance.

 * `portfolio_segment_weights['Seg1']` - benchmark_segment_weights['Seg1']` : $(W_{p,1}-W_{b,1})$, the difference in segment weights.
 * `benchmark_segment_returns['Bench_Seg1_Return'] - benchmark_return` : $(r_{b,1}-r_b)$, the difference in returns

We then collect the allocation effects into a dataframe, using code similar to that in the previous block, with three columns: (1) allocation to segment 1, (2) allocation to segment 2, and (3) the total allocation effect.  The index of the dataframe is the row date.  

In [7]:
alloc_seg1 = (portfolio_segment_weights['Seg1'] - benchmark_segment_weights['Seg1']) * \
             (benchmark_segment_returns['Bench_Seg1_Return'] - benchmark_return)

alloc_seg2 = (portfolio_segment_weights['Seg2'] - benchmark_segment_weights['Seg2']) * \
             (benchmark_segment_returns['Bench_Seg2_Return'] - benchmark_return)

allocation = pd.DataFrame(
    {
        'Allocation_Seg1': alloc_seg1,
        'Allocation_Seg2': alloc_seg2,
        # optional total:
        'Allocation_Total': alloc_seg1 + alloc_seg2
    },
    index=benchmark_return.index  # same as your dates
)

## Calculate Selection Effects
The selection effect measures how much of the portfolio's performance relative to the benchmark comes from over- or under-weighting specific securities within each segment. Here we include the interaction term with the selection effect.

For each segment:
 * Multiply by the portfolio segment weight by the difference between the portfolio's return for that segment and the benchmark's return for that segment.

 * `portfolio_segment_weights['Seg1']`: $W_{p,1}$, the portfolio segment weight.
 * `portfolio_segment_returns['Port_Seg1_Return'] - benchmark_segment_returns['Bench_Seg1_Return']` : $(r_{p,1}-r_{b,1})$, the difference in segment returns

We then collect the selection effects into a dataframe, using code similar to that in the previous block, with three columns: (1) selection within segment 1, (2) selection within segment 2, and (3) the total selection effect.  The index of the dataframe is the row date.  

In [8]:
# Selection effect for Segment 1
sel_seg1 = portfolio_segment_weights['Seg1'] * (
    portfolio_segment_returns['Port_Seg1_Return'] - benchmark_segment_returns['Bench_Seg1_Return']
)

# Selection effect for Segment 2
sel_seg2 = portfolio_segment_weights['Seg2'] * (
    portfolio_segment_returns['Port_Seg2_Return'] - benchmark_segment_returns['Bench_Seg2_Return']
)

# Combine into one DataFrame
selection = pd.DataFrame(
    {
        'Selection_Seg1': sel_seg1,
        'Selection_Seg2': sel_seg2,
        # optional: total
        'Selection_Total': sel_seg1 + sel_seg2
    },
    index=portfolio_segment_weights.index
)

### Calcculate Average Effects
Here we caculate average allocation and selection effects, and collect results into a dataframe.

In [9]:
avg_effects = {
    'Allocation_Seg1': allocation['Allocation_Seg1'].mean(),
    'Allocation_Seg2': allocation['Allocation_Seg2'].mean(),
    'Selection_Seg1':  selection['Selection_Seg1'].mean(),
    'Selection_Seg2':  selection['Selection_Seg2'].mean(),
}

# Build DataFrame
avg_df = pd.DataFrame.from_dict(avg_effects, orient='index', columns=['Average_Effect'])

# Add total row
avg_df.loc['Total'] = avg_df['Average_Effect'].sum()

### Calculate Portfolio Alpha

In [10]:
# X = benchmark excess return, Y = portfolio excess return
X = sm.add_constant(benchmark_excess_return)  # adds intercept (alpha)
y = portfolio_excess_return

model = sm.OLS(y, X).fit()
portfolio_alpha = model.params['const']
print("Portfolio alpha:", portfolio_alpha)

Portfolio alpha: 0.0022732518211335197


### Estimating Component Alphas

Now we want to see how much each effect (allocation or selection) contributes to the portfolio’s overall alpha.

The idea: for each component time series, we run a regression of the component’s return on the benchmark’s excess return. The intercept of this regression is the component alpha.

 * The `for` loop goes through each column name (two allocation effects and two selection effects).
 * `y_piece` is the data for that component.
 * `sm.add_constant(...)` adds a column of 1s so the regression can estimate an intercept (the alpha).
 * `sm.OLS(y_piece, X).fit()` runs an Ordinary Least Squares regression.
 * `model.params['const']` extracts the intercept, which is the component's alpha.
 * `pd.DataFrame.from_dict(...)` turns the dictionary of alphas into a nice table.
 * `alpha_df.loc['Total']` adds up all the alphas to show that the sum equals the portfolio alpha.

In [11]:
alphas = {}

for col in ['Allocation_Seg1','Allocation_Seg2',
            'Selection_Seg1','Selection_Seg2']:
    y_piece = allocation[col] if 'Allocation' in col else selection[col]
    X = sm.add_constant(benchmark_excess_return)
    model = sm.OLS(y_piece, X).fit()
    alphas[col] = model.params['const']

# Convert to DataFrame
alpha_df = pd.DataFrame.from_dict(alphas, orient='index', columns=['Alpha'])
alpha_df.loc['Total'] = alpha_df['Alpha'].sum()

### Alpha Check
Check to verify that the portfolio alpha is the sum of the component alphas.

In [12]:
print("Portfolio alpha:", portfolio_alpha)
print("Sum of attribution alphas:", alpha_df.loc['Total','Alpha'])

Portfolio alpha: 0.0022732518211335197
Sum of attribution alphas: 0.0022732518211335223


### Annualize, Merge and Display Results
Here we combine the total attrribution and alpha attribution effects, and display the results.

In [13]:
# Multiply each DataFrame by 12 to Annualize
avg_df_annualized = avg_df * 12
alpha_df_annualized = alpha_df * 12

# Combine side by side
combined = pd.concat([avg_df_annualized, alpha_df_annualized], axis=1)

print(combined)

                 Average_Effect     Alpha
Allocation_Seg1        0.000608  0.000351
Allocation_Seg2        0.002328  0.001340
Selection_Seg1         0.038616  0.027764
Selection_Seg2         0.005234 -0.002176
Total                  0.046786  0.027279


### Summary
Selection to Segment 1 seems to be the driving force of both the total performance spread and alpha.