<a href="https://colab.research.google.com/github/boyerb/Investments/blob/master/Ex20-Performance_Attribution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Investments: Theory, Fundamental Analysis, and Data Driven Analytics**, Bates, Boyer, and Fletcher

# Example Chapter 20: Allocation and Selection Effects for Active Return and Alpha
In this example we calcuate average allocation and selection effects for the  the active return and also calculate allocation and selection effects for alpha. This script does the same analysis as worksheet `EX20.3.1` in the *Examples Workbook*, except that the analysis is performed at the allocation and selection level, rather than the individual security level.

### Imports and Setup

In [None]:
!curl -s -O https://raw.githubusercontent.com/boyerb/Investments/master/functions/simple_finance.py
import importlib, simple_finance as sf
importlib.reload(sf)

#import packages
import pandas as pd
import statsmodels.api as sm

### Load in Data
We load stock returns, benchmark weights, and portfolio weights from the spreadsheet `EX20.3.1` and separate stock returns, benchmark weights, and portfolio weights into dintinct DataFrames.    

In [None]:
# Load in the data by first specifying the URL where the data can be found
url='https://github.com/boyerb/Investments/raw/master/Examples_25.02.xlsx'
# specify which columns to read
columns_to_read = ['Date', 'A','B','C','D','E','A_b','B_b','C_b','D_b','E_b','A_p','B_p','C_p','D_p','E_p']
df = pd.read_excel(url, sheet_name='EX20.3.1', header=3, usecols=columns_to_read, engine='openpyxl')
df = df.dropna()

# Split into the three panels and set the Date columns as the index
stock_returns = df[['Date','A','B','C','D','E']].set_index('Date')

benchmark_weights = df[['Date','A_b','B_b','C_b','D_b','E_b']].copy()
benchmark_weights.columns = ['Date','A','B','C','D','E']
benchmark_weights = benchmark_weights.set_index('Date')

portfolio_weights = df[['Date','A_i','B_i','C_i','D_i','E_i']].copy()
portfolio_weights.columns = ['Date','A','B','C','D','E']
portfolio_weights = portfolio_weights.set_index('Date')

### Calculate Portfolio and Benchmark Returns for Every Period
We calculate the benchmark return and portfolio return each period as a weighted sum of individual security returns. In each case we have two matrices or blocks of data: (1) returns and (2) weights. We want to `sumproduct()` the weights and returns for each row to create the portfolio return for that row-date. The way to do this in Python is to use the `*` operator which multiplies each element of the weights matrix by the same element of the return matrix. We then use the syntax `.sum(axis=1)` which will sum these products for each row. The syntax `(axis=1)` tells Python to do the summpation across columns rather than rows.  We then calcaulte the   

We then calculate the two columns of excess returns, one fo the benchmark and one for the portfolio by subtracting off the risk-free rate.  We are given that the risk-free rate is 25 basis points per month.

In [None]:
# Calcualte benchmark returns as a weighted sum of indivdiual secuirity returns.
benchmark_return = (stock_returns  * benchmark_weights).sum(axis=1)
# Calcualte portfolio returns as a weighted sum of indivdiual secuirity returns.
portfolio_return = (stock_returns  * portfolio_weights).sum(axis=1)

# Calculate excess returns
benchmark_excess_return = benchmark_return - 0.0025
portfolio_excess_return = portfolio_return - 0.0025

### Calculate Portfolio and Benchmark Segment Weights
We next group securities into two segments:

* Segment 1 (Seg1): the first three securities (A, B, C)
* Segment 2 (Seg2): the last two securities (D, E)

For each date, we sum the portfolio weights and benchmark weights within these groups to obtain segment-level weights.
1. Code snippet: `portfolio_segment_weights = pd.DataFrame({...})`  
2. Code snippet: `Seg1:portfolio_weights.iloc[:, 0:3].sum(axis=1)`  


The first code snippet creates a new Dataframe called `portfolio_segment_weights`.  The second code snippet creates a column in the DataFrame called `Seg1` that sums across the first three columns of the DataFrame `portfolio_weights`.  The colon, `:`, means "select all rows" (i.e., every date). The second part 0:3 means “select columns 0 through 2” (Python indexing is zero-based and stops before 3). So this picks out columns A_i, B_i, and C_i from the `portfolio_weights` DataFrame. The key part of the code is the use of `sum(axis=1)`, which tells Pandas to sum across columns (horizontally) for each row. Since each row represents a date, this produces the total weight for that segment on that date.

In [None]:
portfolio_segment_weights = pd.DataFrame({
    'Seg1': portfolio_weights.iloc[:, 0:3].sum(axis=1),  # sum of first 3 columns
    'Seg2': portfolio_weights.iloc[:, 3:5].sum(axis=1)   # sum of last 2 columns
}, index=portfolio_weights.index)

benchmark_segment_weights = pd.DataFrame({
    'Seg1': benchmark_weights.iloc[:, 0:3].sum(axis=1),  # sum of first 3 columns
    'Seg2': benchmark_weights.iloc[:, 3:5].sum(axis=1)   # sum of last 2 columns
}, index=benchmark_weights.index)

### Calculate Security Segment Weights
Now we normalize each security’s weight within its segment so that the weights inside a segment always sum to 1. This is done separately for the portfolio and the benchmark.  
Step 1. Select columns with .iloc
 * Code snippet: `portfolio_weights.iloc[:, 0:3]`
 * The colon, `:`, means "select all rows" (i.e., every date).
 * The second part 0:3 means “select columns 0 through 2” (Python indexing is zero-based and stops before 3).  
 * So this picks out securities A_i, B_i, C_i from the portfolio weights DataFrame.

Step 2. Divide row by row
 * Code snippet: `.div(portfolio_segment_weights['Seg1'], axis=0)`  
 * axis=0 means match by row index (i.e., by date).
 * Each security’s weight for a given date is divided by the total segment weight for that same date.
 * This ensures that inside each segment, the security-segment weights sum to 1 per date.

Step 3. Concatenate side by side
 * Code snippet: `pd.concat([pssw1, pssw2], axis=1)`
 * `axis=1` means combine along columns (side by side).
 * Here, the two DataFrames (pssw1 for A–C, pssw2 for D–E) are joined into one DataFrame with all five securities.


In [None]:
# Calculate Portfolio Security Segment Weights (pssw)
pssw1 = portfolio_weights.iloc[:, 0:3].div(portfolio_segment_weights['Seg1'], axis=0)
pssw2 = portfolio_weights.iloc[:, 3:5].div(portfolio_segment_weights['Seg2'], axis=0)
portfolio_security_segment_weights = pd.concat([pssw1, pssw2], axis=1)

# Calculate Benchmark Security Segment Weights (bssw)
bssw1 = benchmark_weights.iloc[:, 0:3].div(benchmark_segment_weights['Seg1'], axis=0)
bssw2 = benchmark_weights.iloc[:, 3:5].div(benchmark_segment_weights['Seg2'], axis=0)
benchmark_security_segment_weights = pd.concat([bssw1, bssw2], axis=1)

### Calculate Segment Returns
Now we want the return for each segment of the portfolio and the benchmark.

 * Segment 1 = A, B, C
 * Segment 2 = D, E

Because we already normalized the security-segment weights so they sum to 1 within each segment, we can calculate a segment's return as the weighted average of its securities' returns.  
Step 1. Select columns with .iloc
 * `.iloc[:, 0:3]` :  select all rows (:) and the first three columns (0,1,2), which correspond to A, B, C.
 * `.iloc[:, 3:5]` :  select all rows and the last two columns (3,4), which correspond to D, E.

Step2 . Calcaute secment returns
 * Multiply the matrix of returns and the mnatrix of weights element by element using the `*` operator.
 * `.sum(axis=1)` means sum across the columns for each row (date), giving the segment's return on that date.  

Step 3. Store the two columns of segment returns together in one table (a pandas DataFrame) so it's easy to use later.
 * `pd.DataFrame({...})` creates a new DataFrame.
 * Inside the curly braces `{}`, we pass in a dictionary:
  * The key (e.g., `'Port_Seg1_Return'`) becomes the column name.
  * The value (e.g., `port_seg1_ret`) is the data for that column.
  * `index=stock_returns.index` tells pandas to use the same row labels (dates) as the original stock returns DataFrame. This way, each segment return lines up correctly with the same dates as the stock data.

In [None]:
# Portfolio segment returns (weights within each segment sum to 1)
port_seg1_ret = (portfolio_security_segment_weights.iloc[:, 0:3] * stock_returns.iloc[:, :3]).sum(axis=1)
port_seg2_ret = (portfolio_security_segment_weights.iloc[:, 3:5] * stock_returns.iloc[:, 3:5]).sum(axis=1)

portfolio_segment_returns = pd.DataFrame(
    {'Port_Seg1_Return': port_seg1_ret, 'Port_Seg2_Return': port_seg2_ret},
    index=stock_returns.index
)

# Benchmark segment returns
bench_seg1_ret = (benchmark_security_segment_weights.iloc[:, :3] * stock_returns.iloc[:, :3]).sum(axis=1)
bench_seg2_ret = (benchmark_security_segment_weights.iloc[:, 3:5] * stock_returns.iloc[:, 3:5]).sum(axis=1)

benchmark_segment_returns = pd.DataFrame(
    {'Bench_Seg1_Return': bench_seg1_ret, 'Bench_Seg2_Return': bench_seg2_ret},
    index=stock_returns.index
)

## Calculate Allocation Effects
For each segment:
 * Calculate the difference in weights;  $(W_{p,1}-W_{b,1})$
    -  syntax: `portfolio_segment_weights['Seg1']` - benchmark_segment_weights['Seg1']`   
 *  Calculate the difference in benchmark returns: $(r_{b,i}^s-r_b)$
    - syntax: `benchmark_segment_returns['Bench_Seg1_Return'] - benchmark_return`

 * Multiply these two columns together element-by element: $(W_{p,1}-W_{b,1})(r_{b,i}^s-r_b)$
    - syntax: We use the `*` operator. This creates the allocation effect for each month. The backslash, `\` is simply a line break that allows the code to continue on another line.

We then collect the allocation effects into a dataframe, using code similar to that in the previous block, with three columns: (1) allocation to segment 1, (2) allocation to segment 2, and (3) the total allocation effect.  The index of the dataframe is the row date.  



In [None]:
alloc_seg1 = (portfolio_segment_weights['Seg1'] - benchmark_segment_weights['Seg1']) * \
             (benchmark_segment_returns['Bench_Seg1_Return'] - benchmark_return)

alloc_seg2 = (portfolio_segment_weights['Seg2'] - benchmark_segment_weights['Seg2']) * \
             (benchmark_segment_returns['Bench_Seg2_Return'] - benchmark_return)

allocation = pd.DataFrame(
    {
        'Allocation_Seg1': alloc_seg1,
        'Allocation_Seg2': alloc_seg2,
        'Allocation_Total': alloc_seg1 + alloc_seg2
    },
    index=benchmark_return.index  # dates
)

## Calculate Selection Effects
For each segment:
 * Multiply by the portfolio segment weight by the difference between the portfolio's return for that segment and the benchmark's return for that segment: $W_{p,i}(r_{p,i}^s-r_{b,i}^s)$

    - Note that when you wrap an expression in parentheses `(...)`, you automatically get implicit line continuation, so you can break the line without using `\`
    
We then collect the selection effects into a dataframe, using code similar to that in the previous block, with three columns: (1) selection within segment 1, (2) selection within segment 2, and (3) the total selection effect.  The index of the dataframe is the row date.  

In [None]:
# Selection effect for Segment 1
sel_seg1 = portfolio_segment_weights['Seg1'] * (
    portfolio_segment_returns['Port_Seg1_Return'] - benchmark_segment_returns['Bench_Seg1_Return']
)

# Selection effect for Segment 2
sel_seg2 = portfolio_segment_weights['Seg2'] * (
    portfolio_segment_returns['Port_Seg2_Return'] - benchmark_segment_returns['Bench_Seg2_Return']
)

# Combine into one DataFrame
selection = pd.DataFrame(
    {
        'Selection_Seg1': sel_seg1,
        'Selection_Seg2': sel_seg2,
        # optional: total
        'Selection_Total': sel_seg1 + sel_seg2
    },
    index=portfolio_segment_weights.index
)

### Calculate Average Allocation and Selection Effects for the Active Return
Here we caculate that average allocation and selection effects, and collect results into a dataframe.  
* syntax: `.mean()` calcaulates the average down rows.  

We then collect these effects into a DataFrame, and a a column for the total average effect.  Note that the total average effect is identical to the average monthly effect estimated in the *Examples Workbook*, worksheet `EX20.3.1` in cell `V67`.   

In [None]:
avg_effects = {
    'Allocation_Seg1': allocation['Allocation_Seg1'].mean(),
    'Allocation_Seg2': allocation['Allocation_Seg2'].mean(),
    'Selection_Seg1':  selection['Selection_Seg1'].mean(),
    'Selection_Seg2':  selection['Selection_Seg2'].mean(),
}

# Build DataFrame
avg_df = pd.DataFrame.from_dict(avg_effects, orient='index', columns=['Average_Effect'])

# Add total row
avg_df.loc['Total'] = avg_df['Average_Effect'].sum()
print(avg_df)

### Calculate Portfolio Alpha
Here we use the `intercept()` function from the `simple_finance` package. Note that the alpha is identical to the monthly alpha estimated in the *Examples Workbook*, worksheet `EX20.3.1` in cell `T71`.   

In [None]:
# X = benchmark excess return, Y = portfolio excess return
a=sf.intercept(portfolio_excess_return, benchmark_excess_return)
print(a)

### Estimating Component Alphas

Now we want to see how much each effect (allocation or selection) contributes to the portfolio’s overall alpha.

The idea: for each component time series, we run a regression of the component’s return on the benchmark’s excess return. The intercept of this regression is the component alpha. We frist create a dictionary with four elements, and then turn this into a DataFrame. We then calculate the total alpha. Note that the alpha is identical to that calculated above.



In [None]:
alphas = {}
alphas['Allocation_Seg1'] = sf.intercept(allocation['Allocation_Seg1'],benchmark_excess_return)
alphas['Allocation_Seg2'] = sf.intercept(allocation['Allocation_Seg2'],benchmark_excess_return)
alphas['Selection_Seg1'] = sf.intercept(selection['Selection_Seg1'],benchmark_excess_return)
alphas['Selection_Seg2'] = sf.intercept(selection['Selection_Seg2'],benchmark_excess_return)


# Convert to DataFrame
alpha_df = pd.DataFrame.from_dict(alphas, orient='index', columns=['Alpha'])
alpha_df.loc['Total'] = alpha_df['Alpha'].sum()
print(alpha_df)

### Annualize, Merge and Display Results
Here we combine the total attrribution and alpha attribution effects, and display the results.

In [None]:
# Multiply each DataFrame by 12 to Annualize
avg_df_annualized = avg_df * 12
alpha_df_annualized = alpha_df * 12

# Combine side by side
combined = pd.concat([avg_df_annualized, alpha_df_annualized], axis=1)

print(combined)

### Summary
Selection to Segment 1 seems to be the driving force of both the active return and alpha.