# Group Factor Exposures

Factor analysis is a technique in quantitative protfolio management. Portfolio holidngs and performance (profit and less) are decomposed using one or more factors (risk factors are one example) represented as a portfolio of weights. For example, a stock price's co-movement with a benchmark (like S&P 500 index) is known as its bets, a common risk factor. Let's consider a contrived example of a portfolio constructed from 3 randomly-generated factors (usually called the factor loadings) and some weights:

In [1]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series

In [2]:
fac1, fac2, fac3 = np.random.rand(3, 1000)

In [3]:
import random; random.seed(0)
import string
N = 1000
def rands(n):
    choices = string.ascii_uppercase
    return ''.join([random.choice(choices) for _ in range(n)])
tickers = np.array([rands(5) for _ in range(N)])

In [4]:
fac1[:10], fac2[:10], fac3[:10]

(array([0.43672566, 0.98243116, 0.69165518, 0.69689995, 0.32554627,
        0.64140395, 0.64413051, 0.46835474, 0.6259105 , 0.88080001]),
 array([0.01918373, 0.28766059, 0.11283617, 0.45287537, 0.58055715,
        0.80109443, 0.71705903, 0.95224941, 0.22305689, 0.97591346]),
 array([0.13453143, 0.56326162, 0.00206494, 0.1908722 , 0.01176678,
        0.42967421, 0.76964555, 0.13482427, 0.08607279, 0.80729373]))

In [5]:
ticker_subset = tickers.take(np.random.permutation(N)[:1000])

# Weighted sum of factors plus noies
port = Series(.7 * fac1 - 1.2 * fac2 +.3 * fac3 + np.random.rand(1000),
                index = ticker_subset)

factors = DataFrame({'f1': fac1, 'f2': fac2, 'f3': fac3},
                    index = ticker_subset)

Vector correlations between each factor and the portfolio may not indicate too much:

In [6]:
factors.corrwith(port)

f1    0.351262
f2   -0.680517
f3    0.169067
dtype: float64

The standard way to compute the factor exposures is by least squares regression; using pandas.ols with factors as the explanatory variables we can compute exposures over the entire set of tickers:

In [7]:
pd.ols(y = port, x = factors).beta

AttributeError: module 'pandas' has no attribute 'ols'

As you can see, the original factor weights can nearly be recovered since there was not too much additional random noise added to the portfolio. Using groupby you can compute exposures industry by industry. To do so, write a function like so:

In [None]:
def beta_exposure(chunk, factors = None):
    return pd.ols(y = chunk, x = factors).beta

Then, group by industries and apply that function, passing the DataFrame of factor loadings:

In [14]:
industries = ['north', 'south']

by_ind = port.groupby(industries)

KeyError: 'north'

In [15]:
exposures = by_ind.apply(beta_exposure, factors = factors)

NameError: name 'by_ind' is not defined