# 5 PORTFOLIO ANALYSIS
Portfolio analysis is one of the most commonly used statistical methodologies
in empirical asset pricing. Its objective is to examine the cross-sectional relation
between two or more variables. The most frequent application of portfolio analysis
is to examine the ability of one or more variables to predict future stock returns. The
general approach is to form portfolios of stocks, where the stocks in each portfolio
have different levels of the variable or variables posited to predict cross-sectional
variation in future returns and to examine the returns of these portfolios.

Perhaps the most important benefit of portfolio analysis is that it is a
nonparametric technique. This means that it does not make any assumptions
about the nature of the cross-sectional relations between the variables under
investigation. Many other methodologies rely on some assumptions regarding the
functional form of the relation between the variables being examined. For example,
linear regression analysis assumes that the relation between the dependent and
independent variables is linear. Portfolio analysis does not require this assumption.
In fact, portfolio analysis can be helpful in uncovering nonlinear relations between
variables that are quite difficult to detect using parametric techniques. Perhaps the main drawback of the technique is that it is difficult to control for a large number of
variables when examining the cross-sectional relation of interest. This compares to
regression analysis in which it is easy to control for a large number of independent
variables in the analysis.

## 5.1 UNIVARIATE PORTFOLIO ANALYSIS
We begin with the most basic type of portfolio analysis: univariate portfolio analysis.
A univariate portfolio analysis has only one sort variable $X$. The objective of the analysis is to assess the cross-sectional relation between $X$ and the outcome variable $Y$.
A univariate portfolio analysis does not allow us to control for any other effects when
examining this relation.

The univariate portfolio analysis procedure has **four steps**. The **first step** is to calculate the breakpoints that will be used to divide the sample into portfolios. The **second
step** is to use these breakpoints to form the portfolios. The **third step** is to calculate the
average value of the outcome variable $Y$ within each portfolio for each period $t$. The
**fourth step** is to examine variation in these average values of $Y$ across the different
portfolios.

### 5.1.1 Breakpoints
The first step in univariate portfolio analysis is to calculate the periodic breakpoints
that will be used to group the entities in the sample into portfolios based on values
of the sort variable $X$. We denote the number of portfolios to be formed each time period as $n_P$. The number
of breakpoints that need to be calculated each period is therefore $n_P ‚àí 1$. The number
of portfolios to be formed and, thus, the number of breakpoints to be calculated is
the same for all time periods. The value of the $k$th breakpoint, however, will almost
certainly vary from time period to time period. We denote the $k$th breakpoint for
period $t$ as $B_{k,t}$ for $k ‚àà \{1, 2, \dots , n_{P ‚àí 1}\}$.

The breakpoints for period $t$ are determined by percentiles of the time $t$
cross-sectional distribution of the sort variable $X$. Specifically, letting $p_k$ be the
percentile that determines the $k$th breakpoint, the $k$th breakpoint for period $t$ is
calculated as the $pk$th percentile of the values of $X$ across all entities in the sample
for which $X$ is available in period $t$. We therefore define the breakpoints as
$$
B_{k,t}=Pctl_{p_k}(\{X_t\}) \tag{5.1}
$$
where $Pctl_p(Z)$ is the $p$th percentile of the set $Z$ and $\{X_t\}$ represents the set of
valid values of the sort variable $X$ across all entities $i$ in the sample in time
period $t$. The percentiles, and thus the breakpoints, increase as $k$ increases, giving
$ 0 < p_1 < p_2 < \dots< p_{n_{P‚àí1}}$ and $B_{1,t} ‚â§ B_{2,t} ‚â§ \dots ‚â§ B_{n_{P‚àí1},t}$ for all periods $t$. 

It is worth mentioning here that, in some cases, breakpoints are calculated using
only a subset of the entities that are in the sample for the given period $t$. For example,
in research where the entities are stocks, sometimes researchers form breakpoints
using only stocks that trade on the New York Stock Exchange, and then use those
breakpoints to sort all stocks in the sample (including stocks that trade on other
exchanges) into portfolios. 
It is for this reason that we consider the calculation of breakpoints and the formation
of portfolios, two separate steps in the portfolio analysis procedure.

Choosing an appropriate number of portfolios and choosing appropriate percentiles for the breakpoints are important decisions in portfolio analysis. As the
entities in the sample will eventually be grouped into portfolios based on the
breakpoints, the decision is largely based on trading off the number of entities in
each portfolio against the dispersion of the sort variable among the portfolios. As the
number of portfolios increases, the number of entities in each portfolio decreases,
and vice versa. When the average value of the outcome variable $Y$ for each portfolio
is eventually calculated, **a small number of entities in each
portfolio results in increased noise when using the sample mean value of $Y$ as an
estimate of the true mean**. Thus, having **a large number of entities in each portfolio
increases the accuracy of our estimate of the true mean value for each portfolio** and is
thus desirable. On the other hand, the more entities we group into each portfolio, the
smaller the number of portfolios and the smaller the dispersion in the sort variable $X$ among the portfolios. **Decreased dispersion in $X$ across the portfolios can make it
more difficult to detect cross-sectional relations between $X$ and $Y$**, as the values of $X$
in the portfolios may not differ substantially if we have too few portfolios.

Most commonly, portfolios are formed using breakpoints that represent evenly
spaced percentiles of the cross-sectional distribution of the sort variable. This means
that the $n_{P ‚àí 1}$ breakpoints are defined to be the $k √ó (1‚àïn_P)$ percentiles of $X$, where
$k ‚àà {1, \dots, n_{P ‚àí 1}}$. For example, if we want to split the sample into five portfolios, we may use the 20th, 40th, 60th, and 80th percentiles of the sort variable as
the portfolio breakpoints.

In choosing the number of portfolios and breakpoint percentiles, it is important to
remember that new portfolios are formed for each time period $t$. Thus, when assessing the number of entities that fall into each portfolio, it is important to look not only
at the average number of entities in the sample during the different time periods $t$ but
also at the minimum number of entities in any time period.

**Finally, almost all studies use between 3 and 20 portfolios, with most
researchers choosing either 5 or 10.**

To exemplify the calculation of breakpoints in univariate portfolio analysis, we use
the methodology sample discussed in Section 1.1 and take $\beta$ to be the sort variable.
Our analysis uses seven portfolios $(n_P = 7)$ and thus 6 breakpoints will be calculated each year. The breakpoints will be the 10th, 20th, 40th, 60th, 80th, and 90th percentiles of $\beta$.

The results of the calculation of the breakpoints are presented in Table 5.1. The
table shows that, for example, breakpoints one, two, three, four, five, and six for year 1988 are ‚àí0.09, 0.05, 0.27, 0.48, 0.82, and 1.09, respectively. These are the breakpoints that will be used to sort stocks into portfolios at the end of year 1988. As
necessitated by the calculation, the breakpoints are increasing across the columns for
each year $t$.

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

df = pd.read_csv("alldata_mktcap.csv",index_col=0)
df = df.drop_duplicates(subset=['permno','year'])
df

Unnamed: 0,permno,year,beta,rt+1,bm,size,mktcap
0,10001,1988,0.267605,60.422343,1.145192,1.850382,6.362250
1,10002,1988,0.023970,-35.235672,,2.286519,9.840625
2,10003,1988,0.213007,-61.669376,,3.730165,41.686000
3,10005,1988,0.619461,-41.703333,1.632601,-0.241753,0.785250
4,10008,1988,0.869109,,,,
...,...,...,...,...,...,...,...
155169,93422,2009,,5.503780,,,
155170,93423,2009,,61.818743,,,
155171,93426,2009,,48.812811,,,
155172,93428,2009,,179.178222,,,


In [2]:
beta_breakpoint = df.groupby(['year'])['beta'].describe(percentiles=[0.1,0.2,0.4,0.6,0.8,0.9]).reset_index()
beta_breakpoint = beta_breakpoint[['year','10%','20%','40%','60%','80%','90%']]
beta_breakpoint.columns = ['year','B1','B2','B3','B4','B5','B6']

for i in range(1,7):
    name = beta_breakpoint.columns[i]
    beta_breakpoint[name] = beta_breakpoint[name].apply(lambda x:round(x, 2))

beta_breakpoint

Unnamed: 0,year,B1,B2,B3,B4,B5,B6
0,1988,-0.09,0.05,0.27,0.48,0.82,1.09
1,1989,-0.12,0.04,0.28,0.52,0.88,1.15
2,1990,-0.07,0.09,0.37,0.67,1.07,1.38
3,1991,-0.11,0.09,0.37,0.67,1.05,1.36
4,1992,-0.23,0.07,0.42,0.76,1.21,1.65
5,1993,-0.2,0.11,0.44,0.73,1.19,1.58
6,1994,-0.07,0.18,0.52,0.8,1.19,1.56
7,1995,-0.19,0.09,0.41,0.7,1.15,1.64
8,1996,0.01,0.19,0.47,0.74,1.16,1.55
9,1997,-0.0,0.15,0.36,0.59,0.89,1.15


### 5.1.2 Portfolio Formation
Having calculated the breakpoints, the next step in univariate portfolio analysis is
to group the entities in the sample into portfolios. 
In general, portfolio $k$ holds entities $i$ with period $t$ values of the sort variable, $X_{i,t}$, that
are greater than or equal to the $k ‚àí 1$st breakpoint $B_{k‚àí1,t}$ and less than or equal to the
$k$th breakpoint $B_{k,t}$ for $k ‚àà {1, \dots , n_P}$, where we define $B_{0,t} = -\infty$ and $B_{n_P,t} = \infty$.
Thus, letting $P_{k,t}$ be the set of entities in the $k$th portfolio formed at the end of period
$t$, we have
$$
P_{k,t} = \{i|B_{k‚àí1,t} ‚â§ X_{i,t} ‚â§ B_{k,t}\} \tag{5.2}
$$
for $k ‚àà {1, 2, \dots , n_P}$. We refer to $P_{k,t}$ as the $k$th portfolio or portfolio $k$ for period $t$.

When the set of entities used to calculate the breakpoints is the same as the set of
entities that are grouped into portfolios, the number of entities in each of the portfolios
should be approximately dictated by the percentiles used to calculate the breakpoints
and the number of stocks in the sample during the given period $t$.

Table 5.2 presents the number of stocks in each of the portfolios for each year $t$
in our example sample. As expected, as the 10th percentile is used to calculate the
first breakpoint, approximately 10% of the stocks. Similarly, the second, sixth, and
seventh portfolios each hold approximately 10% of the stocks in each cross section.
Portfolios three, four, and five each hold approximately 20% of the stocks in the
sample.

In [8]:
df_beta_bp = pd.merge(df, beta_breakpoint, how='left', on=['year'])

df_beta_bp = df_beta_bp[df_beta_bp['beta'].notnull()]
df_beta_bp = df_beta_bp[df_beta_bp['rt+1'].notnull()]

def beta_group(row):
    if row['beta']<=row['B1']:
        value='n1'
    elif row['B1']<=row['beta']<=row['B2']:
        value='n2'
    elif row['B2']<=row['beta']<=row['B3']:
        value='n3'  
    elif row['B3']<=row['beta']<=row['B4']:
        value='n4'
    elif row['B4']<=row['beta']<=row['B5']:
        value='n5'
    elif row['B5']<=row['beta']<=row['B6']:
        value='n6'
    elif row['B6']<=row['beta']:
        value='n7' 
    return value

df_beta_bp['group'] = df_beta_bp.apply(beta_group, axis=1)

df_beta_bp

Unnamed: 0,permno,year,beta,rt+1,bm,size,mktcap,B1,B2,B3,B4,B5,B6,group
0,10001,1988,0.267605,60.422343,1.145192,1.850382,6.362250,-0.09,0.05,0.27,0.48,0.82,1.09,n3
1,10002,1988,0.023970,-35.235672,,2.286519,9.840625,-0.09,0.05,0.27,0.48,0.82,1.09,n2
2,10003,1988,0.213007,-61.669376,,3.730165,41.686000,-0.09,0.05,0.27,0.48,0.82,1.09,n3
3,10005,1988,0.619461,-41.703333,1.632601,-0.241753,0.785250,-0.09,0.05,0.27,0.48,0.82,1.09,n5
5,10009,1988,-0.368279,-40.610894,,2.359556,10.586250,-0.09,0.05,0.27,0.48,0.82,1.09,n1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
130054,93428,2011,1.561395,20.238018,0.163951,6.711059,821.440021,0.25,0.55,0.98,1.25,1.51,1.73,n6
130055,93429,2011,0.816072,19.042129,0.112264,7.738735,2295.566394,0.25,0.55,0.98,1.25,1.51,1.73,n3
130057,93433,2011,2.243563,-49.855917,1.187065,3.728204,41.604299,0.25,0.55,0.98,1.25,1.51,1.73,n7
130058,93434,2011,0.115458,74.996691,0.569317,3.241733,25.577999,0.25,0.55,0.98,1.25,1.51,1.73,n1


In [3]:
table2 = df_beta_bp.groupby(['year','group'])['permno'].count().to_frame().reset_index()
table2 = table2.pivot(index='year',columns='group',values='permno').reset_index()
table2

group,year,n1,n2,n3,n4,n5,n6,n7
0,1988,463,504,1040,1004,1078,517,529
1,1989,460,474,1002,1011,1041,516,511
2,1990,444,481,1013,987,1034,514,493
3,1991,407,456,940,1007,980,513,487
4,1992,463,483,1031,1005,1034,521,508
5,1993,496,523,1020,1088,1065,547,512
6,1994,529,548,1167,1087,1178,574,545
7,1995,524,584,1145,1179,1176,588,580
8,1996,564,576,1192,1168,1219,608,592
9,1997,534,586,1162,1222,1219,631,604


### 5.1.3 Average Portfolio Values
The third step in univariate portfolio analysis is to calculate the average value of the
outcome variable $Y$ for each of the $n_P$ portfolios in each time period $t$. In many cases,
instead of taking the simple average of the outcome variable values, it is desirable
to weight the entities within each portfolio according to some other variable $W_{i,t}$.
The most commonly used weight variable is market capitalization. In cases where
market capitalization is used as the weight variable, the average is referred to as the
**value-weighted average**. When a simple average is desired, the values of the $W_{i,t}$ are
set to 1 $(W_{i,t} = 1, ‚àÄi, t)$. In this case, we refer to the portfolios as **equal-weighted
portfolios**. Thus, in its general form, the average value of the outcome variable for portfolio $k$ in period $t$ is
defined as
$$
\bar Y_{k,t}=\frac{\sum_{i\in P_{k,t}}W_{i,t}Y_{i,t}}{\sum_{i\in P_{k,t}}W_{i,t}} \tag{5.3}
$$ 
for $k ‚àà \{1, \dots , n_P\}$. The summations in equation (5.3) are taken over all entities $i$ in
the $k$th portfolio for time period $t(P_{k,t})$.

In addition to calculating the average value of the outcome variable $(\bar Y)$ for each
portfolio, we also calculate the difference in average values between portfolio $n_P$ and
portfolio $n_1$. For each period $t$, we define the difference in the average outcome
variable between the highest and lowest portfolios to be
$$
\bar Y_{Diff,t}=\bar Y_{n_P,t}-\bar Y_{n_1,t} \tag{5.4}
$$

This value represents the difference in the average value of the outcome variable $Y$
for entities with high values of the sort variable compared to those with low values of the sort variable. This difference in averages is the **primary value** used to detect a cross-sectional relation between the sort variable and the outcome variable, which is the main objective of portfolio analysis.

Turning to our example, we use the one-year-ahead excess stock return $(r_{t+1})$ as
our outcome variable. Because $r_{t+1}$ represents the excess return of the stock in the
year after the calculation of $\beta$ (the sort variable), the average excess stock returns
represent the excess returns that would have been realized by an investor who, at the
end of year $t$, created the portfolios as described previously and held the portfolios
without further trading for the entirety of year $t + 1$.

Table 5.3 presents the average equal-weighted portfolio excess returns for each of
the seven portfolios as well as for the difference between portfolio seven and portfolio
one. As can be seen from the table, the portfolio that holds stocks in the lowest decile of $\beta$
(portfolio 1) as of the end of 1988 generated an excess return of 1.94% during 1989.
The difference in excess return between portfolios
seven and one is 4.75% (6.69% ‚àí 1.94%). The corresponding values for portfolios
formed at the end of (held during) year 1989 through 2011 (1990 through 2012) are
also presented.

In [6]:
ewret = df_beta_bp.groupby(['year','group'])['rt+1'].mean().to_frame().reset_index()
ewret = ewret.pivot(index='year',columns='group',values='rt+1').reset_index()
ewret['n7-1'] = ewret['n7'] - ewret['n1']

for i in range(1,9):
    name = ewret.columns[i]
    ewret[name] = ewret[name].apply(lambda x:round(x, 2))

ewret

group,year,n1,n2,n3,n4,n5,n6,n7,n7-1
0,1988,1.94,4.06,0.86,7.29,4.9,10.8,6.69,4.75
1,1989,-29.16,-27.91,-30.44,-30.47,-28.43,-28.01,-28.17,0.98
2,1990,66.09,24.38,41.67,42.05,56.21,64.26,70.34,4.25
3,1991,58.68,33.58,34.3,24.44,18.33,19.15,20.18,-38.5
4,1992,40.34,25.35,25.74,24.55,19.69,15.53,8.2,-32.14
5,1993,-4.68,-5.95,-5.46,-3.5,-7.27,-11.67,-5.98,-1.31
6,1994,29.68,24.88,25.9,31.45,27.65,27.66,36.57,6.89
7,1995,21.72,17.48,19.48,14.13,13.47,11.1,6.31,-15.41
8,1996,32.28,36.87,29.86,21.05,19.16,5.21,-9.47,-41.76
9,1997,-5.38,-9.49,-8.46,-12.02,-4.9,-7.12,0.24,5.62


We now repeat the analysis using value-weighted portfolios. Thus, the weights in
each of the portfolios are determined by the market capitalization (MktCap) measured
as of the end of the portfolio formation year $t$. Table 5.4 presents the average portfolio
excess returns for the value-weighted portfolios. As can be seen from the results, the
weighting scheme can have a substantial impact of the average portfolio returns.

In [7]:
def wavg(group, avg_name, weight_name):
    d = group[avg_name]
    w = group[weight_name]
    try:
        return (d * w).sum() / w.sum()
    except ZeroDivisionError:
        return np.nan

vwret = df_beta_bp.groupby(['year','group']).apply(wavg,'rt+1','mktcap').to_frame().reset_index().rename(columns={0: 'vwret'})
vwret = vwret.pivot(index='year',columns='group',values='vwret').reset_index()
vwret['n7-1'] = vwret['n7'] - vwret['n1']

for i in range(1,9):
    name = vwret.columns[i]
    vwret[name] = vwret[name].apply(lambda x:round(x, 2))

vwret

group,year,n1,n2,n3,n4,n5,n6,n7,n7-1
0,1988,1.78,6.99,8.64,17.14,19.94,26.43,19.04,17.26
1,1989,-29.08,-31.5,-19.56,-19.78,-16.24,-10.53,-12.79,16.29
2,1990,2.1,12.36,12.54,16.93,22.3,30.98,54.37,52.27
3,1991,-1.01,23.21,13.67,8.33,4.81,0.63,13.7,14.71
4,1992,22.79,16.97,14.66,9.65,6.64,5.62,5.58,-17.21
5,1993,-15.98,-7.0,-4.12,-3.07,-4.0,-9.95,2.29,18.27
6,1994,15.53,23.94,25.6,33.3,31.59,27.97,28.43,12.89
7,1995,4.61,17.13,18.31,15.68,13.24,17.83,29.07,24.45
8,1996,6.01,35.74,25.14,20.73,28.41,29.61,17.34,11.34
9,1997,-10.95,-8.23,-0.38,-1.92,12.83,16.82,33.91,44.86


### 5.1.4 Summarizing the Results
The main objective of portfolio analysis is to determine whether there is a
cross-sectional relation between the sort variable $X$ and the outcome variable $Y$. To
do so, we begin by calculating the time-series means of the period average values of
the outcome variable $\bar Y_{k,t}$, for each of the $n_P$ portfolios as well as for the difference
portfolio. We define these average values as
$$
\bar Y_k = \frac{\sum_{t=1}^{T}\bar Y_{k,t}}{T} \tag{5.5}
$$
and
$$
\bar Y_{Diff} = \frac{\sum_{t=1}^{T}\bar Y_{Diff,t}}{T} \tag{5.6}
$$
where $t = 1$ indicates the first period in the sample and $T$ is the number of periods in
the sample.

The time-series means serve as estimates of the true average values of the outcome
variable for entities in each of the portfolios in the average time period. Similarly, the
time-series mean of the difference portfolio estimates the difference, in the average
time period, of the average value of the outcome variable for entities in the $n_P$th
portfolio compared to those in the first portfolio.

### 5.1.5 Interpreting the Results
In addition to calculating the time-series means for each of the portfolios, we frequently want to test whether the time-series mean for each of the portfolios differs
from some null hypothesis mean value. That value is often zero. Most importantly,
we want to examine whether the time-series mean of the difference portfolio is statistically distinguishable from zero. A statistically nonzero mean for the difference
portfolio is evidence that, in the average time period, a cross-sectional relation exists between the sort variable and the outcome variable.
To make such an assessment, for
each of the $n_P$ portfolios, as well as the difference portfolio, we calculate **standard
errors**, **t-statistics**, and **p-values** for the test with null hypothesis that the time-series
mean of the average portfolio outcome variable value is equal to zero.
Because for each portfolio the portfolio average values $(Y_{k,t})$ represent a time series, the standard
errors are frequently adjusted following **Newey and West (1987)**.

In addition to examining whether the time-series mean for the
difference portfolio is statistically distinguishable from zero, researchers frequently
examine the average values of $Y$ across the $n_P$ portfolios $(Y_k, k ‚àà \{1, 2, ‚Ä¶ , n_P\})$ for
monotonicity. If a monotonic or near monotonic pattern arises, it is a strong indication
that the results of the difference portfolio are not spurious.

The results for our example are presented in Table 5.5. The row labeled Average
shows the time-series average of the annual portfolio excess returns for portfolios
1 through 7 as well as for the difference portfolio (column labeled 7-1). The rows
labeled Standard error, t-statistic, and p-value present the standard error of the estimated mean portfolio excess return, adjusted following Newey and West (1987) using
six lags, and the corresponding t-statistics and p-values, respectively.

The results indicate that in the average year, each of
these seven portfolios produces positive excess returns. This is not surprising because
stocks are known to generate average returns that are higher than the return on the
risk-free security.  The average return of the difference portfolio, presented in the column labeled 7-1, is ‚àí7.74%. This difference is not distinguishable from zero in 5% level of significance as the t-statistic is ‚àí1.73 and the p-value is 0.08. Thus, our portfolio analysis fails to detect a cross-sectional relation between $\beta$ and one-year-ahead excess stock returns $(r_{t+1})$.

In [9]:
def nw_adjust(df, group, lags=6):
    df.dropna(subset = [group], inplace = True)
    adj_a = np.array(df[group])
    model = sm.OLS(adj_a, [1]*len(adj_a)).fit(cov_type='HAC', cov_kwds={'maxlags': lags})
    return round(adj_a.mean(),2), round(float(model.bse), 2), round(float(model.tvalues),2), round(float(model.pvalues),2)

table5 = pd.DataFrame(index = ['Average','Standard error','t-statistic','p-value'], columns = ['1','2','3','4','5','6','7','7-1'])
portfolio = ['n1','n2','n3','n4','n5','n6','n7','n7-1']
for i in range(0,8):
    for j in range(0,4):
        table5.iloc[j,i] = nw_adjust(ewret, portfolio[i])[j]
        
table5 

Unnamed: 0,1,2,3,4,5,6,7,7-1
Average,18.95,14.2,15.47,13.02,12.49,11.64,11.21,-7.74
Standard error,3.93,2.8,3.1,2.47,2.47,2.47,3.46,4.47
t-statistic,4.82,5.07,4.99,5.26,5.07,4.71,3.24,-1.73
p-value,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08


### 5.1.6 Presenting the Results
There are many different approaches to presenting the results of one or more portfolio
analyses. Exactly which approach is chosen depends on the objective of the analysis. Here, we discuss some of the most common approaches to presenting portfolio analysis results.

***Single Portfolio Analysis***

While the results of this analysis are well
summarized by Table 5.5, several of the results in Table 5.5 are redundant, as the
standard error, t-statistic, and p-value all contain essentially the same information.
Thus, only one of these values, most commonly the t-statistic, is presented. Furthermore, t-statistics are frequently presented in parentheses to enhance the appearance
of the presentation. Thus, the results of the portfolio analysis may be presented as
in Table 5.6. Only the average excess return and the corresponding Newey and West
(1987) adjusted (six lags) t-statistics are displayed.
![table5.6](img/table5.6.jpg)

***Multiple Analyses, Same Sort Variable, Different Outcome Variables***

Frequently, we want to examine the cross-sectional relation between the sort variable $X$ and many different outcome variables $Y$. To do this, we repeat the univariate
portfolio analysis for each outcome variable $Y$.
We can then present
the results of all of these portfolio analyses in one table. Often, it is not of particular
interest to examine whether the average value of the outcome variable in any of the
$n_P$ portfolios is equal to zero.  For example, we know that all stocks have a positive
market capitalization. Thus, testing whether the average market capitalization of a
certain set of stocks is not of interest. We may only be interested in whether the average value of the difference portfolio is equal to zero, as nonzero differences indicate
a cross-sectional relation between the sort variable $X$ and the outcome variable $Y$.
Therefore, sometimes the only t-statistic presented is that of the difference portfolio.
The objective of such analyses is often to understand the complexion of each of the
portfolios formed by sorting on the variable $X$.

To exemplify this, Table 5.7 presents the results of a portfolio analysis using the
same ùõΩ-sorted portfolios but taking each of ùõΩ, MktCap, and BM to be the outcome
variable. Here, t-statistics for the difference portfolio are reported in a separate column at the end of the table instead of in parentheses under the average value.
Because the portfolios are formed by sorting on ùõΩ, the average value of ùõΩ is monotonically increasing across the seven portfolios and the time-series mean of the differences in average ùõΩ between portfolios seven and one is highly statistically significant.

The table indicates that stocks with low ùõΩ tend
to be small market capitalization stocks. And the portfolios exhibit a nearly monotonically decreasing pattern in average book-to-market ratio (BM).

In [10]:
value = ['beta', 'mktcap', 'bm']
table7 = pd.DataFrame(index = value, columns = ['1','2','3','4','5','6','7','7-1','7-1 t-statistic'])

for i in range(0,3):
    temp = df_beta_bp.groupby(['year','group'])[value[i]].mean().to_frame().reset_index()
    temp = temp.pivot(index='year',columns='group')[value[i]].reset_index()
    temp['n7-1'] = temp['n7'] - temp['n1']
    for j in range(0,8):
        table7.iloc[i,j] = nw_adjust(temp, portfolio[j])[0]
        table7.iloc[i,8] = nw_adjust(temp, portfolio[j])[2]

table7

Unnamed: 0,1,2,3,4,5,6,7,7-1,7-1 t-statistic
beta,-0.23,0.13,0.39,0.69,1.01,1.36,1.93,2.16,25.41
mktcap,162.52,923.63,2286.52,2592.24,2581.01,2657.92,2713.53,2551.01,3.79
bm,0.92,0.86,0.72,0.67,0.6,0.57,0.46,-0.45,-6.81


***Multiple Analyses, Different Sort Variables, Same Outcome Variable***

Sometimes, we want to present the results of portfolio analyses with different sort
variables $X$ but with the same outcome variable $Y$. This is often the case when we are
examining the ability of many different variables to predict future stock returns.

Table 5.8 presents an example of how the results of such portfolio analyses can be
presented. The table shows the average excess returns and the associated t-statistics
for portfolios sorted on each of ùõΩ, MktCap, and BM. In both the
MktCap and BM cases, the average excess returns are nearly monotonic across the
seven portfolios.

In [11]:
def X_breakpoint(df, X):
    X_bp = df.groupby(['year'])[X].describe(percentiles=[0.1,0.2,0.4,0.6,0.8,0.9]).reset_index()
    X_bp = X_bp[['year','10%','20%','40%','60%','80%','90%']]
    X_bp.columns = ['year','B1','B2','B3','B4','B5','B6']

    for i in range(1,7):
        name = X_bp.columns[i]
        X_bp[name] = X_bp[name].apply(lambda x:round(x, 2))

    df_X_bp = pd.merge(df, X_bp, how='left', on=['year'])
    df_X_bp = df_X_bp[df_X_bp[X].notnull()]
    df_X_bp = df_X_bp[df_X_bp['rt+1'].notnull()]
    return df_X_bp

def X_group(row, X):    
    if row[X]<=row['B1']:
        value='n1'
    elif row['B1']<=row[X]<=row['B2']:
        value='n2'
    elif row['B2']<=row[X]<=row['B3']:
        value='n3'  
    elif row['B3']<=row[X]<=row['B4']:
        value='n4'
    elif row['B4']<=row[X]<=row['B5']:
        value='n5'
    elif row['B5']<=row[X]<=row['B6']:
        value='n6'
    elif row['B6']<=row[X]:
        value='n7' 
    return value

In [12]:
table8_index = ['beta','', 'mktcap','', 'bm','']
table8_col = ['1','2','3','4','5','6','7','7-1']
table8 = pd.DataFrame(index = table8_index, columns = table8_col)

for i in range(0,3):
    df_X_bp = X_breakpoint(df, table8_index[2*i])
    df_X_bp['group'] = df_X_bp.apply(X_group, X = table8_index[2*i], axis=1)
    
    temp = df_X_bp.groupby(['year','group'])['rt+1'].mean().to_frame().reset_index()
    temp = temp.pivot(index='year',columns='group',values='rt+1').reset_index()
    temp['n7-1'] = temp['n7'] - temp['n1']
    
    for j in range(0,8):
        table8.iloc[2*i,j] = nw_adjust(temp, portfolio[j])[0]
        table8.iloc[2*i+1,j] = nw_adjust(temp, portfolio[j])[2]

table8

Unnamed: 0,1,2,3,4,5,6,7,7-1
beta,18.95,14.2,15.47,13.02,12.49,11.64,11.21,-7.74
,4.82,5.07,4.99,5.26,5.07,4.71,3.24,-1.73
mktcap,41.05,20.49,13.0,10.08,7.87,8.74,7.99,-33.06
,7.13,4.81,3.75,4.09,4.36,4.7,3.82,-6.03
bm,11.42,7.03,8.3,10.6,14.25,19.12,36.11,24.69
,3.65,3.18,4.26,5.26,5.54,5.57,5.58,4.62
