-
Notifications
You must be signed in to change notification settings - Fork 0
/
high-yield-spread.Rmd
472 lines (395 loc) · 28 KB
/
high-yield-spread.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
---
title: "Documenting a Correlation Between High-Yield Spreads and Forward Investment Returns"
author: "Christopher C. Smith"
date: '2022-06-22'
output: pdf_document
---
## Introduction
High-yield spreads, the difference in yield between investment-grade and high-yield bonds, can be interpreted as a measure of credit market stress. In a good economy, default rates are relatively low, and high-yield bonds pay average yields fairly close to their average coupon rates. But in a bad economy, default rates rise—especially for high-yield bonds—and average yields fall somewhat below average coupon rates. Investment grade bonds, in contrast, have less default risk and remain relatively unaffected by economic downturns. Thus, when investors see a bad economy ahead, they discount high-yield bonds relative to investment grade bonds, demanding a higher yield in exchange for more credit risk.
Notably, high-yield spreads tend to peak around the same time that US stock markets bottom. Might high-yield spread data then contain a tradeable signal for stocks? There's reason to think it might. Whereas stock markets tend to go up over time, high-yield spreads tend to move in a more predictable (and thus easier-to-interpret) range. Conventional stock market wisdom also holds that the bond market is "smarter" than the stock market, so perhaps informational signals from bonds will front-run moves in stocks. In this short paper, my goal is simply to demonstrate and document the correlation between high-yield spreads and forward investment returns for stocks, as well as for investment-grade and high-yield bonds.
```{r setup, include=FALSE, echo=FALSE, message=FALSE, warning=FALSE}
#Load libraries
library(tidyverse)
library(xlsx)
library(lubridate)
library(RSelenium)
library(tidyquant)
library(caret)
#Start Selenium driver and navigate to FRED series for high-yield spread
rD <- rsDriver(browser = "firefox")
driver <- rD[["client"]]
driver$navigate("https://fred.stlouisfed.org/series/BAMLH0A0HYM2")
Sys.sleep(8)
#Find "Max" button on the page and click it
element <- driver$findElement(using = "xpath",'//*[@id="zoom-all"]')
element$clickElement()
Sys.sleep(2)
#Find "Download" button on the page and click it
element <- driver$findElement(using = "css",'.fg-download-btn-chart-gtm > span:nth-child(1)')
element$clickElement()
Sys.sleep(2)
#Find "CSV" button on the page and get the link from it
element <- driver$findElement(using = "xpath",'//*[@id="download-data-csv"]')
url <- unlist(element$getElementAttribute('href'))
#Get and clean high-yield spread data
download.file(url,"highyieldspread.csv")
yieldspread <- read.csv("highyieldspread.csv")
unlink("highyieldspread.csv")
yieldspread <- yieldspread %>%
select(date=DATE,spread=BAMLH0A0HYM2) %>%
mutate(date=as.Date(date),spread=as.numeric(spread)) %>%
filter(!is.na(spread))
#Get historical price data and calculate next period percent returns
prices <- tq_get(c("VOO","GOVT","AGG","HYG","VWOB"))
prices <- prices %>%
group_by(symbol) %>%
mutate(return_6_mo = c(diff(adjusted,lag = round(253*.5)),rep(NA,times=round(253*.5)))/adjusted,
return_1_yr = c(diff(adjusted,lag = 253),rep(NA,times=253))/adjusted,
return_2_yr = c(diff(adjusted,lag = 253*2),rep(NA,times=253*2))/adjusted,
return_3_yr = c(diff(adjusted,lag = 253*3),rep(NA,times=253*3))/adjusted)
#Join returns data with high-yield spread data
prices <- prices %>%
select(symbol,date,return_6_mo,return_1_yr,return_2_yr,return_3_yr)
yieldspread <- inner_join(yieldspread,prices,by="date") %>%
mutate(spread = spread/100)
yieldspread_full <- yieldspread
#calculate percentile of yield spread
yieldspread <- yieldspread %>%
mutate(percentile = rank(spread)/length(spread))
```
## Analyzing S&P 500 Forward Returns
We begin by looking at correlation coefficients between high-yield spreads and forward returns over various timeframes.
```{r sp500, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Filter data to keep S&P 500 only
yieldspread <- yieldspread_full %>%
filter(symbol == "VOO")
#Calculate correlation coefficient between high-yield spread and next-year return
data.frame(pearsons_6mo = cor(yieldspread$spread,yieldspread$return_6_mo,use="complete.obs"),
pearsons_1yr = cor(yieldspread$spread,yieldspread$return_1_yr,use="complete.obs"),
pearsons_2yr = cor(yieldspread$spread,yieldspread$return_2_yr,use="complete.obs"),
pearsons_3yr = cor(yieldspread$spread,yieldspread$return_3_yr,use="complete.obs")) %>%
knitr::kable()
```
Examining this data, it appears that high-yield spreads have predictive power for returns over a one-year and two-year timeframe, but that correlations are weaker on shorter and longer timeframes. Thus, we will focus on the one-year and two-year timeframes.
Next, we will plot high-yield spreads against next-one-year and next-two-year returns, with a regression model to show expected return. To choose between linear and log regression models, we perform a ten-fold cross-validation and choose the model that minimizes residual mean-squared error, on average, across both timeframes. In this case, that's the linear regression model:
```{r sp500_1, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Partition data for ten-fold cross-validation
set.seed(1)
test_index_1 <- createFolds(yieldspread$return_1_yr[!is.na(yieldspread$return_1_yr)],k=10)
test_index_2 <- createFolds(yieldspread$return_2_yr[!is.na(yieldspread$return_2_yr)],k=10)
#Define function to get residual mean squared error of predictions
RMSE <- function(true_ratings, predicted_ratings){
sqrt(mean((true_ratings - predicted_ratings)^2))
}
#Perform five-fold cross-validation and calculate RMSEs
rmses <- map_dfr(1:10,function(x){
#Partition data
train_set_1 <- yieldspread[-test_index_1[[x]],]
train_set_2 <- yieldspread[-test_index_2[[x]],]
test_set_1 <- yieldspread[test_index_1[[x]],]
test_set_2 <- yieldspread[test_index_2[[x]],]
#Do a multiple regression, accounting for both PCE inflation and date
lm_fit1 <- train_set_1 %>% lm(formula = return_1_yr ~ spread)
lm_fit2 <- train_set_2 %>% lm(formula = return_2_yr ~ spread)
log_fit1 <- train_set_1 %>% lm(formula = return_1_yr ~ log(spread))
log_fit2 <- train_set_2 %>% lm(formula = return_1_yr ~ log(spread))
#Choose the model that minimizes RMSE
return(data.frame(Model=c("Linear","Log"),
RMSE = c((RMSE(test_set_1$return_1_yr,predict(lm_fit1,test_set_1)) + RMSE(test_set_2$return_2_yr,predict(lm_fit2,test_set_2)))/2,
RMSE(test_set_1$return_1_yr,predict(log_fit1,test_set_1)) + RMSE(test_set_2$return_2_yr,predict(log_fit2,test_set_2))/2),
Trial = rep(x,times=2)))
})
#Choose model that minimizes RMSE on average
rmses %>% group_by(Model) %>% summarize(Mean_RMSE = mean(RMSE)) %>% knitr::kable()
#Model expected returns based on high-yield spread
model_1yr <- lm(yieldspread,formula = return_1_yr ~ spread)
model_2yr <- lm(yieldspread,formula = return_2_yr ~ spread)
```
When we create the plot, we see a nice, clear relationship between the two variables—with higher forward S&P 500 returns when high-yield spreads are high—across both timeframes. The relationship is just as strong on the two-year timeframe as on the one-year timeframe. The current yield spread is shown on the chart as a red dot, with the current expected return shown as a dashed line. Now appears to be a good time to buy the S&P 500, with expected forward two-year return at 36%.
```{r sp500_2, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Create tidy data frame for charting
yieldspread <- yieldspread %>%
select(-symbol,-return_6_mo,-return_3_yr) %>%
gather("timeframe","value",return_1_yr,return_2_yr) %>%
mutate(forecast = case_when(timeframe == "return_1_yr" ~ predict(object=model_1yr,newdata=.),
TRUE ~ predict(object=model_2yr,newdata=.))) %>%
mutate(label = case_when(date == max(date) & timeframe == "return_1_yr" ~ paste(scales::percent(round(forecast,2),accuracy=1),"expected 1-year return\nbased on current spread"),
date == max(date) & timeframe == "return_2_yr" ~ paste(scales::percent(round(forecast,2),accuracy=1),"expected 2-year return\nbased on current spread"),
TRUE ~ NA_character_)) %>%
mutate(timeframe = case_when(timeframe == "return_1_yr" ~ "1-year forward return",
timeframe == "return_2_yr" ~ "2-year forward return",
TRUE ~ NA_character_))
#Chart forward returns against high-yield spread
yieldspread %>%
mutate(highlight = !is.na(label)) %>%
ggplot(aes(x=spread,y=value)) +
geom_point() +
geom_line(aes(y=forecast),col="blue",lwd=1) +
geom_point(aes(y=forecast,color=highlight,alpha=highlight),size=3) +
geom_hline(aes(yintercept=forecast,color=highlight,alpha=highlight),lwd=1,linetype="dashed") +
geom_text(aes(x=spread,y=forecast,label=label),nudge_y=-.2,nudge_x=.03,col="red",size=4,fontface = "bold") +
facet_wrap(vars(timeframe),ncol=2) +
labs(title="S&P500 forward return vs. high-yield spread",
x="High-yield spread",
y="Forward S&P 500 return",
caption="Data courtesy FRED and Yahoo! Finance API") +
scale_x_continuous(limits=c(.03,.11),breaks=seq(.03,.11,by=.01), labels=scales::percent_format(accuracy = 1L)) +
scale_y_continuous(labels = scales::percent) +
scale_color_manual(values=c("black","red")) +
scale_alpha_manual(values=c(0,1)) +
theme(legend.position="none")
```
## Analyzing High-Yield Bond Forward Returns
Since high-yield spreads are literally a measure of excess yield on high-yield bonds, we might expect a strong, positive relationship with forward return on a high-yield bond ETF. Looking at the correlation coefficients, that's indeed what we find. The coefficients are positive and similar to the coefficients for the S&P 500.
```{r hyg, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Filter data to keep AGG only
yieldspread <- yieldspread_full %>%
filter(symbol == "HYG")
#Calculate correlation coefficient between high-yield spread and next-year return
data.frame(pearsons_6mo = cor(yieldspread$spread,yieldspread$return_6_mo,use="complete.obs"),
pearsons_1yr = cor(yieldspread$spread,yieldspread$return_1_yr,use="complete.obs"),
pearsons_2yr = cor(yieldspread$spread,yieldspread$return_2_yr,use="complete.obs"),
pearsons_3yr = cor(yieldspread$spread,yieldspread$return_3_yr,use="complete.obs")) %>%
knitr::kable()
```
We again perform a ten-fold cross-validation to determine the best regression model. As with the S&P 500, a linear model minimizes residual mean-squared error better than a log model.
```{r hyg_1, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Partition data for ten-fold cross-validation
set.seed(1)
test_index_1 <- createFolds(yieldspread$return_1_yr[!is.na(yieldspread$return_1_yr)],k=10)
test_index_2 <- createFolds(yieldspread$return_2_yr[!is.na(yieldspread$return_2_yr)],k=10)
#Define function to get residual mean squared error of predictions
RMSE <- function(true_ratings, predicted_ratings){
sqrt(mean((true_ratings - predicted_ratings)^2))
}
#Perform five-fold cross-validation and calculate RMSEs
rmses <- map_dfr(1:10,function(x){
#Partition data
train_set_1 <- yieldspread[-test_index_1[[x]],]
train_set_2 <- yieldspread[-test_index_2[[x]],]
test_set_1 <- yieldspread[test_index_1[[x]],]
test_set_2 <- yieldspread[test_index_2[[x]],]
#Do a multiple regression, accounting for both PCE inflation and date
lm_fit1 <- train_set_1 %>% lm(formula = return_1_yr ~ spread)
lm_fit2 <- train_set_2 %>% lm(formula = return_2_yr ~ spread)
log_fit1 <- train_set_1 %>% lm(formula = return_1_yr ~ log(spread))
log_fit2 <- train_set_2 %>% lm(formula = return_1_yr ~ log(spread))
#Choose the model that minimizes RMSE
return(data.frame(Model=c("Linear","Log"),
RMSE = c((RMSE(test_set_1$return_1_yr,predict(lm_fit1,test_set_1)) + RMSE(test_set_2$return_2_yr,predict(lm_fit2,test_set_2)))/2,
RMSE(test_set_1$return_1_yr,predict(log_fit1,test_set_1)) + RMSE(test_set_2$return_2_yr,predict(log_fit2,test_set_2))/2),
Trial = rep(x,times=2)))
})
#Choose model that minimizes RMSE on average
rmses %>% group_by(Model) %>% summarize(Mean_RMSE = mean(RMSE)) %>% knitr::kable()
#Model expected returns based on high-yield spread
model_1yr <- lm(yieldspread,formula = return_1_yr ~ spread)
model_2yr <- lm(yieldspread,formula = return_2_yr ~ spread)
```
Plotting actual and modeled returns, we again observe a highly positive and highly significant relationship between high-yield spreads and forward returns. Expected forward returns are lower than for the S&P 500, with current expected one-year return at 7%, and expected two-year return at 11%.
```{r hyg_2, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Create tidy data frame for charting
yieldspread <- yieldspread %>%
select(-symbol,-return_6_mo,-return_3_yr) %>%
gather("timeframe","value",return_1_yr,return_2_yr) %>%
mutate(forecast = case_when(timeframe == "return_1_yr" ~ predict(object=model_1yr,newdata=.),
TRUE ~ predict(object=model_2yr,newdata=.))) %>%
mutate(label = case_when(date == max(date) & timeframe == "return_1_yr" ~ paste(scales::percent(round(forecast,2),accuracy=1),"expected 1-year return\nbased on current spread"),
date == max(date) & timeframe == "return_2_yr" ~ paste(scales::percent(round(forecast,2),accuracy=1),"expected 2-year return\nbased on current spread"),
TRUE ~ NA_character_)) %>%
mutate(timeframe = case_when(timeframe == "return_1_yr" ~ "1-year forward return",
timeframe == "return_2_yr" ~ "2-year forward return",
TRUE ~ NA_character_))
#Chart forward returns against high-yield spread
yieldspread %>%
mutate(highlight = !is.na(label)) %>%
ggplot(aes(x=spread,y=value)) +
geom_point() +
geom_line(aes(y=forecast),col="blue",lwd=1) +
geom_point(aes(y=forecast,color=highlight,alpha=highlight),size=3) +
geom_hline(aes(yintercept=forecast,color=highlight,alpha=highlight),lwd=1,linetype="dashed") +
geom_text(aes(x=spread,y=forecast,label=label),nudge_y=-.2,nudge_x=.03,col="red",size=4,fontface = "bold") +
facet_wrap(vars(timeframe),ncol=2) +
labs(title="HYG forward return vs. high-yield spread",
x="High-yield spread",
y="Forward HYG return",
caption="Data courtesy FRED and Yahoo! Finance API") +
scale_x_continuous(limits=c(.03,.11),breaks=seq(.03,.11,by=.01), labels=scales::percent_format(accuracy = 1L)) +
scale_y_continuous(labels = scales::percent) +
scale_color_manual(values=c("black","red")) +
scale_alpha_manual(values=c(0,1)) +
theme(legend.position="none")
```
## Analyzing Investment-Grade Bonds Forward Returns
What about investment-grade bonds? Investment-grade bond yields tend to be higher, on average, when high-yield spreads are higher, so we might expect a positive relationship with forward returns here, as well. In fact, many financial advisors will recommend investment-grade bonds as a "safe-haven" in a bad credit environment, when high-yield spreads are high. But in fact, we find the opposite. The relationship between high-yield spreads and forward returns for US Treasury bonds is robustly negative:
```{r govt, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Filter data to keep GOVT only
yieldspread <- yieldspread_full %>%
filter(symbol == "GOVT")
#Calculate correlation coefficient between high-yield spread and next-year return
data.frame(pearsons_6mo = cor(yieldspread$spread,yieldspread$return_6_mo,use="complete.obs"),
pearsons_1yr = cor(yieldspread$spread,yieldspread$return_1_yr,use="complete.obs"),
pearsons_2yr = cor(yieldspread$spread,yieldspread$return_2_yr,use="complete.obs"),
pearsons_3yr = cor(yieldspread$spread,yieldspread$return_3_yr,use="complete.obs")) %>%
knitr::kable()
```
This suggests that when high-yield spreads are high, the "flight to safety" tends to be overdone and treasury bonds overpriced.
Here, it is the two-year and three-year timeframes that are most significant, so those are the timeframes we will focus on. (Perhaps investment-grade bonds respond to informational signals more slowly than higher-risk, more actively traded assets.) Performing our ten-fold cross validation, we again confirm that it is the linear model that minimizes residual mean-squared error.
```{r govt_1, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Partition data for ten-fold cross-validation
set.seed(1)
test_index_2 <- createFolds(yieldspread$return_2_yr[!is.na(yieldspread$return_2_yr)],k=10)
test_index_3 <- createFolds(yieldspread$return_3_yr[!is.na(yieldspread$return_3_yr)],k=10)
#Define function to get residual mean squared error of predictions
RMSE <- function(true_ratings, predicted_ratings){
sqrt(mean((true_ratings - predicted_ratings)^2))
}
#Perform five-fold cross-validation and calculate RMSEs
rmses <- map_dfr(1:10,function(x){
#Partition data
train_set_2 <- yieldspread[-test_index_2[[x]],]
train_set_3 <- yieldspread[-test_index_3[[x]],]
test_set_2 <- yieldspread[test_index_2[[x]],]
test_set_3 <- yieldspread[test_index_3[[x]],]
#Do a multiple regression, accounting for both PCE inflation and date
lm_fit2 <- train_set_2 %>% lm(formula = return_2_yr ~ spread)
lm_fit3 <- train_set_3 %>% lm(formula = return_3_yr ~ spread)
log_fit2 <- train_set_2 %>% lm(formula = return_2_yr ~ log(spread))
log_fit3 <- train_set_3 %>% lm(formula = return_3_yr ~ log(spread))
#Choose the model that minimizes RMSE
return(data.frame(Model=c("Linear","Log"),
RMSE = c((RMSE(test_set_2$return_2_yr,predict(lm_fit2,test_set_2)) + RMSE(test_set_3$return_3_yr,predict(lm_fit3,test_set_3)))/2,
RMSE(test_set_2$return_2_yr,predict(log_fit2,test_set_2)) + RMSE(test_set_3$return_3_yr,predict(log_fit3,test_set_3))/2),
Trial = rep(x,times=2)))
})
#Choose model that minimizes RMSE on average
rmses %>% group_by(Model) %>% summarize(Mean_RMSE = mean(RMSE)) %>% knitr::kable()
#Model expected returns based on high-yield spread
model_2yr <- lm(yieldspread,formula = return_2_yr ~ spread)
model_3yr <- lm(yieldspread,formula = return_3_yr ~ spread)
```
Finally, we plot our data. Current expected one-year return on the GOVT ETF (a medium-duration portfolio of US Treasury bonds) is merely 1%, making this probably *not* a good time to buy Treasury bonds. And expected returns will only get worse as high-yield spreads rise.
```{r govt_2, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Create tidy data frame for charting
yieldspread <- yieldspread %>%
select(-symbol,-return_6_mo,-return_1_yr) %>%
gather("timeframe","value",return_2_yr,return_3_yr) %>%
mutate(forecast = case_when(timeframe == "return_2_yr" ~ predict(object=model_2yr,newdata=.),
TRUE ~ predict(object=model_3yr,newdata=.))) %>%
mutate(label = case_when(date == max(date) & timeframe == "return_2_yr" ~ paste(scales::percent(round(forecast,2),accuracy=1),"expected 2-year return\nbased on current spread"),
date == max(date) & timeframe == "return_3_yr" ~ paste(scales::percent(round(forecast,2),accuracy=1),"expected 3-year return\nbased on current spread"),
TRUE ~ NA_character_)) %>%
mutate(timeframe = case_when(timeframe == "return_2_yr" ~ "2-year forward return",
timeframe == "return_3_yr" ~ "3-year forward return",
TRUE ~ NA_character_))
#Chart forward returns against high-yield spread
yieldspread %>%
mutate(highlight = !is.na(label)) %>%
ggplot(aes(x=spread,y=value)) +
geom_point() +
geom_line(aes(y=forecast),col="blue",lwd=1) +
geom_point(aes(y=forecast,color=highlight,alpha=highlight),size=3) +
geom_hline(aes(yintercept=forecast,color=highlight,alpha=highlight),lwd=1,linetype="dashed") +
geom_text(aes(x=spread,y=forecast,label=label),nudge_y=-.2,nudge_x=.03,col="red",size=4,fontface = "bold") +
facet_wrap(vars(timeframe),ncol=2) +
labs(title="GOVT forward return vs. high-yield spread",
x="High-yield spread",
y="Forward GOVT return",
caption="Data courtesy FRED and Yahoo! Finance API") +
scale_x_continuous(limits=c(.03,.11),breaks=seq(.03,.11,by=.01), labels=scales::percent_format(accuracy = 1L)) +
scale_y_continuous(labels = scales::percent) +
scale_color_manual(values=c("black","red")) +
scale_alpha_manual(values=c(0,1)) +
theme(legend.position="none")
```
For investment-grade corporate bonds exhibit a similar dynamic. As with Treasury bonds, we find a negative relationship with forward returns, with the most significant signal on the two-year and three-year timeframes.
```{r agg, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Filter data to keep AGG only
yieldspread <- yieldspread_full %>%
filter(symbol == "AGG")
#Calculate correlation coefficient between high-yield spread and next-year return
data.frame(pearsons_6mo = cor(yieldspread$spread,yieldspread$return_6_mo,use="complete.obs"),
pearsons_1yr = cor(yieldspread$spread,yieldspread$return_1_yr,use="complete.obs"),
pearsons_2yr = cor(yieldspread$spread,yieldspread$return_2_yr,use="complete.obs"),
pearsons_3yr = cor(yieldspread$spread,yieldspread$return_3_yr,use="complete.obs")) %>%
knitr::kable()
```
Ten-fold cross-validation again confirms that a linear model best minimizes residual mean-squared error.
```{r agg_1, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Partition data for ten-fold cross-validation
set.seed(1)
test_index_2 <- createFolds(yieldspread$return_2_yr[!is.na(yieldspread$return_2_yr)],k=10)
test_index_3 <- createFolds(yieldspread$return_3_yr[!is.na(yieldspread$return_3_yr)],k=10)
#Define function to get residual mean squared error of predictions
RMSE <- function(true_ratings, predicted_ratings){
sqrt(mean((true_ratings - predicted_ratings)^2))
}
#Perform five-fold cross-validation and calculate RMSEs
rmses <- map_dfr(1:10,function(x){
#Partition data
train_set_2 <- yieldspread[-test_index_2[[x]],]
train_set_3 <- yieldspread[-test_index_3[[x]],]
test_set_2 <- yieldspread[test_index_2[[x]],]
test_set_3 <- yieldspread[test_index_3[[x]],]
#Do a multiple regression, accounting for both PCE inflation and date
lm_fit2 <- train_set_2 %>% lm(formula = return_2_yr ~ spread)
lm_fit3 <- train_set_3 %>% lm(formula = return_3_yr ~ spread)
log_fit2 <- train_set_2 %>% lm(formula = return_2_yr ~ log(spread))
log_fit3 <- train_set_3 %>% lm(formula = return_3_yr ~ log(spread))
#Choose the model that minimizes RMSE
return(data.frame(Model=c("Linear","Log"),
RMSE = c((RMSE(test_set_2$return_2_yr,predict(lm_fit2,test_set_2)) + RMSE(test_set_3$return_3_yr,predict(lm_fit3,test_set_3)))/2,
RMSE(test_set_2$return_2_yr,predict(log_fit2,test_set_2)) + RMSE(test_set_3$return_3_yr,predict(log_fit3,test_set_3))/2),
Trial = rep(x,times=2)))
})
#Choose model that minimizes RMSE on average
rmses %>% group_by(Model) %>% summarize(Mean_RMSE = mean(RMSE)) %>% knitr::kable()
#Model expected returns based on high-yield spread
model_2yr <- lm(yieldspread,formula = return_2_yr ~ spread)
model_3yr <- lm(yieldspread,formula = return_3_yr ~ spread)
```
Plotting our data, we again find a negative relationship with forward returns, although expected returns are higher than for Treasury bonds.
```{r agg_2, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE}
#Create tidy data frame for charting
yieldspread <- yieldspread %>%
select(-symbol,-return_6_mo,-return_1_yr) %>%
gather("timeframe","value",return_2_yr,return_3_yr) %>%
mutate(forecast = case_when(timeframe == "return_2_yr" ~ predict(object=model_2yr,newdata=.),
TRUE ~ predict(object=model_3yr,newdata=.))) %>%
mutate(label = case_when(date == max(date) & timeframe == "return_2_yr" ~ paste(scales::percent(round(forecast,2),accuracy=1),"expected 2-year return\nbased on current spread"),
date == max(date) & timeframe == "return_3_yr" ~ paste(scales::percent(round(forecast,2),accuracy=1),"expected 3-year return\nbased on current spread"),
TRUE ~ NA_character_)) %>%
mutate(timeframe = case_when(timeframe == "return_2_yr" ~ "2-year forward return",
timeframe == "return_3_yr" ~ "3-year forward return",
TRUE ~ NA_character_))
#Chart forward returns against high-yield spread
yieldspread %>%
mutate(highlight = !is.na(label)) %>%
ggplot(aes(x=spread,y=value)) +
geom_point() +
geom_line(aes(y=forecast),col="blue",lwd=1) +
geom_point(aes(y=forecast,color=highlight,alpha=highlight),size=3) +
geom_hline(aes(yintercept=forecast,color=highlight,alpha=highlight),lwd=1,linetype="dashed") +
geom_text(aes(x=spread,y=forecast,label=label),nudge_y=-.2,nudge_x=.03,col="red",size=4,fontface = "bold") +
facet_wrap(vars(timeframe),ncol=2) +
labs(title="AGG forward return vs. high-yield spread",
x="High-yield spread",
y="Forward AGG return",
caption="Data courtesy FRED and Yahoo! Finance API") +
scale_x_continuous(limits=c(.03,.11),breaks=seq(.03,.11,by=.01), labels=scales::percent_format(accuracy = 1L)) +
scale_y_continuous(labels = scales::percent) +
scale_color_manual(values=c("black","red")) +
scale_alpha_manual(values=c(0,1)) +
theme(legend.position="none")
```
## Conclusion and Directions for Future Research
In conclusion, it makes good sense to rotate out of investment-grade bonds and into higher-risk assets such as stocks and high-yield bonds as high-yield spreads rise. And perhaps also the converse: as high-yield spreads fall, the risk involved in stocks and high-yield bonds may cease to be worth taking relative to the risk of investment-grade bonds. Rotation in and out of different asset classes could be scaled, staged, or triggered when a certain threshold is reached. We might, for instance, use the high-yield spread's percentile as a heuristic for asset allocation. As an example, the current high-yield spread is in the `r last(yieldspread$percentile)*100`th percentile of our historical data. Thus, we might go `r scales::percent(last(yieldspread$percentile))` stocks or high-yield bonds, and `r scales::percent(1-last(yieldspread$percentile))` investment-grade bonds. We could rebalance in accordance with changes in the high-yield spread's percentile rank.
In a future extension of this study, I will look at spreads between investment-grade and high-yield bonds in emerging markets, as well as between US and emerging markets bonds. I will experiment with incorporating information about the base discount rate, perhaps transforming yield spread as a percentage of discount rate, or perhaps rendering the base rate as a color gradient on the chart. I will see if the slope of a moving average of high-yield spreads (i.e. is the spread rising or falling?) affects forward returns. I will map date onto my charts as a color gradient and see if there are time effects I need to control for. I will use machine learning techniques to determine how effective our models might be for forecasting out-of-sample. I will also incorporate information about the variance in the data to calculate Sharpe ratios for comparison across asset classes. And finally, I will propose a specific portfolio rebalancing strategy pegged to the high-yield spread.
```{r cleanup, include=FALSE, echo=FALSE, message=FALSE, warning=FALSE}
#Close Selenium driver
driver$close()
rD[["server"]]$stop()
rm(driver, rD)
gc()
system("taskkill /im java.exe /f", intern=FALSE, ignore.stdout=FALSE)
```