This is the README for Nathan Potgieter’s financial econometrics project. This work sets out a framework for the Monte Carlo simulation of asset markets. This framework is built into the MCmarket package which will be available in on githib under the “Nathan-Potgieter/MCmarket” repository.
The aim of this project is to develop a general and easy to use Monte Carlo simulation package designed to generate asset return data, with a pre-specified correlation structure, various asset distribution options and mean and variance time-series properties. Elliptical copulas are used to specify the correlation structure within the simulated data. The option to use the Archmedian Clayton copula is also made available, as this allow for the creation of markets with high left-tail dependence. However, when using this option the correlation structure will not be able to be specified.
The Monte Carlo simulation routine involves the following steps:
This example generates k periods of returns, for D Assets across N markets.
- Draw a series of k random uniformly distributed numbers
(corresponding to k trading periods), across a set of D variables
(or D assets), from a multivariate distribution with a given
correlation structure.
- This is accomplished using Euclidean (Gaussian or t-copula) or Archmediean Clayton copulas. In R this can easily be done using the ellipCopula, archmCopula and rcopula functions from the copula package.
- Convert the uniformly distributed data into something that more
resembles the distribution of asset returns. MCmarket provides
functionality to convert them into normal, student-t or
skewed-generalized t distributions.
- This is done the same way one would convert p-values into test statistics with the dnorm(), dt() and sgt::dsgt() functions.
- Technically this is accomplished via the inversion of the cumulative distribution function (CDF).
- Induce mean and variance persistence to the series, by plugging the
random numbers resulting from step 2 into a ARMA(1,1) + APARCH(1,1)
equation as the innovations.
- If the parameters are set accordingly the resulting series should possess the volatility clustering observed in empirical asset returns.
- The final step is to repeat the first 3 steps N times to generate an ensemble of asset markets, each with the same risk characteristics.
library(pacman)
# Loading MCmarket
p_load(MCmarket)
#These packages are used in the Monte Carlo (MC) functions.
p_load(tidyverse, copula, lubridate, sgt, glue)
#These packages are only used in research and demonstration tasks.
p_load(tbl2xts, fGarch, forecast)
Since a correlation matrix is the key input in the Monte Carlo simulation framework this section covers the creation of some ad hoc correlation matrices using the MCmarkets::sim_corr function, as well as, the estimation of some empirical correlation matrices using S&P500 data.
In this section I developed a simple function that allows the user to easily generate a correlation matrix with a desired correlation structure. Note that the majority of the code was provided by Nico Katzke. The function is located in the gen_corr.R code file.
#' @title gen_corr
#' @description This function allows users to easily generate a user-defined ad hoc correlation
#' matrix with up to four layers of clusters.
#' @param D the number of variables. The output is a D by D correlation matrix.
#' @param clusters a character string specifying the type of cluster structure.
#' Available options are:
#'
#' "none" for a correlation matrix with no clusters, but significant correlation
#' between constituents.
#'
#' "non-overlapping" for a correlation matrix with one layer of clusters.
#'
#' "overlapping" for a correlation matrix with up to four layers and a set number
#' of clusters per layer.
#' @param num_clusters If clusters = "none", then num_clusters is not used.
#'
#' If clusters = "non-overlapping", then num_clusters is an integer indicating the number of clusters.
#'
#' If clusters = "overlapping", then num_clusters is a vector of length less than or equal to four. The length of
#' num_clusters specifies the number of cluster layers and the integers within the vector specify the number of clusters
#' per layer. It is preferable to arrange the vector in descending order, since failing to do so can result in a
#' unique output, which may not contain the intended number of layers. Additionally, using combinations with repeating
#' numbers of clusters, for example num_clusters = C(10, 10, 5, 5)) will produce fewer layers, but unique intra-cluster
#' correlations (See examples).
#' @return Returns a D by D correlation matrix.
#'
#' @import propagate
#'
#' @examples
#' \dontrun{
#' library(ggcorrplot)
#'
#' ### This generates a 50 by 50 correlation matrix with no clusters.
#' gen_corr(D = 50, clusters = "none) %>%
#' ggcorrplot(title = "no clusters")
#'
#' ### This generates a 50 by 50 correlation matrix with 5 non-overlapping clusters.
#' gen_corr(D = 50, clusters = "non-overlapping) %>%
#' ggcorrplot(title = "non-overlapping clusters")
#'
#' ### This generates a 60 by 60 correlation matrix consisting
#' ### of 4 layers with 10, 5, 3 and 2 clusters respectively.
#' gen_corr(D = 60,
#' clusters = "overlapping",
#' num_clusters = c(10,5,3,2)) %>%
#' ggcorrplot(title = "overlapping clusters")
#'
#' ### Try tinkering with the num_clusters argument in unique ways to effect the
#' ### within cluster correlation coefficients.
#' ### Two cluster layers with low overall correlation.
#' gen_corr(D = 50,
#' clusters = "overlapping",
#' num_clusters = c(10,10,10,2)) %>%
#' ggcorrplot(title = "overlapping clusters") %>%
#' ggcorrplot(title = "overlapping clusters")
#'
#' }
#' @export
gen_corr <- function (D = 50,
clusters = c("none", "non-overlapping", "overlapping"),
num_clusters = NULL) {
Grps <- num_clusters
num_layers <- length(num_clusters)
#set.seed(123)
if (!(clusters %in% c("none", "non-overlapping", "overlapping"))) stop("Please provide a valid clusters argument")
if(clusters == "none"){
# Unclustered covariance matrix
Sigma <- diag(D)
for (i in 1:D) for (j in 1:D) Sigma[i,j] <- 0.9^abs(i-j)
} else
if(clusters == "non-overlapping"){
#----------------------
# distinct non-overlapping clusters:
#----------------------
if(is.null(num_clusters)) stop("Please provide a valid num_clusters argument when using non-overlapping clusters")
if (num_layers>1) stop("Please provede a valid num_clusters argument of length one when using non-overlapping clusters")
Sigma <- matrix(0.6, D, D)
diag(Sigma) <- 1
for (i in 1:Grps) {
ix <- seq((i-1) * D / Grps + 1, i * D / Grps)
Sigma[ix, -ix] <- 0.0001 #think about
}
} else
if(clusters == "overlapping"){
#----------------------
# distinct overlapping clusters:
#----------------------
if(is.null(num_clusters)) stop("Please provide a valid num_clusters argument when using overlapping clusters")
if (num_layers>4) stop("Please provede a valid num_clusters argument of length less than or equal to 4 when using non-overlapping clusters")
Sigma <- matrix(0.7, D, D)
diag(Sigma) <- 1
for (i in 1:Grps[1]) {
ix <- seq((i-1) * D / Grps[1] + 1, i * D / Grps[1])
Sigma[ix, -ix] <- 0.5
}
if(num_layers>=2){
for (i in 1:Grps[2]) {
ix <- seq((i-1) * D / Grps[2] + 1, i * D / Grps[2])
Sigma[ix, -ix] <- 0.3
} }
if(num_layers>=3){
for (i in 1:Grps[3]) {
ix <- seq((i-1) * D / Grps[3] + 1, i * D / Grps[3])
Sigma[ix, -ix] <- 0.15
} }
if(num_layers>=4){
for (i in 1:Grps[4]) {
ix <- seq((i-1) * D / Grps[4] + 1, i * D / Grps[4])
Sigma[ix, -ix] <- 0.05
} }
}
return(Sigma)
}
Demonstrating the use of gen_corr
gen_corr(D = 60, clusters = "none") %>%
ggcorrplot::ggcorrplot(title = "No Clusters", hc.order = FALSE)
gen_corr(D = 60, clusters = "non-overlapping", num_clusters = c(3)) %>%
ggcorrplot::ggcorrplot(title = "Non-Overlapping Clusters", hc.order = FALSE)
gen_corr(D = 60, clusters = "overlapping", num_clusters = c(10,5,4,2)) %>% ggcorrplot::ggcorrplot(title = "Overlapping Clusters", hc.order = FALSE)
Since it would also be useful for MCmarkets’s users to have easy access to some empirical correlation matrices, a data set containing correlation matrices labeled “stressed”, “rally” and “normal” is now constructed.
S&P500 data since 1/01/2000 is used to sample covariance matrices that can be used as inputs in the Monte Carlo procedure. These matrices will be made available as part of the MCmarket package. They can also be used in correlation matrix generative models, like CorrGAN (https://marti.ai). At a later stage MCmarket may incorporate some CorrGAN model, preferably those like the conditional CorrGAN or conditional CorrVAE. this will allow users to randomly simulate different types of realistic correlation matrices.
Note that the code below is not run when building this README, it would simply take too long to knit this document.
pacman::p_load(tidyverse, tidyquant, lubridate)
# impute missing returns from fin merics tut is used to fill in missing data
source("code/impute_missing_returns.R")
# save current system date to a variable
today <- Sys.Date()
# subtract 90 months from the current date| this is used at a later stage.
#date <- today %m+% months(-90)
# saving from date
date <- as.Date("2000-01-01")
# Getting all tickers in SP500
stock_list <- tq_index("SP500") %>%
arrange(symbol) %>%
mutate(symbol = case_when(symbol == "BRK.B" ~ "BRK-B", #repairing names that cause errors
symbol == "BF.B" ~ "BF-B",
TRUE ~ as.character(symbol))) %>%
pull(symbol)
# This function gets the data for a ticker and date
get_data <- function (ticker, from) {
df <- tq_get(ticker, from = from) %>% mutate(symbol = rep(ticker, length(date)))
return(df)
}
#Getting S&P500 data set
SNP_data <-
1:length(stock_list) %>% map(~get_data(ticker = stock_list[[.]], from = date)) %>% bind_rows()
save(SNP_data, file = "data/SNP_data.Rda")
#Calculating returns and sd
#and imputing missing returns using Nico's impute_missing_returns function
SNP_returns <- left_join(
SNP_data %>%
group_by(symbol) %>%
arrange(date) %>%
mutate(return = log(adjusted/dplyr::lag(adjusted))) %>%
dplyr::filter(date>first(date)) %>%
select(date, symbol, return) %>%
spread(symbol, return) %>%
impute_missing_returns(impute_returns_method = "Drawn_Distribution_Own") %>% #Imputing missing returns
gather(symbol, return, -date) %>%
group_by(symbol) %>%
mutate(sd = sd(return, na.rm = T)*sqrt(252)) %>% # calculate annualized SD
ungroup(),
SNP_data %>%
select(date, symbol, volume, adjusted),
by = c("date", "symbol")
)
save(SNP_returns, file = "data/SNP_returns.Rda")
This chunk calculates the sharp ratios for random equally weighted portfolios of 50 assets over a random time slice of 252 days since 2000/01/01 from the SNP_return data set. The portfolios are then labels as follows:
-
‘stressed market’: A market is ‘stressed’ whenever the equi-weighted basket of stocks has a Sharpe below -0.5 over the year of study (252 trading days).
-
‘rally market’: A market is ‘rallying’ whenever the equi-weighted basket of stocks under has a Sharpe above 2 over the year of study (252 trading days).
-
‘normal market’: A market is ‘normal’ whenever the equi-weighted basket of stocks under has a Sharpe in-between -0.5 and 2 over the year of study (252 trading days).
Note that this methodology is consistent with that used in https://marti.ai/qfin/2020/02/03/sp500-sharpe-vs-corrmats.html.
load("data/SNP_returns.Rda")
#Packages used in chunk
library(pacman)
p_load(tidyverse, furrr, PerformanceAnalytics, tbl2xts, rmsfuns, lubridate, fitHeavyTail)
#This function generates random portfolios with option to supply sharp ratio,
#sharp = TRUE takes substantially longer to calculate.
gen_random_port <- function(dim = 100, sharp = TRUE) {
#list of Assets from which sample
symbols <- SNP_returns %>%
dplyr::filter(date==first(date)) %>%
pull(symbol)
dim <- dim
sample_symbols <- sample(symbols, dim)
#Dates from which to sample
dates <- SNP_returns %>%
dplyr::filter(symbol == "A") %>%
select(date)
indx <- sample.int(nrow(dates) - 252, 1)
start_date <- dates[indx,]
end_date <- dates[indx + 252,]
sample_dates <- dates %>% dplyr::filter(date >= start_date[[1]] &
date <= end_date[[1]]) %>% pull()
if(sharp == FALSE){
training_data <- list(dates = list(sample_dates),
symbols = list(sample_symbols)) %>% as_tibble()
}else
if(sharp == TRUE){
#setting rebalance months
RebMonths <- c(1,4,7,10)
EQweights <-
SNP_returns %>%
dplyr::filter(symbol %in% sample_symbols &
date %in% sample_dates) %>%
select(date, symbol, return) %>%
mutate(Months = as.numeric(format(date, format = "%m")),
YearMonths = as.numeric(format(date, format = "%Y%m"))) %>%
dplyr::filter(Months %in% RebMonths) %>%
group_by(YearMonths, Months, symbol) %>%
dplyr::filter(date == last(date)) %>%
ungroup() %>%
group_by(date) %>%
mutate(weight = 1/n()) %>%
select(date, symbol, weight) %>%
spread(symbol, weight) %>% ungroup() %>%
tbl_xts()
# Return wide data
Returns <-
SNP_returns %>%
dplyr::filter(symbol %in% sample_symbols &
date %in% sample_dates) %>%
select(date, symbol, return) %>%
spread(symbol, return) %>% tbl_xts()
#calculating portfolio returns
EW_RetPort <- rmsfuns::Safe_Return.portfolio(Returns,
weights = EQweights, lag_weights = TRUE,
verbose = TRUE, contribution = TRUE,
value = 100, geometric = TRUE)
Sharp <- SharpeRatio(EW_RetPort$returns, FUN="StdDev", annualize = TRUE)
training_data <- list(dates = list(sample_dates),
symbols = list(sample_symbols),
sharp = Sharp[1]) %>% as_tibble()
}
return(training_data)
}
#------------------------------------------
#First I generate the labeled training data
#------------------------------------------
#Generating N random portfolios, with Sharp ratio's.
# The furrr packages is used to speed up computation
set.seed(5245214)
training_data_sharp <-
1:500 %>% map_dfr(~gen_random_port(dim = 50))
save(training_data_sharp, file = "data/training_data_sharp.Rda")
load("data/training_data_sharp.Rda")
# Separating by sharp ratio into market types.
stressed_market <-
training_data_sharp %>%
dplyr::filter(sharp < -0.5)
rally_market <-
training_data_sharp %>%
dplyr::filter(sharp > 2)
normal_market <-
training_data_sharp %>%
dplyr::filter(sharp >= -0.5 & sharp <= 2)
#This function gathers the SNP data corresponding to the portfolios described above
get_market_data <- function(index_df, i){
SNP_returns %>%
select(date, symbol, return) %>%
dplyr::filter(symbol %in% index_df$symbols[[i]] &
date %in% index_df$dates[[i]])
}
#Separating portfolio by sharp ratio
stressed_market_data <-
1:nrow(stressed_market) %>%
map(~get_market_data(stressed_market, .x))
rally_market_data <-
1:nrow(rally_market) %>%
map(~get_market_data(rally_market, .x))
normal_market_data <-
1:nrow(normal_market) %>%
map(~get_market_data(normal_market, .x))
#Calculating Covariance matrix using the fitHeavyTail method
calc_cov <- function(df, i){
df[[i]] %>% select(date, symbol, return) %>%
spread(symbol, return) %>% select(-date) %>%
fitHeavyTail::fit_mvt() %>% .$cov
}
stressed_market_cov <-
1:length(stressed_market_data) %>%
map(~calc_cov(stressed_market_data, .x))
rally_market_cov <-
1:length(rally_market_data) %>%
map(~calc_cov(rally_market_data, .x))
normal_market_cov <-
1:length(normal_market_data) %>%
map(~calc_cov(normal_market_data, .x))
# Creating and saving a data frame of covariance matrices.
cov_mats <- list(cov_normal = normal_market_cov,
cov_stressed = stressed_market_cov,
cov_rally = rally_market_cov)
save(cov_mats, file = "data/cov_mats.Rda")
# Creating and saving a data frame of correlation matrices.
cor_mats <-
list(
cor_normal = 1:length(normal_market_cov) %>%
map(~cov2cor(normal_market_cov[[.x]])),
cor_stressed = 1:length(stressed_market_cov) %>%
map(~cov2cor(stressed_market_cov[[.x]])),
cor_rally = 1:length(rally_market_cov) %>%
map(~cov2cor(rally_market_cov[[.x]]))
)
save(cor_mats, file = "data/cor_mats.Rda")
#------------------------------------------
#Now to generate unlabelled training data
#Hence why sharp = FALSE in gen_random_port.
# This section has been scrapped for now;
# corrgan may only be implemented at a later stage
#------------------------------------------
training_data_indx <-
1:500 %>% map_dfr(~gen_random_port(dim = 50, sharp = FALSE))
market_data <- 1:nrow(training_data) %>% map(~get_market_data(training_data, .x))
training_data <- 1:length(market_data) %>% map(~calc_cor(market_data, .x))
save(training_data, file = "data/training_data.Rda")
Note that they are converted from covariance matrices to correlation matrices before being plotted.
Step 1: Draw a series of random uniformly distributed numbers across a set of variables with a specified dependence structure.
This section explores the feasibility of using various types of copulas as part of the Monte Carlo framework.
The Gaussian and t copulas from the Elliptal family allow us to specify their correlation matrix before randomly selecting observations from their multivariate distribution. Doing so allows one to produce random draws of uniformly distributed variables that adhere to the user specified correlation structure. The chunk of code below demonstrates this functionality.
#loading copula package; already loaded
# pacman::p_load(copula)
#generating toy corr matrix
corr <- gen_corr(D = 50, clusters = "overlapping", num_clusters = c(10, 5, 2))
#generating normal and student t copula objects
Ncop <- ellipCopula(family = "normal", dispstr = "un", param = P2p(corr), dim = 50)
Tcop <- ellipCopula(family = "t", dispstr = "un", param = P2p(corr), dim = 50)
#generating 252 random draws for each of the dim = 50 variables.
set.seed(123)
rn <- rCopula(copula = Ncop, n = 252)
rt <- rCopula(copula = Tcop, n = 252)
#Checking if the correlation structure was maintained
p_load(patchwork)
# Original corr
p1 <- ggcorrplot::ggcorrplot(corr, hc.order = TRUE) +
labs(title = "Input Correlation Matrix") +
scale_x_discrete(labels = NULL) + scale_y_discrete(labels = NULL) +
theme(legend.position = "bottom")
# corr from random draws form norm and t copula
p2 <- fitHeavyTail::fit_mvt(rn) %>% .$cov %>% cov2cor() %>%
ggcorrplot::ggcorrplot(hc.order = TRUE) +
labs(subtitle = "Normal Copula") +
scale_x_discrete(labels = NULL) + scale_y_discrete(labels = NULL) +
theme(legend.position = "none")
p3 <- fitHeavyTail::fit_mvt(rt) %>% .$cov %>% cov2cor() %>%
ggcorrplot::ggcorrplot(hc.order = TRUE) +
labs(subtitle = "T-Copula") +
scale_x_discrete(labels = NULL) + scale_y_discrete(labels = NULL) +
theme(legend.position = "none")
#Printing summary output
#Note that all the variables appear uniformly distributed
p_load(printr)
summary(rn)
V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | V29 | V30 | V31 | V32 | V33 | V34 | V35 | V36 | V37 | V38 | V39 | V40 | V41 | V42 | V43 | V44 | V45 | V46 | V47 | V48 | V49 | V50 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Min. :0.003789 | Min. :0.00633 | Min. :0.001209 | Min. :0.001334 | Min. :0.0008255 | Min. :0.004738 | Min. :0.0005907 | Min. :0.0029 | Min. :0.01236 | Min. :0.003844 | Min. :0.001529 | Min. :0.01036 | Min. :0.003309 | Min. :0.0003049 | Min. :0.0003147 | Min. :0.003135 | Min. :0.002744 | Min. :0.0000119 | Min. :0.001564 | Min. :0.000024 | Min. :0.0001738 | Min. :0.0006254 | Min. :0.0007298 | Min. :0.002491 | Min. :0.00183 | Min. :0.003089 | Min. :0.004845 | Min. :0.00258 | Min. :0.004175 | Min. :0.001231 | Min. :0.006494 | Min. :0.0136 | Min. :0.005554 | Min. :0.0003943 | Min. :0.002071 | Min. :0.0006909 | Min. :0.003993 | Min. :0.004728 | Min. :0.002004 | Min. :0.009047 | Min. :0.003131 | Min. :0.003676 | Min. :0.005901 | Min. :0.01605 | Min. :0.001985 | Min. :0.001721 | Min. :0.009545 | Min. :0.004709 | Min. :0.001793 | Min. :0.004227 | |
1st Qu.:0.264266 | 1st Qu.:0.22660 | 1st Qu.:0.266089 | 1st Qu.:0.271002 | 1st Qu.:0.2587489 | 1st Qu.:0.287033 | 1st Qu.:0.2930791 | 1st Qu.:0.3019 | 1st Qu.:0.27377 | 1st Qu.:0.298196 | 1st Qu.:0.278910 | 1st Qu.:0.29040 | 1st Qu.:0.256124 | 1st Qu.:0.2773664 | 1st Qu.:0.2601362 | 1st Qu.:0.268616 | 1st Qu.:0.258252 | 1st Qu.:0.2469655 | 1st Qu.:0.264870 | 1st Qu.:0.271307 | 1st Qu.:0.2447171 | 1st Qu.:0.2377156 | 1st Qu.:0.2834407 | 1st Qu.:0.261445 | 1st Qu.:0.24456 | 1st Qu.:0.260355 | 1st Qu.:0.266097 | 1st Qu.:0.22359 | 1st Qu.:0.213969 | 1st Qu.:0.226501 | 1st Qu.:0.217059 | 1st Qu.:0.2573 | 1st Qu.:0.237995 | 1st Qu.:0.2270943 | 1st Qu.:0.228558 | 1st Qu.:0.2615509 | 1st Qu.:0.227972 | 1st Qu.:0.270511 | 1st Qu.:0.245802 | 1st Qu.:0.262503 | 1st Qu.:0.234612 | 1st Qu.:0.228202 | 1st Qu.:0.245781 | 1st Qu.:0.22760 | 1st Qu.:0.247869 | 1st Qu.:0.197376 | 1st Qu.:0.265055 | 1st Qu.:0.226628 | 1st Qu.:0.229997 | 1st Qu.:0.229532 | |
Median :0.528074 | Median :0.54573 | Median :0.558916 | Median :0.518284 | Median :0.5600216 | Median :0.528676 | Median :0.5460156 | Median :0.5653 | Median :0.52971 | Median :0.493048 | Median :0.512379 | Median :0.51556 | Median :0.520155 | Median :0.5338151 | Median :0.5217817 | Median :0.491348 | Median :0.536240 | Median :0.5212094 | Median :0.544127 | Median :0.509699 | Median :0.4964993 | Median :0.4774210 | Median :0.5057867 | Median :0.516828 | Median :0.46991 | Median :0.523842 | Median :0.538164 | Median :0.47795 | Median :0.527411 | Median :0.507615 | Median :0.494630 | Median :0.4839 | Median :0.479687 | Median :0.4514849 | Median :0.464199 | Median :0.4744518 | Median :0.452618 | Median :0.474706 | Median :0.461307 | Median :0.494529 | Median :0.452390 | Median :0.488225 | Median :0.504749 | Median :0.50125 | Median :0.470472 | Median :0.473256 | Median :0.458971 | Median :0.462113 | Median :0.457879 | Median :0.451507 | |
Mean :0.514392 | Mean :0.50861 | Mean :0.518201 | Mean :0.526097 | Mean :0.5271120 | Mean :0.517097 | Mean :0.5236607 | Mean :0.5301 | Mean :0.51770 | Mean :0.509647 | Mean :0.525185 | Mean :0.52038 | Mean :0.513376 | Mean :0.5272407 | Mean :0.5013516 | Mean :0.501415 | Mean :0.513850 | Mean :0.5034275 | Mean :0.520418 | Mean :0.512833 | Mean :0.4964674 | Mean :0.4926329 | Mean :0.5081371 | Mean :0.504310 | Mean :0.48028 | Mean :0.515371 | Mean :0.524248 | Mean :0.48603 | Mean :0.502727 | Mean :0.493647 | Mean :0.488203 | Mean :0.4782 | Mean :0.469891 | Mean :0.4703938 | Mean :0.468896 | Mean :0.4972494 | Mean :0.482492 | Mean :0.489755 | Mean :0.493414 | Mean :0.497577 | Mean :0.480867 | Mean :0.481536 | Mean :0.497383 | Mean :0.49347 | Mean :0.478911 | Mean :0.467169 | Mean :0.475492 | Mean :0.470350 | Mean :0.481197 | Mean :0.464058 | |
3rd Qu.:0.747871 | 3rd Qu.:0.76296 | 3rd Qu.:0.755723 | 3rd Qu.:0.780670 | 3rd Qu.:0.7738236 | 3rd Qu.:0.772948 | 3rd Qu.:0.7334884 | 3rd Qu.:0.7670 | 3rd Qu.:0.75039 | 3rd Qu.:0.770164 | 3rd Qu.:0.783701 | 3rd Qu.:0.76716 | 3rd Qu.:0.746077 | 3rd Qu.:0.7742346 | 3rd Qu.:0.7580183 | 3rd Qu.:0.734754 | 3rd Qu.:0.770402 | 3rd Qu.:0.7552704 | 3rd Qu.:0.782779 | 3rd Qu.:0.769645 | 3rd Qu.:0.7584466 | 3rd Qu.:0.7497944 | 3rd Qu.:0.7286707 | 3rd Qu.:0.760582 | 3rd Qu.:0.72562 | 3rd Qu.:0.786900 | 3rd Qu.:0.788405 | 3rd Qu.:0.76117 | 3rd Qu.:0.810005 | 3rd Qu.:0.732214 | 3rd Qu.:0.730686 | 3rd Qu.:0.7014 | 3rd Qu.:0.690186 | 3rd Qu.:0.7193733 | 3rd Qu.:0.733735 | 3rd Qu.:0.7467014 | 3rd Qu.:0.747865 | 3rd Qu.:0.727367 | 3rd Qu.:0.740362 | 3rd Qu.:0.712695 | 3rd Qu.:0.716353 | 3rd Qu.:0.705756 | 3rd Qu.:0.746101 | 3rd Qu.:0.73634 | 3rd Qu.:0.724313 | 3rd Qu.:0.735372 | 3rd Qu.:0.718441 | 3rd Qu.:0.703526 | 3rd Qu.:0.723494 | 3rd Qu.:0.710275 | |
Max. :0.992490 | Max. :0.99950 | Max. :0.996827 | Max. :0.993144 | Max. :0.9946405 | Max. :0.995707 | Max. :0.9979940 | Max. :0.9969 | Max. :0.99686 | Max. :0.998054 | Max. :0.997910 | Max. :0.99803 | Max. :0.990757 | Max. :0.9968406 | Max. :0.9941528 | Max. :0.993693 | Max. :0.998193 | Max. :0.9954734 | Max. :0.985183 | Max. :0.987889 | Max. :0.9979575 | Max. :0.9995434 | Max. :0.9973505 | Max. :0.999236 | Max. :0.99626 | Max. :0.999496 | Max. :0.993679 | Max. :0.99502 | Max. :0.996348 | Max. :0.993499 | Max. :0.993373 | Max. :0.9993 | Max. :0.996742 | Max. :0.9913225 | Max. :0.991620 | Max. :0.9863268 | Max. :0.997131 | Max. :0.999298 | Max. :0.998414 | Max. :0.993464 | Max. :0.997222 | Max. :0.990074 | Max. :0.999006 | Max. :0.99674 | Max. :0.999628 | Max. :0.990437 | Max. :0.998647 | Max. :0.997682 | Max. :0.997730 | Max. :0.997622 |
p1
#Notice that the underlying correlation structure has, for the most part, been maintained.
# Though, some noise has been introduced
(p2+p3) + plot_annotation(title = "Output Correlation Matrices") +
plot_layout(guides='collect') &
theme(legend.position='bottom')
This work searched for a method to produce simulated markets with high left-tail dependence. This would in theory enable the simulation of markets in which the correlation between assets increases when when overall asset returns are low. (i.e. during crisis periods)
Unfortunately, Elliptal copulas cannot be calibrated to exhibit dynamic correlations (i.e left-tail dependence). Therefore, in this section we examine some properties of Archimedean copulas which do exhibit tail dependence. Unfortunately, Archimedean copulas don’t allow us to stipulate their correlation matrix.
Archimedean copulas such as the Clayton, Frank, Gumbel and Joe exhibit higher levels of dependence at the tails of the multivariate distribution. In this section we will examine the Clayton copula due to it exhibiting enhanced left-tail dependencies. Other copulas have been examined, but the Clayton copula is currently the only Archimedean copula, in the copula package, that allows random sampling from multivariate distributions with with large dimensions. Additionally, the other copulas have to be rotated 180 degrees to exhibit left-tail dependence (since they naturally posses right-tail dependence). This rotation step can be quite time consuming and often fails in higher dimensions. Therefore, the choice to focus on only using the Clayton copula was largely due to practical concerns raised when tinkering with alternative copulas. Therefore, in future work it would be beneficial to explore alternative high dimensional left-tail copulas.
# first look at at dim=2 to get understanding of what parameter tning does
#Clayton Copula
claycop <- archmCopula(family = "clayton", param = 2, dim = 2)
claycop2 <- archmCopula(family = "clayton", param = 4, dim = 2)
claycop3 <- archmCopula(family = "clayton", param = 6, dim = 2)
#Normal Copula
Ncop <- ellipCopula(family = "normal", dispstr = "un", param = 0.5, dim = 2)
# rCopula(251, claycop)
#note how left tail dependence increases with the parameter along with the overall correlation
#param = 2
persp(claycop, main = "Clayon Copula - param = 2", dCopula, zlim = c(0, 15), theta = 18)
#param = 4
persp(claycop2, main = "Clayon Copula - param = 4", dCopula, zlim = c(0, 15), theta = 18)
## Warning in persp.default(x = x., y = y., z = z.mat, zlim = zlim, xlab = xlab, :
## surface extends beyond the box
#param = 6
persp(claycop3, main = "Clayon Copula - param = 6", dCopula, zlim = c(0, 15), theta = 18)
## Warning in persp.default(x = x., y = y., z = z.mat, zlim = zlim, xlab = xlab, :
## surface extends beyond the box
#Normal Copula
persp(Ncop, main = "Normal Copula - cor = 0.5", dCopula, zlim = c(0, 15), theta = 18)
# Compairing Clayton with Normal copula
bind_rows(rCopula(5000, copula = claycop) %>% as_tibble() %>% mutate(copula = "claycop"),
rCopula(5000, copula = Ncop) %>% as_tibble() %>% mutate(copula = "normal")) %>%
ggplot(aes(x=V1,y=V2)) +
geom_point(alpha=0.5) +
geom_density_2d_filled(alpha=0.7) +
facet_wrap(~copula, nrow = 1) +
labs(title = "2D kernal Density - Clayton vs Normal Copula",
caption = "Clayton parameter = 2, Normal correlation = 0.5") +
theme_bw() +
theme(legend.position = "bottom")
## Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
## Using compatibility `.name_repair`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Out of interest some code written when experimenting with alternative Archimedian copulas, deemed unusable, is supplied below:
#Note that the Gumbel and Joe copulas must be rotated 180 degrees to exhibit greater left tail dependence
#-----------------------------------------------
#Gumbel Copula
#This copula function breaks when using dim >= 7
#-----------------------------------------------
gumcop <- archmCopula(family = "gumbel", param = 2, dim = 2) %>% rotCopula()
persp(gumcop, dCopula, zlim = c(0, 10))
rCopula(1000, copula = gumcop) %>% plot(main = "Gumbel Copula")
#-----------------------------------------------------------------
# Joe copula
#This copulas computation time is unacceptable long with dim = 10
#and doesnt work when dim = 50
#----------------------------------------------------------------
joecop <- archmCopula(family = "joe", param = 3, dim = 50) %>% rotCopula()
persp(joecop, dCopula, zlim = c(0, 10))
rCopula(1000, copula = joecop) %>% plot()
#--------------------------------
#Galambos
#This function breaks when dim>2
#--------------------------------
galcop <- evCopula(family = "galambos", param = 2, dim = 2) %>% rotCopula()
persp(galcop, dCopula, zlim = c(0, 10))
rCopula(1000, galcop) %>% plot()
Since this project wanted to include the left-tail dependence property from the Clayton copula and the ability to specify the correlation matrix from the Elliptal copulas, there was an investigation into the creation of hybrid copulas.
@ruenzi states that flexible copula structures an be created through convex combinations of basic copulas. The authors use the 2D case and Tawn’s theorem (below) to construct hybrid bivariate copulas.
Tawn (1988) “shows that a copula is a convex set and every convex combination of existing copula functions is again a copula” [@ruenzi2011, p.8].
Thus, if and are 2D copulas and is a weighting variable between 0 and 1, then
is a unique copula. Therefore, a hybrid copula can be created by linearly weighting an Elliptical and Archimedean copula of the dimension 2.
This work attempted to generalize this finding into higher dimensional copulas. First lets look at some 2D examples. We will investigate if this works in higher dimensions in Section .
This chunk of code creates and plots some pseudo hybrid copulas by linearly weighting the random numbers generated from a student-t and Clayton copula. Note how when increasing the Clayton weight, observations concentrate at the bottom left of the density plots.
Remember that each variable is still currently uniformly distributed.
This section has successfully demonstrated how to use copulas to simulate random uniformly distributed data, with some that adhered to the given correlation matrix. It also found a potential way to simulate data with hybrid copulas.
Step 2: Converting the uniformly distributed variables to something that better resembles the distribution of asset returns.
This section looks at some candidate pdf for the univariate asset returns. Due to convenience, it has become standard to use the normal, or student-t distribution when simulating asset returns. However, after reading up on numerous possible marginal distributions, I decided that the the Skewed generalized t distribution is a valuable alternative as it allows for the most flexibility. In fact, the skewed generalized t (SGT) distribution nests 12 common probability distribution functions (pdf). The tree diagram below indicates how one can set the SGT parameters to achieve the desired pdf. MCmarkets functions will include arguments to induce the marginals to be uniform, normal, student-t and SGT distributed.
The code below demonstrates how the p, q and functions influence the SGT distribution. Note how the SGT’s parameters allow one to effect the skewness and kurtosis of the distribution.
Here I demonstrate how uniformly distributed observations can be transformed into other distributions.
# Generating a uniform (0,1) series
set.seed(1234)
unif <- as_tibble(runif(1000))
unif %>% ggplot(aes(x = value)) +
geom_histogram() + theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# transforming into normal(0,1) distribution
norm <- qnorm(unif$value) %>% as_tibble()
norm %>% ggplot(aes(x = value)) +
geom_histogram() + theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# transforming into t distribution with df=3
t <- qt(unif$value, df = 3) %>% as_tibble()
t %>% ggplot(aes(x = value)) +
geom_histogram() + theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# transforming into SGT| left skewed with excess kurtosis
sgt <-
qsgt(
unif$value,
mu = 1,
sigma = 1,
lambda = -0.4,
p = 1,
q = 10000
) %>% as_tibble()
sgt %>% ggplot(aes(x = value)) +
geom_histogram() + theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The simulated innovations do not yet demonstrate the mean and/or volatility persistence observed in real asset return series. Therefore, they can be thought of as the innovations of some time-series model/process. In this step I introduce mean and variance persistence using an ARMA(p,q) + APGARCH(q,p) model.
- “The leptokurtosis, clustering volatility and leverage effects characteristics of financial time series justifies the GARCH modelling approach. The non-linear characteristic of the time series is used to check the Brownian motion and investigate into the temporal evolutionary patterns. The non-linear methods of forecasting and signal analysis are gaining popularity in stock market because of their robustness in feature extraction and classification.” source: https://towardsdatascience.com/garch-processes-monte-carlo-simulations-for-analytical-forecast-27edf77b2787
This function introduces mean and variance persistence to a series of i.i.d random numbers. Note that most of the code was re-purposed from fGarch’s garchspec and garchsim functions.
#' @title sim_garch
#' @description This function takes a vector of random numbers, referred to as innovations, and
#' induces mean and variance persistence by inserting them into an ARMA(1,1) + APARCH(1,1) model.
#' @param innovations a vector containing the random numbers or the innovations of the
#' ARIMA + GARCH process.
#' @param omega a positive value defining the coefficient of the variance equation. The default is 5e-04.
#' @param gamma a value defining the APARCH leverage parameter in the variance equation. The default
#' of 0 implies no leverage effect and therefore, corresponds to the standard GARCH model.
#' @param alpha a value defining the autoregressive variance coefficient. The default is 0.
#' @param beta a value defining the variance coefficient. The default is 0.
#' @param mu a value defining the mean. The default is 0.
#' @param ar a value defining the autoregressive ARMA coefficient. The default is 0.
#' @param ma a value defining the moving average ARMA coefficient. The default is 0.
#' @param delta a strictly positive value defining the delta parameter of the APARCH model. The default is 2,
#' which corresponds to the standard GARCH model.
#' @param simple a logical parameter indicating if the output should be a simple vector containing just the
#' resulting ARIMA + GARCH series, or if FALSE a three column dataframe containing z - the innovations, h - the
#' conditional variance and y - ARMA + APARCH series.
#'
#' @return if simple = TRUE a vector of the resulting ARMA + APARCH series.
#'
#' If simple = FALSE a three column dataframe containing z - the innovations, h - the conditional variance and y - ARMA +
#' APARCH series. Note the length of the resulting series will be one observation less than that of the innovations.
#' The ARMA(1,1) + APARCH(1,1) model effectively consumes this lag when producing its first value.
#' @note (1) It is suggested that the randomly distributed numbers have a mean of zero and standard
#' deviation of one, as these attributes can be set within sim_garch.
#'
#' (2) For more information on the ARMA + APARCH parameters see:
#'
#' Ruppert, D. and Matteson, D.S., 2011. Statistics and data analysis for financial engineering (Vol. 13).
#' New York: Springer.
#'
#' @importFrom dplyr tibble
#'
#' @examples
#'
#' \dontrun{
#'
#' library(tidyverse)
#'
#' ### creating series of 501 innovations
#' inno <- rnorm(501)
#'
#' ### This produces a ARMA + APARCH series of length 500.
#' GARCH <- sim_garch(inno,
#' mu = 0.000002,
#' omega = 0.00005,
#' alpha = 0.098839,
#' beta = 0.899506,
#' ar = 0.063666,
#' ma = NULL,
#' gamma = 0.12194,
#' delta = 1.85,
#' simple = FALSE)
#'
#' ### Visualising the resulting series
#' GARCH %>% mutate(period = 1:n()) %>%
#' gather(key, value, -period) %>%
#' ggplot() +
#' geom_line(aes(x = period, y = value, color = key)) +
#' facet_wrap(~key, scales = "free_y", ncol = 1)
#'
#' }
#' @export
sim_garch <- function(innovations,
omega = NULL, alpha = NULL, gamma = NULL, beta = NULL,
mu = NULL, ar = NULL, ma = NULL, delta = NULL,
simple = TRUE) {
# Creating model list with defaults
model <- list(omega = ifelse(is.null(omega), 5e-04, omega),
alpha = ifelse(is.null(alpha), 0, alpha),
gamma = ifelse(is.null(gamma), 0, gamma),
beta = ifelse(is.null(beta), 0, beta),
mu = ifelse(is.null(mu), 0, mu), #changed form NULL to 0
ar = ifelse(is.null(ar), 0, ar),
ma = ifelse(is.null(ma), 0, ma),
delta = ifelse(is.null(delta), 2, delta))
#obtaining parameters and lag orders from model object
mu <- model$mu
ar <- model$ar
ma <- model$ma
omega <- model$omega
alpha <- model$alpha
gamma <- model$gamma
beta <- model$beta
delta <- model$delta
deltainv <- 1/delta
order.ar <- length(ar)
order.ma <- length(ma)
order.alpha <- length(alpha)
order.beta <- length(beta)
max.order <- max(order.ar, order.ma, order.alpha, order.beta)
n <- length(innovations) - 1
#Generating innovations
z_length <- n + max.order
z <- c(innovations)[(2-max.order):length(innovations)] #z is the vector of random innovation
h <- c(rep(model$omega/(1 - sum(model$alpha) - sum(model$beta)),
times = max.order), rep(NA, n)) #h is the conditional standard deviation
y <- c(rep(model$mu/(1 - sum(model$ar)), times = max.order), rep(NA, n)) #Observations from the resulting garch process
m <- max.order
#Inducing ARIMA + GARCH
eps <- h^deltainv * z #this part often breaks depending on GARCH parameters chosen (must be stationary)
for (i in (m + 1):(n + m)) {
h[i] = omega + sum(alpha * (abs(eps[i - (1:order.alpha)]) -
gamma * (eps[i - (1:order.alpha)]))^delta) +
sum(beta * h[i - (1:order.beta)])
eps[i] = h[i]^deltainv * z[i]
y[i] = mu + sum(ar * y[i - (1:order.ar)]) + sum(ma * eps[i - (1:order.ma)]) + eps[i]
}
if(simple == TRUE) {
data <- c(NA, y[(m + 1):(n + m)]) #removes burn in data and only provides ARIMA + GARCH series.
} else {
data <- tibble(z = c(NA, z[(m + 1):(n + m)]), # Innovations
h = c(NA, h[(m + 1):(n + m)]^deltainv), #Conditional Standard deviation
y = c(NA, y[(m + 1):(n + m)])) # ARIMA + GARCH series
}
return(data)
}
Note that:
-
volatility clusters and significant autocorrelation appear in the series after it is processed through the sim_garch function.
-
sim_garch is a deterministic function.
set.seed(32156454)
# Simulating a series of random SGT numbers
inno <- sgt::rsgt(n = 10001, lambda = -0.0143, p = 1.6650, q = 1.9095)
# Introducing Mean and Var Persistence
#Parameters from Statistics and Data Analysis for Financial Engineering pg.421-423
return <- sim_garch(inno,
omega = 0.000005, #key unconditional volatility parameter
alpha = 0.098839,
gamma = 0,
beta = 0.899506,
mu = 0.899506,
ar = 0.063666,
ma = NULL,
delta = 1.85,
simple = TRUE)
p_load(patchwork)
p1 <- inno %>% as_tibble() %>% ggplot(aes(x=1:length(inno), y=value)) +
geom_line() + theme_bw() + labs(subtitle = "Random Draws From SGT Distribution", x = "", y = "Innovations")
p2 <- return %>% as_tibble() %>% ggplot(aes(x=1:length(return), y=value)) +
geom_line() + theme_bw() + labs(subtitle = "Same Random Draws After sim_garch", x = "", y = "Returns")
p1/p2 + plot_annotation(title = "SGT Innovations vs APGARCH Returns")
## Warning: Removed 1 row(s) containing missing values (geom_path).
p1 <- ggAcf(inno^2)+ theme_bw()+ labs(title = "ACF of Squared Innovations")
p2 <- ggAcf(return^2) + theme_bw() + labs(title = "ACF of Squared Returns")
p1/p2
This section introduces the sim_market function, which is designed to carry out the first 3 steps of this Monte Carlo framework.
But first some toy examples that demonstrate the inner workings of sim_market are provided.
p_load(MCmarket)
# -------------------
# Step 1: Draw a series of random uniformly distributed numbers across a set of variables with a specified dependence structure.
# ------------------
D <- 10 # Dimention
k <- 1000 # number of periods
# Create correlation matrix.
corr <- gen_corr(D, cluster = "non-overlapping", num_clusters = 2)
# Specify and copula and simulate uniformly distributed numbers.
ncop <- ellipCopula(family = "normal", dispstr = "un", param = P2p(corr), dim = D)
set.seed(1234)
data <- rCopula(k, ncop)
# Creating a date vector
dates <- rmsfuns::dateconverter(StartDate = Sys.Date(),
EndDate = Sys.Date() %m+% lubridate::days(k-1),
Transform = "alldays")
# Making Tidy & adding date column
data <- as_tibble(data) %>%
purrr::set_names(glue::glue("Asset_{1:ncol(data)}")%>% as.character()) %>%
mutate(date = dates) %>%
gather(Asset, Value, -date)
# Visualization
data %>% ggplot() +
geom_histogram(aes(x = Value, color = "Asset"), bins = 15) +
facet_wrap(~Asset, nrow = 2) +
theme(legend.position = "none")
# ----------------
# Step 2: Converting the uniformly distributed variables to something that better resembles the distribution of asset returns.
# ---------------
# First set the mean and sd for each asset
args <- tibble(Asset = glue::glue("Asset_{1:D}") %>% as.character()) %>%
mutate(mean = 0, # Setting mean
sd = 1) # and sd; can also be a vector of length = n.o assets
# Then convert Uniform marginal distributions to norm
data <- data %>% left_join(., args, by = "Asset") %>%
group_by(Asset) %>% arrange(date) %>%
mutate(Return = qnorm(Value, mean, sd)) %>%
select(date, Asset, Return)
# Visualizing the transformations
data %>% group_by(Asset) %>% mutate(mean = mean(Return),
sd = sd(Return))
## # A tibble: 10,000 x 5
## # Groups: Asset [10]
## date Asset Return mean sd
## <date> <chr> <dbl> <dbl> <dbl>
## 1 2021-02-03 Asset_1 -1.19 0.0425 1.02
## 2 2021-02-03 Asset_2 -0.251 0.00429 1.02
## 3 2021-02-03 Asset_3 0.259 0.0168 0.999
## 4 2021-02-03 Asset_4 -1.91 0.00639 1.02
## 5 2021-02-03 Asset_5 -0.156 0.0267 1.00
## 6 2021-02-03 Asset_6 -0.181 -0.00721 0.972
## 7 2021-02-03 Asset_7 -0.865 0.0258 0.963
## 8 2021-02-03 Asset_8 -0.847 -0.0127 0.963
## 9 2021-02-03 Asset_9 -0.859 0.0172 0.928
## 10 2021-02-03 Asset_10 -1.06 -0.00700 0.972
## # ... with 9,990 more rows
data %>% ggplot() +
geom_histogram(aes(x = Return, color = "Asset"), bins = 15) +
facet_wrap(~Asset, nrow = 2, scales = "free_x") +
theme(legend.position = "none")
# For comparison from step 3
data %>% ggplot() +
geom_line(aes(x=date, y=Return, color = Asset)) +
facet_wrap(~Asset, scales = "free_y", ncol = 2) +
theme(legend.position = "none")
# ---------------
# Step 3: Introducing Volatility Persistence
# ---------------
# Tibble with with garh parameters and defaults
ts_args <- tibble(Asset = glue::glue("Asset_{1:D}") %>% as.character()) %>%
mutate(omega = 1,
alpha = seq(0, 0.999, length.out = 10),
gamma =0,
beta = 0,
mu = rnorm(10, 0.02, 0.01), #changed from NULL to 0
ar = 0,
ma = 0,
delta = 1.98)
# Applying time series parameters to step 2 data
data <- data %>% left_join(., ts_args, by = "Asset") %>%
arrange(date) %>% group_by(Asset) %>%
mutate(Return = sim_garch(innovations = Return,
omega = omega,
alpha = alpha,
gamma = gamma,
beta = beta,
mu = mu,
ar = ar,
ma = ma,
delta = delta)) %>% na.omit() %>%
select(date, Asset, Return)
# Visualising returns ts
data %>% ggplot() +
geom_line(aes(x=date, y=Return, color = Asset)) +
facet_wrap(~Asset, scales = "free_y", ncol = 2) +
theme(legend.position = "none")
# ------------------------------------
# Steps 1 and 2 with hybrid copula
# ------------------------------------
# Creating copula onjects
ncop <- ellipCopula(family = "normal", dispstr = "un", param = P2p(corr), dim = D)
acop <- archmCopula(family = "clayton", param = 4, dim = D)
clayton_weight <- 0.5
data <- (clayton_weight*rCopula(copula = acop, n = 1000)) + (1-clayton_weight)*rCopula(copula = ncop, n = 1000)
set.seed(1234)
data <- as_tibble(data) %>%
purrr::set_names(glue::glue("Asset_{1:ncol(data)}")%>% as.character()) %>%
mutate(date = dates) %>%
gather(Asset, Value, -date)
args <- tibble(Asset = glue::glue("Asset_{1:D}") %>% as.character()) %>%
mutate(mean = 0, # Setting mean
sd = 1) # and sd; can also be a vector of length = n.o assets
# Then convert Uniform marginal distributions to norm
data <- data %>% left_join(., args, by = "Asset") %>%
group_by(Asset) %>% arrange(date) %>%
mutate(Return = qnorm(Value, mean, sd)) %>%
select(date, Asset, Return)
# SD can't be set as intended. Not sure what is going on (more on this later)
data %>% group_by(Asset) %>% mutate(mean = mean(Return),
sd = sd(Return))
## # A tibble: 10,000 x 5
## # Groups: Asset [10]
## date Asset Return mean sd
## <date> <chr> <dbl> <dbl> <dbl>
## 1 2021-02-03 Asset_1 0.0251 -0.0223 0.585
## 2 2021-02-03 Asset_2 0.251 0.0142 0.587
## 3 2021-02-03 Asset_3 0.422 -0.00467 0.603
## 4 2021-02-03 Asset_4 0.279 -0.00350 0.609
## 5 2021-02-03 Asset_5 0.528 -0.00340 0.600
## 6 2021-02-03 Asset_6 -0.560 0.00799 0.602
## 7 2021-02-03 Asset_7 -0.860 0.0220 0.603
## 8 2021-02-03 Asset_8 -0.733 -0.00430 0.586
## 9 2021-02-03 Asset_9 -0.728 0.00940 0.587
## 10 2021-02-03 Asset_10 -0.829 -0.00975 0.606
## # ... with 9,990 more rows
This is the main function in the MC market package, It performs the first three steps of the MCcarlo process. See the Roxygen documentation below for further detail.
For the purpose of this readme this version of sim_inno is slightly different to that in the final version of MCmarket. This vrsion allow one to create hybrid multivariate copulas, this functionality is tested in section .
#' @title sim_market
#' @description This function produces a series of returns for an asset market with a
#' given correlation matrix. The user can choose between the multivariate t and normal
#' distributions and adjust the markets left tail dependency by weighting in the Clayton copula.
#' The univariate asset return distributions can also be set to normal, student-t or sgt
#' distributions. Finally, mean and variance persistence can be induced via the parameters of an
#' ARMA + APARCH model.
#' @note It is suggested that, if the ts_model is used, then the marginal distributions be left
#' as list(mu = 0, sd = 1). These attributes are better off being set in the ts_model argument.
#' @param corr a correlation matrix specifying the correlation structure of the simulated data.
#' Note that the number of variables simulated is equal to the number of columns in the correlation matrix.
#' @param k a positive integer indicating the number of time periods to simulate.
#' @param mv_dist a string specifying the multivariate distribution. Can be one of c("norm", "t")
#' which correspond to the multivariate normal and t distributions respectively.
#' @param mv_df degrees of freedom of the multivariate distribution, required when mv_dist = "t". Default is 3.
#' @param left_cop_weight a positive value between zero and one stipulating the weight applied to
#' the Clayton copula when simulating the multivariate distribution. Note that a value between zero
#' and one essentially generates a hybrid distribution between the chosen mv_dist and the Clayton
#' copula. Therefore, the greater the left_cop_weight the less the data will reflect the correlation
#' structure. Default is set to 0.
#' @param left_cop_param a positive value indicating the parameter of the Clayton copula. Default is 4.
#' @param marginal_dist a string variable specifying the asset returns univariate distribution. Can
#' be one of c("norm", "t", "sgt") referring to the normal, student-t and skewed-generalized-t
#' distributions respectively. Default is "norm".
#' @param marginal_dist_model list containing the relevant parameters for the chosen marginal_dist.
#'
#' marginal_dist = "norm" accepts the mean (mu) and standard deviation (sd) arguments with their respective
#' defaults set to list(mu = 0, sd = 1).
#'
#' marginal_dist = "t" accepts the non-centrality parameter (ncp) and degrees of freedom (df) arguments,
#' default values are list(ncp = 0, df = 5).
#'
#' marginal_dist = "sgt" accepts the mean (mu), standard deviation (sd), lambda, p and q parameters
#' list(mu = 0, sigma = 1, lambda, p, q). Note lambda, p and q have no defaults and must therefore be set
#' by the user. For more information see the documentation on the qsgt function from the sgt pacakge.
#' @param ts_model a list containing various ARMA + APGARCH model parameters allowing one to specify
#' the time series properties of the simulated returns. Note that parameter combinations resulting
#' in non-stationary of the mean or variance will produce NAN's and that the maximum lag allowed for
#' any given parameter is 1.
#'
#' The default is ts_model = NULL, in which case time series time series properties are not induced, however if
#' ts_model = list() then the default values are list(omega = 5e-04, alpha = 0, gamma = NULL, beta = 0, mu = 0,
#' ar = NULL, ma = NULL, delta = 2). In order to set different parameters for each asset simply insert a vector
#' of length equal to the number of assets, the first element of the vector will correspond to Asset_1, the 2nd
#' to Asset_2 ect...
#'
#' See the sim_garch function's documentation and the "model" parameter under fGarch::garchSpec() for more details
#' regarding the parameters themselves.
#'
#' @return a tidy tibble containing a date, Asset and Return column.
#'
#' @importFrom copula P2p ellipCopula archmCopula rCopula
#' @importFrom glue glue
#' @importFrom sgt qsgt
#' @importFrom lubridate '%m+%' days
#' @importFrom tidyr gather
#' @import dplyr
#' @import purrr
#'
#' @examples
#'
#' \dontrun{
#'
#' library(tidyverse)
#'
#' ### creating a correlation matrix of 50 assets to use as an input in sim_asset_market.
#' corr <- gen_corr(N = 20, Clusters = "none")
#'
#' ### simulating 500 periods of returns across 50 assets.
#' set.seed(12345)
#' market_data <-
#' sim_market(corr,
#' k = 500,
#' mv_dist = "norm",
#' left_cop_weight = 0.1,
#' marginal_dist = "norm",
#' ts_model = list(mu = 0.000002,
#' omega = 0.00005,
#' alpha = 0.098839,
#' beta = 0.899506,
#' ar = 0.063666,
#' ma = NULL,
#' gamma = 0.12194,
#' delta = 1.85))
#'
#' ### Visualising the market
#' market_data %>%
#' group_by(Asset) %>%
#' mutate(cum_ret = 100*cumprod(1 + Return)) %>%
#' ggplot() +
#' geom_line(aes(x = date, y = cum_ret, color = Asset)) +
#' facet_wrap(~Asset) +
#' theme(legend.position = "none")
#'
#' }
#' @export
sim_market <- function(corr,
k = 252,
mv_dist = "t",
mv_df = 4,
left_cop_weight = 0,
left_cop_param = 5,
marginal_dist = "norm",
marginal_dist_model = NULL,
ts_model = NULL) {
N <- nrow(corr)
k <- k + 1 # extra room for sim_garch to as a lag.
Cor <- P2p(corr)
# Specifying Copulas
# elliptical
if(!(mv_dist %in% c("norm", "t"))) stop("Please supply a valid argument for mv_dist")
else
if (mv_dist == "t") {
if (is.null(mv_df)) stop('Please supply a valid degrees of freedom parameter when using mv_dist = "t".')
Ecop <- ellipCopula(family = "t",
dispstr = "un",
df = mv_df,
param = Cor,
dim = N)
} else
if (mv_dist == "norm") {
Ecop <- ellipCopula(family = "normal",
dispstr = "un",
param = Cor,
dim = N)
}
# Left-cop (Archemedian copula)
if (left_cop_weight < 0|left_cop_weight > 1) stop("Please provide a valid left_cop_weight between 0 and 1")
if (left_cop_weight != 0) {
Acop <- archmCopula(family = "clayton",
param = left_cop_param,
dim = N)
}
# Generating random (uniformly distributed) draws from hybrid copula's
if (left_cop_weight == 0) {
data <- rCopula(k, Ecop)
} else
if(left_cop_weight == 1) {
data <- rCopula(k, Acop)
} else {
data <- left_cop_weight*rCopula(k, Acop) + (1-left_cop_weight)*rCopula(k, Ecop)
}
# Creating a date vector
start_date <- Sys.Date()
dates <- rmsfuns::dateconverter(StartDate = start_date,
EndDate = start_date %m+% lubridate::days(k-1),
Transform = "alldays")
# Making Tidy & adding date column
data <- as_tibble(data) %>%
purrr::set_names(glue::glue("Asset_{1:ncol(data)}")) %>%
mutate(date = dates) %>%
gather(Asset, Value, -date)
if (!(marginal_dist %in% c("norm", "t", "sgt", "unif"))) stop ("Please supply a valid marginal_dist argument")
if (marginal_dist == "unif") return(data)
# Warnings
if (marginal_dist == "norm" & is.null(marginal_dist_model)) marginal_dist_model <- list(mu=0, sd = 1)
if (marginal_dist == "t" & is.null(marginal_dist_model)) marginal_dist_model <- list(ncp = 0, df = 5)
if (marginal_dist == "sgt" & is.null(marginal_dist_model)) stop ('Please supply a valid marginal_dist_model when using marginal_dist="sgt".')
# Converting Uniform marginal distributions to norm, t or sgt.
args <- tibble(Asset = glue::glue("Asset_{1:N}")) %>%
mutate(mean = marginal_dist_model$mu,
sd = marginal_dist_model$sd,
ncp = marginal_dist_model$ncp,
df = marginal_dist_model$df,
lambda = marginal_dist_model$lambda,
p = marginal_dist_model$p,
q = marginal_dist_model$q)
if (marginal_dist == "norm") {
if(is.null(marginal_dist_model$mu)) stop('Please supply a valid mu parameter when using marginal_dist = "norm".')
if(is.null(marginal_dist_model$sd)) stop('Please supply a valid sd parameter when using marginal_dist = "norm".')
data <- data %>% left_join(., args, by = "Asset") %>%
group_by(Asset) %>% arrange(date) %>%
mutate(Return = qnorm(Value, mean, sd)) %>%
select(date, Asset, Return)
} else
if (marginal_dist == "t") {
if(is.null(marginal_dist_model$ncp)) stop('Please supply a valid ncp parameter when using marginal_dist = "t".')
if(is.null(marginal_dist_model$df)) stop('Please supply a valid df parameter when using marginal_dist = "t".')
data <- data %>% left_join(., args, by = "Asset") %>%
group_by(Asset) %>% arrange(date) %>%
mutate(Return = qt(Value, df = df, ncp = ncp)) %>%
select(date, Asset, Return)
} else
if (marginal_dist == "sgt") {
if (is.null(marginal_dist_model$mu)) marginal_dist_model$mu <- 0
if (is.null(marginal_dist_model$sd)) marginal_dist_model$sd <- 1
if (is.null(marginal_dist_model$lambda)|
is.null(marginal_dist_model$p)|
is.null(marginal_dist_model$q)) stop('Please supply valid arguments for lambda, p and q when using marginal_dist = "sgt".')
data <- data %>% left_join(., args, by = "Asset") %>%
group_by(Asset) %>% arrange(date) %>%
mutate(Return = qsgt(Value, mean, sd, lambda, p, q)) %>%
select(date, Asset, Return)
}
if (is.null(ts_model)) {
data <- data %>% dplyr::filter(date > first(date))
return(data)
} else {
# Warnings
if (!is.null(ts_model$omega) & length(ts_model$omega) != 1 & length(ts_model$omega) != N) stop("Please supply a valid vector length for omega. Must be of length 1 or ncol(corr).")
if (!is.null(ts_model$alpha) & length(ts_model$alpha) != 1 & length(ts_model$alpha) != N) stop("Please supply a valid vector length for alpha. Must be of length 1 or ncol(corr).")
if (!is.null(ts_model$gamma) & length(ts_model$gamma) != 1 & length(ts_model$gamma) != N) stop("Please supply a valid vector length for gamma. Must be of length 1 or ncol(corr).")
if (!is.null(ts_model$beta) & length(ts_model$beta) != 1 & length(ts_model$beta) != N) stop("Please supply a valid vector length for beta. Must be of length 1 or ncol(corr).")
if (!is.null(ts_model$mu) & length(ts_model$mu) != 1 & length(ts_model$mu) != N) stop("Please supply a valid vector length for mu. Must be of length 1 or ncol(corr).")
if (!is.null(ts_model$ar) & length(ts_model$ar) != 1 & length(ts_model$ar) != N) stop("Please supply a valid vector length for ar. Must be of length 1 or ncol(corr).")
if (!is.null(ts_model$ma) & length(ts_model$ma) != 1 & length(ts_model$ma) != N) stop("Please supply a valid vector length for ma. Must be of length 1 or ncol(corr).")
if (!is.null(ts_model$delta) & length(ts_model$delta) != 1 & length(ts_model$delta) != N) stop("Please supply a valid vector length for delta. Must be of length 1 or ncol(corr).")
# Inducing mean and/or variance persistence
# Tibble with with garh parameters and defaults
ts_args <- tibble(Asset = glue::glue("Asset_{1:N}")) %>%
mutate(omega = if (is.null(ts_model$omega)) {5e-04} else {ts_model$omega},
alpha = if (is.null(ts_model$alpha)) {0} else {ts_model$alpha},
gamma = if (is.null(ts_model$gamma)) {0} else {ts_model$gamma},
beta = if (is.null(ts_model$beta)) {0} else {ts_model$beta},
mu = if (is.null(ts_model$mu)) {0} else {ts_model$mu}, #changed form NULL to 0
ar = if (is.null(ts_model$ar)) {0} else {ts_model$ar},
ma = if (is.null(ts_model$ma)) {0} else {ts_model$ma},
delta = if (is.null(ts_model$delta)) {2} else {ts_model$delta})
# Inducing garch properties
data <- data %>% left_join(., ts_args, by = "Asset") %>%
arrange(date) %>% group_by(Asset) %>%
mutate(Return = sim_garch(innovations = Return,
omega = omega,
alpha = alpha,
gamma = gamma,
beta = beta,
mu = mu,
ar = ar,
ma = ma,
delta = delta)) %>% na.omit() %>%
select(date, Asset, Return)
return(data)
}
}
This code tests sim_inno and demonstrates its functionality.
Corr <- gen_corr(D = 50, clusters = "overlapping", num_clusters = c(10,5,2))
Corr %>% ggcorrplot::ggcorrplot()
# ----------------------------------------
# Simulating data with marginal_dist="norm"
# ----------------------------------------
set.seed(872154)
data_norm <- sim_market(corr = Corr,
mv_dist = "t",
mv_df = 4,
marginal_dist = "norm",
marginal_dist_model = list(mu = 0, sd = 0.02311859),
k = 500)
# ----------------------------------------
# Simulating data with marginal_dist="sgt"
# ----------------------------------------
set.seed(872154)
sgt_pars <- list(mu = 0, sd = 1, lambda = -0.1, p = 1000, q =3)
data_sgt <- sim_market(corr = Corr,
mv_dist = "t",
mv_df = 4,
marginal_dist = "sgt",
marginal_dist_model = sgt_pars,
k = 500)
The above seems to work very well, but does it work with left_cop_weight =! 0? i.e. does the MC framework function as intended when using a hybrid copula.
set.seed(123)
data_nlc <- sim_market(corr = Corr,
mv_dist = "t",
mv_df = 4,
left_cop_param = 5,
left_cop_weight = 0, # Only t copula
marginal_dist = "norm",
marginal_dist_model = list(mu = 0, sd = 1),
k = 500)
set.seed(123)
data_lc <- sim_market(corr = Corr,
mv_dist = "t",
mv_df = 4,
left_cop_param = 5,
left_cop_weight = 0.5, # t - Clayton copula hybrid
marginal_dist = "norm",
marginal_dist_model = list(mu = 0, sd = 1),
k = 500)
set.seed(123)
data_lco <- sim_market(corr = Corr,
mv_dist = "t",
mv_df = 4,
left_cop_param = 5,
left_cop_weight = 1, # Only Clayton copula
marginal_dist = "norm",
marginal_dist_model = list(mu = 0, sd = 1),
k = 500)
# checking sd's: They are Not being set correctly in the hybrid copula
data_nlc %>% group_by(Asset) %>% mutate(sd = sd(Return)) # sd = 1
## # A tibble: 25,000 x 4
## # Groups: Asset [50]
## date Asset Return sd
## <date> <glue> <dbl> <dbl>
## 1 2021-02-04 Asset_1 0.405 0.980
## 2 2021-02-04 Asset_2 0.305 1.01
## 3 2021-02-04 Asset_3 0.299 1.03
## 4 2021-02-04 Asset_4 0.785 1.03
## 5 2021-02-04 Asset_5 0.233 1.00
## 6 2021-02-04 Asset_6 0.803 1.01
## 7 2021-02-04 Asset_7 -0.285 0.991
## 8 2021-02-04 Asset_8 0.489 0.999
## 9 2021-02-04 Asset_9 0.326 0.983
## 10 2021-02-04 Asset_10 0.359 0.990
## # ... with 24,990 more rows
data_lc %>% group_by(Asset) %>% mutate(sd = sd(Return)) # sd approx = 0.58
## # A tibble: 25,000 x 4
## # Groups: Asset [50]
## date Asset Return sd
## <date> <glue> <dbl> <dbl>
## 1 2021-02-04 Asset_1 0.349 0.573
## 2 2021-02-04 Asset_2 -0.140 0.571
## 3 2021-02-04 Asset_3 0.275 0.564
## 4 2021-02-04 Asset_4 0.182 0.579
## 5 2021-02-04 Asset_5 0.306 0.577
## 6 2021-02-04 Asset_6 0.0166 0.597
## 7 2021-02-04 Asset_7 0.0428 0.582
## 8 2021-02-04 Asset_8 0.0804 0.580
## 9 2021-02-04 Asset_9 0.289 0.574
## 10 2021-02-04 Asset_10 0.420 0.584
## # ... with 24,990 more rows
data_lco %>% group_by(Asset) %>% mutate(sd = sd(Return)) # sd approx = 1
## # A tibble: 25,000 x 4
## # Groups: Asset [50]
## date Asset Return sd
## <date> <glue> <dbl> <dbl>
## 1 2021-02-04 Asset_1 0.692 1.03
## 2 2021-02-04 Asset_2 0.169 1.00
## 3 2021-02-04 Asset_3 0.844 1.01
## 4 2021-02-04 Asset_4 0.544 1.04
## 5 2021-02-04 Asset_5 0.449 1.04
## 6 2021-02-04 Asset_6 0.487 1.04
## 7 2021-02-04 Asset_7 0.333 1.04
## 8 2021-02-04 Asset_8 0.520 1.05
## 9 2021-02-04 Asset_9 1.49 0.981
## 10 2021-02-04 Asset_10 0.750 1.02
## # ... with 24,990 more rows
The above code demonstrates that the SD of asset returns is not being set correctly when trying to simulate from hybrid copulas. This may be because Tawn’s theorem only applied to bivariate copulas. Will be investigated in the future, but for now the hybrid copula functionality is left out fo MCmarkets functions.
This step turns the first three steps discussed thus far into a Monte Carlo simulation. This is done by simply repeating the first three steps a number of times. The mc_market function performs the Monte Carlo simulation of asset markets.
#' @title mc_market
#' @description This function performs a Monte Carlo simulation by iterating over the the sim_market function N times.
#' It is intended for users who are not comfortable using the purrr::map functions.
#' @note (1) see ?sim_market for information on the other parameters.
#'
#' (2) See examples under sim_market for instructions on how to add an on-screen progress bar when performing
#' the Monte Carlo simulation, this is recommended for simulations with N >1000 since they can take a number of
#' minuets to complete.
#' @param N a positive integer indicating the number of markets to simulate.
#'
#' @param list a logical value indicating whether the output should be a list of tibbles or a single long tibble (see return).
#' Due to memory issues associated with list = FALSE, list = TRUE is recommended for N > 500. List = FALSE is
#' best used for tidy output that can easily be plotted with ggplot2 (see example).
#'
#' @return if list = TRUE (default), a list of length N where each entry contains a tidy tibble with a date,
#' Asset and Return column. Else if list = FALSE a single tidy tibble with date, Universe, Asset and Return columns.
#'
#'
#' @importFrom tidyr gather
#' @importFrom dplyr progress_estimated
#' @import dplyr
#' @import purrr
#'
#' @examples
#'
#' \dontrun{
#'
#' library(tidyverse)
#'
#' ### creating a correlation matrix to use as input in sim_asset_market
#' corr <- gen_corr(D = 20, clusters = "none")
#'
#' ### simulating 550 periods of returns across 50 assets and 100 universes
#' set.seed(12542)
#' market_data <- sim_asset_market(corr,
#' k = 550,
#' mv_dist = "norm",
#' marginal_dist = "norm",
#' ts_model = list(mu = 0.000002,
#' omega = 0.00005,
#' alpha = 0.098839,
#' beta = 0.899506,
#' ar = 0.063666,
#' ma = NULL,
#' gamma = 0.12194,
#' delta = 1.85))
#'
#' ### Visualizing the market
#' market_data %>% group_by(Asset) %>%
#' mutate(cum_ret = 100*cumprod(1 + Return)) %>%
#' ggplot() +
#' geom_line(aes(x = date, y = cum_ret, color = Asset)) +
#' facet_wrap(~Asset) +
#' theme(legend.position = "none")
#'
#' }
#' @export
mc_market <- function(corr,
N = 100,
k = 252,
mv_dist = "t",
mv_df = 3,
clayton_param = 4,
marginal_dist = "norm",
marginal_dist_model = NULL,
ts_model = NULL,
list = TRUE) {
if (list == TRUE) {
pb <- dplyr::progress_estimated(N) # setting length of progress bar
1:N %>%
map(
~ sim_market(
corr = corr,
k = k,
mv_dist = mv_dist,
mv_df = mv_df,
clayton_param = clayton_param,
marginal_dist = marginal_dist,
marginal_dist_model = marginal_dist_model,
ts_model = ts_model
)
)
} else
if (list == FALSE) {
1:N %>%
map_dfr(
~ sim_market(
corr = corr,
k = k,
mv_dist = mv_dist,
mv_df = mv_df,
clayton_param = clayton_param,
marginal_dist = marginal_dist,
marginal_dist_model = marginal_dist_model,
ts_model = ts_model
),
.id = "Universe"
)
}
}
#---------
#toy corr
#--------
corr <- MCmarket::gen_corr(D = 20, clusters = "overlapping", num_clusters = c(2,5,4))
#--------------
#emperical corr
#--------------
#load("data/labeled_training_data.Rda")
#set.seed(96472)
#idx <- sample(1:50, size = 20)
#corr <- labeled_training_data$normal_market[[1]] %>% .[idx, idx]
#-----------------------------
#Monte Carlo market simulation
#-----------------------------
set.seed(521554)
mc_data <- mc_market(corr, N = 10,
k = 500,
marginal_dist = "t",
ts_model = list(), # Using default ts_model arguments
list = FALSE)
# Plotting results
mc_data %>%
group_by(Asset, Universe) %>%
arrange(date) %>%
mutate(cum_ret = cumprod(1 + Return)*100) %>%
ggplot() +
geom_line(aes(x = date, y = cum_ret, color = Universe), size = 1, alpha = 0.5) +
facet_wrap(~Asset, scales = "free_y") +
labs(title = "Ensemble of Cumulative Returns",
subtitle = "50 Realizations for a Market of 20 Assets") +
theme_bw()+
theme(legend.position = "none")
One produce a progress bar when mapping over sim_market.
rm(sim_market)
library(MCmarket)
N <- 100
pb <- dplyr::progress_estimated(N) # this must be named pb
## Warning: `progress_estimated()` is deprecated as of dplyr 1.0.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
market <-
map_dfr(1:N,
~sim_market(diag(10), progress = TRUE),
.id = "Universe")
object.size(x = market) %>% print(units = "Mb")
## 7.7 Mb
This readme accomplished its aim in developing a framework or the Monte Carlo simulation of financial asset markets. There was an investigation into using hybrid copulas to create markets with varying levels of left tail dependency, however, the this proves unsuccessful in high dimensions. Simulated markets can be have the normal, t or Clayton multivariate distributions. The assets returns can be normal, student t or SGT distributed and can exhibit numerous APARCH properties. In addition, markets with the multivariate normal or t distribution can be created to have any correlation matrix.