# JEM092 Asset Pricing
# Seminar 5
## Lukáš Petrásek
### Charles University
### lukas.petrasek@fsv.cuni.cz
## 15.3.2022

This seminar is recycled from the 2019/2020 course Portfolio Analysis and Risk Management. Authors of the original seminars are **Martin Hronec** and **Marek Hauzr**.

Why use R in finance? Packages. 

Truth: You’ll probably use whatever your employer tells you to use, e.g. Python, Julia, C++, SQL, (Excel) ... For those using only Excel, something to think about: 
- [Sober comparison of R vs. Excel](http://www.burns-stat.com/documents/tutorials/spreadsheet-addiction/)

Quick introductory material:
- [A (very) short introduction to R](https://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf)
- [Another tutorial](http://tryr.codeschool.com/)

In [None]:
# First, suppress warnings, because there's a lot of them, mostly unimportant. Generally, you should be careful when
# suppressing warnings.
options(warn = -1)

# Import base packages.
library(methods)

# Import third-party packages.
library(PortfolioAnalytics)
library(quadprog)
library(quantmod)
library(ROI)
library(ROI.plugin.glpk)
library(ROI.plugin.quadprog)

# * 2 Gambles as an illustration

In [None]:
# number_of_lotteries_to_simulate is the number of lotteries to simulate. fee is the price at which the player play's 
# the game.
petersburg <- function (number_of_lotteries_to_simulate, fee = 0) {
    winnings <- 0
    for (i in 1:number_of_lotteries_to_simulate) {
        coin <- c('heads', 'tails')
        pot <- 2
        flip <- sample(coin, size = 1)
        while (flip == 'heads') {
            pot <- pot * 2
            flip <- sample(coin, size = 1)
        }
        winnings <- winnings + pot - fee
    }

    # The average payout across all simulated lotteries (depends on a fee).
    winnings / number_of_lotteries_to_simulate
}

petersburg(100000, 1)

In [None]:
set.seed(298)
# Total jackpot is A + B, game ends when the gambler wins the jackpot or loses everything.
initial_wealth_gambler <- 20
initial_wealth_banker <- 100
winning_probability_gambler <- 0.5  # prob. of gambler winning
wealth_gambler <- c(initial_wealth_gambler)

coin_gamble <- function (wealth_a, wealth_b, winning_probability_a) {
    while (wealth_a > 0 & wealth_b > 0) {
        probability_value <- runif(1)  # uniform distribution from 0 to 1
        if (probability_value <= winning_probability_a) {
            wealth_a <- wealth_a + 1; wealth_b <- wealth_b - 1
        }
        else {
            wealth_a <- wealth_a - 1; wealth_b <- wealth_b + 1
        }
        wealth_gambler <- c(wealth_gambler, wealth_a)
    }
    wealth_gambler
}

wealth_gambler <- coin_gamble(initial_wealth_gambler, initial_wealth_banker, winning_probability_gambler)

print(length(wealth_gambler))
print(wealth_gambler[length(wealth_gambler)])

In [None]:
plot(wealth_gambler, type = 'l')

In [None]:
win_count <- 0
total_rounds = 100

for (i in 1:total_rounds) {
    wealth_gambler <- coin_gamble(initial_wealth_gambler, initial_wealth_banker, winning_probability_gambler)
    if (wealth_gambler[length(wealth_gambler)]) {
        win_count <- win_count + 1
    }
    win_count
}

print(win_count / total_rounds)

# Where to get data

In general:
- Bloomberg and Thomson Reuters are standard sources at financial institutions (they've got APIs as well as .csv options).
- More expensive (more specialized databases), e.g. CapitalIQ, some option dataset, etc.
- Yahoo Finance, Google Finance, FRED, Macrotrends.net, etc.
- Professors often have some datasets related to their research, very nice and useful example:  [The FF Library](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html)

In R:
- The package 'quantmod' provides a reliable function for downloading financial data from the web. It works with a variety of sources, some of them are Yahoo, Google and FRED.

In [None]:
# Help for getSymbols.
?getSymbols()

In [None]:
# Download SP500 data from Yahoo (you can download more at once using a vector of tickers).
getSymbols(
    "^GSPC",
    src = "yahoo",
    from = as.Date("2007-01-04"),
    to = as.Date("2010-01-01"),
    warnings = FALSE
)

In [None]:
# Peek at the data.
head(GSPC)

In [None]:
# Print basic statistics.
summary(GSPC$GSPC.Adjusted)

In [None]:
plot(y = GSPC$GSPC.Adjusted, x = index(GSPC), type = 'l')

# Stock split example

In [None]:
# Download AAPL data from Yahoo.
getSymbols(
    "AAPL",
    src = "yahoo",
    from = as.Date("2000-01-01"),
    to = as.Date("2021-12-31"),
    warnings = FALSE
)

head(AAPL)
tail(AAPL)
plot(AAPL$AAPL.Close)

In [None]:
# Download splits and dividends.
splits <- getSplits(
    "AAPL",
    from = as.Date("2000-01-01"),
    to = as.Date("2021-12-31")
)
raw_dividends <- getDividends(
    "AAPL",
    from = as.Date("2000-01-01"),
    to = as.Date("2021-12-31"),
    split.adjust = FALSE
)

head(splits)
head(raw_dividends)

In [None]:
# Calculate split and dividend adjustment ratios.
ratios <- adjRatios(splits, raw_dividends, AAPL$AAPL.Adjusted)
head(ratios)

# Use the Split and Div columns to calculate unadjusted close prices for AAPL.
AAPL$unadjusted_close <- AAPL$AAPL.Adjusted / (ratios[, "Split"] * ratios[, "Div"])

head(AAPL)
plot(AAPL$AAPL.Close, AAPL$AAPL.Adjusted, AAPL$unadjusted_close)

# * Important types of optimization problems

Optimization problems can be categorized based on the form of their objective function and constraints as well as the kind of decision variables. The type of optimization problem with which one is faced determines what software is appropriate, the efficiency of the algorithm for solving the problem, and the degree to which the optimal solution returned by the optimization solver is trustworthy and useful.

An optimization problem formulation consists of three parts:
1. A set of decision variables (usually represented as an N ∗ 1–dimensional vector array)
2. An objective function, which is a function of the decision variables(f(x))
3. A set of constraints defined by functions $g_i(x) ≤ 0$ (inequality constraint) and $h_i(x) = 0$ (equality constraint)


* Convex Programming
$$\begin{aligned}
& \underset{x}{\text{min}} \ \  f(x) \\
& \text{s.t.} \ g_i(x) \leq 0 \ \ \ \ i = 1,...,I \\
& \ \ \ \ \ \ \ \ \  \ Ax = b 
\end{aligned}$$
where $f(x)$ and $g_i(x)$ are convex functions and $Ax = b$ is a system of linear equalities. Convex programming problems encompass several classes of problems with special structure, including linear programming (LP), some quadratic programming (QP), second-order cone programming (SOCP), etc. LP problems are best studied and easiest to solve with commercial solvers, followed by convex QP problems, SOCP problems, and SDP problems.

* Linear Programming
$$\begin{aligned}
& \underset{x}{\text{min}} \ \  c'x \\
& \text{s.t.} \ \ Ax = b  \\
& \ \ \ \ \ \ \ \ \  \ x \geq 0 
\end{aligned}$$

* Quadratic Programming
$$\begin{aligned}
& \underset{x}{\text{min}} \ \  \frac{1}{2}x'Qx + c'x \\
& \text{s.t.} \ \ Ax = b  \\
& \ \ \ \ \ \ \ \ \  \ x \geq 0 
\end{aligned}$$
where $Q$ is a $N*N$ matrix, $c$ is an N-dimensional vector, $A$ is a $J*N$ matrix and $b$ is a J-dimensional vector. When the matrix $Q$ is positive semi-definite, the objective function is convex (it is a sum of a convex quadratic term and a linear function, and a linear function is both convex and concave). Since the objective function is convex and the constraints are linear expressions, we have a convex optimization problem. The problem can be solved by efficient algorithms, and we can trust that any local optimum they find is in fact the global optimum. When Q is not positive semi-definite, however, the quadratic problem can have several local optimal solutions and stationary points, and is therefore more difficult to solve.

# Bottom-up portfolio optimization

In our setting, solving quadratic programming problem is paramount. Let's look at the documentation of [quadprog](https://cran.r-project.org/web/packages/quadprog/quadprog.pdf). Now you should understand the code below.

In [None]:
# Simulating uncorrelated returns. Uncorrelated because it is enough in this case.
number_of_assets <- 20
number_of_observations <- 100
returns <- array(
    rnorm(number_of_observations * number_of_assets, mean = 0.001, sd = 0.005),
    dim = c(number_of_observations, number_of_assets)
)

# Define the optimization problem. We have min x'Qx (variance), where Q is the covariance matrix, under Bx >= b where 
# B is diagonal with 1, and b is vector of 0 (this ensures the portfolio does not have shorts) and under Ax = a, where 
# A is a matrix with one row of values = 1 (this ensures that sum of weights = 1). The equality is passed in to the
# function by setting meq = 1.
Q <- 2 * cov(returns)
A <- t(matrix(rbind(rep(1, number_of_assets))))
a <- 1
B <- diag(number_of_assets)
b <- array(0, dim = c(number_of_assets, 1))
c <- rep(0, number_of_assets)

# Perform the optimization.
result <- solve.QP(Dmat = Q, dvec = c, Amat = t(rbind(A, B)), bvec = rbind(a, b), meq = 1)
# Obtain the weights. You can do sanity checks such as summing all weights to see if they are equal to 1, also assign
# random weights and see if the resulting portfolio has always higher variance, etc.
w <- result$solution
plot(result$solution)

# Portfolio optimization using PortfolioAnalytics

Coding things bottom-up (in a sense that we still use an optimization package) is a nice way to be sure we understand the problem, however there is no need for reinventing the wheel. Thus, 'PortfolioAnalytics'. It provides numerical solutions for portfolio problems with complex constraints and objectives. The goal of
the package is to aid practicioners and researchers in solving portfolio optimization problems with complex constraints and objectives that mirror real-world applications.

Let’s follow the general case from the documentation.

In [None]:
# Clear your workspace.
rm(list = ls())

par(mfrow = c(1, 1))

# Get data.
data(edhec)
returns <- edhec[, 1:4]
colnames(returns) <- c("CA", "CTAG", "DS", "EM")
print(head(returns, 5))

In [None]:
# Get a character vector of the fund names.
fund_names <- colnames(returns)

# Specify a portfolio object by passing a character vector for the assets argument.
pspec <- portfolio.spec(assets = fund_names)

# Print all atributes of the portfolio.
print.default(pspec)

In [None]:
# Adding constraints to the portfolio object is done with add.constraint.
pspec <- add.constraint(
    portfolio = pspec,
    type = "weight_sum",
    min_sum = 1,
    max_sum = 1
)

# This is the same as above.
pspec <- add.constraint(portfolio = pspec, type = "full_investment")

# Apply dollar neutral condition.
pspec <- add.constraint(
    portfolio = pspec,
    type = "weight_sum",
    min_sum = 0,
    max_sum = 0
)
pspec <- add.constraint(portfolio = pspec, type = "dollar_neutral")

# Box constraints specify upper and lower bounds on the weights of the assets.
pspec <- add.constraint(
    portfolio = pspec,
    type = "box",
    min = 0.05,
    max = 0.4
)

# Upper and lower bound can also be specified per asset.
pspec <- add.constraint(
    portfolio = pspec,
    type = "box",
    min = c(0.05, 0, 0.08, 0.1),
    max = c(0.4, 0.3, 0.7, 0.55)
)

# Let's take more styles.
returns <- edhec[, 1:6]
colnames(returns) <- c("CA", "CTAG", "DS", "EM", "EQMN", "ED")
funds <- colnames(returns)

# Create an initial portfolio object with leverage and box constraints.
initial_portfolio <- portfolio.spec(assets = funds)
initial_portfolio <- add.constraint(
    portfolio = initial_portfolio,
    type = "leverage",
    min_sum = 0.99,
    max_sum = 1.01
)

## Global minimum variance portfolio

In [None]:
library(ROI.plugin.glpk)
library(ROI.plugin.quadprog)

In [None]:
minvar <- add.objective(portfolio = initial_portfolio, type = "risk", name = "var")

opt_minvar <- optimize.portfolio(
    R = returns,
    portfolio = minvar,
    optimize_method = "ROI",
    trace = TRUE
)

print(opt_minvar)

In [None]:
# Make some fancy plots.
plot(
    opt_minvar,
    risk.col = "StdDev",
    return.col = "mean",
    main = "Minimum Variance Optimization",
    chart.assets = TRUE,
    xlim = c(0, 0.05),
    ylim = c(0, 0.0085)
)

In [None]:
# Plot the efficient frontier.
meanvar <- create.EfficientFrontier(R = returns, portfolio = initial_portfolio, type = 'mean-var')
chart.EfficientFrontier(meanvar, match.col = 'StdDev', type = 'l', RAR.text = 'Sharpe Ratio', pch = 4)

In [None]:
data <- load("/home/lukas/projects/asset-pricing/summer-semester-2022/seminar_7/Asset_Pricing_seminar_data.RData")
# head(book_value_sap100)
# head(MktCap_sap100)
head(OHLCV_sap100)