Permalink
Cannot retrieve contributors at this time
--- | |
title: A Gentle Introduction to `CVXR` | |
author: "Paul Rosenfield, Anqi Fu, Balasubramanian Narasimhan, Steven Diamond, Stephen Boyd" | |
date: '2017-10-29' | |
slug: cvxr_gentle-intro | |
bibliography: ../bibtex/cvxr_refs.bib | |
params: | |
mode: ignore | |
testdata_dir: test_data_1.0 | |
data_dir: gentle_intro | |
--- | |
```{r, echo = FALSE, message = FALSE, eval = params$mode %in% c("test", "save")} | |
library(here) | |
testdata_dir <- here("static", params$testdata_dir, params$data_dir) | |
if (params$mode == "test") { | |
library(testthat) | |
} else { | |
if (!dir.exists(testdata_dir)) dir.create(testdata_dir) | |
} | |
``` | |
```{r prereqs, message = FALSE, echo = FALSE} | |
library(CVXR) | |
## We restrict ourselves to solvers on CRAN | |
CVXR::add_to_solver_blacklist(c("CPLEX", "GUROBI", "MOSEK", "CBC")) | |
library(ggplot2) | |
library(tidyr) | |
library(MASS) | |
``` | |
## Introduction | |
Welcome to `CVXR`: a modeling language for describing and solving | |
convex optimization problems that follows the natural, mathematical | |
notation of convex optimization rather than the requirements of any | |
particular solver. The purpose of this document is both to introduce | |
the reader to `CVXR` and to generate excitement for its possibilities in | |
the field of statistics. | |
Convex optimization is a powerful and very general tool. As a | |
practical matter, the set of convex optimization problems includes | |
almost every optimization problem that can be solved exactly and | |
efficiently (i.e. without requiring an exhaustive search). If an | |
optimization problem can be solved, it is probably convex. This family | |
of problems becomes even larger if you include those that can be | |
solved _approximately_ and efficiently. To learn more about the | |
mathematics and applications of convex optimization, | |
see [Boyd and Vandenberghe 2009](http://stanford.edu/~boyd/cvxbook/). | |
Convex optimization systems written in other languages are already | |
widely used in practical applications. These | |
include | |
[YALMIP](http://users.isy.liu.se/johanl/yalmip/pmwiki.php?n=Main.WhatIsYALMIP) and | |
[CVX](http://cvxr.com/cvx/) | |
(Matlab), [CVXPY](http://www.cvxpy.org/en/latest/) (Python), | |
and [Convex.jl](http://convexjl.readthedocs.org/en/latest/) | |
(Julia). `CVXR` Shares a lot of its code base | |
with | |
[CVXcanon](http://stanford.edu/class/ee364b/projects/2015projects/reports/miller_quigley_zhu_report.pdf) and | |
CVXPY. As far as we know, this is the first full-featured general | |
convex optimization package for R. | |
One of the great headaches of conventional numerical optimization is | |
the process of deciding which algorithm to use and how to set its | |
parameters. In convex optimization, the particular algorithm matters | |
much less. So while a user of `CVXR` is still free to choose from a | |
number of different algorithms and to set algorithm parameters as they | |
please, the vast majority of users will not need to do this. `CVXR` | |
will just work. | |
The uses for convex optimization in statistics are many and | |
varied. A large number of parameter-fitting methods are convex, including | |
least-squares, ridge, lasso, and isotonic regression, as well as many | |
other kinds of problems such as maximum entropy or minimum | |
Kullback-Leibler divergence over a finite set. | |
All of these examples, at least in their most basic forms, are | |
established enough that they already have well-crafted R packages | |
devoted to them. If you use `CVXR` to solve these problems, it will | |
work. It will probably be slower than a custom-built algorithm---for | |
example, glmnet for fitting lasso or ridge regression models---but it | |
will work. However, this is not the true purpose of `CVXR`. If you | |
want to build a well-established model, you should use one of the | |
well-established packages for doing so. If you want to build your | |
_own_ model---one that is a refinement of an existing method, or | |
perhaps even something that has never been tried before---then `CVXR` | |
is the package to do it. The advantage of `CVXR` over glmnet and the | |
like comes from its flexibility: A few lines of code can transform a | |
problem from commonplace to state-of-the-art, and can often do the | |
work of an entire package in the process. | |
This document is meant to familiarize you with the `CVXR` package. It | |
assumes basic knowledge of convex optimization and statistics as well | |
as proficiency with R. A potential user of `CVXR` should read this entire | |
document and then refer to the tutorial examples. | |
Happy optimizing! | |
## Convex Optimization | |
A convex optimization problem has the following form: | |
$$ | |
\begin{array}{ll} \mbox{minimize} & f_0(x)\\ | |
\mbox{subject to} & f_i(x) \leq 0, \quad i=1,\ldots,m\\ | |
& g_i(x) = 0, \quad i=1,\ldots,p | |
\end{array} | |
$$ | |
where $x$ is the variable, $f_0$ and $f_1,...,f_m$ are convex, and | |
$g_1,...,g_p$ are affine. $f_0$ is called the objective function, $f_i | |
\leq 0$ are called the inequality constraints, and $g_i = 0$ are | |
called the equality constraints. | |
In `CVXR`, you will specify convex optimization problems in a more | |
convenient format than the one above. | |
A convex function is one that is upward curving. A concave function is | |
downward curving. An affine function is flat, and is thus both convex | |
and concave. | |
A convex optimization problem is one that attempts to minimize a | |
convex function (or maximize a concave function) over a convex set of | |
input points. | |
You can learn much more about convex optimization via | |
@BoydVandenberghe:2004 as well as | |
the | |
[CVX101 MOOC](https://lagunita.stanford.edu/courses/Engineering/CVX101/Winter2014/about). | |
## 'Hello World' | |
We begin with one of the simplest possible problems that presents all | |
three of these features: | |
$$ | |
\begin{array}{ll} | |
\mbox{minimize} & x^2 + y^2 \\ | |
\mbox{subject to} & x \geq 0, \quad 2x + y = 1 | |
\end{array} | |
$$ | |
with scalar variables $x$ and $y$. This is a convex optimization | |
problem with objective $f_0(x,y) = x^2 + y^2$ and constraint functions | |
$f_1(x,y) = -x$ and $g_1(x,y) = 2x - y - 1$. | |
Note that this problem is simple enough to be solved analytically, so | |
we can confirm that `CVXR` has produced the correct answer. Here's how | |
we formulate the problem in `CVXR`. | |
```{r} | |
# Variables minimized over | |
x <- Variable(1) | |
y <- Variable(1) | |
# Problem definition | |
objective <- Minimize(x^2 + y^2) | |
constraints <- list(x >= 0, 2*x + y == 1) | |
prob2.1 <- Problem(objective, constraints) | |
# Problem solution | |
solution2.1 <- solve(prob2.1) | |
solution2.1$status | |
solution2.1$value | |
solution2.1$getValue(x) | |
solution2.1$getValue(y) | |
# The world says 'hi' back. | |
``` | |
```{r, echo = FALSE, eval = params$mode %in% c("test", "save"), error = params$mode %in% c("test", "save")} | |
if (params$mode == "save") { | |
saveRDS(list(x = solution2.1$getValue(x), y = solution2.1$getValue(y)), | |
file = file.path(testdata_dir, "gentle_intro1.RDS")) | |
} else { | |
cat("Testthat Results: No output is good\n") | |
gentle_intro <- readRDS(file = file.path(testdata_dir, "gentle_intro1.RDS")) | |
expect_identical(solution2.1$getValue(x), gentle_intro$x) | |
expect_identical(solution2.1$getValue(y), gentle_intro$y) | |
} | |
``` | |
We now turn to a careful explanation of the code. The first lines | |
create two Variable objects, `x` and `y`, both of length 1 | |
(i.e. scalar variables). | |
```{r} | |
x <- Variable(1) | |
y <- Variable(1) | |
``` | |
`x` and `y` represent what we are allowed to adjust in our problem in | |
order to obtain the optimal solution. They don't have values yet, and | |
they won't until after we solve the problem. For now, they are just | |
placeholders. | |
Next, we define the problem objective. | |
```{r} | |
objective <- Minimize(x^2 + y^2) | |
``` | |
This call to `Minimize()` does _not_ return the minimum value of the | |
expression `x^2 + y^2` the way a call to the native R function `min()` | |
would do (after all, `x` and `y` don't have values yet). Instead, | |
`Minimize()` creates an Objective object, which defines the goal of | |
the optimization we will perform, namely to find values for `x` and | |
`y` which produce the smallest possible value of `x^2 + y^2`. | |
The next line defines two constraints---an inequality constraint and | |
an equality constraint, respectively. | |
```{r} | |
constraints <- list(x >= 0, 2*x + y == 1) | |
``` | |
Again, counter to what you might ordinarily expect, the expression `x | |
>= 0` does not return `TRUE` or `FALSE` the way `1.3 >= 0` | |
would. Instead, the `==` and `>=` operators have been overloaded to | |
return Constraint objects, which will be used by the solver to enforce | |
the problem's constraints. (Without them, the solution to our problem | |
would simply be $x = y = 0$.) | |
Next, we define our Problem object, which takes our Objective object | |
and our two Constraint objects as inputs. | |
```{r} | |
prob2.1 <- Problem(objective, constraints) | |
``` | |
Problem objects are very flexible in that they can have 0 or more | |
constraints, and their objective can be to `Minimize()` a convex | |
expression (as shown above) _or_ to `Maximize()` a concave | |
expression. | |
The call to `Problem()` still does not actually _solve_ our | |
optimization problem. That only happens with the call to `solve()`. | |
```{r} | |
solution2.1 <- solve(prob2.1) | |
``` | |
Behind the scenes, this call translates the problem into a format that | |
a convex solver can understand, feeds the problem to the solver, and | |
then returns the results in a list. For this problem, the | |
list will contain among other things the optimal value of | |
the objective function `x^2 + y^2`, values for `x` and `y` that | |
achieve that optimal objective value (along with a function | |
`solution2.1$getValue` to retrieve them conveniently), and some | |
accompanying metadata such as `solution2.1$status`, which confirms | |
that the solution was indeed `"optimal"`. | |
```{r} | |
solution2.1$status | |
solution2.1$value | |
solution2.1$getValue(x) | |
solution2.1$getValue(y) | |
``` | |
In general, when you apply the `solve()` method to a Problem, several | |
things can happen: | |
1. `solution$status == "optimal"`: The problem was solved. Values for | |
the optimization variables were found, which satisfy all of the | |
constraints and minimize/maximize the objective. | |
2. `solution$status == "infeasible"`: The problem was _not_ solved | |
because no combination of input variables exists that can satisfy | |
all of the constraints. For a trivial example of when this might | |
happen, consider a problem with optimization variable `x`, and | |
constraints `x >= 1` and `x <= 0`. Obviously, no value of `x` | |
exists that can satisfy both constraints. In this case, | |
`solution$value` is `+Inf` for a minimization problem and `-Inf` | |
for a maximization problem, indicating infinite dissatisfaction | |
with the result. The values for the input variables are `NA`. | |
3. `solution$status == "unbounded"`: The problem was _not_ solved | |
because the objective can be made arbitrarily small for a | |
minimization problem or arbitrarily large for a maximization | |
problem. Hence there is no optimal solution because for any given | |
solution, it is always possible to find something even more | |
optimal. In this case, `solution$opt.val` is `-Inf` for a | |
minimization problem and `+Inf` for a maximization problem, | |
indicating infinite satisfaction with the result. Again, the values | |
of the the input variables are `NA`. | |
### Modifying a CVXR Problem | |
Like any normal R object, the `Problem`, `Minimize`, `Maximize`, and | |
`Constraint` objects can all be modified and computed upon after | |
creation. Here is an example where we modify the problem we created | |
above by changing its objective and adding a constraint, print the | |
modified problem, check whether it is still convex, and then solve the | |
modified problem: | |
```{r} | |
# Modify the problem from example 1 | |
prob2.2 <- prob2.1 | |
objective(prob2.2) <- Minimize(x^2 + y^2 + abs(x-y)) | |
constraints(prob2.2) <- c(constraints(prob2.1), y <= 1) | |
# Solve the modified problem | |
solution2.2 <- solve(prob2.2) | |
# Examine the solution | |
solution2.2$status | |
solution2.2$value | |
solution2.2$getValue(x) | |
solution2.2$getValue(y) | |
``` | |
```{r, echo = FALSE, eval = params$mode %in% c("test", "save"), error = params$mode %in% c("test", "save")} | |
if (params$mode == "save") { | |
saveRDS(list(x = solution2.2$getValue(x), y = solution2.2$getValue(y)), | |
file = file.path(testdata_dir, "gentle_intro2.RDS")) | |
} else { | |
cat("Testthat Results: No output is good\n") | |
gentle_intro <- readRDS(file = file.path(testdata_dir, "gentle_intro2.RDS")) | |
expect_identical(solution2.2$getValue(x), gentle_intro$x) | |
expect_identical(solution2.2$getValue(y), gentle_intro$y) | |
} | |
``` | |
### An Invalid Problem | |
Unfortunately, you can't just type any arbitrary problem you like into | |
`CVXR`. There are restrictions on what kinds of problems can be | |
handled. For example, if we tried to `Maximize()' the objective from | |
example 2.1, we get an error: | |
```{r, error = TRUE} | |
prob2.3 <- prob2.1 | |
objective(prob2.3) <- Maximize(x^2 + y^2) | |
solve(prob2.3) | |
``` | |
We would get a similar error if we tried to add the constraint | |
`p_norm(x,2) == 1`. This is because `CVXR` uses a strict set of rules | |
called Disciplined Convex Programming (DCP) to evaluate the convexity | |
of any given problem. If you follow these rules, you are guaranteed | |
that your problem is convex. If you don't follow these rules, `CVXR` | |
will throw an exception. See the last section for further information | |
on DCP. | |
## Simple Examples | |
We begin by showing what a standard linear regression problem looks | |
like in `CVXR`: | |
### Ordinary Least Squares | |
For illustration, we generate some synthetic data for use in this | |
example. | |
```{r, echo=TRUE} | |
set.seed(1) | |
s <- 1 | |
n <- 10 | |
m <- 300 | |
mu <- rep(0, 9) | |
Sigma <- cbind(c(1.6484, -0.2096, -0.0771, -0.4088, 0.0678, -0.6337, 0.9720, -1.2158, -1.3219), | |
c(-0.2096, 1.9274, 0.7059, 1.3051, 0.4479, 0.7384, -0.6342, 1.4291, -0.4723), | |
c(-0.0771, 0.7059, 2.5503, 0.9047, 0.9280, 0.0566, -2.5292, 0.4776, -0.4552), | |
c(-0.4088, 1.3051, 0.9047, 2.7638, 0.7607, 1.2465, -1.8116, 2.0076, -0.3377), | |
c(0.0678, 0.4479, 0.9280, 0.7607, 3.8453, -0.2098, -2.0078, -0.1715, -0.3952), | |
c(-0.6337, 0.7384, 0.0566, 1.2465, -0.2098, 2.0432, -1.0666, 1.7536, -0.1845), | |
c(0.9720, -0.6342, -2.5292, -1.8116, -2.0078, -1.0666, 4.0882, -1.3587, 0.7287), | |
c(-1.2158, 1.4291, 0.4776, 2.0076, -0.1715, 1.7536, -1.3587, 2.8789, 0.4094), | |
c(-1.3219, -0.4723, -0.4552, -0.3377, -0.3952, -0.1845, 0.7287, 0.4094, 4.8406)) | |
X <- MASS::mvrnorm(m, mu, Sigma) | |
X <- cbind(rep(1, m), X) | |
trueBeta <- c(0, 0.8, 0, 1, 0.2, 0, 0.4, 1, 0, 0.7) | |
y <- X %*% trueBeta + rnorm(m, 0, s) | |
``` | |
```{r} | |
beta <- Variable(n) | |
objective <- Minimize(sum_squares(y - X %*% beta)) | |
prob3.1 <- Problem(objective) | |
``` | |
Here, `y` is the response, `X` is the matrix of predictors, `n` is the | |
number of predictors, and `beta` is a vector of coefficients on the | |
predictors. The Ordinary Least-Squares (OLS) solution for `beta` | |
minimizes the $l_2$-norm of the residuals (i.e. the | |
root-mean-squared error). As we can see below, `CVXR`'s solution | |
matches the solution obtained by using `lm`. | |
```{r} | |
CVXR_solution3.1 <- solve(prob3.1) | |
lm_solution3.1 <- lm(y ~ 0 + X) | |
``` | |
```{r, echo=FALSE, fig.height = 5, fig.width = 7} | |
lmBeta <- coef(lm_solution3.1) | |
cvxrBeta <- CVXR_solution3.1$getValue(beta) | |
coeff <- cbind(lmBeta, cvxrBeta) | |
colnames(coeff) <- c("OLS", "CVXR") | |
rownames(coeff) <- paste0("beta[", seq_along(lmBeta) - 1L, "]") | |
d <- data.frame(coeff = rownames(coeff), coeff) | |
tidyD <- gather(d, key="type", value="value", OLS, CVXR, factor_key = TRUE) | |
ggplot(data = tidyD, mapping = aes(x = coeff, y = value)) + | |
geom_bar(mapping = aes(fill = type), stat = "identity", position = "dodge") + | |
scale_x_discrete(labels = parse(text = levels(tidyD$coeff))) | |
``` | |
Obviously, if all you want to do is least-squares linear regression, | |
you should simply use `lm`. The chief advantage of `CVXR` is its | |
flexibility, as we will demonstrate in the rest of this section. | |
```{r, echo = FALSE, eval = params$mode %in% c("test", "save"), error = params$mode %in% c("test", "save")} | |
if (params$mode == "save") { | |
saveRDS(list(beta = cvxrBeta), | |
file = file.path(testdata_dir, "gentle_intro3.RDS")) | |
} else { | |
cat("Testthat Results: No output is good\n") | |
gentle_intro <- readRDS(file = file.path(testdata_dir, "gentle_intro3.RDS")) | |
expect_identical(cvxrBeta, gentle_intro$beta) | |
} | |
``` | |
### Non-Negative Least Squares | |
Looking at example 3.1, you may notice that the OLS regression problem | |
has an objective, but no constraints. In many contexts, we can greatly | |
improve our model by constraining the solution to reflect our prior | |
knowledge. For example, we may know that the coefficients `beta` must | |
be non-negative. | |
```{r} | |
prob3.2 <- prob3.1 | |
constraints(prob3.2) <- list(beta >= 0) | |
solution3.2 <- solve(prob3.2) | |
``` | |
```{r, echo=FALSE, out.width=700} | |
cvxrBetaNNLS <- solution3.2$getValue(beta) | |
coeff <- cbind(trueBeta, lmBeta, cvxrBetaNNLS) | |
colnames(coeff) <- c("Actual", "OLS", "NNLS") | |
rownames(coeff) <- paste0("beta[", seq_along(trueBeta) - 1L, "]") | |
d <- data.frame(coeff = rownames(coeff), coeff) | |
tidyD <- gather(d, key="type", value="value", Actual, OLS, NNLS, factor_key = TRUE) | |
ggplot(data = tidyD, mapping = aes(x = coeff, y = value)) + | |
geom_bar(mapping = aes(fill = type), stat = "identity", position = "dodge") + | |
scale_x_discrete(labels = parse(text = levels(tidyD$coeff))) | |
``` | |
```{r, echo = FALSE, eval = params$mode %in% c("test", "save"), error = params$mode %in% c("test", "save")} | |
if (params$mode == "save") { | |
saveRDS(list(beta = cvxrBetaNNLS), | |
file = file.path(testdata_dir, "gentle_intro4.RDS")) | |
} else { | |
cat("Testthat Results: No output is good\n") | |
gentle_intro <- readRDS(file = file.path(testdata_dir, "gentle_intro4.RDS")) | |
expect_identical(cvxrBetaNNLS, gentle_intro$beta) | |
} | |
``` | |
As we can see in the figure above, adding that one constraint produced | |
a massive improvement in the accuracy of the estimates. Not only are | |
the non-negative least-squares estimates much closer to the true | |
coefficients than the OLS estimates, they have even managed to recover the | |
correct sparsity structure in this case. | |
Like with OLS, there are already R packages available which implement | |
non-negative least squares, such as `nnls`. But that is | |
actually an excellent demonstration of the power of `CVXR`: A single | |
line of code here, namely `prob3.2$constraints <- list(beta >= 0)`, is | |
doing the work of an entire package. | |
### Support Vector Classifiers | |
Another common statistical tool is the support vector classifier | |
(SVC). The SVC is an affine function (hyperplane) that separates two | |
sets of points by the widest margin. When the sets are not linearly | |
separable, the SVC is determined by a trade-off between the width of | |
the margin and the number of points that are misclassified. | |
For the binary case, where the response $y_i \in \{-1,1\}$, the SVC is obtained by solving | |
$$ | |
\begin{array}{ll} | |
\mbox{minimize} & \frac{1}{2}\Vert\beta\Vert^2 + C\sum_{i=1}^m \xi_i \\ | |
\mbox{subject to} & \xi_i \geq 0, \quad y_i(x_i^T\beta + \beta_0) \geq 1 - \xi_i, \quad i = 1,\ldots,m | |
\end{array} | |
$$ | |
with variables $(\beta,\xi)$. Here, $\xi$ is the amount by which a | |
point can violate the separating hyperplane, and $C > 0$ is a | |
user-chosen penalty on the total violation. As $C$ increases, fewer | |
misclassifications will be allowed. | |
Below, we fit a SVC in `CVXR` with $C = 10$. | |
```{r} | |
## Generate data | |
set.seed(10) | |
n <- 2 | |
m <- 50 | |
X <- matrix(rnorm(m*n), nrow = m, ncol = n) | |
y <- c(rep(-1, m/2), rep(1, m/2)) | |
X[y == 1,] = X[y == 1,] + 1 | |
``` | |
```{r} | |
## Define variables | |
cost <- 10 | |
beta0 <- Variable() | |
beta <- Variable(n) | |
slack <- Variable(m) | |
# Form problem | |
objective <- (1/2) * sum_squares(vstack(beta, beta0)) + cost * sum(slack) | |
constraints <- list(y * (X %*% beta + beta0) >= 1 - slack, slack >= 0) | |
prob3.3 <- Problem(Minimize(objective), constraints) | |
solution3.3 <- solve(prob3.3) | |
``` | |
```{r, echo=FALSE, fig.width = 7} | |
b0 <- solution3.3$getValue(beta0) | |
b <- solution3.3$getValue(beta) | |
## Plot support vector and classified points | |
plotdata <- as.data.frame(cbind(y, X)) | |
names(plotdata) <- c("y", "X1", "X2") | |
plotdata$color <- factor(3 - y) | |
ggplot(data = plotdata, aes(x = X1, y = X2, color = color)) + | |
geom_point(size = 3) + | |
geom_abline(slope = -b[1] / b[2], intercept = (1 - b0) / b[2]) + | |
labs(title = "Binary Support Vector Classifier") + | |
theme(legend.position="none") | |
``` | |
```{r, echo = FALSE, eval = params$mode %in% c("test", "save"), error = params$mode %in% c("test", "save")} | |
if (params$mode == "save") { | |
saveRDS(list(b0 = b0, b = b), | |
file = file.path(testdata_dir, "gentle_intro5.RDS")) | |
} else { | |
cat("Testthat Results: No output is good\n") | |
gentle_intro <- readRDS(file = file.path(testdata_dir, "gentle_intro5.RDS")) | |
expect_identical(b0, gentle_intro$b0) | |
expect_identical(b, gentle_intro$b) | |
} | |
``` | |
## Disciplined Convex Programming (DCP) | |
Disciplined convex programming (DCP) is a system for constructing | |
mathematical expressions with known sign and curvature from a given library of | |
base functions. `CVXR` uses DCP to ensure that the specified | |
optimization problems are convex. | |
The user may find it helpful to read about how the DCP rules are applied | |
in other languages such | |
as | |
[Python](http://www.cvxpy.org/en/latest/tutorial/dcp/index.html), | |
[Matlab](http://cvxr.com/cvx/doc/dcp.html#), | |
and [Julia](http://convexjl.readthedocs.io/en/latest/types.html). | |
`CVXR` implements the same rules, and a short introduction | |
is [available here](/cvxr_dcp). The set of DCP functions are | |
described in [Function Reference](/cvxr_functions/). | |
## Session Info | |
```{r} | |
sessionInfo() | |
``` | |
## Source | |
[R Markdown](https://github.com/bnaras/cvxr_docs/blob/master/content/cvxr_examples/gentle_intro.Rmd) | |
## References | |