Permalink
Switch branches/tags
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
508 lines (408 sloc) 18.2 KB
---
title: A Gentle Introduction to `CVXR`
author: "Paul Rosenfield, Anqi Fu, Balasubramanian Narasimhan, Steven Diamond, Stephen Boyd"
date: '2017-10-29'
slug: cvxr_gentle-intro
bibliography: ../bibtex/cvxr_refs.bib
---
```{r, message=FALSE, warning=FALSE, echo = FALSE}
library(ggplot2)
library(tidyr)
library(CVXR)
```
## Introduction
Welcome to `CVXR`: a modeling language for describing and solving
convex optimization problems that follows the natural, mathematical
notation of convex optimization rather than the requirements of any
particular solver. The purpose of this document is both to introduce
the reader to `CVXR` and to generate excitement for its possibilities in
the field of statistics.
Convex optimization is a powerful and very general tool. As a
practical matter, the set of convex optimization problems includes
almost every optimization problem that can be solved exactly and
efficiently (i.e. without requiring an exhaustive search). If an
optimization problem can be solved, it is probably convex. This family
of problems becomes even larger if you include those that can be
solved _approximately_ and efficiently. To learn more about the
mathematics and applications of convex optimization,
see [Boyd and Vandenberghe 2009](http://stanford.edu/~boyd/cvxbook/).
Convex optimization systems written in other languages are already
widely used in practical applications. These
include
[YALMIP](http://users.isy.liu.se/johanl/yalmip/pmwiki.php?n=Main.WhatIsYALMIP) and
[CVX](http://cvxr.com/cvx/)
(Matlab), [CVXPY](http://www.cvxpy.org/en/latest/) (Python),
and [Convex.jl](http://convexjl.readthedocs.org/en/latest/)
(Julia). `CVXR` Shares a lot of its code base
with
[CVXcanon](http://stanford.edu/class/ee364b/projects/2015projects/reports/miller_quigley_zhu_report.pdf) and
CVXPY. As far as we know, this is the first full-featured general
convex optimization package for R.
One of the great headaches of conventional numerical optimization is
the process of deciding which algorithm to use and how to set its
parameters. In convex optimization, the particular algorithm matters
much less. So while a user of `CVXR` is still free to choose from a
number of different algorithms and to set algorithm parameters as they
please, the vast majority of users will not need to do this. `CVXR`
will just work.
The uses for convex optimization in statistics are many and
varied. A large number of parameter-fitting methods are convex, including
least-squares, ridge, lasso, and isotonic regression, as well as many
other kinds of problems such as maximum entropy or minimum
Kullback-Leibler divergence over a finite set.
All of these examples, at least in their most basic forms, are
established enough that they already have well-crafted R packages
devoted to them. If you use `CVXR` to solve these problems, it will
work. It will probably be slower than a custom-built algorithm---for
example, glmnet for fitting lasso or ridge regression models---but it
will work. However, this is not the true purpose of `CVXR`. If you
want to build a well-established model, you should use one of the
well-established packages for doing so. If you want to build your
_own_ model---one that is a refinement of an existing method, or
perhaps even something that has never been tried before---then `CVXR`
is the package to do it. The advantage of `CVXR` over glmnet and the
like comes from its flexibility: A few lines of code can transform a
problem from commonplace to state-of-the-art, and can often do the
work of an entire package in the process.
This document is meant to familiarize you with the `CVXR` package. It
assumes basic knowledge of convex optimization and statistics as well
as proficiency with R. A potential user of `CVXR` should read this entire
document and then refer to the tutorial examples.
Happy optimizing!
## Convex Optimization
A convex optimization problem has the following form:
$$
\begin{array}{ll} \mbox{minimize} & f_0(x)\\
\mbox{subject to} & f_i(x) \leq 0, \quad i=1,\ldots,m\\
& g_i(x) = 0, \quad i=1,\ldots,p
\end{array}
$$
where $x$ is the variable, $f_0$ and $f_1,...,f_m$ are convex, and
$g_1,...,g_p$ are affine. $f_0$ is called the objective function, $f_i
\leq 0$ are called the inequality constraints, and $g_i = 0$ are
called the equality constraints.
In `CVXR`, you will specify convex optimization problems in a more
convenient format than the one above.
A convex function is one that is upward curving. A concave function is
downward curving. An affine function is flat, and is thus both convex
and concave.
A convex optimization problem is one that attempts to minimize a
convex function (or maximize a concave function) over a convex set of
input points.
You can learn much more about convex optimization via
@BoydVandenberghe:2004 as well as
the
[CVX101 MOOC](https://lagunita.stanford.edu/courses/Engineering/CVX101/Winter2014/about).
## 'Hello World'
We begin with one of the simplest possible problems that presents all
three of these features:
$$
\begin{array}{ll}
\mbox{minimize} & x^2 + y^2 \\
\mbox{subject to} & x \geq 0, \quad 2x + y = 1
\end{array}
$$
with scalar variables $x$ and $y$. This is a convex optimization
problem with objective $f_0(x,y) = x^2 + y^2$ and constraint functions
$f_1(x,y) = -x$ and $g_1(x,y) = 2x - y - 1$.
Note that this problem is simple enough to be solved analytically, so
we can confirm that `CVXR` has produced the correct answer. Here's how
we formulate the problem in `CVXR`.
```{r}
# Variables minimized over
x <- Variable(1)
y <- Variable(1)
# Problem definition
objective <- Minimize(x^2 + y^2)
constraints <- list(x >= 0, 2*x + y == 1)
prob2.1 <- Problem(objective, constraints)
# Problem solution
solution2.1 <- solve(prob2.1)
solution2.1$status
solution2.1$value
solution2.1$getValue(x)
solution2.1$getValue(y)
# The world says 'hi' back.
```
We now turn to a careful explanation of the code. The first lines
create two Variable objects, `x` and `y`, both of length 1
(i.e. scalar variables).
```{r}
x <- Variable(1)
y <- Variable(1)
```
`x` and `y` represent what we are allowed to adjust in our problem in
order to obtain the optimal solution. They don't have values yet, and
they won't until after we solve the problem. For now, they are just
placeholders.
Next, we define the problem objective.
```{r}
objective <- Minimize(x^2 + y^2)
```
This call to `Minimize()` does _not_ return the minimum value of the
expression `x^2 + y^2` the way a call to the native R function `min()`
would do (after all, `x` and `y` don't have values yet). Instead,
`Minimize()` creates an Objective object, which defines the goal of
the optimization we will perform, namely to find values for `x` and
`y` which produce the smallest possible value of `x^2 + y^2`.
The next line defines two constraints---an inequality constraint and
an equality constraint, respectively.
```{r}
constraints <- list(x >= 0, 2*x + y == 1)
```
Again, counter to what you might ordinarily expect, the expression `x
>= 0` does not return `TRUE` or `FALSE` the way `1.3 >= 0`
would. Instead, the `==` and `>=` operators have been overloaded to
return Constraint objects, which will be used by the solver to enforce
the problem's constraints. (Without them, the solution to our problem
would simply be $x = y = 0$.)
Next, we define our Problem object, which takes our Objective object
and our two Constraint objects as inputs.
```{r}
prob2.1 <- Problem(objective, constraints)
```
Problem objects are very flexible in that they can have 0 or more
constraints, and their objective can be to `Minimize()` a convex
expression (as shown above) _or_ to `Maximize()` a concave
expression.
The call to `Problem()` still does not actually _solve_ our
optimization problem. That only happens with the call to `solve()`.
```{r}
solution2.1 <- solve(prob2.1)
```
Behind the scenes, this call translates the problem into a format that
a convex solver can understand, feeds the problem to the solver, and
then returns the results in a list. For this problem, the
list will contain among other things the optimal value of
the objective function `x^2 + y^2`, values for `x` and `y` that
achieve that optimal objective value (along with a function
`solution2.1$getValue` to retrieve them conveniently), and some
accompanying metadata such as `solution2.1$status`, which confirms
that the solution was indeed `"optimal"`.
```{r}
solution2.1$status
solution2.1$value
solution2.1$getValue(x)
solution2.1$getValue(y)
```
In general, when you apply the `solve()` method to a Problem, several
things can happen:
1. `solution$status == "optimal"`: The problem was solved. Values for
the optimization variables were found, which satisfy all of the
constraints and minimize/maximize the objective.
2. `solution$status == "infeasible"`: The problem was _not_ solved
because no combination of input variables exists that can satisfy
all of the constraints. For a trivial example of when this might
happen, consider a problem with optimization variable `x`, and
constraints `x >= 1` and `x <= 0`. Obviously, no value of `x`
exists that can satisfy both constraints. In this case,
`solution$value` is `+Inf` for a minimization problem and `-Inf`
for a maximization problem, indicating infinite dissatisfaction
with the result. The values for the input variables are `NA`.
3. `solution$status == "unbounded"`: The problem was _not_ solved
because the objective can be made arbitrarily small for a
minimization problem or arbitrarily large for a maximization
problem. Hence there is no optimal solution because for any given
solution, it is always possible to find something even more
optimal. In this case, `solution$opt.val` is `-Inf` for a
minimization problem and `+Inf` for a maximization problem,
indicating infinite satisfaction with the result. Again, the values
of the the input variables are `NA`.
### Modifying a CVXR Problem
Like any normal R object, the `Problem`, `Minimize`, `Maximize`, and
`Constraint` objects can all be modified and computed upon after
creation. Here is an example where we modify the problem we created
above by changing its objective and adding a constraint, print the
modified problem, check whether it is still convex, and then solve the
modified problem:
```{r}
# Modify the problem from example 1
prob2.2 <- prob2.1
objective(prob2.2) <- Minimize(x^2 + y^2 + abs(x-y))
constraints(prob2.2) <- c(constraints(prob2.1), y <= 1)
# Solve the modified problem
solution2.2 <- solve(prob2.2)
# Examine the solution
solution2.2$status
solution2.2$value
solution2.2$getValue(x)
solution2.2$getValue(y)
```
### An Invalid Problem
Unfortunately, you can't just type any arbitrary problem you like into
`CVXR`. There are restrictions on what kinds of problems can be
handled. For example, if we tried to `Maximize()' the objective from
example 2.1, we get an error:
```{r, error = TRUE}
prob2.3 <- prob2.1
objective(prob2.3) <- Maximize(x^2 + y^2)
solve(prob2.3)
```
We would get a similar error if we tried to add the constraint
`p_norm(x,2) == 1`. This is because `CVXR` uses a strict set of rules
called Disciplined Convex Programming (DCP) to evaluate the convexity
of any given problem. If you follow these rules, you are guaranteed
that your problem is convex. If you don't follow these rules, `CVXR`
will throw an exception. See the last section for further information
on DCP.
## Simple Examples
We begin by showing what a standard linear regression problem looks
like in `CVXR`:
### Ordinary Least Squares
For illustration, we generate some synthetic data for use in this
example.
```{r, echo=TRUE}
set.seed(1)
s <- 1
n <- 10
m <- 300
mu <- rep(0, 9)
Sigma <- cbind(c(1.6484, -0.2096, -0.0771, -0.4088, 0.0678, -0.6337, 0.9720, -1.2158, -1.3219),
c(-0.2096, 1.9274, 0.7059, 1.3051, 0.4479, 0.7384, -0.6342, 1.4291, -0.4723),
c(-0.0771, 0.7059, 2.5503, 0.9047, 0.9280, 0.0566, -2.5292, 0.4776, -0.4552),
c(-0.4088, 1.3051, 0.9047, 2.7638, 0.7607, 1.2465, -1.8116, 2.0076, -0.3377),
c(0.0678, 0.4479, 0.9280, 0.7607, 3.8453, -0.2098, -2.0078, -0.1715, -0.3952),
c(-0.6337, 0.7384, 0.0566, 1.2465, -0.2098, 2.0432, -1.0666, 1.7536, -0.1845),
c(0.9720, -0.6342, -2.5292, -1.8116, -2.0078, -1.0666, 4.0882, -1.3587, 0.7287),
c(-1.2158, 1.4291, 0.4776, 2.0076, -0.1715, 1.7536, -1.3587, 2.8789, 0.4094),
c(-1.3219, -0.4723, -0.4552, -0.3377, -0.3952, -0.1845, 0.7287, 0.4094, 4.8406))
library(MASS)
X <- mvrnorm(m, mu, Sigma)
X <- cbind(rep(1, m), X)
trueBeta <- c(0, 0.8, 0, 1, 0.2, 0, 0.4, 1, 0, 0.7)
y <- X %*% trueBeta + rnorm(m, 0, s)
```
```{r}
beta <- Variable(n)
objective <- Minimize(sum_squares(y - X %*% beta))
prob3.1 <- Problem(objective)
```
Here, `y` is the response, `X` is the matrix of predictors, `n` is the
number of predictors, and `beta` is a vector of coefficients on the
predictors. The Ordinary Least-Squares (OLS) solution for `beta`
minimizes the $l_2$-norm of the residuals (i.e. the
root-mean-squared error). As we can see below, `CVXR`'s solution
matches the solution obtained by using `lm`.
```{r}
CVXR_solution3.1 <- solve(prob3.1)
lm_solution3.1 <- lm(y ~ 0 + X)
```
```{r, echo=FALSE, fig.height = 5, fig.width = 7}
library(ggplot2)
lmBeta <- coef(lm_solution3.1)
cvxrBeta <- CVXR_solution3.1$getValue(beta)
coeff <- cbind(lmBeta, cvxrBeta)
colnames(coeff) <- c("OLS", "CVXR")
rownames(coeff) <- paste0("beta[", seq_along(lmBeta) - 1L, "]")
d <- data.frame(coeff = rownames(coeff), coeff)
tidyD <- gather(d, key="type", value="value", OLS, CVXR, factor_key = TRUE)
ggplot(data = tidyD, mapping = aes(x = coeff, y = value)) +
geom_bar(mapping = aes(fill = type), stat = "identity", position = "dodge") +
scale_x_discrete(labels = parse(text = levels(tidyD$coeff)))
```
Obviously, if all you want to do is least-squares linear regression,
you should simply use `lm`. The chief advantage of `CVXR` is its
flexibility, as we will demonstrate in the rest of this section.
### Non-Negative Least Squares
Looking at example 3.1, you may notice that the OLS regression problem
has an objective, but no constraints. In many contexts, we can greatly
improve our model by constraining the solution to reflect our prior
knowledge. For example, we may know that the coefficients `beta` must
be non-negative.
```{r}
prob3.2 <- prob3.1
constraints(prob3.2) <- list(beta >= 0)
solution3.2 <- solve(prob3.2)
```
```{r, echo=FALSE, out.width=700}
cvxrBetaNNLS <- solution3.2$getValue(beta)
coeff <- cbind(trueBeta, lmBeta, cvxrBetaNNLS)
colnames(coeff) <- c("Actual", "OLS", "NNLS")
rownames(coeff) <- paste0("beta[", seq_along(trueBeta) - 1L, "]")
d <- data.frame(coeff = rownames(coeff), coeff)
tidyD <- gather(d, key="type", value="value", Actual, OLS, NNLS, factor_key = TRUE)
ggplot(data = tidyD, mapping = aes(x = coeff, y = value)) +
geom_bar(mapping = aes(fill = type), stat = "identity", position = "dodge") +
scale_x_discrete(labels = parse(text = levels(tidyD$coeff)))
```
As we can see in the figure above, adding that one constraint produced
a massive improvement in the accuracy of the estimates. Not only are
the non-negative least-squares estimates much closer to the true
coefficients than the OLS estimates, they have even managed to recover the
correct sparsity structure in this case.
Like with OLS, there are already R packages available which implement
non-negative least squares, such as `nnls`. But that is
actually an excellent demonstration of the power of `CVXR`: A single
line of code here, namely `prob3.2$constraints <- list(beta >= 0)`, is
doing the work of an entire package.
### Support Vector Classifiers
Another common statistical tool is the support vector classifier
(SVC). The SVC is an affine function (hyperplane) that separates two
sets of points by the widest margin. When the sets are not linearly
separable, the SVC is determined by a trade-off between the width of
the margin and the number of points that are misclassified.
For the binary case, where the response $y_i \in \{-1,1\}$, the SVC is obtained by solving
$$
\begin{array}{ll}
\mbox{minimize} & \frac{1}{2}\Vert\beta\Vert^2 + C\sum_{i=1}^m \xi_i \\
\mbox{subject to} & \xi_i \geq 0, \quad y_i(x_i^T\beta + \beta_0) \geq 1 - \xi_i, \quad i = 1,\ldots,m
\end{array}
$$
with variables $(\beta,\xi)$. Here, $\xi$ is the amount by which a
point can violate the separating hyperplane, and $C > 0$ is a
user-chosen penalty on the total violation. As $C$ increases, fewer
misclassifications will be allowed.
Below, we fit a SVC in `CVXR` with $C = 10$.
```{r}
## Generate data
set.seed(10)
n <- 2
m <- 50
X <- matrix(rnorm(m*n), nrow = m, ncol = n)
y <- c(rep(-1, m/2), rep(1, m/2))
X[y == 1,] = X[y == 1,] + 1
```
```{r}
## Define variables
cost <- 10
beta0 <- Variable()
beta <- Variable(n)
slack <- Variable(m)
# Form problem
objective <- (1/2) * sum_squares(vstack(beta, beta0)) + cost * sum(slack)
constraints <- list(y * (X %*% beta + beta0) >= 1 - slack, slack >= 0)
prob3.3 <- Problem(Minimize(objective), constraints)
solution3.3 <- solve(prob3.3)
```
```{r, echo=FALSE, fig.width = 7}
b0 <- solution3.3$getValue(beta0)
b <- solution3.3$getValue(beta)
## Plot support vector and classified points
plotdata <- as.data.frame(cbind(y, X))
names(plotdata) <- c("y", "X1", "X2")
ggplot(data = plotdata, aes(x = X1, y = X2)) +
geom_point(size = 3, color = factor(3 - plotdata$y)) +
geom_abline(slope = -b[1] / b[2], intercept = (1 - b0) / b[2]) +
labs(title = "Binary Support Vector Classifier")
```
## Disciplined Convex Programming (DCP)
Disciplined convex programming (DCP) is a system for constructing
mathematical expressions with known sign and curvature from a given library of
base functions. `CVXR` uses DCP to ensure that the specified
optimization problems are convex.
The user may find it helpful to read about how the DCP rules are applied
in other languages such
as
[Python](http://www.cvxpy.org/en/latest/tutorial/dcp/index.html),
[Matlab](http://cvxr.com/cvx/doc/dcp.html#),
and [Julia](http://convexjl.readthedocs.io/en/latest/types.html).
`CVXR` implements the same rules, and a short introduction
is [available here](/post/cvxr_dcp). The set of DCP functions are
described in [Function Reference](/post/cvxr_functions/).
## Session Info
```{r}
sessionInfo()
```
## Source
[R Markdown](https://github.com/bnaras/cvxr_docs/blob/master/content/cvxr_examples/gentle_intro.Rmd)
## References