# Convex Optimization: Quadratic Programming

**Prerequisites**

- Linear Algebra
- Calculus
- Convex Optimization: Theoretical Foundations
- Convex Optimization: Linear Programs

**Outcomes**

- Know the general structure of quadratic programs
- Map linear least squares into a quadratic program
- Solve constrained linear least squares via quadratic programming

In [None]:
# uncomment to install cvxpy if necessary
# %pip install --user cvxpy

import cvxpy as cp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.style.use("ggplot")
pd.set_option('display.float_format', lambda x: '%.4f' % x)

## Quadratic Programming

Recall the general form of a linear program:

$$\begin{align*}
\min_x \ & c^T x \\
& Ax \le b,\\
& x \ge 0
\end{align*}$$

Notice that both the objective function and the constraints are linear in the choice variable $x$

### General Form

An extension to this framework is to allow the objective function to be quadratic in $x$

An optimization problem of this is called quadratic programming

The general form of a quadratic program is

$$\begin{align*}
\min_x \ & \frac{1}{2} x^T P x +  q^T x + r\\
& Gx \le h,\\
& Ax = b.
\end{align*}$$

### Geometry

The feasible space for a quadratic program is defined by the constraints

Because these constraints are all linear in $x$ (as in a linear program), the feasible space is a polygon (as in LP)

![QPfeasibility](qp_feasibility.png)

> Reference: Boyd et. al. 2004

## QP with cvxpy

We can use the `cvxpy` library to represent and solve quadratic programs

As an example suppose we need to minimize the function

$$g(x, y, z) = \frac{3}{2} x^2 + 2 xy + xz + 2y^2 + 2yz + \frac{3}{2}z^2+ 3x + z$$

Subject to the constraint that $x$, $y$, and $z$ are all at least -1

We need to formulate the matrix $P$, vector $q$, and scalar $r$ from the general form of the QP

We can do this by reading off coefficients:

\begin{align*}
P &= \left[\begin{matrix}3 & 2 & 1\\2 & 4 & 2\\1 & 2 & 3\end{matrix}\right]\\
q &= \left[\begin{matrix}3\\0\\1\end{matrix}\right] \\
r &= 0
\end{align*}


We can represent $P$ and $q$ as numpy arrays

In [None]:
def qp_example_arrays():
    P = np.array([[3, 2, 1],
                  [2, 4, 2],
                  [1, 2, 3]])
    q = np.array([3, 0, 1])
    r = 0
    return P, q, r

And now let's write a function that can use these inputs to solve our QP with cvxpy

In [None]:
def solve_qp1(P, q, r):
    N = len(q)
    assert P.shape[0] == P.shape[1] == N
    
    x = cp.Variable(N)
    obj = cp.Minimize(1/2*cp.quad_form(x, P) + q.T@x + r)
    prob = cp.Problem(
        obj,       # objective
        [x >= -1],  # list of constraints
    )
    
    ans = prob.solve()
    
    return x.value, prob

In [None]:
prob1.constraints[0].dual_value

In [None]:
opt_x, prob1 = solve_qp1(*qp_example_arrays())
opt_x, prob1.value

**Exercise**

In the code cell below you will find the outline for a Python class called `QP`

Your task is to implement the functions of `QP` such that the class can 

- Take in variables representing $P$, $q$, $r$, $G$, $h$, $A$, and $b$ from the general form of the quadratic program
- Formulate the general form quadratic program using `cvxpy`
- Solve the quadratic program

In [None]:
class QP:
    """
    Formulate and solve a general form quadratic program using cvxpy
    
    The quadratic program has the following representation
    
    min_x 1/2 x' P x + q'x + r
    s.t. G x <= h
         A x  = b
    """
    def __init__(self, P, q, r, G, h, A, b):
        pass
    
        # change the code below to create a cvxpy Variable
        self.x = None
        
        # formulate the QP here
        self.prob = None
    
    def solve(self):
        pass

### Special Case: Linear Regression

Now we will study the linear regression problem, from a convex optimization perspective

The linear regression problem takes as an input $(x, y)$ pairs of observed data (where $x$ and $y$ may be vector valued) and a proposed manifold of models of the form

$$y = x \beta + \epsilon, \quad \epsilon \sim N\left(0, \sigma^2\right)$$

The task of linear regression is to select the parameter vector $\beta$ that maximizes the likelihood, given the data $(x, y)$

It can be shown that maximizing the likelihood is equivalent to minimizing the sum of squared residuals, which is defined as

$$r \equiv \sum (x \beta - y)^2 = ||x\beta - y||_2^2$$

For this reason, linear regression is also called least squares (LS) or least squares regression

### Analytical Solution

It turns out that the unconstrained LS problem can be solved analytically

We'll work that out now, starting by manipulating the squared residual expression

\begin{align*}
||x\beta - y||_2^2 &= \left [(x \beta - y)^T(x \beta - y) \right] \\
&= \beta^T x^T x \beta - \beta^T x^T y - y^T x \beta + y^T y \\ 
&= \beta^T x^T x \beta - 2 y^T x \beta + y^T y
\end{align*}

Now we differentiate with respect to $\beta$ and set equal to 0:

\begin{align*}
0 &= 2 x^T x \beta - 2 x^T y &\Longrightarrow \\
x^Tx \beta &= x^T y &\Longrightarrow \\
\beta &= (x^T x)^{-1} x^T y
\end{align*}

We will leverage the fact that we know the analytical solution below...

We can also express the LS problem as a quadratic program

There are (for now) no constraints, so we are left with expressing the objective function in the form of a general form QP

Above we showed that 

$$||\beta x - y||_2^2 \equiv \beta^T x^T x \beta - 2 y^T x \beta + y^T y,$$

Which means we have a quadratic program

$$\min_{\beta} (1/2) \beta^T P \beta + q^T \beta + r$$

with 

\begin{align*}
P &= 2 x^T x \\
q &= -2 y^T x \\
r &= y^T y
\end{align*}

#### Example: house prices

We'll do an example using data from house prices in King County Washington (near the city Seattle) from May 2014 to May 2015

Let's first load the data and take a look

> Note: The data comes from [Kaggle](https://www.kaggle.com/harlfoxem/housesalesprediction) . Variable definitions and additional documentation are available at that link.

In [None]:
url = "https://datascience.quantecon.org/assets/data/kc_house_data.csv"
df = pd.read_csv(url)
df.info()

In [None]:
# construct "x" 1/2 of observed data
df["log_sqft_living"] = np.log(df["sqft_living"])
x_cols = [
    'bedrooms',
    'bathrooms',
    'floors',
    'waterfront',
    'view',
    'condition',
    "log_sqft_living"
]

x = df.loc[:, x_cols].copy().astype(float)
x.head()

In [None]:
# notice the log here!
y = np.log(df["price"])
df["log_price"] = y
y.head()

In [None]:
df.plot.scatter(x="log_sqft_living" , y="log_price", alpha=0.35, s=1.5);

We'll first compute the analytical solution:


In [None]:
x_arr = x.to_numpy()
y_arr = y.to_numpy()
beta_hat = np.linalg.inv(x_arr.T @ x_arr) @ x_arr.T @ y_arr
pd.Series(beta_hat, index=list(x))

Now, let's solve this as a quadratic program using cvxpy:

In [None]:
def houses_qp(x, y):
    P = 2 * x.T @ x
    q = -2 * y.T @  x
    r = float(y @ y)
    
    beta = cp.Variable(len(q))
    obj = cp.Minimize(1/2*cp.quad_form(beta, P) + q.T@beta + r)
    prob = cp.Problem(
        obj,       # objective
        [],        # list of constraints
    )
    ans = prob.solve()
    
    return beta.value, prob

In [None]:
beta_qp, prob_houses = houses_qp(x_arr, y_arr)
betas = pd.DataFrame(dict(analytical=beta_hat, qp=beta_qp), index=list(x))
betas

We can see that the beta computed analytically matches very closely to the beta computed via quadratic programming

If we look at the value of the objective function (sum of squared residuals or SSR) at the optimal values we see that they are almost identical

In [None]:
prob_houses.value

In [None]:
((x @ betas).sub(y, axis=0)**2).sum()

#### Why?

If we end up with a slightly worse parameter vector beta (in the sense that the SSR is larger), why solve as quadratic program?

**constraints** and **regularization**

The LS problem is only analytically tractable if there are no constraints and the objective function is strictly the SSR

If we would like to add constraints on the value of our parameter vector $\beta$ we must use a constrained least squares routine... or quadratic programming!

### Extension: Constrained Regression

Suppose you work for a financial institution

Your firm has the capacity to invest in $N$ different assets (numbered 1 to $N$)

One day your boss comes to you and tells you to invest in asset $0$

However, due to regulatory constraints you are not allowed to open a position in asset $0$

What you decide to do is construct a portfolio over assets 1 to $N$ that approximate as closely as possible exposure to asset $0$

Knowing quadratic programming and least squares regression you set up the following least squares regression problem:

\begin{align*}
y_0 &= \sum_{i=1}^N \beta_i y_i + \epsilon \\
&= y \beta + \epsilon
\end{align*}

Where you have two sets of constraints:

1. $\beta_i >= 0 \forall i$
2. $\sum_i \beta_i = 1$

The associated QP is

\begin{align*}
\min_{\beta} \quad &|| \beta y - y_0 ||_2^2 \\
s.t. \quad & \mathbf{1}^T\beta = 1 \\
 & \beta \ge 0
\end{align*}

The vector $\beta$ can then be used as portfolio weights for assets 1 to $N$ such that the expected behavior of the portfolio matches the behavior of asset 0

#### Example: US Equities

Suppose that asset 0 is Apple Inc. stock (ticker AAPL)

We have 5 other equities: Walt Disney Co (DIS), Home Depot (HD), McDonalds (MCD), Microsoft (MSFT), Nike (NKE)

Our goal is to construct a synthetic exposure to AAPL using the other 6 equities

In [None]:
df = pd.read_csv("equities.csv", parse_dates=["Date"], index_col=["Date"])
df.plot()

In [None]:
y0_df = df["AAPL"]
y_df = df.drop("AAPL", axis=1)

In [None]:
def aapl_exposure(y_df, y0_df):
    y0 = y0_df.to_numpy()
    y = y_df.to_numpy()
    P = 2 * y.T @ y
    q = -2 * y0.T @  y
    r = float(y0 @ y0)
    
    beta = cp.Variable(len(q))
    obj = cp.Minimize(1/2*cp.quad_form(beta, P) + q.T@beta + r)
    prob = cp.Problem(
        obj,                          # objective
        [beta >= 0, sum(beta) == 1],  # list of constraints
    )
    prob.solve()
    
    return pd.Series(beta.value, index=list(y_df)), prob

In [None]:
beta_aapl, prob_aapl = aapl_exposure(y_df, y0_df)
beta_aapl

In [None]:
prob_aapl.constraints[0].dual_value

In [None]:
prob_aapl.constraints

The synthetic asset is composed of about 1/4 MSFT and 3/4 NKE

Let's see what the daily returns of AAPL vs our portfolio look like

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))
(y_df @ beta_aapl).pct_change().rename("approx").plot(ax=ax, alpha=0.7)
y0_df.pct_change().plot(ax=ax, alpha=0.7)
ax.legend(loc=0)

### Extension: Regularization

Another extension we can make to the standard LS problem using QP is regularization

Regularization can be loosely thought of as any method that seeks to "tame" the value of parameters

The goal behind regularization is to avoid extreme parameter values

In predictive settings the rationale behind that is to avoid having a model or optimization routine focus too strongly on quirks that appear in training data that do not correspond to generic patterns in the underlying population

One very common form of regularization is called L2-regularization or Tychonov regularization

This is applied by adding the following term to the objective function

$$\lambda || x ||_2^2,$$

where $\lambda$ is a constant that governs the strength of the regularization

When this term is added to the objective function, the values of $x$ are compressed towards zero (relative to the solution where this term is not added)

As $\lambda$ is increased, this compression is stronger

Let's implement L2-regularization with cvxpy using our housing data example

In [None]:
def houses_qp_l2(x, y, lam):   
    beta = cp.Variable(x.shape[1])
    obj = cp.Minimize(
        cp.sum_squares(x @ beta - y) + lam * cp.sum_squares(beta)
    )
    prob = cp.Problem(
        obj,       # objective
        [],        # list of constraints
    )
    ans = prob.solve()
    
    return beta.value, prob

In [None]:
for lam in [100, 1000, 5000, 10000]:
    betas[f"qp_l2_{lam}"] = houses_qp_l2(x_arr, y_arr, lam)[0]
    
betas

In [None]:
1.78**2 + 0.16 ** 2

In [None]:
1.4043 ** 2 + 0.4661 ** 2

In [None]:
((x@betas).sub(y, axis=0)**2).sum()

In [None]:
(betas**2).sum(axis=0)

### Example: Huber Regression

We'll now work through an example of something called Huber Regression

This example was originally created by the cvxpy team

We accessed the example at the following GitHub repository and have repeated it here in this notebook almost verbatim: https://github.com/cvxgrp/cvx_short_course/blob/master/applications/huber_regression.ipynb

Credit (and gratitude!) goes to the original authors

The function $\phi(u;M)$ below is called the Huber function

$$
\phi(u;M) = \left\{ \begin{array}{ll} u^2 & |u|\leq M\\
2M |u| - M^2 & |u|>M
\end{array}\right.
$$

Relative to the squared function, the Huber function is more permissive of values of $u$ that are greater in absolute value than $M$

In the cells below we plot this function

In [None]:
class Huber:
    def __init__(self, M):
        self.M = M
    
    def __call__(self, u):
        out = np.zeros(len(u))
        small = np.abs(u) <= self.M
        out[small] = u[small]**2
        out[~small] = 2*self.M*abs(u[~small]) - self.M**2
        return out
    

In [None]:
fig, ax = plt.subplots()
phi = Huber(1)
u = np.linspace(-4, 4, 100)
ax.plot(u, phi(u), label="Huber")
ax.plot(u, u**2, label="L2")
ax.legend(loc=0);

#### Example

In the following code we do a numerical example of Huber regression.
We generate $m=450$ measurements with $n=300$ regressors

We randomly choose $\beta^\mathrm{true}$ and $x_i \sim \mathcal N(0,I)$

We set $y_i = (\beta^\mathrm{true})^Tx_i + \epsilon_i$, where $\epsilon_i \sim
\mathcal N(0,1)$

Then with probability $p$ we replace $y_i$ with $-y_i$

The data has fraction $p$ of (non-obvious) wrong measurements

The distribution of "good" and "bad" $y_i$ are the same

Our goal is to recover $\beta^\mathrm{true} \in {\bf R}^n$ from the measurements $y\in {\bf R}^m$

We compare three approaches: 

1. standard regression
2. Huber regression
3. "prescient" regression, where we know which measurements had their sign flipped

We generate $50$ problem instances, with $p$ varying from $0$ to $0.15$, and plot the relative error in reconstructing $\beta^\mathrm{true}$ for the three approaches

Notice that in the range $p \in [0,0.08]$, Huber regression matches prescient regression

Standard regression, by contrast, fails even for very small $p$

In [None]:
# Generate data for Huber regression.
import numpy as np
np.random.seed(1)
n = 300
SAMPLES = int(1.5*n)
beta_true = 5*np.random.normal(size=(n,1))
X = np.random.randn(n, SAMPLES)
Y = np.zeros((SAMPLES,1))
v = np.random.normal(size=(SAMPLES,1))

In [None]:
# Generate data for different values of p.
# Solve the resulting problems.
# WARNING this script takes a few minutes to run.
import cvxpy as cp

TESTS = 50
lsq_data = np.zeros(TESTS)
huber_data = np.zeros(TESTS)
prescient_data = np.zeros(TESTS)
p_vals = np.linspace(0,0.15, num=TESTS)
for idx, p in enumerate(p_vals):
    # Generate the sign changes.
    factor = 2*np.random.binomial(1, 1-p, size=(SAMPLES,1)) - 1
    Y = factor*X.T.dot(beta_true) + v
    
    # Form and solve a standard regression problem.
    beta = cp.Variable((n,1))
    fit = cp.norm(beta - beta_true)/cp.norm(beta_true)
    cost = cp.norm(X.T@beta - Y)
    prob = cp.Problem(cp.Minimize(cost))
    prob.solve()
    lsq_data[idx] = fit.value
    
    # Form and solve a prescient regression problem,
    # i.e., where the sign changes are known.
    cost = cp.norm(cp.multiply(factor, X.T@beta) - Y)
    cp.Problem(cp.Minimize(cost)).solve()
    prescient_data[idx] = fit.value
    
    # Form and solve the Huber regression problem.
    cost = cp.sum(cp.huber(X.T@beta - Y, 1))
    cp.Problem(cp.Minimize(cost)).solve()
    huber_data[idx] = fit.value

In [None]:
# Plot the relative reconstruction error for 
# least-squares, prescient, and Huber regression.
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(p_vals, lsq_data, label='Least squares')
ax.plot(p_vals, huber_data, label='Huber')
ax.plot(p_vals, prescient_data, label='Prescient')
ax.set_ylabel(r'$\||\beta - \beta^{\mathrm{true}}\||_2/\||\beta^{\mathrm{true}}\||_2$')
ax.set_xlabel('p')
ax.legend(loc='upper left');

In [None]:
# Plot the relative reconstruction error for Huber and prescient regression,
# zooming in on smaller values of p.
indices = np.where(p_vals <= 0.08)

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(p_vals[indices], huber_data[indices], 'g', label='Huber')
ax.plot(p_vals[indices], prescient_data[indices], 'r', label='Prescient')
ax.set_ylabel(r'$\||\beta - \beta^{\mathrm{true}}\||_2/\||\beta^{\mathrm{true}}\||_2$')
ax.set_xlabel('p')
ax.set_xlim([0, 0.07])
ax.set_ylim([0, 0.05])
ax.legend(loc='upper left');