In [1]:
using Plots, Random, LinearAlgebra, Statistics, SparseArrays
include("proxgrad.jl")

proxgrad_const (generic function with 1 method)

# Solving ERM problems

The file `proxgrad.jl` contains code for solving regularized empirical risk minimization (ERM) problems. It provides the optimization function `proxgrad` together with a large number of predefined loss functions and regularizers.
    
The function `proxgrad` solves regularized ERM problems of the form
$$
\mbox{minimize} \quad \sum_{i=1}^n \ell(y_i, w^T x_i) + r(w).    
$$
It solves these with the proximal gradient method, which we will learn shortly.

You can select from a range of losses. For real valued $y$, try:
   * quadratic loss - `QuadLoss()`
   * $\ell_1$ loss - `L1Loss()`
   * quantile loss (for $\alpha$ quantile) - `QuantileLoss(α)`
 
For Boolean $y$, try
   * hinge loss - `HingeLoss()`
   * logistic loss - `LogisticLoss()`
   * weighted hinge loss - `WeightedHingeLoss()`

For nominal $y$, try
   * multinomial loss - `MultinomialLoss()`
   * one vs all loss - `OvALoss()`
       * (by default, it uses the logistic loss for the underlying binary classifier)

For ordinal $y$, try
   * ordinal hinge loss - `OrdinalHingeLoss()`
   * bigger vs smaller loss - `BvSLoss()`
       * (by default, it uses the logistic loss for the underlying binary classifier)
       
It also provides a few regularizers, including 
   * quadratic regularization - `QuadReg()`
   * $\ell_1$ regularization - `OneReg()`
   * nonnegative constraint - `NonNegConstraint()`
       
Below, we provide some examples for how to use the proxgrad function to fit regularized ERM problems.

## generate random data set

First (as usual), we'll generate some random data to try our methods on.

In [2]:
Random.seed!(0)
n = 50
d = 10
X = randn(n,d)
w♮ = randn(d)
y = X*w♮ + randn(n);

## Quadratic loss, quadratic regularizer

$$
\mbox{minimize} \quad \frac 1 n ||Xw - y||^2 + λ||w||^2
$$

In [3]:
# we form \frac 1 n || ⋅ ||^2 by multiplying the QuadLoss() function by 1/n
loss = 1/n*QuadLoss()

# we form λ|| ⋅ ||^2 by multiplying the QuadReg() function by λ
λ = .1
reg = λ*QuadReg()

# minimize 1/n ||Xw - y||^2 + λ||w||^2
w = proxgrad(loss, reg, X, y, maxiters=10) 

norm(X*w-y) / norm(y)

0.257873738852037

`maxiters`, the maximum number of iterations, controls how fully we converge.
You can try increasing it to see if the error improves.

In [4]:
w = proxgrad(loss, reg, X, y, maxiters=100) 
norm(X*w-y) / norm(y)

0.2607432004382123

## Hinge loss, quadratic regularizer

$$
\mbox{minimize} \quad \frac 1 n \sum_{i=1}^n (1 - y_i w^T x_i)_+ + λ||w||^2
$$

In [5]:
ybool = Int.(sign.(y)) # form a boolean target

# we form \frac 1 n \sum_{i=1}^n (1 - ⋅ )_+ by multiplying the HingeLoss() function by 1/n
loss = 1/n*HingeLoss()

# we form λ|| ⋅ ||^2 by multiplying the QuadReg() function by λ
λ = .1
reg = λ*QuadReg()

# minimize 1/n \frac 1 n \sum_{i=1}^n (1 - y_i w^T x_i)_+ + λ||w||^2
w = proxgrad(loss, reg, X, ybool, maxiters=100) 

norm(X*w-y) / norm(y)

0.7258075872083293

# Homework question 

Use the proxgrad function to fit the following objective
    
$$
\mbox{minimize} \quad \frac 1 n \sum_{i=1}^n \log(1 + \exp(- \text{ybool}_i w^T x_i)) + λ||w||^2
$$
for $\lambda = .5$