# ReLU Layers

We can write a ReLU layer $z = \max(Wx+b, 0)$ as the
convex optimization problem
\begin{equation}
\begin{array}{ll}
\mbox{minimize} & \|z-\tilde Wx - b\|_2^2 \\[.2cm]
\mbox{subject to} & z \geq 0, \\
& \tilde W = W,
\end{array}
\label{eq:prob}
\end{equation}
with variables $z$ and $\tilde W$,
and parameters $W$, $b$, and $x$.
(Note that we have added an extra variable $\tilde W$ so
that the problem is DPP.)

We can embed this problem into a PyTorch `Module` and use it
as a layer in a sequential neural network.
We note that this example is purely illustrative;
one can implement a ReLU layer much more efficiently
by directly performing the matrix multiplication, vector addition,
and then taking the positive part.

In [1]:
from cvxpylayers.torch import CvxpyLayer
import torch
import cvxpy as cp

In [2]:
class ReluLayer(torch.nn.Module):
    def __init__(self, D_in, D_out):
        super(ReluLayer, self).__init__()
        self.W = torch.nn.Parameter(1e-3*torch.randn(D_out, D_in))
        self.b = torch.nn.Parameter(1e-3*torch.randn(D_out))
        z = cp.Variable(D_out)
        Wtilde = cp.Variable((D_out, D_in))
        W = cp.Parameter((D_out, D_in))
        b = cp.Parameter(D_out)
        x = cp.Parameter(D_in)
        prob = cp.Problem(cp.Minimize(cp.sum_squares(z-Wtilde@x-b)), [z >= 0, Wtilde==W])
        self.layer = CvxpyLayer(prob, [W, b, x], [z])

    def forward(self, x):
        # when x is batched, repeat W and b 
        if x.ndim == 2:
            batch_size = x.shape[0]
            return self.layer(self.W.repeat(batch_size, 1, 1), self.b.repeat(batch_size, 1), x)[0]
        else:
            return self.layer(self.W, self.b, x)[0]

We generate synthetic data and create a network of two `ReluLayer`s followed by a linear layer.

In [3]:
torch.manual_seed(0)
net = torch.nn.Sequential(
    ReluLayer(20, 20),
    ReluLayer(20, 20),
    torch.nn.Linear(20, 1)
)
X = torch.randn(300, 20)
Y = torch.randn(300, 1)

Now we can optimize the parameters inside the network using, for example, the ADAM optimizer.
The code below solves 15000 convex optimization problems and calls backward 15000 times.

In [4]:
opt = torch.optim.Adam(net.parameters(), lr=1e-2)
for _ in range(25):
    opt.zero_grad()
    l = torch.nn.MSELoss()(net(X), Y)
    print (l.item())
    l.backward()
    opt.step()

1.0796713829040527
1.0764707326889038
1.0727819204330444
1.067252516746521
1.0606187582015991
1.051621913909912
1.0402582883834839
1.0264172554016113
1.0121591091156006
0.9986547231674194
0.9878703951835632
0.9796753525733948
0.9698525667190552
0.9556602239608765
0.939254105091095
0.9228951930999756
0.906936764717102
0.8898395299911499
0.8709890246391296
0.8507254123687744
0.8293333053588867
0.8077667951583862
0.7869061231613159
0.7656839489936829
0.742659330368042
