<a href="https://colab.research.google.com/github/aethelind/notebooks-misc/blob/main/learn_solution_synthetic_aaai_2.1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Decision-Focused Learning Simple Synthetic Learn Problem 2**

# Get required files

In [1]:
# Clear out directory
!rm -rf *
# Download data_decisions_benchmarks.zip and unzip diverse_recommendation_data.pickle
!curl https://bryanwilder.github.io/files/data_decisions_benchmarks.zip | jar xv benchmarks_release/diverse_recommendation_data.pickle
# Move diverse_recommendation_data.pickle to current directory
!mv benchmarks_release/diverse_recommendation_data.pickle .
# Remove empty directory
!rm -rf benchmarks_release
# Download hetrec2011-movielens-2k-v2.zip and unzip movie_actors.dat and user_ratedmovies.dat
!curl https://files.grouplens.org/datasets/hetrec2011/hetrec2011-movielens-2k-v2.zip | jar xv movie_actors.dat user_ratedmovies.dat

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 83.0M  100 83.0M    0     0  16.0M      0  0:00:05  0:00:05 --:--:-- 18.2M
 inflated: benchmarks_release/diverse_recommendation_data.pickle
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 inflated: movie_actors.dat
 29 17.9M   29 5360k    0     0  5624k      0  0:00:03 --:--:--  0:00:03 5618k inflated: user_ratedmovies.dat
100 17.9M  100 17.9M    0     0  9333k      0  0:00:01  0:00:01 --:--:-- 9328k


# Libraries

In [2]:
## submodular.py
import torch

class ContinuousOptimizer(torch.autograd.Function):
    """
    pytorch module for differentiable submodular maximization. The forward pass 
    computes the optimal x for given parameters. The backward pass differentiates 
    that optimal x wrt the parameters.
    """

    @staticmethod
    def forward(ctx, params, optimize_func, get_dgradf_dparams, get_hessian=None, max_x=1., verbose=False):
        """
        Computes the optimal x using the supplied optimizer. 
        """
        ctx.optimize_func = optimize_func
        ctx.get_dgradf_dparams = get_dgradf_dparams
        ctx.verbose = verbose
        ctx.get_hessian = get_hessian
        ctx.all_xs = []
        ctx.max_x = max_x

        import numpy as np
        with torch.enable_grad():
            x = ctx.optimize_func(params, verbose=ctx.verbose)
        ctx.x = x.data
        ctx.all_xs.append(ctx.x.detach().numpy())
        ctx.params = params
        ctx.xgrad = x.grad.data
        return x.data

    @staticmethod
    def backward(ctx, grad_output):
        """
        Differentiates the optimal x returned by the forward pass with respect
        to the ratings matrix that was given as input.
        """
        import numpy as np
        from torch.autograd import Variable
        x = ctx.x
        params = ctx.params
        xgrad = ctx.xgrad
        dxdr = ContinuousOptimizer.get_dxdr(x.detach().numpy(), -xgrad.detach().numpy(), params.detach(
        ).numpy(), ctx.get_dgradf_dparams, ctx.get_hessian, ctx.max_x)
        dxdr_t = torch.from_numpy(np.transpose(dxdr))
        out = torch.mm(dxdr_t.float(), grad_output.view(len(x), 1))
        return out.view_as(params), None, None, None, None

    @staticmethod
    def get_dxdr(x, grad, params, get_dgradf_dparams, get_hessian, max_x):
        '''
        Returns the derivative of the optimal solution in the region around x in 
        terms of the rating matrix r. 

        x: an optimal solution

        grad: df/dx at x

        params: the current parameter settings
        '''
        import numpy as np
        import scipy as sp
        import scipy.sparse
        import scipy.linalg
        n = len(x)
        # first get the optimal dual variables via the KKT conditions
        # dual variable for constraint sum(x) <= k
        if np.logical_and(x > 0, x < max_x).any():
            lambda_sum = np.mean(grad[np.logical_and(x > 0, x < max_x)])
        else:
            lambda_sum = 0
        # dual variable for constraint x <= max_x
        lambda_upper = []
        # dual variable for constraint x >= 0
        lambda_lower = []
        for i in range(n):
            if np.abs(x[i] - max_x) < 0.000001:
                lambda_upper.append(grad[i] - lambda_sum)
            else:
                lambda_upper.append(0)
            if x[i] > 0:
                lambda_lower.append(0)
            else:
                lambda_lower.append(grad[i] - lambda_sum)
        # number of constraints
        m = 2*n + 1
        # collect value of dual variables
        lam = np.zeros((m))
        lam[0] = lambda_sum
        lam[1:(n+1)] = lambda_upper
        lam[n+1:] = lambda_lower
        diag_lambda = np.matrix(np.diag(lam))
        # collect value of constraints
        g = np.zeros((m))
        # TODO: replace the second x.sum() with k so that this is actually generally correct
        g[0] = x.sum() - x.sum()
        g[1:(n+1)] = x - max_x
        g[n+1:] = -x
        diag_g = np.matrix(np.diag(g))
        # gradient of constraints wrt x
        dgdx = np.zeros((m, n))
        # gradient of constraint sum(x) <= k
        dgdx[0, :] = 1
        # gradient of constraints x <= 1
        for i in range(1, n+1):
            dgdx[i, i-1] = 1
        # gradient of constraints x >= 0 <--> -x <= 0
        for i in range(n+1, m):
            dgdx[i, i-(n+1)] = -1
        dgdx = np.matrix(dgdx)
        # the Hessian matrix -- all zeros for now
        if get_hessian == None:
            H = np.matrix(np.zeros((n, n)))
        else:
            H = get_hessian(x, params)
        # coefficient matrix for the linear system
        A = np.bmat([[H, np.transpose(dgdx)], [diag_lambda*dgdx, diag_g]])
        # add 0.01*I to improve conditioning
        A = A + 0.01*np.eye(n+m)
        # RHS of the linear system, mostly partial derivative of grad f wrt params
        dgradf_dparams = get_dgradf_dparams(x, params, num_samples=1000)
        reshaped = np.zeros(
            (dgradf_dparams.shape[0], dgradf_dparams.shape[1]*dgradf_dparams.shape[2]))
        for i in range(n):
            reshaped[i] = dgradf_dparams[i].flatten()
        b = np.bmat([[reshaped], [np.zeros((m, reshaped.shape[1]))]])
        # solution to the system
        derivatives = sp.linalg.solve(A, b)
        if np.isnan(derivatives).any():
            print('report')
            print(np.isnan(A).any())
            print(np.isnan(b).any())
            print(np.isnan(dgdx).any())
            print(np.isnan(diag_lambda).any())
            print(np.isnan(diag_g).any())
            print(np.isnan(dgradf_dparams).any())
        # first n are derivatives of primal variables
        derivatives = derivatives[:n]
        return derivatives


In [3]:
# coverage.py
import torch
import numpy as np
from numba import jit


@jit
def gradient_coverage(x, P, w):
    n = len(w)
    m = len(x)
    grad = np.zeros(m, dtype=np.float32)
    for i in range(n):
        p_fail = 1 - x*P[:, i]
        p_all_fail = np.prod(p_fail)
        for j in range(m):
            grad[j] += w[i] * P[j, i] * p_all_fail/p_fail[j]
    return grad


@jit
def hessian_coverage(x, P, w):
    n = len(w)
    m = len(x)
    hessian = np.zeros((m, m), dtype=np.float32)
    for i in range(n):
        p_fail = 1 - x*P[:, i]
        p_all_fail = np.prod(p_fail)
        for j in range(m):
            for k in range(m):
                hessian[j, k] = -w[i] * P[j, i] * \
                    p_all_fail/(p_fail[j] * p_fail[k])
    return hessian


@jit
def objective_coverage(x, P, w):
    n = len(w)
    total = 0
    for i in range(n):
        p_fail = 1 - x*P[:, i]
        p_all_fail = np.prod(p_fail)
        total += w[i] * (1 - p_all_fail)
    return total


class CoverageInstanceMultilinear(torch.autograd.Function):
    """
    Represents a coverage instance with given coverage probabilities
    P and weights w. Forward pass computes the objective value (if evaluate_forward
    is true). Backward computes the gradients wrt decision variables x.
    """
    @staticmethod
    def forward(ctx, x, P, w, evaluate_forward):
        ctx.evaluate_forward = evaluate_forward
        if type(P) != np.ndarray:
            P = P.detach().numpy()
        if type(w) != np.ndarray:
            w = w.detach().numpy()
        ctx.P = P
        ctx.w = w

        ctx.x = x.detach().numpy()
        if ctx.evaluate_forward:
            out = objective_coverage(ctx.x, ctx.P, ctx.w)
        else:
            out = -1
        return torch.tensor(out).float()

    @staticmethod
    def backward(ctx, grad_in):
        grad = gradient_coverage(ctx.x, ctx.P, ctx.w)
        return torch.from_numpy(grad).float()*grad_in.float(), None, None, None


def optimize_coverage_multilinear(P, w, verbose=True, k=10, c=1., minibatch_size=None):
    '''
    Run some variant of SGD for the coverage problem with given 
    coverage probabilities P and weights w

    '''
    import torch
    # from utils import project_uniform_matroid_boundary as project

    # objective which will provide gradient evaluations
    # coverage = CoverageInstanceMultilinear.apply(P, w, verbose) # move to call below
    # decision variables
    x = torch.zeros(P.shape[0], requires_grad=True)
    # set up the optimizer
    learning_rate = 0.1
    optimizer = torch.optim.SGD(
        [x], momentum=0.9, lr=learning_rate, nesterov=True)
    # take projected stochastic gradient steps
    for t in range(10):
        loss = -CoverageInstanceMultilinear.apply(x, P, w, verbose)
        if verbose:
            print(t, -loss.item())
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()
        x.data = torch.from_numpy(project_uniform_matroid_boundary(x.data.numpy(), k, 1/c)).float()
    return x


@jit
def dgrad_coverage(x, P, num_samples, w):
    n = len(w)
    m = len(x)
    dgrad = np.zeros((m, m, n), dtype=np.float32)
    for i in range(n):
        p_fail = 1 - x*P[:, i]
        p_all_fail = np.prod(p_fail)
        for j in range(m):
            for k in range(m):
                if j == k:
                    dgrad[j, k, i] = w[i] * p_all_fail/p_fail[j]
                else:
                    dgrad[j, k, i] = -w[i] * x[k] * P[j, i] * \
                        p_all_fail/(p_fail[j] * p_fail[k])
    return dgrad


@jit
def dgrad_coverage_stochastic(x, P, num_samples, w, num_real_samples):
    n = len(w)
    m = len(x)
    rand_rows = np.random.choice(list(range(m)), num_real_samples)
    rand_cols = np.random.choice(list(range(n)), num_real_samples)

    dgrad = np.zeros((m, m, n), dtype=np.float32)

    p_fail = np.zeros((n, m), dtype=np.float32)
    p_all_fail = np.zeros((n), dtype=np.float32)
    for i in range(n):
        p_fail[i] = 1 - x*P[:, i]
        p_all_fail[i] = np.prod(p_fail[i])

    for sample in range(num_real_samples):
        k = rand_rows[sample]
        i = rand_cols[sample]
        for j in range(m):
            if j == k:
                dgrad[j, k, i] = w[i] * p_all_fail[i]/p_fail[i, j]
            else:
                dgrad[j, k, i] = -w[i] * x[k] * P[j, i] * \
                    p_all_fail[i]/(p_fail[i, j] * p_fail[i, k])
    return dgrad


In [4]:
## utils.py

def project_uniform_matroid_boundary(x, k, c=1):
    '''
    Exact projection algorithm of Karimi et al. This is the projection implementation
    that should be used now.
    
    Projects x onto the set {y: 0 <= y <= 1/c, ||y||_1 = k}
    '''
    import numpy as np
    k *= c
    n = len(x)
    x = x.copy()
    alpha_upper = x/c
    alpha_lower = (x*c - 1)/c**2
    S = []
    S.extend(alpha_lower)
    S.extend(alpha_upper)
    S.sort()
    S = np.unique(S)
    h = n
    alpha = min(S) - 1
    m = 0
    for i in range(len(S)):
        hprime = h + (S[i] - alpha)*m
        if hprime < k and k <= h:
            alphastar = (S[i] - alpha)*(h - k)/(h - hprime) + alpha
            result = np.zeros((n))
            for j in range(n):
                if alpha_lower[j] > alphastar:
                    result[j] = 1./c
                elif alpha_upper[j] >= alphastar:
                    result[j] = x[j] - alphastar*c
            return result
        m -= (alpha_lower == S[i]).sum()*(c**2)
        m += (alpha_upper == S[i]).sum()*(c**2)
        h = hprime
        alpha = S[i]
    raise Exception('projection did not terminate')

def project_cvx(x, k):
    '''
    Exact Euclidean projection onto the boundary of the k uniform matroid polytope.
    '''
    from cvxpy import Variable, Minimize, sum_squares, Problem
    import numpy as np
    n = len(x)
    p = Variable(n, 1)
    objective = Minimize(sum_squares(p - x))
    constraints = [sum(p) == k, p >= 0, p <= 1]
    prob = Problem(objective, constraints)
    prob.solve()
    return np.reshape(np.array(p.value), x.shape)


### Load & Preprocess Data



The probabilities of coverage below are optimized to be learned by an NN.

It's easily possible to cover all 4 targets with k=2. The highest objective value we can obtain is 4.


In [130]:
# load probability matrix 
P_list = [
[	0	,	1	,	0	,	1	],
[	0	,	1	,	0	,	0	],
[	1	,	1	,	0	,	1	],
[	0	,	1	,	0	,	0	],
[	0	,	0	,	0	,	0	],
[	0	,	0	,	0	,	0	],
[	0	,	0	,	0	,	0	],
[	0	,	1	,	0	,	1	],
[	0	,	1	,	0	,	1	],
[	0	,	0	,	0	,	0	],
[	0	,	1	,	0	,	0	],
[	1	,	1	,	0	,	1	],
[	0	,	0	,	1	,	0	],
[	0	,	1	,	0	,	1	],
[	0	,	1	,	0	,	0	],
[	0	,	1	,	0	,	1	],
[	0	,	1	,	0	,	1	],
[	0	,	1	,	0	,	0	],
[	0	,	1	,	0	,	0	],
[	0	,	0	,	0	,	0	],
[	0	,	0	,	1	,	0	],
[	0	,	1	,	0	,	1	],
[	0	,	1	,	0	,	1	],
[	1	,	1	,	0	,	1	],
[	0	,	0	,	1	,	0	],
[	0	,	1	,	0	,	1	],
]



In [94]:
# load features
circuit_km = [
3.7, 
3, 
6.9, 
3.5, 
2.2, 
2.8, 
2.4, 
4.5, 
4.3, 
2.6, 
3.2, 
5.4, 
1.2, 
4.1, 
3.4, 
4.3, 
4, 
3.3, 
3.4, 
2.4, 
1.2, 
4.5, 
4, 
4.9, 
0.25, 
3.9, 
]
y = []
for i in circuit_km:
  y.append([i])

In [95]:
min(circuit_km)

0.25

In [131]:
#normalized features
circuit_km = [
3.7, 
3, 
6.9, 
3.5, 
2.2, 
2.8, 
2.4, 
4.5, 
4.3, 
2.6, 
3.2, 
5.4, 
1.2, 
4.1, 
3.4, 
4.3, 
4, 
3.3, 
3.4, 
2.4, 
1.2, 
4.5, 
4, 
4.9, 
0.25, 
3.9, 
]
y = []
for i in circuit_km:
  n = (i - min(circuit_km))/(max(circuit_km)-min(circuit_km))
  y.append([n])

In [132]:
## recommendation_nn_decision.py
import numpy as np
import torch
# from coverage import optimize_coverage_multilinear, CoverageInstanceMultilinear, dgrad_coverage, hessian_coverage
import pickle
from functools import partial
# from submodular import ContinuousOptimizer
import torch.nn as nn
import random
# import argparse

# parser = argparse.ArgumentParser()
# parser.add_argument('--layers', type=int, default=1)
# parser.add_argument('--activation', type=str, default='relu')
# parser.add_argument('--k', type=int, default=20)

# args = parser.parse_args()
num_layers = 2
activation = 'relu'
k = 2
use_hessian = False
num_iters = 500
instance_sizes = [0]
learning_rate = 1e-4

Ps = {}
data = {}
f_true = {}

for num_items in instance_sizes:
    Ps_size = np.array(P_list)
    data_size = np.array(y)

    num_targets = Ps_size.shape[1] #500 --> 4
    num_features = data_size.shape[1] #2113 --> 1
    Ps[num_items] = [torch.from_numpy(Ps_size).long()]
    data[num_items] = [torch.from_numpy(data_size).float()]
    w = np.ones(num_targets, dtype=np.float32)
    f_true[num_items] = [(P, w) for P in Ps[num_items]]
  
num_repetitions = 0

train = {}
test = {}
for size in instance_sizes:
  train[size], test[size] = np.array(P_list), np.array(y)

### Train NN

In [136]:
vals = np.zeros((num_repetitions+1, len(instance_sizes), len(instance_sizes)))

for idx in range(num_repetitions, num_repetitions + 1):

    intermediate_size = 26

    # def make_fc():
    #     if num_layers > 1:
    #         if activation == 'relu':
    #             activation_fn = nn.ReLU
    #         elif activation == 'sigmoid':
    #             activation_fn = nn.Sigmoid
    #         else:
    #             raise Exception(
    #                 'Invalid activation function: ' + str(activation))
    #         net_layers = [
    #             nn.Linear(num_features, intermediate_size), activation_fn()]
    #         for hidden in range(num_layers-2):
    #             net_layers.append(
    #                 nn.Linear(intermediate_size, intermediate_size))
    #             net_layers.append(activation_fn())
    #         net_layers.append(nn.Linear(intermediate_size, num_targets))
    #         net_layers.append(nn.Sigmoid())
    #         return nn.Sequential(*net_layers)
    #     else:
    #         return nn.Sequential(nn.Linear(num_features, num_targets), nn.Sigmoid())

    def make_fc():
      net_layers = [nn.Linear(num_features, num_targets), nn.ReLU()]
      net_layers.append(nn.Linear(num_targets, num_targets))
            for hidden in range(num_layers-2):
                net_layers.append(
                    nn.Linear(intermediate_size, intermediate_size))
                net_layers.append(activation_fn())
            net_layers.append(nn.Linear(intermediate_size, num_targets))
            net_layers.append(nn.Sigmoid())
            return nn.Sequential(*net_layers)
        else:
            return nn.Sequential(nn.Linear(num_features, num_targets), nn.Sigmoid())

    # optimizer that will be used for training (and testing)
    optfunc = partial(optimize_coverage_multilinear, w=w, k=k, c=0.95)
    dgrad = partial(dgrad_coverage, w=w)
    if use_hessian:
        hessian = partial(hessian_coverage, w=w)
    else:
        hessian = None

    # runs the given net on instances of a given size
    def eval_opt(net, instances, size):
        net.eval()
        val = 0.
        for i in range(len(instances)):
            pred = net(data[size][i])
            x = ContinuousOptimizer.apply(pred, optfunc, dgrad, None, 0.95)
            pp, ww = f_true[size][i]
            val += objective_coverage(x.detach().numpy(), pp.detach().numpy(), ww)
        net.train()
        return val/len(instances), pred, x

    # train a network for each size, and test on each sizes
    for train_idx, train_size in enumerate(instance_sizes):
        net = make_fc()
        optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
        # training
        for t in range(num_iters):
            print(f"Iteration {t}")
            #i = random.randint(0, 80)
            i=0
            pred = net(data[train_size][i])
            x = ContinuousOptimizer.apply(pred, optfunc, dgrad, None, 0.95)
            pp, ww = f_true[train_size][i]
            loss = -CoverageInstanceMultilinear.apply(x, pp, ww, False)
            optimizer.zero_grad()
            loss.backward(retain_graph=True)
            optimizer.step()
        # save learned network state
        savepath = '/tmp/net_diffopt_smalllr_{0}_{1}_{2}_{3}.pt'.format(
            train_size, k, num_layers, idx)
        torch.save(net.state_dict(), savepath)
        # test on different sizes
        for test_idx, test_size in enumerate(instance_sizes):
            vals[idx, train_idx, test_idx], return_prediction, return_x = eval_opt(net, test, test_size)
            print(vals[idx, train_idx, test_idx])
        # save out values
        print(idx, train_size, vals[idx, train_idx])
        with open('results_recommendation_' + str(num_layers) + '.pickle', 'wb') as f:
            pickle.dump(vals, f)


IndentationError: ignored

In [None]:
return_prediction

tensor([[0.5583, 0.4041, 0.6059, 0.6037],
        [0.5498, 0.3840, 0.6029, 0.5863],
        [0.5968, 0.4995, 0.6196, 0.6793],
        [0.5559, 0.3984, 0.6050, 0.5988],
        [0.5444, 0.3664, 0.5942, 0.5634],
        [0.5473, 0.3783, 0.6020, 0.5813],
        [0.5432, 0.3678, 0.6000, 0.5711],
        [0.5680, 0.4276, 0.6093, 0.6232],
        [0.5656, 0.4217, 0.6085, 0.6184],
        [0.5449, 0.3726, 0.6012, 0.5763],
        [0.5522, 0.3897, 0.6037, 0.5913],
        [0.5735, 0.4409, 0.6113, 0.6341],
        [0.5448, 0.3667, 0.5924, 0.5622],
        [0.5632, 0.4158, 0.6076, 0.6135],
        [0.5547, 0.3955, 0.6046, 0.5963],
        [0.5656, 0.4217, 0.6085, 0.6184],
        [0.5620, 0.4129, 0.6072, 0.6111],
        [0.5534, 0.3926, 0.6042, 0.5938],
        [0.5547, 0.3955, 0.6046, 0.5963],
        [0.5432, 0.3678, 0.6000, 0.5711],
        [0.5448, 0.3667, 0.5924, 0.5622],
        [0.5680, 0.4276, 0.6093, 0.6232],
        [0.5620, 0.4129, 0.6072, 0.6111],
        [0.5729, 0.4394, 0.6111, 0

In [None]:
f_true[0][0][0]

tensor([[0, 1, 0, 1],
        [0, 1, 0, 0],
        [1, 1, 0, 1],
        [0, 1, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 1, 0, 1],
        [0, 1, 0, 1],
        [0, 0, 0, 0],
        [0, 1, 0, 0],
        [1, 1, 0, 1],
        [0, 0, 0, 0],
        [0, 1, 0, 1],
        [0, 1, 0, 0],
        [0, 1, 0, 1],
        [0, 1, 0, 1],
        [0, 1, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 1, 0, 1],
        [0, 1, 0, 1],
        [1, 1, 0, 1],
        [0, 0, 1, 0],
        [0, 1, 0, 1]])

In [None]:
f_true[0][0][0]-return_prediction.round()

tensor([[-0.5583,  0.5959, -0.6059,  0.3963],
        [-0.5498,  0.6160, -0.6029, -0.5863],
        [ 0.4032,  0.5005, -0.6196,  0.3207],
        [-0.5559,  0.6017, -0.6050, -0.5988],
        [-0.5444, -0.3663, -0.5942, -0.5635],
        [-0.5473, -0.3783, -0.6020, -0.5813],
        [-0.5432, -0.3678, -0.6000, -0.5711],
        [-0.5680,  0.5724, -0.6093,  0.3768],
        [-0.5656,  0.5783, -0.6085,  0.3816],
        [-0.5449, -0.3726, -0.6012, -0.5763],
        [-0.5522,  0.6103, -0.6037, -0.5913],
        [ 0.4265,  0.5591, -0.6113,  0.3659],
        [-0.5448, -0.3667, -0.5924, -0.5622],
        [-0.5632,  0.5842, -0.6076,  0.3865],
        [-0.5547,  0.6045, -0.6046, -0.5963],
        [-0.5656,  0.5783, -0.6085,  0.3816],
        [-0.5620,  0.5871, -0.6072,  0.3889],
        [-0.5534,  0.6074, -0.6042, -0.5938],
        [-0.5547,  0.6045, -0.6046, -0.5963],
        [-0.5432, -0.3678, -0.6000, -0.5711],
        [-0.5448, -0.3667, -0.5924, -0.5622],
        [-0.5680,  0.5724, -0.6093

In [None]:
return_prediction.round()

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]], grad_fn=<RoundBackward0>)

In [None]:
return_x

tensor([0.0000, 0.0000, 0.9500, 0.0000, 0.0000, 0.0000, 0.0000, 0.1441, 0.0878,
        0.0000, 0.0000, 0.2776, 0.0000, 0.0332, 0.0000, 0.0878, 0.0065, 0.0000,
        0.0000, 0.0000, 0.0000, 0.1441, 0.0065, 0.2623, 0.0000, 0.0000],
       grad_fn=<ContinuousOptimizerBackward>)

In [None]:
return_x.round()

tensor([0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0.], grad_fn=<RoundBackward0>)

In [None]:
pp, ww = f_true[size][0]

In [None]:
x = ContinuousOptimizer.apply(return_prediction, optfunc, dgrad, None, 0.95)

RuntimeError: ignored

In [None]:
x

tensor([0.0000, 0.0000, 0.9500, 0.0000, 0.0000, 0.0000, 0.0000, 0.1456, 0.0889,
        0.0000, 0.0000, 0.2661, 0.0000, 0.0341, 0.0000, 0.0889, 0.0073, 0.0000,
        0.0000, 0.0000, 0.0000, 0.1456, 0.0073, 0.2661, 0.0000, 0.0000],
       grad_fn=<ContinuousOptimizerBackward>)

In [None]:
n = len(ww)
total = 0
for i in range(n):
  p_fail = 1 - return_x.detach().numpy()*return_prediction.detach().numpy()[:, i]
  p_all_fail = np.prod(p_fail)
  total += ww[i] * (1 - p_all_fail)

In [None]:
total

3.060387745499611

In [None]:
objective_coverage(x.detach().numpy(), pp.detach().numpy(), ww)

2.9423573277890682

In [None]:
x.topk(2)

torch.return_types.topk(
values=tensor([0.9500, 0.2776], grad_fn=<TopkBackward0>),
indices=tensor([ 2, 11]))

In [None]:
eval_loss(net, test, test_size)

(tensor(0.8345, grad_fn=<DivBackward0>),
 tensor([[0.5583, 0.4041, 0.6059, 0.6037],
         [0.5498, 0.3840, 0.6029, 0.5863],
         [0.5968, 0.4995, 0.6196, 0.6793],
         [0.5559, 0.3984, 0.6050, 0.5988],
         [0.5444, 0.3664, 0.5942, 0.5634],
         [0.5473, 0.3783, 0.6020, 0.5813],
         [0.5432, 0.3678, 0.6000, 0.5711],
         [0.5680, 0.4276, 0.6093, 0.6232],
         [0.5656, 0.4217, 0.6085, 0.6184],
         [0.5449, 0.3726, 0.6012, 0.5763],
         [0.5522, 0.3897, 0.6037, 0.5913],
         [0.5735, 0.4409, 0.6113, 0.6341],
         [0.5448, 0.3667, 0.5924, 0.5622],
         [0.5632, 0.4158, 0.6076, 0.6135],
         [0.5547, 0.3955, 0.6046, 0.5963],
         [0.5656, 0.4217, 0.6085, 0.6184],
         [0.5620, 0.4129, 0.6072, 0.6111],
         [0.5534, 0.3926, 0.6042, 0.5938],
         [0.5547, 0.3955, 0.6046, 0.5963],
         [0.5432, 0.3678, 0.6000, 0.5711],
         [0.5448, 0.3667, 0.5924, 0.5622],
         [0.5680, 0.4276, 0.6093, 0.6232],
         [0.5

In [None]:
[f_true[0][0][0][2], f_true[0][0][0][23]]

[tensor([1, 1, 0, 0]), tensor([1, 0, 0, 1])]

With this problem I would get a group of 3 customers out of a possible optimal 4 customers. This is the same as the two-stage approach. 

We still haven't coerced the algo to giving an infeasible solution because the cardinality constraint is imposed at the end of the algo.

## Two-Stage Approach

In [141]:
num_iters =9000

vals = np.zeros((num_repetitions+1, len(instance_sizes), len(instance_sizes)))

for idx in range(num_repetitions, num_repetitions + 1):

    intermediate_size = 4

    # def make_fc():
    #     if num_layers > 1:
    #         if activation == 'relu':
    #             activation_fn = nn.ReLU
    #         elif activation == 'sigmoid':
    #             activation_fn = nn.Sigmoid
    #         else:
    #             raise Exception(
    #                 'Invalid activation function: ' + str(activation))
    #         net_layers = [
    #             nn.Linear(num_features, intermediate_size), activation_fn()]
    #         for hidden in range(num_layers-2):
    #             net_layers.append(
    #                 nn.Linear(intermediate_size, intermediate_size))
    #             net_layers.append(activation_fn())
    #         net_layers.append(nn.Linear(intermediate_size, num_targets))
    #         net_layers.append(nn.Sigmoid())
    #         return nn.Sequential(*net_layers)
    #     else:
    #         return nn.Sequential(nn.Linear(num_features, num_targets), nn.Sigmoid())


    def make_fc():
      net_layers = []
      net_layers.append(nn.Linear(num_features, num_targets))
      #net_layers.append(nn.ReLU())
      #net_layers.append(nn.Sigmoid())
      #net_layers.append(nn.Threshold(0.450, 0))
      return nn.Sequential(*net_layers)


    # optimizer that will be used for training (and testing)
    optfunc = partial(optimize_coverage_multilinear, w=w, k=k, c=0.95)
    dgrad = partial(dgrad_coverage, w=w)
    if use_hessian:
        hessian = partial(hessian_coverage, w=w)
    else:
        hessian = None

    # runs the given net on instances of a given size
    def eval_opt(net, instances, size):
        net.eval()
        val = 0.
        for i in range(len(instances)):
            pred = net(data[size][i])
            x = ContinuousOptimizer.apply(pred, optfunc, dgrad, None, 0.95)
            pp, ww = f_true[size][i]
            val += objective_coverage(x.detach().numpy(), pp.detach().numpy(), ww)
        net.train()
        return val/len(instances), pred, x
        
    def eval_loss(net, instances, size):
        net.eval()
        val = 0.
        for i in range(len(instances)):
            pred = net(data[size][i])
            pp, ww = f_true[size][i]
            val += loss_fn(pred, pp)
        net.train()
        return val/len(instances), pred

    # train a network for each size, and test on each sizes
    for train_idx, train_size in enumerate(instance_sizes):
        net_two_stage = make_fc()
        loss_fn = nn.MSELoss() #nn.MultiLabelSoftMarginLoss()
        optimizer = torch.optim.Adam(net_two_stage.parameters(), lr=learning_rate)
        # training
        for t in range(num_iters):
            print(f"Iteration {t}")
            i=0
            pred = net_two_stage(data[train_size][i])
            pp, ww = f_true[train_size][i]
            loss = loss_fn(pred, pp.to(torch.float32))
            print(loss)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        # save learned network state
        savepath = '/tmp/net_two_stage_diffopt_smalllr_{0}_{1}_{2}_{3}.pt'.format(
            train_size, k, num_layers, idx)
        torch.save(net_two_stage.state_dict(), savepath)
        # test on different sizes
        for test_idx, test_size in enumerate(instance_sizes):
            vals[idx, train_idx, test_idx], prediction_two_stage, x_two_stage = eval_opt(net_two_stage, test, test_size)
            print(vals[idx, train_idx, test_idx])
            print(loss_fn(prediction_two_stage, pp))
        # save out values
        print(idx, train_size, vals[idx, train_idx])
        with open('results_recommendation_' + str(num_layers) + '.pickle', 'wb') as f:
            pickle.dump(vals, f)


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
tensor(0.2072, grad_fn=<MseLossBackward0>)
Iteration 6502
tensor(0.2072, grad_fn=<MseLossBackward0>)
Iteration 6503
tensor(0.2071, grad_fn=<MseLossBackward0>)
Iteration 6504
tensor(0.2070, grad_fn=<MseLossBackward0>)
Iteration 6505
tensor(0.2070, grad_fn=<MseLossBackward0>)
Iteration 6506
tensor(0.2069, grad_fn=<MseLossBackward0>)
Iteration 6507
tensor(0.2069, grad_fn=<MseLossBackward0>)
Iteration 6508
tensor(0.2068, grad_fn=<MseLossBackward0>)
Iteration 6509
tensor(0.2068, grad_fn=<MseLossBackward0>)
Iteration 6510
tensor(0.2067, grad_fn=<MseLossBackward0>)
Iteration 6511
tensor(0.2066, grad_fn=<MseLossBackward0>)
Iteration 6512
tensor(0.2066, grad_fn=<MseLossBackward0>)
Iteration 6513
tensor(0.2065, grad_fn=<MseLossBackward0>)
Iteration 6514
tensor(0.2065, grad_fn=<MseLossBackward0>)
Iteration 6515
tensor(0.2064, grad_fn=<MseLossBackward0>)
Iteration 6516
tensor(0.2064, grad_fn=<MseLossBackward0>)
Iteration 6517
tensor(

In [143]:
loss

tensor(0.1146, grad_fn=<MseLossBackward0>)

In [142]:
initial_net_weights = net_two_stage.get_submodule("0").get_parameter('weight')
initial_net_weights

Parameter containing:
tensor([[ 0.3810],
        [ 1.2843],
        [-0.9319],
        [ 0.8709]], requires_grad=True)

In [144]:
initial_net_bias = net_two_stage.get_submodule("0").get_parameter('bias')
initial_net_bias

Parameter containing:
tensor([-0.0457, -0.0356,  0.5586, -0.2151], requires_grad=True)

In [139]:
f_true[train_size][0][0]

tensor([[0, 1, 0, 1],
        [0, 1, 0, 0],
        [1, 1, 0, 1],
        [0, 1, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 1, 0, 1],
        [0, 1, 0, 1],
        [0, 0, 0, 0],
        [0, 1, 0, 0],
        [1, 1, 0, 1],
        [0, 0, 1, 0],
        [0, 1, 0, 1],
        [0, 1, 0, 0],
        [0, 1, 0, 1],
        [0, 1, 0, 1],
        [0, 1, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 1, 0],
        [0, 1, 0, 1],
        [0, 1, 0, 1],
        [1, 1, 0, 1],
        [0, 0, 1, 0],
        [0, 1, 0, 1]])

In [145]:
prediction_two_stage

tensor([[ 0.1520,  0.6307,  0.0752,  0.2367],
        [ 0.1119,  0.4955,  0.1733,  0.1451],
        [ 0.3353,  1.2488, -0.3732,  0.6558],
        [ 0.1405,  0.5921,  0.1032,  0.2105],
        [ 0.0660,  0.3410,  0.2854,  0.0403],
        [ 0.1004,  0.4569,  0.2013,  0.1189],
        [ 0.0775,  0.3797,  0.2573,  0.0665],
        [ 0.1978,  0.7852, -0.0369,  0.3415],
        [ 0.1864,  0.7466, -0.0089,  0.3153],
        [ 0.0890,  0.4183,  0.2293,  0.0927],
        [ 0.1233,  0.5342,  0.1452,  0.1712],
        [ 0.2494,  0.9591, -0.1630,  0.4594],
        [ 0.0087,  0.1479,  0.4255, -0.0907],
        [ 0.1749,  0.7080,  0.0191,  0.2891],
        [ 0.1348,  0.5728,  0.1172,  0.1974],
        [ 0.1864,  0.7466, -0.0089,  0.3153],
        [ 0.1692,  0.6887,  0.0331,  0.2760],
        [ 0.1291,  0.5535,  0.1312,  0.1843],
        [ 0.1348,  0.5728,  0.1172,  0.1974],
        [ 0.0775,  0.3797,  0.2573,  0.0665],
        [ 0.0087,  0.1479,  0.4255, -0.0907],
        [ 0.1978,  0.7852, -0.0369

In [129]:
pp.to(torch.float32)

tensor([[0., 1., 0., 1.],
        [0., 1., 0., 0.],
        [1., 1., 0., 1.],
        [0., 1., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 1., 0., 1.],
        [0., 1., 0., 1.],
        [0., 0., 0., 0.],
        [0., 1., 0., 0.],
        [1., 1., 0., 1.],
        [0., 0., 0., 0.],
        [0., 1., 0., 1.],
        [0., 1., 0., 0.],
        [0., 1., 0., 1.],
        [0., 1., 0., 1.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 1., 0., 1.],
        [0., 1., 0., 1.],
        [1., 1., 0., 1.],
        [0., 0., 1., 0.],
        [0., 1., 0., 1.]])

In [125]:
loss_fn(prediction_two_stage, pp.to(torch.float32))

tensor(0.2346, grad_fn=<MseLossBackward0>)

In [155]:
final_layer = nn.Sequential(nn.ReLU(), nn.Sigmoid(), nn.Threshold(0.58, 0))

In [156]:
final_layer(prediction_two_stage)

tensor([[0.0000, 0.6527, 0.0000, 0.0000],
        [0.0000, 0.6214, 0.0000, 0.0000],
        [0.5831, 0.7771, 0.0000, 0.6583],
        [0.0000, 0.6438, 0.0000, 0.0000],
        [0.0000, 0.5844, 0.0000, 0.0000],
        [0.0000, 0.6123, 0.0000, 0.0000],
        [0.0000, 0.5938, 0.0000, 0.0000],
        [0.0000, 0.6868, 0.0000, 0.5846],
        [0.0000, 0.6784, 0.0000, 0.0000],
        [0.0000, 0.6031, 0.0000, 0.0000],
        [0.0000, 0.6305, 0.0000, 0.0000],
        [0.0000, 0.7229, 0.0000, 0.6129],
        [0.0000, 0.0000, 0.6048, 0.0000],
        [0.0000, 0.6700, 0.0000, 0.0000],
        [0.0000, 0.6394, 0.0000, 0.0000],
        [0.0000, 0.6784, 0.0000, 0.0000],
        [0.0000, 0.6657, 0.0000, 0.0000],
        [0.0000, 0.6349, 0.0000, 0.0000],
        [0.0000, 0.6394, 0.0000, 0.0000],
        [0.0000, 0.5938, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.6048, 0.0000],
        [0.0000, 0.6868, 0.0000, 0.5846],
        [0.0000, 0.6657, 0.0000, 0.0000],
        [0.0000, 0.7032, 0.0000, 0

In [58]:
eval_loss(net_two_stage, test, test_size)

(tensor(2.1105, grad_fn=<DivBackward0>),
 tensor([[ 0.6728, -1.6228,  1.2495,  0.7856],
         [ 0.3687, -1.2422,  0.9311,  0.7511],
         [ 2.0632, -3.3628,  2.7050,  0.9430],
         [ 0.5859, -1.5141,  1.1586,  0.7757],
         [ 0.0211, -0.8072,  0.5673,  0.7118],
         [ 0.2818, -1.1334,  0.8402,  0.7413],
         [ 0.1080, -0.9159,  0.6582,  0.7216],
         [ 1.0204, -2.0578,  1.6134,  0.8249],
         [ 0.9335, -1.9491,  1.5224,  0.8151],
         [ 0.1949, -1.0247,  0.7492,  0.7315],
         [ 0.4556, -1.3509,  1.0221,  0.7610],
         [ 1.4115, -2.5472,  2.0227,  0.8692],
         [-0.4134, -0.2634,  0.1124,  0.6626],
         [ 0.8466, -1.8403,  1.4314,  0.8052],
         [ 0.5425, -1.4597,  1.1131,  0.7708],
         [ 0.9335, -1.9491,  1.5224,  0.8151],
         [ 0.8032, -1.7859,  1.3860,  0.8003],
         [ 0.4991, -1.4053,  1.0676,  0.7659],
         [ 0.5425, -1.4597,  1.1131,  0.7708],
         [ 0.1080, -0.9159,  0.6582,  0.7216],
         [-0.4134, 

In [157]:
eval_opt(net_two_stage, test, test_size)

(3.5274911522865295, tensor([[ 0.1520,  0.6307,  0.0752,  0.2367],
         [ 0.1119,  0.4955,  0.1733,  0.1451],
         [ 0.3353,  1.2488, -0.3732,  0.6558],
         [ 0.1405,  0.5921,  0.1032,  0.2105],
         [ 0.0660,  0.3410,  0.2854,  0.0403],
         [ 0.1004,  0.4569,  0.2013,  0.1189],
         [ 0.0775,  0.3797,  0.2573,  0.0665],
         [ 0.1978,  0.7852, -0.0369,  0.3415],
         [ 0.1864,  0.7466, -0.0089,  0.3153],
         [ 0.0890,  0.4183,  0.2293,  0.0927],
         [ 0.1233,  0.5342,  0.1452,  0.1712],
         [ 0.2494,  0.9591, -0.1630,  0.4594],
         [ 0.0087,  0.1479,  0.4255, -0.0907],
         [ 0.1749,  0.7080,  0.0191,  0.2891],
         [ 0.1348,  0.5728,  0.1172,  0.1974],
         [ 0.1864,  0.7466, -0.0089,  0.3153],
         [ 0.1692,  0.6887,  0.0331,  0.2760],
         [ 0.1291,  0.5535,  0.1312,  0.1843],
         [ 0.1348,  0.5728,  0.1172,  0.1974],
         [ 0.0775,  0.3797,  0.2573,  0.0665],
         [ 0.0087,  0.1479,  0.4255, -0.

In [59]:
prediction_two_stage.round()

tensor([[ 1., -2.,  1.,  1.],
        [ 0., -1.,  1.,  1.],
        [ 2., -3.,  3.,  1.],
        [ 1., -2.,  1.,  1.],
        [ 0., -1.,  1.,  1.],
        [ 0., -1.,  1.,  1.],
        [ 0., -1.,  1.,  1.],
        [ 1., -2.,  2.,  1.],
        [ 1., -2.,  2.,  1.],
        [ 0., -1.,  1.,  1.],
        [ 0., -1.,  1.,  1.],
        [ 1., -3.,  2.,  1.],
        [-0., -0.,  0.,  1.],
        [ 1., -2.,  1.,  1.],
        [ 1., -1.,  1.,  1.],
        [ 1., -2.,  2.,  1.],
        [ 1., -2.,  1.,  1.],
        [ 0., -1.,  1.,  1.],
        [ 1., -1.,  1.,  1.],
        [ 0., -1.,  1.,  1.],
        [-0., -0.,  0.,  1.],
        [ 1., -2.,  2.,  1.],
        [ 1., -2.,  1.,  1.],
        [ 1., -2.,  2.,  1.],
        [-1.,  0., -0.,  1.],
        [ 1., -2.,  1.,  1.]], grad_fn=<RoundBackward0>)

In [None]:
error_2s = prediction_two_stage.round()-pp
error_2s.sum()

tensor(-9., grad_fn=<SumBackward0>)

In [None]:
x_two_stage

tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.3364, 0.0000, 0.1276, 0.0000, 0.0000,
        0.0319, 0.0000, 0.0000, 0.3929, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.1276, 0.3929, 0.0000, 0.0000, 0.0000, 0.5907, 0.0000],
       grad_fn=<ContinuousOptimizerBackward>)

In [158]:
val_2s, pred_2s, x_2s = eval_opt(net_two_stage, test, test_size)

In [None]:
print(val_2s, pred_2s, x_2s)

0.5906670689582825 tensor([[0.3370, 0.5978, 0.2415, 0.3949],
        [0.3714, 0.5896, 0.2767, 0.4028],
        [0.2034, 0.6347, 0.1207, 0.3596],
        [0.3467, 0.5955, 0.2512, 0.3972],
        [0.4228, 0.5778, 0.3323, 0.4142],
        [0.3815, 0.5873, 0.2874, 0.4051],
        [0.4020, 0.5825, 0.3094, 0.4097],
        [0.2997, 0.6072, 0.2051, 0.3860],
        [0.3088, 0.6049, 0.2138, 0.3882],
        [0.3917, 0.5849, 0.2983, 0.4074],
        [0.3614, 0.5920, 0.2663, 0.4006],
        [0.2798, 0.6124, 0.1864, 0.3810],
        [0.4281, 0.5766, 0.3382, 0.4154],
        [0.3180, 0.6025, 0.2227, 0.3904],
        [0.3516, 0.5943, 0.2562, 0.3983],
        [0.3088, 0.6049, 0.2138, 0.3882],
        [0.3227, 0.6014, 0.2273, 0.3916],
        [0.3565, 0.5931, 0.2612, 0.3994],
        [0.3516, 0.5943, 0.2562, 0.3983],
        [0.4020, 0.5825, 0.3094, 0.4097],
        [0.4281, 0.5766, 0.3382, 0.4154],
        [0.2997, 0.6072, 0.2051, 0.3860],
        [0.3227, 0.6014, 0.2273, 0.3916],
        [0.2819

In [None]:
n = len(ww)
total_2s = 0
for i in range(n):
  p_fail = 1 - x_2s.detach().numpy()*pp.detach().numpy()[:, i]
  p_all_fail = np.prod(p_fail)
  total_2s += ww[i] * (1 - p_all_fail)

In [None]:
total_2s

0.5906670689582825

In [159]:
x_2s.topk(2)

torch.return_types.topk(
values=tensor([0.9500, 0.4414], grad_fn=<TopkBackward0>),
indices=tensor([ 2, 24]))

In [None]:
[f_true[0][0][0][24], f_true[0][0][0][12]]

[tensor([0, 1, 1, 0]), tensor([0, 0, 0, 0])]

With this model, I would recommend circuits 24 and 12 and I would only get two customers out of a possible optimal four.

## Run Optimizer on Ground Truth

In [None]:
x_ground = ContinuousOptimizer.apply(pp, optfunc, dgrad, None, 0.95)

In [None]:
x_ground

tensor([0.0000, 0.0000, 0.4242, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.4242, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.4242, 0.7275, 0.0000])

In [None]:
x_ground.round()

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 1., 0.])

In [None]:
x_ground.topk(2)

torch.return_types.topk(
values=tensor([0.7275, 0.4242]),
indices=tensor([24, 11]))

Running the optimizer on the ground truth set returns one of two optimal solutions. 

This has two implications overall. Improving the model learning algorithm to get closer to the ground truth can lead to an optimal decision. Or decision-focused learning can improve the final decision. 