# Computation of cutting planes: example 1

# The set-up

In [3]:
import numpy as np
import pandas as pd
import scipy.optimize as opt
from scipy.special import expit # The logistic sigmoid function
import accpm
%load_ext autoreload
%autoreload 1
%aimport accpm

np.set_printoptions(precision=4)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


$\DeclareMathOperator{\domain}{dom}
\newcommand{\transpose}{\text{T}}
\newcommand{\vec}[1]{\begin{pmatrix}#1\end{pmatrix}}$

# Example

To test the computation of cutting planes we consider a logistic regression problem. Particularly, we consider the problem of predicting the incidence of diabetes based on various measurements (see [description](https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes)). We use a normalised version of the data (which can be found at [mldata.org](http://mldata.org/repository/data/download/csv/diabetes_scale/)) where the label to be predicted (the incidence of diabetes) is in the first column. We have the following data processing:

In [4]:
names = ['diabetes', 'num preg', 'plasma', 'bp', 'skin fold', 'insulin', 'bmi', 'pedigree', 'age']
data = pd.read_csv('diabetes_scale.csv', header=None, names=names)
data['diabetes'].replace(-1, 0, inplace=True) # The target variable need be 1 or 0, not 1 or -1
data['ones'] = np.ones((data.shape[0], 1)) # Add a column of ones to represent the constant bias
data.head()

Unnamed: 0,diabetes,num preg,plasma,bp,skin fold,insulin,bmi,pedigree,age,ones
0,0,-0.294118,0.487437,0.180328,-0.292929,-1.0,0.00149,-0.53117,-0.033333,1.0
1,1,-0.882353,-0.145729,0.081967,-0.414141,-1.0,-0.207153,-0.766866,-0.666667,1.0
2,0,-0.058824,0.839196,0.04918,-1.0,-1.0,-0.305514,-0.492741,-0.633333,1.0
3,1,-0.882353,-0.105528,0.081967,-0.535354,-0.777778,-0.162444,-0.923997,-1.0,1.0
4,0,-1.0,0.376884,-0.344262,-0.292929,-0.602837,0.28465,0.887276,-0.6,1.0


This model has 9 input variables $x_0, \dots, x_8$ where $x_8$ is the dummy input variable fixed at 1. (The fixed dummy input variable could easily be $x_5$ or $x_7$, it's index is unimportant.) We set the basis functions to the simplest choice $\phi_0(\mathbf{x}) = x_0, \dots, \phi_8(\mathbf{x}) = x_8$. Our model then has the form
$$
  y(\mathbf{x}) = \sigma(\sum_{j=0}^{8} w_j x_j) = \sigma(\mathbf{w}^T \mathbf{x}.)
$$
Here we have a dataset, $\{(\mathbf{x}_n, t_n)\}_{n=0}^{N}$ where $t_n \in \{0, 1\}$, with $N + 1 =767 + 1 = 768$ examples. We train our model by finding the parameter vector $\mathbf{w}$ which minimizes the (data-dependent) cross-entropy error function
$$
  E_D(\mathbf{w}) =  - \sum_{n=0}^{N} \{t_n \ln \sigma(\mathbf{w}^T \mathbf{x}_n) + (1 - t_n)\ln(1 - \sigma(\mathbf{w}^T \mathbf{x}_n))\}.
$$
The gradient of this function is given by
$$
  \nabla E(\mathbf{w}) = \sum_{i=0}^{N} (\sigma(\mathbf{w}^T \mathbf{x}_n) - t_n)\mathbf{x}_n.
$$
We can then implement these as follows:

In [5]:
def cost(w, X, y, c=0):
    """
    Returns the cross-entropy error function with (optional) sum-of-squares regularization term.
    
    w -- parameters
    X -- dataset of features where each row corresponds to a single sample
    y -- dataset of labels where each row corresponds to a single sample
    c -- regularization coefficient (default = 0)
    """
    outputs = expit(X.dot(w)) # Vector of outputs (or predictions)
    return -( y.transpose().dot(np.log(outputs)) + (1-y).transpose().dot(np.log(1-outputs)) ) + c*0.5*w.dot(w)

def grad(w, X, y, c=0):
    """
    Returns the gradient of the cross-entropy error function with (optional) sum-of-squares regularization term.
    """
    outputs = expit(X.dot(w))
    return X.transpose().dot(outputs-y) + c*w
    
def train(X, y,c=0):
    """
    Returns the vector of parameters which minimizes the error function via the BFGS algorithm.
    """
    initial_values = np.zeros(X.shape[1]) # Error occurs if inital_values is set too high
    return opt.fmin_bfgs(cost, initial_values, fprime=grad, args=(X,y,c))

def predict(w, X):
    """
    Returns a vector of predictions. 
    """
    return expit(X.dot(w))

We test $\texttt{accpm}$ against the parameters attained using SciPy's $\texttt{fmin_bfgs}$ function via the $\texttt{train}$ function. The latter produces the following:

In [10]:
y = data['diabetes'].values
X = data[['num preg', 'plasma', 'bp', 'skin fold', 'insulin', 'bmi', 'pedigree', 'age', 'ones']].values
theta_best1 = train(X, y)
print('---------------- Results ----------------')
print("The parameters attained using SciPy's fmin_bfgs are\n", theta_best1)
print('With these parameters the gradient is\n', grad(theta_best1, X, y))
print('With these parameters the norm of the gradient is', np.linalg.norm(grad(theta_best1, X, y)))

Optimization terminated successfully.
         Current function value: 361.722693
         Iterations: 18
         Function evaluations: 30
         Gradient evaluations: 30
---------------- Results ----------------
The parameters attained using SciPy's fmin_bfgs are
 [-1.05 -3.5   0.81 -0.03  0.5  -3.01 -1.11 -0.45  0.2 ]
With these parameters the gradient is
 [ -6.32e-07  -3.74e-07  -9.98e-07  -6.60e-07   4.10e-08  -4.18e-07
   3.97e-07  -6.32e-07  -3.97e-07]
With these parameters the norm of the gradient is 1.69104497808e-06


The former gives the following where we take the initial polyhedron to be the 9-dimensional cube centered at the origin with side length 20:

In [11]:
A = []
b = []
for i in range(theta_best.shape[0]):
    a_upper = [0]*theta_best.shape[0]
    a_lower = [0]*theta_best.shape[0]
    a_upper[i] = 1
    a_lower[i] = -1
    A.append(a_upper)
    A.append(a_lower)
    b.append(10)
    b.append(10)

A = np.array(A, dtype = accpm.myfloat)
b = np.array(b, dtype = accpm.myfloat)

theta_best2 = accpm.accpm(A, b, cost, grad, args = (X, y),
                          alpha=0.01, beta=0.7, start=1, maxiter = 300, 
                          summary=1, testing=1)[1]


-------- Starting ACCPM --------
Initially: b = [ 10.  10.  10.  10.  10.  10.  10.  10.  10.  10.  10.  10.  10.  10.  10.
  10.  10.  10.] and A =
 [[ 1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [-1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0. -1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0. -1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0. -1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0. -1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0. -1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0. -1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0. -1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  1.]
 [ 0.  0.  0.  0.  0.  0.  0.  0. -1.]]
--------------------------------
Entering iteration 0
At iteration 0 AC computation SUCCEEDED with AC [ 0.  0.  0.  0.  0.  0.  0.  0.  0.] where
 

The algorithm converged in 149 iterations and we observe that 

In [20]:
difference = np.abs(theta_best1-theta_best2)
print("The absolute element wise difference between the parameters attained via SciPy's", 
      "fmin_bfgs function and the ACCPM are\n", difference)
print("The norm value of this difference is", np.linalg.norm(difference))

The absolute element wise difference between the parameters attained via SciPy's fmin_bfgs function and the ACCPM are
 [  1.30e-04   2.27e-05   1.52e-04   7.20e-06   5.95e-04   6.02e-04
   3.32e-04   1.97e-04   2.73e-04]
The norm value of this difference is 0.000989931405895


and so the results of the two functions are comparable. Below we run some more tests and observe that the algorithm converges for smaller initial polyhedrons but fails entirely for larger polyhedrons.

In [29]:
A = []
b = []
for i in range(theta_best.shape[0]):
    a_upper = [0]*theta_best.shape[0]
    a_lower = [0]*theta_best.shape[0]
    a_upper[i] = 1
    a_lower[i] = -1
    A.append(a_upper)
    A.append(a_lower)
    b.append(5)
    b.append(5)

A = np.array(A, dtype = accpm.myfloat)
b = np.array(b, dtype = accpm.myfloat)

theta_best3 = accpm.accpm(A, b, cost, grad, args = (X, y),
                          alpha=0.01, beta=0.7, start=1, maxiter = 300, 
                          summary=1, testing=1)[1]

-------- Starting ACCPM --------
Initially: b = [ 5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.  5.] and A =
 [[ 1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [-1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0. -1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0. -1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0. -1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0. -1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0. -1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0. -1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0. -1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  1.]
 [ 0.  0.  0.  0.  0.  0.  0.  0. -1.]]
--------------------------------
Entering iteration 0
At iteration 0 AC computation SUCCEEDED with AC [ 0.  0.  0.  0.  0.  0.  0.  0.  0.] where
        a_cp = [ 0.3

In [28]:
A = []
b = []
for i in range(theta_best.shape[0]):
    a_upper = [0]*theta_best.shape[0]
    a_lower = [0]*theta_best.shape[0]
    a_upper[i] = 1
    a_lower[i] = -1
    A.append(a_upper)
    A.append(a_lower)
    b.append(20)
    b.append(20)

A = np.array(A, dtype = accpm.myfloat)
b = np.array(b, dtype = accpm.myfloat)

theta_best4 = accpm.accpm(A, b, cost, grad, args = (X, y),
                          alpha=0.01, beta=0.7, start=1, maxiter = 300, 
                          summary=1, testing=1)[1]


-------- Starting ACCPM --------
Initially: b = [ 20.  20.  20.  20.  20.  20.  20.  20.  20.  20.  20.  20.  20.  20.  20.
  20.  20.  20.] and A =
 [[ 1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [-1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0. -1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0. -1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0. -1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0. -1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0. -1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0. -1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0. -1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  1.]
 [ 0.  0.  0.  0.  0.  0.  0.  0. -1.]]
--------------------------------
Entering iteration 0
At iteration 0 AC computation SUCCEEDED with AC [ 0.  0.  0.  0.  0.  0.  0.  0.  0.] where
 

ValueError: array must not contain infs or NaNs

In [27]:
A = []
b = []
for i in range(theta_best.shape[0]):
    a_upper = [0]*theta_best.shape[0]
    a_lower = [0]*theta_best.shape[0]
    a_upper[i] = 1
    a_lower[i] = -1
    A.append(a_upper)
    A.append(a_lower)
    b.append(100)
    b.append(100)

A = np.array(A, dtype = accpm.myfloat)
b = np.array(b, dtype = accpm.myfloat)

theta_best5 = accpm.accpm(A, b, cost, grad, args = (X, y),
                          alpha=0.01, beta=0.7, start=1, maxiter = 300, 
                          summary=1, testing=1)[1]


-------- Starting ACCPM --------
Initially: b = [ 100.  100.  100.  100.  100.  100.  100.  100.  100.  100.  100.  100.
  100.  100.  100.  100.  100.  100.] and A =
 [[ 1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [-1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0. -1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0. -1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0. -1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0. -1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0. -1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0. -1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0. -1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  1.]
 [ 0.  0.  0.  0.  0.  0.  0.  0. -1.]]
--------------------------------
Entering iteration 0
At iteration 0 AC computation SUCCEEDED with AC [ 0.  0.  0.  0.  0.  0.  0

ValueError: array must not contain infs or NaNs