## Logistic Matrix Factorization

Maximize the following :

$$
\mathcal{L}(W, H) = \sum_{i=1}^I \sum_{j=1}^J M(i,j) ( Y(i,j) \log (\sigma(W(i) H(j))) + (1 - Y(i,j)) \log(1 - \sigma(W(i) H(j))) )
$$

Observed $I\times J$ binary matrix with possibly missing entries
$Y(i,j) \in \{0,1\}$

Mask Matrix
$M(i,j) = 1$ if $Y(i,j)$ is observed, $M(i,j) = 0$ if $Y(i,j)$ is not observed


Here:

$\sigma(x)$ is the sigmoid function defined as
\begin{eqnarray}
\sigma(x) & = & \frac{1}{1+e^{-x}}
\end{eqnarray}


### Properties of the sigmoid function
Note that

\begin{eqnarray}
\sigma(x) & = & \frac{e^x}{(1+e^{-x})e^x} = \frac{e^x}{1+e^{x}} \\
1 - \sigma(x) & = & 1 - \frac{e^x}{1+e^{x}} = \frac{1+e^{x} - e^x}{1+e^{x}} = \frac{1}{1+e^{x}}
\end{eqnarray}

\begin{eqnarray}
\sigma'(x) & = & \frac{e^x(1+e^{x}) - e^{x} e^x}{(1+e^{x})^2} = \frac{e^x}{1+e^{x}}\frac{1}{1+e^{x}} = \sigma(x) (1-\sigma(x))
\end{eqnarray}

\begin{eqnarray}
\log \sigma(x) & = & -\log(1+e^{-x}) = x - \log(1+e^{x}) \\
\log(1 - \sigma(x)) & = &  -\log({1+e^{x}})
\end{eqnarray}



In [154]:
%matplotlib inline
import numpy as np
import matplotlib as mpl
import matplotlib.pylab as plt


# Generate a random logistic regression problem

def sigmoid(t):
    return np.exp(t)/(1+np.exp(t))

I = 5
J = 10

# Random Mask 
Mask = (np.random.rand(I,J)<0.8).astype(int)

# Random Parameters
W = np.random.randn(I,1)
H = np.random.randn(1,J)

Y = np.zeros((I,J))
# Generate class labels
pi = sigmoid(W*H)

for i in range(I):
    for j in range(J):
        if Mask[i,j]==0:
            Y[i,j] = 0
        else:
            Y[i,j] = 1 if pi[i,j] < np.random.rand() else 0


Task: 
Given $Y$ and $M$ only find a good $W$ and $H$ by maximizing the objective $\mathcal{L}$

#### Evaluating the gradient 

$$
\frac{d\mathcal{L}(W,H)}{dW(i)} = \sum_{j=1}^J (M(i,j) (Y(i,j) -\sigma(W(i) H(j)))) H(j)
$$

$$
\frac{d\mathcal{L}(W,H)}{dH(j)} = \sum_{i=1}^I  W(i) (M(i,j) (Y(i,j) -\sigma(W(i) H(j))))
$$


Then use alternating gradient descent 


In [155]:
print(Y)
print(Mask)

[[ 1.  0.  0.  0.  1.  1.  0.  0.  0.  0.]
 [ 1.  0.  1.  1.  1.  1.  0.  1.  0.  1.]
 [ 0.  0.  0.  0.  0.  1.  1.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  1.  0.  0.  1.]
 [ 0.  1.  1.  0.  1.  0.  0.  0.  0.  1.]]
[[1 0 0 1 1 1 0 0 1 1]
 [1 0 1 1 1 1 1 1 0 1]
 [1 1 1 0 0 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 1]
 [1 1 1 0 1 0 0 1 1 1]]


In [158]:
def LogisticMF(Y, I, J, Mask, eta=0.005, MAX_ITER = 50000, PRINT_PERIOD=5000):
    W = np.random.randn(I,1)
    H = np.random.randn(1,J)
    
    for epoch in range(MAX_ITER):
        dL_H = np.dot(W.T, Mask * (Y - sigmoid(np.dot(W, H))))
        H = H + eta * dL_H
        dL_W = np.dot(Mask * (Y - sigmoid(np.dot(W, H))), H.T)
        W = W + eta * dL_W

        if epoch % PRINT_PERIOD == 0:
            print(W)  
            print(H) 

    return W,H

In [160]:
W, H = LogisticMF(Y, I, J, Mask)

[[-0.59853249]
 [-1.00980969]
 [-1.18747045]
 [ 0.8264081 ]
 [-0.9334108 ]]
[[-1.52471587  0.21051811  0.16172195  1.03355992  0.28584235  0.61379512
  -1.30742369 -0.41418527 -0.28663207 -0.09554908]]
[[-4.92519246]
 [-1.21783232]
 [ 1.25909322]
 [ 6.34424323]
 [-0.25344606]]
[[-2.27921945 -3.70601311 -4.19256918  0.46398825 -3.9314447  -0.57337607
   3.36365384 -2.26253436 -0.07454671  0.43481575]]
[[-6.48146073]
 [-1.32824191]
 [ 1.33335497]
 [ 8.20696922]
 [-0.29567837]]
[[-1.94872381 -5.32877394 -5.69687769  0.3849627  -5.4778733  -0.48590044
   3.93876482 -1.94854542 -0.0502956   0.36081717]]
[[-7.61765621]
 [-1.39925684]
 [ 1.38696054]
 [ 9.58418244]
 [-0.28552105]]
[[-1.91513558 -6.44780064 -6.73307351  0.34332918 -6.55981644 -0.43770952
   4.20049512 -1.91511621 -0.04083961  0.32428402]]
[[ -8.52922824]
 [ -1.44652826]
 [  1.42137129]
 [ 10.70917279]
 [ -0.27305145]]
[[-1.9199935  -7.33799721 -7.56972058  0.31669811 -7.42795751 -0.40677469
   4.3652793  -1.91998995 -0.03548277