## Logistic Matrix Factorization

Maximize the following :

$$
\mathcal{L}(W, H) = \sum_{i=1}^I \sum_{j=1}^J M(i,j) ( Y(i,j) \log (\sigma(W(i) H(j))) + (1 - Y(i,j)) \log(1 - \sigma(W(i) H(j))) )
$$

Observed $I\times J$ binary matrix with possibly missing entries
$Y(i,j) \in \{0,1\}$

Mask Matrix
$M(i,j) = 1$ if $Y(i,j)$ is observed, $M(i,j) = 0$ if $Y(i,j)$ is not observed


Here:

$\sigma(x)$ is the sigmoid function defined as
\begin{eqnarray}
\sigma(x) & = & \frac{1}{1+e^{-x}}
\end{eqnarray}


### Properties of the sigmoid function
Note that

\begin{eqnarray}
\sigma(x) & = & \frac{e^x}{(1+e^{-x})e^x} = \frac{e^x}{1+e^{x}} \\
1 - \sigma(x) & = & 1 - \frac{e^x}{1+e^{x}} = \frac{1+e^{x} - e^x}{1+e^{x}} = \frac{1}{1+e^{x}}
\end{eqnarray}

\begin{eqnarray}
\sigma'(x) & = & \frac{e^x(1+e^{x}) - e^{x} e^x}{(1+e^{x})^2} = \frac{e^x}{1+e^{x}}\frac{1}{1+e^{x}} = \sigma(x) (1-\sigma(x))
\end{eqnarray}

\begin{eqnarray}
\log \sigma(x) & = & -\log(1+e^{-x}) = x - \log(1+e^{x}) \\
\log(1 - \sigma(x)) & = &  -\log({1+e^{x}})
\end{eqnarray}



In [1]:
%matplotlib inline
import numpy as np
import matplotlib as mpl
import matplotlib.pylab as plt


# Generate a random logistic regression problem

def sigmoid(t):
    return np.exp(t)/(1+np.exp(t))

I = 5
J = 10

# Random Mask 
#M = np.random.rand(I,J)<0.8
M = (np.random.rand(I,J)<0.8)

# Random Parameters
W = np.random.randn(I,1)
H = np.random.randn(1,J)

Y = np.zeros((I,J))
# Generate class labels
pi = sigmoid(W*H)

for i in range(I):
    for j in range(J):
        if not M[i,j]:
            Y[i,j] = 8
        else:
            Y[i,j] = 1 if pi[i,j] < np.random.rand() else 0


Task: 
Given $Y$ and $M$ only find a good $W$ and $H$ by maximizing the objective $\mathcal{L}$

#### Evaluating the gradient 

$$
\frac{d\mathcal{L}(W,H)}{dW(i)} = \sum_{j=1}^J (M(i,j) (Y(i,j) -\sigma(W(i) H(j)))) H(j)
$$

$$
\frac{d\mathcal{L}(W,H)}{dH(j)} = \sum_{i=1}^I  W(i) (M(i,j) (Y(i,j) -\sigma(W(i) H(j))))
$$


Then use alternating gradient descent 


In [2]:
# Exam Solution
W = np.random.randn(I,1)
H = np.random.randn(1,J)
M = np.random.randn(5,10)
for i in range(I):
    for j in range(J):
        if Y[i,j]==8:  # 8 is random number, different from 1 or 0.
            M[i,j] = 0
        else:
            M[i,j] = 1



def derivW(W,H,Y,M):    
        dW = np.dot((Y - sigmoid(W@H))*M, H.T)
        return dW
def derivH(W,H,Y,M):        
        dH = np.dot(W.T, (Y - sigmoid(W@H))*M)
        return dH


In [3]:
EPOCH = 100000
eta = 0.005

for i in range (EPOCH):
    W = W + eta * derivW(W,H,Y,M)
    H = H + eta * derivH(W,H,Y,M)
    if i % 10000 == 0:
        print(W)
        print(H)


[[ 0.36867974]
 [ 1.67555383]
 [ 0.28436594]
 [-1.03439405]
 [ 0.02769962]]
[[-0.68860129 -0.4203127   0.68293204  1.55542938  0.11433369  0.32687149
  -0.70587784 -1.26206165 -1.94302155 -1.13503902]]
[[-0.21105466]
 [ 9.15795179]
 [ 1.65496489]
 [-0.5040525 ]
 [-3.60765563]]
[[ 0.11712464  0.91811677 -0.18365208 -0.41202653 -6.77143038 -0.38875476
  -1.2208713   2.48971455 -7.02930691  1.59158446]]
[[ -0.19949909]
 [ 12.35334345]
 [  1.81814311]
 [ -0.41254509]
 [ -3.96472607]]
[[ 0.11018265  0.96000762 -0.15479401 -0.34202185 -9.05912342 -0.32433402
  -1.31101864  2.54706616 -9.20902709  1.53426448]]
[[ -0.18826822]
 [ 14.68730146]
 [  1.8887896 ]
 [ -0.36649113]
 [ -4.17279528]]
[[  0.10383542   0.98828251  -0.13947015  -0.3118918  -10.70966673
   -0.29643611  -1.36586485   2.57178291 -10.81421645   1.5244571 ]]
[[ -0.17795366]
 [ 16.57395796]
 [  1.91295554]
 [ -0.33773934]
 [ -4.36463999]]
[[  0.09807003   1.00969544  -0.12808474  -0.29647235 -12.04001865
   -0.28219509  -1.40603