## Logistic Matrix Factorization

Maximize the following :

$$
\mathcal{L}(W, H) = \sum_{i=1}^I \sum_{j=1}^J M(i,j) ( Y(i,j) \log (\sigma(W(i) H(j))) + (1 - Y(i,j)) \log(1 - \sigma(W(i) H(j))) )
$$

Observed $I\times J$ binary matrix with possibly missing entries
$Y(i,j) \in \{0,1\}$

Mask Matrix
$M(i,j) = 1$ if $Y(i,j)$ is observed, $M(i,j) = 0$ if $Y(i,j)$ is not observed


Here:

$\sigma(x)$ is the sigmoid function defined as
\begin{eqnarray}
\sigma(x) & = & \frac{1}{1+e^{-x}}
\end{eqnarray}


### Properties of the sigmoid function
Note that

\begin{eqnarray}
\sigma(x) & = & \frac{e^x}{(1+e^{-x})e^x} = \frac{e^x}{1+e^{x}} \\
1 - \sigma(x) & = & 1 - \frac{e^x}{1+e^{x}} = \frac{1+e^{x} - e^x}{1+e^{x}} = \frac{1}{1+e^{x}}
\end{eqnarray}

\begin{eqnarray}
\sigma'(x) & = & \frac{e^x(1+e^{x}) - e^{x} e^x}{(1+e^{x})^2} = \frac{e^x}{1+e^{x}}\frac{1}{1+e^{x}} = \sigma(x) (1-\sigma(x))
\end{eqnarray}

\begin{eqnarray}
\log \sigma(x) & = & -\log(1+e^{-x}) = x - \log(1+e^{x}) \\
\log(1 - \sigma(x)) & = &  -\log({1+e^{x}})
\end{eqnarray}



In [4]:
%matplotlib inline
import numpy as np
import matplotlib as mpl
import matplotlib.pylab as plt


# Generate a random logistic regression problem

def sigmoid(t):
    return np.exp(t)/(1+np.exp(t))

I = 5
J = 10

# Random Mask 
M = np.random.rand(I,J)<0.8

# Random Parameters
W = np.random.randn(I,1)
H = np.random.randn(1,J)

Y = np.zeros((I,J))
Ycopy=Y.copy()
# Generate class labels
pi = sigmoid(W*H)

for i in range(I):
    for j in range(J):
        if not M[i,j]:
            Y[i,j] = np.nan
        else:
            Y[i,j] = 1 if pi[i,j] < np.random.rand() else 0
            Ycopy[i,j]=Y[i,j]
            

Task: 
Given $Y$ and $M$ only find a good $W$ and $H$ by maximizing the objective $\mathcal{L}$

#### Evaluating the gradient 

$$
\frac{d\mathcal{L}(W,H)}{dW(i)} = \sum_{j=1}^J (M(i,j) (Y(i,j) -\sigma(W(i) H(j)))) H(j)
$$

$$
\frac{d\mathcal{L}(W,H)}{dH(j)} = \sum_{i=1}^I  W(i) (M(i,j) (Y(i,j) -\sigma(W(i) H(j))))
$$


Then use alternating gradient descent 


In [2]:
def iterate(W,H,Y,M,Epoch=5000,eta=0.005,nu=0.1):
    Hfinal=H.copy()
    Wfinal=W.copy()
    for epoch in range(Epoch):
        dL = np.dot(Wfinal.T, Mask*(Y-sigmoid(np.dot(Wfinal,Hfinal)))) -nu*Hfinal
        Hfinal = Hfinal + eta*dL
        dL = np.dot(Mask*(Y-sigmoid(np.dot(Wfinal,Hfinal))),Hfinal.T )-nu*Wfinal
        Wfinal = Wfinal + eta*dL
    return Wfinal,Hfinal

In [10]:
Winitial = np.random.randn(I,1)
Hinitial= np.random.randn(1,J)
Mask=M.astype(int)
Wfinal,Hfinal=iterate(Winitial,Hinitial,Ycopy,Mask,Epoch=5000,eta=0.005,nu=0.1)
print("WActual:",W, sep="\n")
print("WFinal:",Wfinal, sep="\n")
print("HActual:",H, sep="\n")
print("HFinal:",Hfinal, sep="\n")