## Comparing EM with GD

#### First, we set up by choosing the number of clusters $m$, the number of features $n$, and generate samples by use of a BMM with random parameters. The goal is to recover this setting. The sample set is $S$, which is a dictionary containing binary strings $x$. Notice that replicates are gathered together to save space.

In [30]:
import nbm
import numpy as np
m = 2  # number of classes
n = 5  # number of features
samples = 100000 # sample size
# generate the parameters
[gtheta, gPhi] = nbm.randInt(m, n)
# P_D: the distribution that generates S
total = np.zeros(n)
# generate samples
S = dict()
for _ in range(samples):
    t = nbm.bitsToDec(nbm.sampling(n, gPhi, gtheta))
    x = nbm.decToBits(t, n)
    total += x
    if t not in S:
        S[t] = 0
    S[t] += 1
avg = np.divide(total, samples)
print("average of x: ", avg)

average of x:  [0.705  0.947  0.488  0.1037 0.5583]


#### The true average

In [31]:
gtheta[0] * gPhi[0,:] + gtheta[1] * gPhi[1, :]

array([0.7029, 0.9459, 0.4863, 0.102 , 0.5571])

#### We run EM with unbalanced initialization. One can see that the mixing coefficient of the first component $\theta_1$ increases.

In [32]:
# initialize params
print("-------------------EM: ----------------------------")
theta = np.array([0.01, 0.99])
Phi = np.random.rand(m, n)
[theta1, Phi1] = nbm.em(n, m, samples, S, theta, Phi)
print("final theta: ", theta1)
print("final Phi: ", Phi1)
avg_model = np.matmul(np.transpose(theta1), Phi1)
print("model mean: ", avg_model)

-------------------EM: ----------------------------
final theta:  [0.5293 0.4707]
final Phi:  [[0.4877 0.9034 0.2624 0.0063 0.5751]
 [0.9495 0.996  0.7418 0.2132 0.5394]]
model mean:  [0.705  0.947  0.488  0.1037 0.5583]


### Check the gradients at the point EM converges to 

In [33]:
[gradth, gradPhi] = nbm.gradient(n, theta1, Phi1, samples, S)
print("gradth: ", gradth)
print("gradPhi: ", gradPhi)

gradth:  [-1. -1.]
gradPhi:  [[ 0.      0.      0.      0.0001 -0.    ]
 [ 0.      0.      0.      0.     -0.    ]]


### Print the effective number and the effective mean

In [34]:
effnum = np.zeros(m)
for c in range(m):
    effnum[c] = nbm.eff_num(n, theta, Phi, samples, S, c)
print("effnum: ", effnum / samples)
effmean = np.zeros((m, n))
for c in range(m):
    effmean[c, :] = nbm.eff_mean(n, theta, Phi, samples, S, c)
print("effmean: ", effmean)

effnum:  [0.564 0.436]
effmean:  [[0.6033 0.9153 0.4245 0.0001 0.6267]
 [0.8366 0.9879 0.5702 0.2377 0.4698]]


### The result for GD with random initialization of parameters. The algorithm can converge to $1$-cluster points.

In [36]:
print("-------------------GD: ----------------------------")
for _ in range(10):
    [theta, Phi] = nbm.randInt(m, n)
    theta=np.ones(m)/m
    [ell, theta, Phi, iterId] = nbm.gradientDescent(n, samples, S, theta, Phi, numIterations=20000, alpha = 0.02, tolerance = 1e-7)
    print("final ell: ", ell)
    print("final theta: ", theta)
    print("final Phi: ", Phi)
    [gradth, gradPhi] = nbm.gradient(n, theta, Phi, samples, S)
#    print("gradth: ", gradth)
#    print("gradPhi: ", gradPhi)
avg_model = np.matmul(np.transpose(theta), Phi)
print("model mean: ", avg_model)

-------------------GD: ----------------------------
GD converges at  180 steps
final ell:  2.526094022996858
final theta:  [1. 0.]
final Phi:  [[0.705  0.947  0.488  0.1037 0.5583]
 [0.7926 0.4691 0.2759 0.6606 0.4687]]
GD converges at  183 steps
final ell:  2.52609402299685
final theta:  [0. 1.]
final Phi:  [[0.3923 0.4229 0.6484 0.5001 0.7605]
 [0.705  0.947  0.488  0.1037 0.5583]]
GD converges at  4079 steps
final ell:  2.512558428601922
final theta:  [0.9261 0.0739]
final Phi:  [[0.6815 0.9427 0.4472 0.084  0.523 ]
 [1.     1.     1.     0.3505 1.    ]]
GD converges at  5105 steps
final ell:  2.512558428602112
final theta:  [0.9261 0.0739]
final Phi:  [[0.6815 0.9427 0.4472 0.084  0.523 ]
 [1.     1.     1.     0.3505 1.    ]]
GD converges at  2272 steps
final ell:  2.465732418434597
final theta:  [0.562 0.438]
final Phi:  [[0.9134 0.9899 0.6924 0.1844 0.5089]
 [0.4376 0.8919 0.2257 0.     0.6842]]
GD converges at  1882 steps
final ell:  2.465732531463657
final theta:  [0.562 0.438