# Entanglement Feature Learning

## EFL

### Convension of Coupling Constants
The original EFL model (for the Ising case) is
$$E[\sigma]=-\sum_{\langle ij\rangle}J_{ij} \chi(\sigma_i^{-1}\sigma_j),$$
where $J_{ij}=\ln D_{ij}$ and $\sigma_{i}$ takes values in the $S_2$ group, and the cycle trace $\chi$ maps $()\to2$, $(1,2)\to1$. The energy difference is one unit of $J_{ij}$. Note that each bond $\langle ij\rangle$ is added only once in the summation. Now for easier treatment in numerics, we map the $S_2$ spin $\sigma_i$ to a $\mathbb{Z}_2$ spin $s_i$, s.t. the energy model becomes
$$E[s]=-\frac{1}{2}\sum_{\langle ij\rangle}J_{ij}s_is_j=-\frac{1}{4}\sum_{i j}J_{ij}s_is_j.$$
We then define
$$K_{ij}=\frac{1}{2}J_{ij}=\frac{1}{2}\ln D_{ij},$$
and rewrite the energy model as
$$E[s]=-\frac{1}{2}\sum_{ij}K_{ij}s_is_j,$$
where the summation double counts a bond, and $s_{i}=\pm1$.

### Deep Boltzmann Machine (Model)

Layers: $s^{0},s^{1},\cdots$, where $s^0$ is the visible layer, rests are hidden layers. The energy model:
$$E[s]=-\sum_{l=1:L}\sum_{i j}s_i^{l-1}K_{ij}^l s_j^l.$$

## Restricted Boltzman Machine (RBM)

### RBM Kernel
Differences to standard RBM:
- the binary units takes values $\pm1$ instead of $0,1$.
- **unbiased**, RTN of pure state are not biased, so the only variational parameter is the weight matrix.
- elements in the weight matrixes are **positive** definite, because they correspnd to the logarithmic bond dimension.

Initialization of weight matrix. Wish to:
- reflect the locality.
- ferromagnetic, tune to critical.
- with some randomness.

Gibbs sampling:
- propagating up:
$$h_j\leftarrow\tanh \left\{\begin{array}{cc}\sum_{i}W_{ij}v_i & \text{default, top},\\ 2\sum_{i}W_{ij}v_i & \text{bottom, intermediate},\end{array}\right.$$
- propagating down:
$$v_i\leftarrow\tanh \left\{\begin{array}{cc}\sum_{j}W_{ij}h_j & \text{default, bottom},\\ 2\sum_{j}W_{ij}h_j & \text{top, intermediate},\end{array}\right.$$

Update rule ($\lambda_\text{l}$ learning rate, $\lambda_\text{f}$ forgetting rate):
$$W\leftarrow \text{relu}[(1-\lambda_\text{f})W+\lambda_\text{l} \text{orth}(v^{\intercal}(0) h(0)-v^{\intercal}(\infty) h(\infty))]$$
- If any element in $W$ becomes negative, the element is set to zero.
- Forgetting rate can be used to control the bond dimension.

In [288]:
%run 'EFL.py'
rbm = RBM(4,2)

In [287]:
print(rbm.learn(numpy.asarray(numpy.random.randint(0,2,(40,4))*2-1,dtype=float),0.1,0.01))
rbm.W.get_value()

0.7453134201113543


array([[ 0.165,  0.011],
       [ 0.708,  0.176],
       [ 0.120,  0.285],
       [ 0.071,  0.809]])

In [366]:
print(rbm.learn(numpy.array([[1,1,1,1],[-1,1,-1,1]]*10,dtype=float),0.1,0.01))
rbm.W.get_value()

1.081018265501512


array([[ 1.194,  0.031],
       [ 0.005,  0.940],
       [ 1.189,  0.003],
       [ 0.000,  0.946]])

#### Test by Ideal States
* trivial product state: deep PM, all Ising configuration equal weight.
* random state (maximally thermalized): deep FM, all up or all down.

In [1]:
%run EFL.py
#train_set = numpy.array([[1,1,1,1],[-1,-1,-1,-1]]*200,dtype=float)
#train_set = numpy.array([[1,-1,1,-1],[-1,1,-1,1]]*200,dtype=float)
train_set = numpy.asarray(numpy.random.randint(0,2,(400,4))*2-1,dtype=float)
rbm = RBM(W='random',method='CD')
#lr_table = [0.2,0.3,0.25,0.15,0.1]
#fr_table = [0.05,0.01,0.,0.,0.]
lr_table = [0.5,0.3,0.2,0.15,0.1,0.1,0.1]
fr_table = [0.,0.,0.,0.,0.,0.,0.]

FM, maximally thermal.

In [3]:
for epoch in range(7):
    cost, xent = rbm.train(train_set,
                           learning_rate=lr_table[epoch],
                           forgetting_rate=fr_table[epoch])
    print('Epoch %d: '%epoch, 'cost = %f, xent = %f'%(cost, xent))
rbm.W.get_value()

Epoch 0:  cost = 0.003711, xent = 2.815545
Epoch 1:  cost = -0.000552, xent = 2.785481
Epoch 2:  cost = 0.001037, xent = 2.788026
Epoch 3:  cost = 0.000119, xent = 2.782246
Epoch 4:  cost = 0.000175, xent = 2.783272
Epoch 5:  cost = -0.000117, xent = 2.788262
Epoch 6:  cost = 0.000849, xent = 2.796292


array([[ 0.05757096,  0.08962437],
       [ 0.08546829,  0.10909487],
       [ 0.04621907,  0.02650601],
       [ 0.00845927,  0.01175318]])

PM, trivial product state.

In [6]:
for epoch in range(7):
    cost, xent = rbm.train(train_set,
                           learning_rate=lr_table[epoch],
                           forgetting_rate=fr_table[epoch])
    print('Epoch %d: '%epoch, 'cost = %f, xent = %f'%(cost, xent))
rbm.W.get_value()

Epoch 0:  cost = 0.112113, xent = 3.242719
Epoch 1:  cost = 0.022328, xent = 3.070230
Epoch 2:  cost = 0.007352, xent = 2.949821
Epoch 3:  cost = 0.013084, xent = 2.926602
Epoch 4:  cost = 0.001652, xent = 2.892917
Epoch 5:  cost = 0.005669, xent = 2.922602
Epoch 6:  cost = 0.004110, xent = 2.880515


array([[ 0.03113473,  0.06022195],
       [ 0.43808222,  0.00861   ],
       [ 0.01715053,  0.01461342],
       [ 0.03886801,  0.29126499]])

### Convolutional RBM
Assumption: translational symmetry (or in statistical sense under disorder average). Weights are shared among kernels for each layer and each group. This makes the algorithm scalable. For disordered system, this will learn the disorder averaged entanglement features.

Consider a 1D free fermion CFT, the Renyi entropy given by (acorrding to Calabrese and Cardy 2004, 2009)
$$S\propto\sum_{i,j}\ln|u_i-v_j|-\sum_{i<j}\ln|u_i-u_j|-\sum_{i<j}\ln|v_i-v_j|.$$
$u_i$ and $v_i$ are positions of kinks and antikinks in the Ising configuraiton.

In [26]:
from itertools import combinations
def entropy_CFT(kinks):
    us = kinks[0::2]
    vs = kinks[1::2]
    Suv = sum(numpy.log(abs(u-v)) for u in us for v in vs)
    Suu = sum(numpy.log(abs(u1-u2)) for u1, u2 in combinations(us,2))
    Svv = sum(numpy.log(abs(v1-v2)) for v1, v2 in combinations(vs,2))
    S = Suv-Suu-Svv
    return S
entropy_CFT([3,10])

1.9459101490553132

## Deep Boltzmann Machine

In [473]:
%run 'EFL.py'
dbm = DBM([8,4,2,1])
data = Server([[+1,+1,+1,+1,+1,+1,+1,+1],
               [+1,-1,-1,-1,+1,+1,+1,-1],
               [-1,-1,-1,-1,+1,+1,+1,+1],
               [+1,+1,+1,-1,-1,-1,+1,-1]]*100,15)
data = Server(numpy.random.randint(0,2,(200,8))*2-1,10)

In [474]:
dbm.pretrain(data,lrs=[0.2,0.3,0.2,0.15],frs=[0.05,0.01])

Pretraining layer 0:
    Epoch 0: cost = 1.242016
    Epoch 1: cost = 1.084836
    Epoch 2: cost = 1.050002
    Epoch 3: cost = 1.064238
    Epoch 4: cost = 1.049167
    Epoch 5: cost = 1.059592
    Epoch 6: cost = 1.018133
Pretraining layer 1:
    Epoch 0: cost = 1.118752
    Epoch 1: cost = 1.000285
    Epoch 2: cost = 1.000000
Pretraining layer 2:
    Epoch 0: cost = 1.204579
    Epoch 1: cost = 1.000001
    Epoch 2: cost = 1.000000


In [475]:
numpy.set_printoptions(formatter={'float': '{: 0.3f}'.format})
for rbm in dbm.rbms:
    print(rbm.W.get_value())

[[ 0.001  0.006  0.000  0.293]
 [ 0.001  0.024  0.071  0.356]
 [ 0.002  0.022  0.245  0.025]
 [ 0.003  0.008  0.040  0.037]
 [ 0.001  0.006  0.193  0.000]
 [ 0.000  0.007  0.040  0.028]
 [ 0.002  0.027  0.266  0.129]
 [ 0.002  0.000  0.023  0.030]]
[[ 0.000  0.000]
 [ 0.000  0.000]
 [ 0.000  0.000]
 [ 0.000  0.000]]
[[ 0.000]
 [ 0.000]]


In [472]:
f = theano.function([dbm.input],dbm.rbms[1].output)
f([[+1,+1,+1,+1,+1,+1,+1,+1],
   [+1,-1,-1,-1,+1,+1,+1,-1],
   [-1,-1,-1,-1,+1,+1,+1,+1],
   [+1,+1,+1,-1,-1,-1,+1,-1]])

array([[ 1.000,  0.939],
       [-1.000,  0.501],
       [-1.000,  0.914],
       [ 1.000, -0.927]])

In [694]:
cost = numpy.asscalar(dbm.rbms[2].learn([[1,1,1,1,1,1,1,1],[1,-1,-1,-1,1,1,1,-1]],0.2,0.01))
print(cost)
for rbm in dbm.rbms:
    print(rbm.W.get_value())

0.35542559726526024
[[ 0.176  0.178  1.126  0.682]
 [ 1.313  0.650  0.184  0.184]
 [ 0.697  1.158  0.181  0.181]
 [ 0.673  1.184  0.181  0.181]
 [ 0.176  0.179  1.272  0.551]
 [ 0.176  0.179  1.278  0.552]
 [ 0.178  0.181  0.864  1.019]
 [ 0.926  0.901  0.179  0.179]]
[[ 0.938  0.000]
 [ 0.860  0.000]
 [ 0.000  1.226]
 [ 0.000  0.827]]
[[ 1.050]
 [ 0.000]]


In [437]:
numpy.set_printoptions(formatter={'float': '{: 0.3f}'.format})

In [548]:
numpy_rng = numpy.random.RandomState()

In [551]:
numpy_rng.uniform(low=0., high=1., size=(4, 3))

array([[ 0.589,  0.422,  0.692],
       [ 0.229,  0.998,  0.034],
       [ 0.262,  0.855,  0.039],
       [ 0.344,  0.751,  0.588]])

In [575]:
Nv, Nh =9,4
vx = numpy.arange(0.5/Nv,1.,1./Nv)
hx = numpy.arange(0.5/Nh,1.,1./Nh)
hxs, vxs = numpy.meshgrid(hx, vx)
W_raw = numpy.exp(-(hxs-vxs)**2*(5*Nh**2))
W_raw *= numpy_rng.uniform(low=0., high=2., size=(Nv, Nh))
W_raw

array([[ 0.257,  0.001,  0.000,  0.000],
       [ 0.607,  0.034,  0.000,  0.000],
       [ 0.271,  0.771,  0.000,  0.000],
       [ 0.006,  0.620,  0.014,  0.000],
       [ 0.000,  0.125,  0.141,  0.000],
       [ 0.000,  0.021,  0.609,  0.002],
       [ 0.000,  0.000,  0.616,  0.210],
       [ 0.000,  0.000,  0.018,  1.052],
       [ 0.000,  0.000,  0.000,  0.320]])

In [558]:
numpy.arange(0.5/Nv,1.,1./Nv)

array([ 0.062,  0.188,  0.312,  0.438,  0.562,  0.688,  0.812,  0.938])

In [748]:
%run 'EFL.py'
db = Server(numpy.array([[1],[2],[3],[4],[5],[6],[7]]), 3)

In [756]:
for b in db:
    print(b)

[[ 7.000]
 [ 1.000]
 [ 2.000]]
[[ 3.000]
 [ 4.000]
 [ 5.000]]


In [774]:
numpy.asarray([[1,2],[3,4]],dtype=float)

array([[ 1.000,  2.000],
       [ 3.000,  4.000]])

In [9]:
class A(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y
        assert self.x == self.y, 'no! %d neq %d'%(self.x, self.y)

In [11]:
A(1,1)

<__main__.A at 0x104232d30>