# Entanglement Feature Learning

## EFL

### Convension of Coupling Constants
The original EFL model (for the Ising case) is
$$E[\sigma]=-\sum_{\langle ij\rangle}J_{ij} \chi(\sigma_i^{-1}\sigma_j),$$
where $J_{ij}=\ln D_{ij}$ and $\sigma_{i}$ takes values in the $S_2$ group, and the cycle trace $\chi$ maps $()\to2$, $(1,2)\to1$. The energy difference is one unit of $J_{ij}$. Note that each bond $\langle ij\rangle$ is added only once in the summation. Now for easier treatment in numerics, we map the $S_2$ spin $\sigma_i$ to a $\mathbb{Z}_2$ spin $s_i$, s.t. the energy model becomes
$$E[s]=-\frac{1}{2}\sum_{\langle ij\rangle}J_{ij}s_is_j=-\frac{1}{4}\sum_{i j}J_{ij}s_is_j.$$
We then define
$$K_{ij}=\frac{1}{2}J_{ij}=\frac{1}{2}\ln D_{ij},$$
and rewrite the energy model as
$$E[s]=-\frac{1}{2}\sum_{ij}K_{ij}s_is_j,$$
where the summation double counts a bond, and $s_{i}=\pm1$.

### Deep Boltzmann Machine (Model)

Layers: $s^{0},s^{1},\cdots$, where $s^0$ is the visible layer, rests are hidden layers. The energy model:
$$E[s]=-\sum_{l=1:L}\sum_{i j}s_i^{l-1}K_{ij}^l s_j^l.$$

## Restricted Boltzman Machine (RBM)

### RBM Kernel
Differences to standard RBM:
- the binary units takes values $\pm1$ instead of $0,1$.
- **unbiased**, RTN of pure state are not biased, so the only variational parameter is the weight matrix.
- elements in the weight matrixes are **positive** definite, because they correspnd to the logarithmic bond dimension.

Initialization of weight matrix. Wish to:
- reflect the locality.
- ferromagnetic, tune to critical.
- with some randomness.

Gibbs sampling:
- propagating up:
$$h_j\leftarrow\tanh \left\{\begin{array}{cc}\sum_{i}W_{ij}v_i & \text{default, top},\\ 2\sum_{i}W_{ij}v_i & \text{bottom, intermediate},\end{array}\right.$$
- propagating down:
$$v_i\leftarrow\tanh \left\{\begin{array}{cc}\sum_{j}W_{ij}h_j & \text{default, bottom},\\ 2\sum_{j}W_{ij}h_j & \text{top, intermediate},\end{array}\right.$$

Update rule ($\lambda_\text{l}$ learning rate, $\lambda_\text{f}$ forgetting rate):
$$W\leftarrow \text{relu}[(1-\lambda_\text{f})W+\lambda_\text{l} (v^{\intercal}(0) h(0)-v^{\intercal}(\infty) h(\infty))]$$
- If any element in $W$ becomes negative, the element is set to zero.
- Forgetting rate can be used to control the bond dimension.

In [288]:
%run 'EFL.py'
rbm = RBM(4,2)

In [287]:
print(rbm.learn(numpy.asarray(numpy.random.randint(0,2,(40,4))*2-1,dtype=float),0.1,0.01))
rbm.W.get_value()

0.7453134201113543


array([[ 0.165,  0.011],
       [ 0.708,  0.176],
       [ 0.120,  0.285],
       [ 0.071,  0.809]])

In [366]:
print(rbm.learn(numpy.array([[1,1,1,1],[-1,1,-1,1]]*10,dtype=float),0.1,0.01))
rbm.W.get_value()

1.081018265501512


array([[ 1.194,  0.031],
       [ 0.005,  0.940],
       [ 1.189,  0.003],
       [ 0.000,  0.946]])

#### Test by Ideal States
* trivial product state: deep PM, all Ising configuration equal weight.
* random state (maximally thermalized): deep FM, all up or all down.

In [1]:
%run EFL.py
#train_set = numpy.array([[1,1,1,1],[-1,-1,-1,-1]]*200,dtype=float)
#train_set = numpy.array([[1,-1,1,-1],[-1,1,-1,1]]*200,dtype=float)
train_set = numpy.asarray(numpy.random.randint(0,2,(400,4))*2-1,dtype=float)
rbm = RBM(W='random',method='CD')
#lr_table = [0.2,0.3,0.25,0.15,0.1]
#fr_table = [0.05,0.01,0.,0.,0.]
lr_table = [0.5,0.3,0.2,0.15,0.1,0.1,0.1]
fr_table = [0.,0.,0.,0.,0.,0.,0.]

FM, maximally thermal.

In [3]:
for epoch in range(7):
    cost, xent = rbm.train(train_set,
                           learning_rate=lr_table[epoch],
                           forgetting_rate=fr_table[epoch])
    print('Epoch %d: '%epoch, 'cost = %f, xent = %f'%(cost, xent))
rbm.W.get_value()

Epoch 0:  cost = 0.003711, xent = 2.815545
Epoch 1:  cost = -0.000552, xent = 2.785481
Epoch 2:  cost = 0.001037, xent = 2.788026
Epoch 3:  cost = 0.000119, xent = 2.782246
Epoch 4:  cost = 0.000175, xent = 2.783272
Epoch 5:  cost = -0.000117, xent = 2.788262
Epoch 6:  cost = 0.000849, xent = 2.796292


array([[ 0.05757096,  0.08962437],
       [ 0.08546829,  0.10909487],
       [ 0.04621907,  0.02650601],
       [ 0.00845927,  0.01175318]])

PM, trivial product state.

In [6]:
for epoch in range(7):
    cost, xent = rbm.train(train_set,
                           learning_rate=lr_table[epoch],
                           forgetting_rate=fr_table[epoch])
    print('Epoch %d: '%epoch, 'cost = %f, xent = %f'%(cost, xent))
rbm.W.get_value()

Epoch 0:  cost = 0.112113, xent = 3.242719
Epoch 1:  cost = 0.022328, xent = 3.070230
Epoch 2:  cost = 0.007352, xent = 2.949821
Epoch 3:  cost = 0.013084, xent = 2.926602
Epoch 4:  cost = 0.001652, xent = 2.892917
Epoch 5:  cost = 0.005669, xent = 2.922602
Epoch 6:  cost = 0.004110, xent = 2.880515


array([[ 0.03113473,  0.06022195],
       [ 0.43808222,  0.00861   ],
       [ 0.01715053,  0.01461342],
       [ 0.03886801,  0.29126499]])

### Convolutional RBM
Assumption: translational symmetry (or in statistical sense under disorder average). Weights are shared among kernels for each layer and each group. This makes the algorithm scalable. For disordered system, this will learn the disorder averaged entanglement features.

Consider a 1D free fermion CFT, the Renyi entropy given by (acorrding to Calabrese and Cardy 2004, 2009)
$$S\propto\sum_{i,j}\ln|u_i-v_j|-\sum_{i<j}\ln|u_i-u_j|-\sum_{i<j}\ln|v_i-v_j|.$$
$u_i$ and $v_i$ are positions of kinks and antikinks in the Ising configuraiton.

In [26]:
from itertools import combinations
def entropy_CFT(kinks):
    us = kinks[0::2]
    vs = kinks[1::2]
    Suv = sum(numpy.log(abs(u-v)) for u in us for v in vs)
    Suu = sum(numpy.log(abs(u1-u2)) for u1, u2 in combinations(us,2))
    Svv = sum(numpy.log(abs(v1-v2)) for v1, v2 in combinations(vs,2))
    S = Suv-Suu-Svv
    return S
entropy_CFT([3,10])

1.9459101490553132

## Deep Boltzmann Machine

In [774]:
%run 'EFL.py'
dbm = DBM([8,4,2,1])

In [752]:
test_samples = [
    [+1,+1,+1,+1,+1,+1,+1,+1],
    [-1,-1,-1,-1,-1,-1,-1,-1],
    [+1,+1,+1,+1,-1,-1,-1,-1],
    [-1,-1,-1,-1,+1,+1,+1,+1],
    #[+1,+1,-1,-1,-1,-1,+1,+1],
    #[-1,-1,+1,+1,+1,+1,-1,-1]
]
data = Server(test_samples*100,15)
#data = Server(numpy.random.randint(0,2,(200,8))*2-1,10)

In [753]:
dbm.pretrain(data,lrs=[0.5,0.3,0.2,0.15],frs=[0.05,0.01])

Pretraining layer 0:
    Epoch 0: cost = 0.523708
    Epoch 1: cost = 0.185438
    Epoch 2: cost = 0.020546
    Epoch 3: cost = 0.021498
Pretraining layer 1:
    Epoch 0: cost = 1.672782
    Epoch 1: cost = 1.453887
    Epoch 2: cost = 1.198430
    Epoch 3: cost = 0.832181
    Epoch 4: cost = 0.715139
    Epoch 5: cost = 0.781272
    Epoch 6: cost = 0.513500
Pretraining layer 2:
    Epoch 0: cost = 1.245307
    Epoch 1: cost = 1.024171
    Epoch 2: cost = 1.013465
    Epoch 3: cost = 1.014886


In [754]:
numpy.set_printoptions(formatter={'float': '{: 0.3f}'.format})
for rbm in dbm.rbms:
    print(rbm.W.get_value())

[[ 1.177  1.225  0.077  0.078]
 [ 1.278  1.290  0.064  0.065]
 [ 1.240  1.355  0.037  0.038]
 [ 1.247  1.416  0.037  0.038]
 [ 0.080  0.080  1.301  1.283]
 [ 0.020  0.020  1.299  1.220]
 [ 0.020  0.021  1.126  1.143]
 [ 0.087  0.088  1.265  1.270]]
[[ 1.342  0.000]
 [ 1.339  0.000]
 [ 0.000  1.274]
 [ 0.000  1.276]]
[[ 0.105]
 [ 0.000]]


In [686]:
f = theano.function([dbm.input],dbm.rbms[1].output)
f([[+1,+1,+1,+1,+1,+1,+1,+1],
   [-1,-1,-1,-1,+1,+1,+1,+1],
   [+1,+1,-1,-1,-1,-1,+1,+1],
   [+1,+1,+1,+1,-1,-1,-1,-1]])

array([[ 0.981,  0.985],
       [-0.955,  0.944],
       [-0.967,  0.973],
       [ 0.955, -0.944]])

In [755]:
dbm.finetune(data,10,lrs=[0.5,0.5,0.4,0.3,0.2,0.1])

    Epoch 0: cost = 0.007569
    Epoch 1: cost = 0.005526
    Epoch 2: cost = 0.004137
    Epoch 3: cost = 0.003619
    Epoch 4: cost = 0.003370
    Epoch 5: cost = 0.003178
    Epoch 6: cost = 0.003043
    Epoch 7: cost = 0.002978
    Epoch 8: cost = 0.002947
    Epoch 9: cost = 0.002932


In [756]:
for rbm in dbm.rbms:
    print(rbm.W.get_value())

[[ 1.484  1.533  0.000  0.000]
 [ 1.568  1.580  0.000  0.000]
 [ 1.552  1.667  0.000  0.000]
 [ 1.467  1.636  0.000  0.000]
 [ 0.000  0.000  1.620  1.601]
 [ 0.000  0.000  1.548  1.469]
 [ 0.000  0.000  1.457  1.474]
 [ 0.000  0.000  1.613  1.617]]
[[ 0.987  0.000]
 [ 0.983  0.000]
 [ 0.000  0.402]
 [ 0.000  0.404]]
[[ 0.000]
 [ 1.371]]


In [707]:
f = theano.function([dbm.input],dbm.rbms[1].output)
f(test_samples)

array([[ 1.000,  1.000],
       [-1.000, -1.000],
       [ 1.000, -1.000],
       [-1.000,  1.000]])

In [757]:
dbm.MC_configs[0].get_value()

array([[ 1.000,  1.000,  1.000,  1.000, -1.000, -1.000, -1.000, -1.000],
       [-1.000, -1.000, -1.000, -1.000, -1.000, -1.000, -1.000, -1.000],
       [-1.000, -1.000, -1.000, -1.000, -1.000, -1.000, -1.000, -1.000],
       [-1.000, -1.000, -1.000, -1.000,  1.000,  1.000,  1.000,  1.000],
       [-1.000, -1.000, -1.000, -1.000,  1.000,  1.000,  1.000,  1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [-1.000, -1.000, -1.000, -1.000, -1.000, -1.000, -1.000, -1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [-1.000, -1.000, -1.000, -1.000, -1.000, -1.000, -1.000, -1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [-1.000, -1.000, -1.000, -1.000, -1.000, -1.

In [758]:
x=T.matrix()
y=T.vector()
f=theano.function([x,y],x*y)

In [759]:
f([[1,2],[3,4]],[1,0])

array([[ 1.000,  0.000],
       [ 3.000,  0.000]])

In [771]:
x=dbm.theano_rng.binomial(size=[3],n=1,p=1)
f=theano.function([],[x,2*x])
f()

[array([1, 1, 1]), array([2, 2, 2])]

In [766]:
dbm.theano_rng.binomial(size=[3],n=1,p=0.5).shape

Shape.0

In [782]:
%run 'EFL.py'
dbm = DBM([8,4,2,1])
dbm.get_cost_updates()

(OrderedUpdates([(<RandomStateType>, for{cpu,scan_fn}.4),
                 (<RandomStateType>, for{cpu,scan_fn}.5),
                 (<RandomStateType>, for{cpu,scan_fn}.6),
                 (<RandomStateType>, for{cpu,scan_fn}.7),
                 (<RandomStateType>, for{cpu,scan_fn}.8),
                 (<RandomStateType>, for{cpu,scan_fn}.9)]),
 OrderedUpdates([(<RandomStateType>, for{cpu,scan_fn}.4),
                 (<RandomStateType>, for{cpu,scan_fn}.5),
                 (<RandomStateType>, for{cpu,scan_fn}.6),
                 (<RandomStateType>, for{cpu,scan_fn}.7),
                 (<RandomStateType>, for{cpu,scan_fn}.8),
                 (<RandomStateType>, for{cpu,scan_fn}.9),
                 (<RandomStateType>, for{cpu,scan_fn}.10),
                 (<RandomStateType>, for{cpu,scan_fn}.11)]))