# Entanglement Feature Learning

## EFL

### Convension of Coupling Constants
The original EFL model (for the Ising case) is
$$E[\sigma]=-\sum_{\langle ij\rangle}J_{ij} \chi(\sigma_i^{-1}\sigma_j),$$
where $J_{ij}=\ln D_{ij}$ and $\sigma_{i}$ takes values in the $S_2$ group, and the cycle trace $\chi$ maps $()\to2$, $(1,2)\to1$. The energy difference is one unit of $J_{ij}$. Note that each bond $\langle ij\rangle$ is added only once in the summation. Now for easier treatment in numerics, we map the $S_2$ spin $\sigma_i$ to a $\mathbb{Z}_2$ spin $s_i$, s.t. the energy model becomes
$$E[s]=-\frac{1}{2}\sum_{\langle ij\rangle}J_{ij}s_is_j=-\frac{1}{4}\sum_{i j}J_{ij}s_is_j.$$
We then define
$$K_{ij}=\frac{1}{2}J_{ij}=\frac{1}{2}\ln D_{ij},$$
and rewrite the energy model as
$$E[s]=-\frac{1}{2}\sum_{ij}K_{ij}s_is_j,$$
where the summation double counts a bond, and $s_{i}=\pm1$.

### Deep Boltzmann Machine (Model)

Layers: $s^{0},s^{1},\cdots$, where $s^0$ is the visible layer, rests are hidden layers. The energy model:
$$E[s]=-\sum_{l=1:L}\sum_{i j}s_i^{l-1}K_{ij}^l s_j^l.$$

## Restricted Boltzman Machine (RBM)

### RBM Kernel
Differences to standard RBM:
- the binary units takes values $\pm1$ instead of $0,1$.
- **unbiased**, RTN of pure state are not biased, so the only variational parameter is the weight matrix.
- elements in the weight matrixes are **positive** definite, because they correspnd to the logarithmic bond dimension.

Initialization of weight matrix. Wish to:
- reflect the locality.
- ferromagnetic, tune to critical.
- with some randomness.

Gibbs sampling:
- propagating up:
$$h_j\leftarrow\tanh \left\{\begin{array}{cc}\sum_{i}W_{ij}v_i & \text{default, top},\\ 2\sum_{i}W_{ij}v_i & \text{bottom, intermediate},\end{array}\right.$$
- propagating down:
$$v_i\leftarrow\tanh \left\{\begin{array}{cc}\sum_{j}W_{ij}h_j & \text{default, bottom},\\ 2\sum_{j}W_{ij}h_j & \text{top, intermediate},\end{array}\right.$$

Update rule ($\lambda_\text{l}$ learning rate, $\lambda_\text{f}$ forgetting rate):
$$W\leftarrow \text{relu}[(1-\lambda_\text{f})W+\lambda_\text{l} (v^{\intercal}(0) h(0)-v^{\intercal}(\infty) h(\infty))]$$
- If any element in $W$ becomes negative, the element is set to zero.
- Forgetting rate can be used to control the bond dimension.

In [288]:
%run 'EFL.py'
rbm = RBM(4,2)

In [287]:
print(rbm.learn(numpy.asarray(numpy.random.randint(0,2,(40,4))*2-1,dtype=float),0.1,0.01))
rbm.W.get_value()

0.7453134201113543


array([[ 0.165,  0.011],
       [ 0.708,  0.176],
       [ 0.120,  0.285],
       [ 0.071,  0.809]])

In [366]:
print(rbm.learn(numpy.array([[1,1,1,1],[-1,1,-1,1]]*10,dtype=float),0.1,0.01))
rbm.W.get_value()

1.081018265501512


array([[ 1.194,  0.031],
       [ 0.005,  0.940],
       [ 1.189,  0.003],
       [ 0.000,  0.946]])

#### Test by Ideal States
* trivial product state: deep PM, all Ising configuration equal weight.
* random state (maximally thermalized): deep FM, all up or all down.

In [1]:
%run EFL.py
#train_set = numpy.array([[1,1,1,1],[-1,-1,-1,-1]]*200,dtype=float)
#train_set = numpy.array([[1,-1,1,-1],[-1,1,-1,1]]*200,dtype=float)
train_set = numpy.asarray(numpy.random.randint(0,2,(400,4))*2-1,dtype=float)
rbm = RBM(W='random',method='CD')
#lr_table = [0.2,0.3,0.25,0.15,0.1]
#fr_table = [0.05,0.01,0.,0.,0.]
lr_table = [0.5,0.3,0.2,0.15,0.1,0.1,0.1]
fr_table = [0.,0.,0.,0.,0.,0.,0.]

FM, maximally thermal.

In [3]:
for epoch in range(7):
    cost, xent = rbm.train(train_set,
                           learning_rate=lr_table[epoch],
                           forgetting_rate=fr_table[epoch])
    print('Epoch %d: '%epoch, 'cost = %f, xent = %f'%(cost, xent))
rbm.W.get_value()

Epoch 0:  cost = 0.003711, xent = 2.815545
Epoch 1:  cost = -0.000552, xent = 2.785481
Epoch 2:  cost = 0.001037, xent = 2.788026
Epoch 3:  cost = 0.000119, xent = 2.782246
Epoch 4:  cost = 0.000175, xent = 2.783272
Epoch 5:  cost = -0.000117, xent = 2.788262
Epoch 6:  cost = 0.000849, xent = 2.796292


array([[ 0.05757096,  0.08962437],
       [ 0.08546829,  0.10909487],
       [ 0.04621907,  0.02650601],
       [ 0.00845927,  0.01175318]])

PM, trivial product state.

In [6]:
for epoch in range(7):
    cost, xent = rbm.train(train_set,
                           learning_rate=lr_table[epoch],
                           forgetting_rate=fr_table[epoch])
    print('Epoch %d: '%epoch, 'cost = %f, xent = %f'%(cost, xent))
rbm.W.get_value()

Epoch 0:  cost = 0.112113, xent = 3.242719
Epoch 1:  cost = 0.022328, xent = 3.070230
Epoch 2:  cost = 0.007352, xent = 2.949821
Epoch 3:  cost = 0.013084, xent = 2.926602
Epoch 4:  cost = 0.001652, xent = 2.892917
Epoch 5:  cost = 0.005669, xent = 2.922602
Epoch 6:  cost = 0.004110, xent = 2.880515


array([[ 0.03113473,  0.06022195],
       [ 0.43808222,  0.00861   ],
       [ 0.01715053,  0.01461342],
       [ 0.03886801,  0.29126499]])

### Convolutional RBM
Assumption: translational symmetry (or in statistical sense under disorder average). Weights are shared among kernels for each layer and each group. This makes the algorithm scalable. For disordered system, this will learn the disorder averaged entanglement features.

Consider a 1D free fermion CFT, the Renyi entropy given by (acorrding to Calabrese and Cardy 2004, 2009)
$$S\propto\sum_{i,j}\ln|u_i-v_j|-\sum_{i<j}\ln|u_i-u_j|-\sum_{i<j}\ln|v_i-v_j|.$$
$u_i$ and $v_i$ are positions of kinks and antikinks in the Ising configuraiton.

In [26]:
from itertools import combinations
def entropy_CFT(kinks):
    us = kinks[0::2]
    vs = kinks[1::2]
    Suv = sum(numpy.log(abs(u-v)) for u in us for v in vs)
    Suu = sum(numpy.log(abs(u1-u2)) for u1, u2 in combinations(us,2))
    Svv = sum(numpy.log(abs(v1-v2)) for v1, v2 in combinations(vs,2))
    S = Suv-Suu-Svv
    return S
entropy_CFT([3,10])

1.9459101490553132

## Deep Boltzmann Machine

In [10]:
%run 'EFL.py'
dbm = DBM([8,4,2,1])

In [11]:
test_samples = [
    [+1,+1,+1,+1,+1,+1,+1,+1],
    [-1,-1,-1,-1,-1,-1,-1,-1],
    [+1,+1,+1,+1,-1,-1,-1,-1],
    [-1,-1,-1,-1,+1,+1,+1,+1],
    #[+1,+1,-1,-1,-1,-1,+1,+1],
    #[-1,-1,+1,+1,+1,+1,-1,-1]
]
data = Server(test_samples*100,15)
#data = Server(numpy.random.randint(0,2,(200,8))*2-1,10)

In [12]:
dbm.pretrain(data,lrs=[0.5,0.3,0.2,0.15],frs=[0.05,0.01])

Pretraining layer 0:
    Epoch 0: cost = 0.399131
    Epoch 1: cost = 0.152201
    Epoch 2: cost = 0.095953
    Epoch 3: cost = 0.011594
    Epoch 4: cost = 0.027396
    Epoch 5: cost = 0.017311
    Epoch 6: cost = 0.007148
Pretraining layer 1:
    Epoch 0: cost = 1.571035
    Epoch 1: cost = 1.427351
    Epoch 2: cost = 1.048674
    Epoch 3: cost = 0.829283
    Epoch 4: cost = 0.822940
    Epoch 5: cost = 0.618515
    Epoch 6: cost = 0.646965
Pretraining layer 2:
    Epoch 0: cost = 1.184162
    Epoch 1: cost = 1.004733
    Epoch 2: cost = 1.010763
    Epoch 3: cost = 1.014066
    Epoch 4: cost = 1.007851
    Epoch 5: cost = 0.999284
    Epoch 6: cost = 0.999723


In [16]:
numpy.set_printoptions(formatter={'float': '{: 0.3f}'.format})
for rbm in dbm.rbms:
    print(rbm.W.get_value())

[[ 1.654  1.623  0.000  0.000]
 [ 1.676  1.645  0.000  0.000]
 [ 1.648  1.621  0.000  0.000]
 [ 1.518  1.595  0.000  0.000]
 [ 0.000  0.000  1.693  1.597]
 [ 0.000  0.000  1.610  1.582]
 [ 0.000  0.000  1.508  1.631]
 [ 0.000  0.000  1.738  1.776]]
[[ 1.324  0.000]
 [ 1.325  0.000]
 [ 0.000  1.271]
 [ 0.000  1.280]]
[[ 0.000]
 [ 0.054]]


In [14]:
f = theano.function([dbm.input],dbm.rbms[1].output)
f([[+1,+1,+1,+1,+1,+1,+1,+1],
   [-1,-1,-1,-1,-1,-1,-1,-1],
   [-1,-1,-1,-1,+1,+1,+1,+1],
   [+1,+1,+1,+1,-1,-1,-1,-1]])

array([[ 1.000,  1.000],
       [-1.000, -1.000],
       [-1.000,  1.000],
       [ 1.000, -1.000]])

In [15]:
dbm.finetune(data,10,lrs=[0.5,0.5,0.4,0.3,0.2,0.1])

    Epoch 0: cost = 0.005861
    Epoch 1: cost = 0.003953
    Epoch 2: cost = 0.003014
    Epoch 3: cost = 0.002709
    Epoch 4: cost = 0.002426
    Epoch 5: cost = 0.002295
    Epoch 6: cost = 0.002220
    Epoch 7: cost = 0.002186
    Epoch 8: cost = 0.002171
    Epoch 9: cost = 0.002164


In [788]:
for rbm in dbm.rbms:
    print(rbm.W.get_value())

[[ 1.575  1.588  0.000  0.000]
 [ 1.601  1.574  0.000  0.000]
 [ 1.515  1.638  0.000  0.000]
 [ 1.573  1.656  0.000  0.000]
 [ 0.000  0.000  1.655  1.605]
 [ 0.000  0.000  1.624  1.590]
 [ 0.000  0.000  1.628  1.633]
 [ 0.000  0.000  1.557  1.580]]
[[ 1.173  0.000]
 [ 1.170  0.000]
 [ 0.000  1.179]
 [ 0.000  1.181]]
[[ 0.036]
 [ 0.833]]


In [707]:
f = theano.function([dbm.input],dbm.rbms[1].output)
f(test_samples)

array([[ 1.000,  1.000],
       [-1.000, -1.000],
       [ 1.000, -1.000],
       [-1.000,  1.000]])

In [17]:
dbm.MC_configs[0].get_value()

array([[ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [ 1.000,  1.000,  1.000,  1.000, -1.000, -1.000, -1.000, -1.000],
       [-1.000, -1.000, -1.000, -1.000,  1.000,  1.000,  1.000,  1.000],
       [-1.000, -1.000, -1.000, -1.000,  1.000,  1.000,  1.000,  1.000],
       [-1.000, -1.000, -1.000, -1.000,  1.000,  1.000,  1.000,  1.000],
       [-1.000, -1.000, -1.000, -1.000,  1.000,  1.000,  1.000,  1.000],
       [-1.000, -1.000, -1.000, -1.000,  1.000,  1.000,  1.000,  1.000],
       [-1.000, -1.000, -1.000, -1.000, -1.000, -1.000, -1.000, -1.000],
       [ 1.000,  1.000,  1.000,  1.000, -1.000, -1.000, -1.000, -1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000,  1.000],
       [ 1.000,  1.000,  1.000,  1.000,  1.000,  1.