#Entanglement Feature Learning

In [110]:
%run EFL.py

## Restricted Boltzman Machine (RBM)

### RBM Kernel
Differences to standard RBM:
* the binary units takes values $\pm1$ instead of $0,1$.
* unbiased, RTN of pure state are not biased, so the only parameter is the weight matrix.

Initialization of weight matrix. Wish:
* reflect the locality.
* ferromagnetic, tune to critical.
* with some randomness.

Energy model ($v_i,h_j=\pm1$):
$$H=-\sum_{i,j}v_iW_{ij}h_{j}.$$
The free energy (given visible spins) is:
$$F=-\ln\sum_{[h]}e^{-H}=-\sum_{j}\ln \left(2\cosh\left(\sum_{i}v_iW_{ij}\right)\right).$$
Local field seen by $h_j$: $\sum_i v_i W_{ij}$, seen by $v_i$: $\sum_{j} W_{ij} h_j$.

In [112]:
''' Train RBM
training_set::numpy.array: each row is a training sample.
batch_size::int: size of a batch
learning_rate::float: learning rate
training_epochs::int: number of epochs
n_hidden::int: number of hidden spins'''
def train_rbm(training_set=None,
              batch_size=20,
              learning_rate=0.1,
              training_epochs=5,
              n_hidden=2):
    if training_set is None:
        return # if no training set, return
    # number of samples and visibles
    n_samples, n_visible = training_set.shape
    n_batches = n_samples//batch_size # number of batches
    # allocate symbolic variables for the data
    #index = T.lscalar() # index to a batch
    samples = T.matrix('samples') # the data
    # initialize storage for the persistent chain
    persistent_chain = theano.shared(
        numpy.zeros((batch_size,n_hidden),dtype=theano.config.floatX),
        borrow = True)
    # construct RBM
    rbm = RBM(input=samples, Nv=n_visible, Nh=n_hidden)
    # get cost and updates
    cost, updates = rbm.get_cost_updates(
        learning_rate=learning_rate,
        persistent=persistent_chain, k=15)
    # learning function
#    rbm_learn = theano.function(
#        [index], cost, updates=updates, name='rbm_learn',
#        givens={samples:training_set[index*batch_size:(index+1)*batch_size,:]})
    rbm_learn = theano.function([samples], cost, updates=updates, name='rbm_learn')
    # go through training epochs
    for epoch in range(training_epochs):
        # go through training set
        costs = []
        for batch_index in range(n_batches):
            costs += [rbm_learn(training_set[batch_index*batch_size:(batch_index+1)*batch_size,:])]
        print('Training epoch %d, cost is'%epoch, numpy.mean(costs))
    return rbm

In [136]:
rbm = train_rbm(numpy.array([[1,-1,1,-1],[1,-1,1,-1]]*20,dtype=float))

Training epoch 0, cost is -2.15721753791
Training epoch 1, cost is -1.55290759869
Training epoch 2, cost is -1.20763823882
Training epoch 3, cost is -0.928025412443
Training epoch 4, cost is -0.885664219583
Training epoch 5, cost is -0.726975173998
Training epoch 6, cost is -0.634242490507
Training epoch 7, cost is -0.565763683482
Training epoch 8, cost is -0.481818304754
Training epoch 9, cost is -0.459752206883
Training epoch 10, cost is -0.347703250516
Training epoch 11, cost is -0.370970701242
Training epoch 12, cost is -0.284409911717
Training epoch 13, cost is -0.313531325837
Training epoch 14, cost is -0.237288284478


In [137]:
rbm.W.get_value()

array([[-0.22110558, -1.13884915],
       [ 0.50554618,  1.44307546],
       [ 0.0946451 , -1.43356834],
       [ 0.26400811,  1.14424641]])

In [134]:
rbm = train_rbm(numpy.random.RandomState().binomial(1,0.5,size=(200,4))*2-1.)

Training epoch 0, cost is -2.81100618268
Training epoch 1, cost is -2.7713649327
Training epoch 2, cost is -2.79285400235
Training epoch 3, cost is -2.73203150394
Training epoch 4, cost is -2.77240718769
Training epoch 5, cost is -2.72573133456
Training epoch 6, cost is -2.75158188047
Training epoch 7, cost is -2.74664394846
Training epoch 8, cost is -2.77412166107
Training epoch 9, cost is -2.7650800458
Training epoch 10, cost is -2.7699336707
Training epoch 11, cost is -2.73099349037
Training epoch 12, cost is -2.7928642532
Training epoch 13, cost is -2.72351479382
Training epoch 14, cost is -2.78642089828


### Convolutional RBM
Assumption: translational symmetry (or in statistical sense under disorder average). Weights are shared among kernels for each layer and each group. This makes the algorithm scalable. For disordered system, this will learn the disorder averaged entanglement features.