## Task #2

A template code for training an RBM on Rydberg atom data (the full dataset) is provided below. For the first part of this task (determining the minimum number of hidden units).


Your input data for this task will be binary data from 100 atoms. Already, we've hit the ``curse of dimensionality'': the brute force problem at hand scales as $2^N$ in terms of computational resources, where $N$ is the number of Rydberg atoms. To solve this problem via brute force, we'd only be able to go up to around $N = 12$! 

The input data file is Rydberg\_data.txt. To train an RBM on this data, we will look at evaluating the energy $\textit{during}$ training to see how well the RBM is doing and stop the training process once we see that it's done well enough (dubbed the ``learning criterion''). The energy can be calculated using the energy function in Rydberg\_energy\_calculator.py. The learning criterion you'll use is $\vert E_{RBM} - E_{exact} \vert \leq 0.0001$ (i.e. train the RBM until this is satisfied), where $E_{exact} = -4.1203519096$. However, don't wait forever for the RBM to try and satisfy the learning criterion. Only wait 1000 training steps (epochs).

We've given you a very large dataset in \texttt{Rydberg\_data.txt}. So let's evaluate how ``hard'' the learning problem is (i.e. how many hidden units to reach our learning criterion). Use the entire dataset to determine the minimum number of hidden units required in order to obtain the learning criterion. The ``size'' of the compressed entity that the RBM spits out is the equivalent storage of $100 + n_h + n_h \times 100$ numbers, where $n_h$ is the number of hidden units that you found (start with $n_h = 1$). 

In [4]:
import numpy as np
import torch
import Rydberg_energy_calculator
from RBM_helper import RBM

training_data = torch.from_numpy(np.loadtxt("Rydberg_data.txt"))

In [None]:
flag = 0
i = 0
epochs = 1000
num_samples = 20000 
n_vis = training_data.shape[1]
exact_energy = -4.1203519096
print("Exact energy: ",exact_energy)

while flag == 0 :
  i = i + 1
  n_hin = i
  rbm = RBM(n_vis, n_hin)
  print("\n The number of hidden units is: ", n_hin)
  
  e = 0
  while (e < epochs):
    e = e + 1
    rbm.train(training_data)   
    if e % 100 == 0:
      init_state = torch.zeros(num_samples, n_vis)
      RBM_samples = rbm.draw_samples(1000, init_state)
      energies = Rydberg_energy_calculator.energy(RBM_samples, rbm.wavefunction) 
      print("Epoch:", e,". Energy from RBM samples:", energies.item(),". Error:", abs(exact_energy - energies.item()))
      if (abs(exact_energy - energies.item()) < 0.0002):
        print("FINAL NUMBER OF HIDDEN UNITS:", n_hin)
        print("FINAL NUMBER OF EPOCHS:", e)
        print("ERROR:", abs(exact_energy - energies.item()))
        e = epochs
        flag = 1

Exact energy:  -4.1203519096

 The number of hidden units is:  1




Epoch: 100 . Energy from RBM samples: -4.120062240258802 . Error: 0.0002896693411980067
Epoch: 200 . Energy from RBM samples: -4.119976002260587 . Error: 0.00037590733941339494
Epoch: 300 . Energy from RBM samples: -4.120082717140181 . Error: 0.0002691924598190454
Epoch: 400 . Energy from RBM samples: -4.120004489701917 . Error: 0.00034741989808306784
Epoch: 500 . Energy from RBM samples: -4.120038590317868 . Error: 0.00031331928213251814
Epoch: 600 . Energy from RBM samples: -4.120016504978766 . Error: 0.0003354046212340478
Epoch: 700 . Energy from RBM samples: -4.120052344188419 . Error: 0.0002995654115807156
Epoch: 800 . Energy from RBM samples: -4.120031126971855 . Error: 0.0003207826281448334
Epoch: 900 . Energy from RBM samples: -4.120077148590775 . Error: 0.0002747610092255215
Epoch: 1000 . Energy from RBM samples: -4.1201191562144714 . Error: 0.000232753385528639

 The number of hidden units is:  2
Epoch: 100 . Energy from RBM samples: -4.119901615188974 . Error: 0.000450294411

How does this compare to $2^{100}$? The rbm model is linear: $100 + n_h + n_h \times 100$. If you set the n_visible you have a line with n_hidden as a variable. 

Double the number of hidden units determined in the previous question and determine how many data points (i.e. the portion of the full dataset in $\texttt{Rydberg\_data.txt}$) you need to reach the learning criterion. Start with 500 data points. Move up in increments of at least 100 (depending on how precise you want to be!). This will let experimentalists know the minimum amount of data required from their experiment!

In [None]:
flag = 0
i = 0
epochs = 1000
n_hin = 2 * 2 # in the previous case it converged with 2 units
n_vis = training_data.shape[1]
exact_energy = -4.1203519096
print("Exact energy: ",exact_energy,". Hidden units:",n_hin,".")

while flag == 0 :
  i = i + 1
  num_samples = 10 * i
  rbm = RBM(n_vis, n_hin)
  print("\nThe number of samples is: ", num_samples)
  
  e = 0
  while (e < epochs):
    e = e + 1
    rbm.train(training_data)   
    if e % 100 == 0:
      init_state = torch.zeros(num_samples, n_vis)
      RBM_samples = rbm.draw_samples(1000, init_state)
      energies = Rydberg_energy_calculator.energy(RBM_samples, rbm.wavefunction) 
      print("Epoch:", e,". Energy from RBM samples:", energies.item(),". Error:", abs(exact_energy - energies.item()))
      if (abs(exact_energy - energies.item()) < 0.0002):
        print("NUMBER OF SAMPLES:", num_samples)
        print("FINAL NUMBER OF EPOCHS:", e)
        print("ERROR:", abs(exact_energy - energies.item()))
        e = epochs
        flag = 1

Exact energy:  -4.1203519096 . Hidden units: 4 .

La cantidad de samples es:  10
Epoch: 100 . Energy from RBM samples: -4.121750277501969 . Error: 0.001398367901969344
