# Superconducting Qubit Refrigerator
Optimize the cooling power of a refrigerator based on a superconducting qubit (see section IVb of manuscript or Refs. [1](https://doi.org/10.1103/PhysRevB.94.184503), [2](https://doi.org/10.1103/PhysRevB.100.085405) or [3](https://doi.org/10.1103/PhysRevB.100.035407)). The Hamiltonian of the system is:
\begin{equation}
	\hat{H}[u(t)] = - E_0\left[\Delta \hat{\sigma}_x + u(t)\hat{\sigma}_z  \right],
	\label{eq:h_fridge}
\end{equation}
where $\hat{\sigma}_x$ and $\hat{\sigma}_z$ are Pauli matrices, $E_0$ is a fixed energy scale, $\Delta$ characterizes the minimum gap of the system, and $u(t)$ is our single continuous control parameter. In this setup the coupling to the bath is fixed, so we do not have the discrete action of choosing the bath.
The coupling to the baths is described using the Lindblad master equation [see Eq. (9) of the manuscript]. The Lindblad operators and corresponding rates are gived by

\begin{equation}
\begin{aligned}
	\hat{A}^{(\alpha)}_{+,u(t)} &= -i\rvert e_{u(t)}\rangle 
    \langle g_{u(t)} \rvert, &
	\hat{A}^{(\alpha)}_{-,u(t)} &= +i\rvert g_{u(t)}\rangle \langle e_{u(t)}\rvert,
\end{aligned}
\end{equation}
where $\rvert g_{u(t)}\rangle$ and $\rvert e_{u(t)}\rangle$ are, respectively, the instantaneous ground state and excited state of the qubit. The corresponding rates are given by $\gamma^{(\alpha)}_{\pm,u(t)} = S_{\alpha}[\pm\Delta \epsilon_{u(t)}] $, where $\Delta \epsilon_{u(t)}$ is the instantaneous energy gap of the system, and
\begin{equation}
	S_\alpha(\Delta \epsilon)= \frac{g_{\alpha}}{2} \frac{1}{1+Q_\alpha^2( \Delta\epsilon/\omega_\alpha - \omega_\alpha/\Delta \epsilon )^2 } \frac{\Delta \epsilon}{e^{\beta_\alpha\Delta\epsilon}-1}
\end{equation}
is the noise power spectrum of bath $\alpha$. Here $\omega_\alpha$, $Q_\alpha$ and $g_\alpha$ are the base resonance frequency, quality factor and coupling strength of the resonant circuit acting as bath $\alpha=\text{H},\text{C}$, and $\beta_\alpha$ is the inverse temperature of bath $\alpha$.
#### Import modules

In [None]:
import sys
import os
sys.path.append(os.path.join('..','src'))
import sac
import sac_envs
import numpy as np

## Setup new Training
The following codes initiates a new training session. All training logs, parameters and saved states will be stored under the ```data``` folder, within a folder with the current date and time. 
- ```env_params``` is a dictionary with the environment parameters.
- ```training_hyperparams``` is a dictionary with training hyperparameters.
- ```log_info``` is a dictionary that specifices which quantities to log.

The parameters below were used to produce Fig. 4 of the manuscript.

In [None]:
e0 = 1.
delta = 0.12
omega = 0.05 
dt = 2.*np.pi /omega / 128. 
env_params = {
    "g0": 1.,                             #g of bath 0
    "g1": 1.,                             #g of bath 1
    "b0": 1/0.3,                          #inverse temperature \beta of bath 0
    "b1": 1/0.15,                         #inverse temperature \beta of bath 1
    "q0": 30.,                            #quality factor of bath 0
    "q1": 30.,                            #quality factor of bath 1
    "e0": e0,                             #E_0
    "delta": delta,                       #\Delta
    "w0": 2.*e0*np.sqrt(delta**2 + 0.25), #resonance frequency of bath 0
    "w1": 2.*e0*delta,                    #resonance frequency of bath 1
    "min_u": 0.,                          #minimum value of action u
    "max_u": 0.75,                        #maximum value of action u
    "dt": dt,                             #timestep \Delta t
    "reward_extra_coeff": 1.*10**4        #the reward is multiplied by this factor
} 

training_hyperparams = {
    "BATCH_SIZE": 256,                    #batch size
    "LR": 0.001,                          #learning rate
    "ALPHA_START": 50,                    #initial value of SAC temperature
    "ALPHA_END": 0.,                      #final value of SAC temperature
    "ALPHA_DECAY": 48000,                 #exponential decay of SAC temperature
    "REPLAY_MEMORY_SIZE": 192000,         #size of replay buffer
    "POLYAK": 0.995,                      #polyak coefficient
    "LOG_STEPS": 6000,                    #save logs and display training every number of steps
    "GAMMA": 0.995,                       #RL discount factor
    "HIDDEN_SIZES": (256,256),            #size of hidden layers
    "SAVE_STATE_STEPS": 80000,            #saves complete state of trainig every number of steps
    "INITIAL_RANDOM_STEPS": 5000,         #number of initial uniformly random steps
    "UPDATE_AFTER": 1000,                 #start minimizing loss function after initial steps
    "UPDATE_EVERY": 50,                   #performs this many updates every this many steps
    "USE_CUDA": False                     #use cuda for computation 
}

log_info = {
    "log_running_reward": True,           #log running reward 
    "log_running_loss": True,             #log running loss
    "log_actions": True,                  #log chosen actions
    "extra_str": "_superconducting_qubit_refrigerator" #extra string to append to training folder
}


train = sac.SacTrain()
train.initialize_new_train(sac_envs.SuperconductingQubitRefrigerator,
                           env_params, training_hyperparams, log_info)

#### Train
Perform a given number of training steps. It can be run multiple times.

In [None]:
train.train(500000)

## Save the State
The full state of the training session is saved every ```SAVE_STATE_STEPS``` steps. Run this command if you wish to manually save the current state.

In [None]:
train.save_full_state()

## Load Existing Training
Any training session that was saved can be loaded specifying the training session folder in ```log_dir```. This will produce a new folder for logging with the current date-time. The following loads the latest save in a folder named ```"2021_04_14-10_51_14_superconducting_qubit_refrigerator"```.

In [None]:
log_dir = "../data/2021_04_14-10_51_14_superconducting_qubit_refrigerator/"
train = sac.SacTrain()
train.load_train(log_dir)

#### Train
Perform a given number of training steps. It can be run multiple times.

In [None]:
train.train(50000)

## Manually changing the learning rate
The following is NOT RECOMMENDED, since the change is not logged. However, it is possible to change the learning rate during training. The following changes the learning rate to 0.0001.

In [None]:
for g in train.pi_optimizer.param_groups:
    g['lr'] = 0.0001
for g in train.q_optimizer.param_groups:
    g['lr'] = 0.0001