# Superconducting Qubit Refrigerator: Train
Optimize the cooling power of a refrigerator based on a superconducting qubit (see Results section of the manuscript or Refs. [1](https://doi.org/10.1103/PhysRevB.94.184503), [2](https://doi.org/10.1103/PhysRevB.100.085405) or [3](https://doi.org/10.1103/PhysRevB.100.035407)). The Hamiltonian of the system is:
\begin{equation}
	\hat{H}[u(t)] = - E_0\left[\Delta \hat{\sigma}_x + u(t)\hat{\sigma}_z  \right],
	\label{eq:h_fridge}
\end{equation}
where $\hat{\sigma}_x$ and $\hat{\sigma}_z$ are Pauli matrices, $E_0$ is a fixed energy scale, $\Delta$ characterizes the minimum gap of the system, and $u(t)$ is our single continuous control parameter. In this setup the coupling to the bath is fixed, so we do not have the discrete action of choosing the bath.
The coupling to the baths is described using the Lindblad master equation [see Eq. (52) of the manuscript]. The Lindblad operators and corresponding rates are gived by

\begin{equation}
\begin{aligned}
	\hat{A}^{(\alpha)}_{+,u(t)} &= -i\rvert e_{u(t)}\rangle 
    \langle g_{u(t)} \rvert, &
	\hat{A}^{(\alpha)}_{-,u(t)} &= +i\rvert g_{u(t)}\rangle \langle e_{u(t)}\rvert,
\end{aligned}
\end{equation}
where $\rvert g_{u(t)}\rangle$ and $\rvert e_{u(t)}\rangle$ are, respectively, the instantaneous ground state and excited state of the qubit. The corresponding rates are given by $\gamma^{(\alpha)}_{\pm,u(t)} = S_{\alpha}[\pm\Delta \epsilon_{u(t)}] $, where $\Delta \epsilon_{u(t)}$ is the instantaneous energy gap of the system, and
\begin{equation}
	S_\alpha(\Delta \epsilon)= \frac{g_{\alpha}}{2} \frac{1}{1+Q_\alpha^2( \Delta\epsilon/\omega_\alpha - \omega_\alpha/\Delta \epsilon )^2 } \frac{\Delta \epsilon}{e^{\beta_\alpha\Delta\epsilon}-1}
\end{equation}
is the noise power spectrum of bath $\alpha$. Here $\omega_\alpha$, $Q_\alpha$ and $g_\alpha$ are the base resonance frequency, quality factor and coupling strength of the resonant circuit acting as bath $\alpha=\text{H},\text{C}$, and $\beta_\alpha$ is the inverse temperature of bath $\alpha$.
#### Import modules

In [None]:
import sys
import os
sys.path.append(os.path.join('..','src'))
import numpy as np
import sac_multi
import sac_multi_envs
import extra

## Setup new Training
The following codes initiates a new training session. All training logs, parameters and saved states will be stored under the ```data``` folder, within a folder with the current date and time. 
- ```env_params``` is a dictionary with the environment parameters.
- ```training_hyperparams``` is a dictionary with training hyperparameters.
- ```log_info``` is a dictionary that specifices which quantities to log.

The parameters below were used to produce Figs. 3-4 of the manuscript. The parameter ```a_end``` determines the final value of the weight c. One training for each value of the parameter ```a_end``` in the range $[0.4,1]$ must be performed to produce the points of Fig. 4.

In [None]:
#final value of the weight c (here denoted with a) 
a_end = 0.6

#other parameters
e0 = 1.
delta = 0.12
dt = 0.9817477042468103 
env_params = {
    "g0": 1.,                                  #g of bath 0
    "g1": 1.,                                  #g of bath 1
    "b0": 1/0.3,                               #inverse temperature \beta of bath 0
    "b1": 1/0.15,                              #inverse temperature \beta of bath 1
    "q0": 4.,                                  #quality factor of bath 0
    "q1": 4.,                                  #quality factor of bath 1
    "e0": e0,                                  #E_0
    "delta": delta,                            #\Deta
    "w0": 2.*e0*np.sqrt(delta**2 + 0.25),      #resonance frequency of bath 0
    "w1": 2.*e0*delta,                         #resonance frequency of bath 1
    "min_u": 0.,                               #minimum value of the control u
    "max_u": 0.75,                             #maximum value of the control u
    "dt": dt,                                  #time step \Delta t
    "p_coeff": 1.51*10**3,                     #the power is multiplied by this
    "entropy_coeff": 27,                       #the entropy production is multiplied by this
    "state_steps": 128                         #time-steps N defining the state
} 
training_hyperparams = {
    "BATCH_SIZE": 512,                         #batch size          
    "LR": 0.0003,                              #learning rate
    "H_START": 0.,                             #initial policy entropy
    "H_END": -3.5,                             #final policy entropy
    "H_DECAY": 440000,                         #exponential decay of policy entropy
    "A_START": 1.,                             #initial value of weight c
    "A_END": a_end,                            #final value of weight c
    "A_DECAY": 20000,                          #sigmoid decay of weight c
    "A_MEAN": 170000,                          #sigmoid mean of weight c
    "REPLAY_MEMORY_SIZE": 280000,              #size of replay buffer
    "POLYAK": 0.995,                           #polyak coefficient
    "LOG_STEPS": 1000,                         #save logs and display training every num. steps
    "GAMMA": 0.997,                            #RL discount factor
    "CHANNEL_SIZES":(64,64,64,128,128,128,128),#channels per conv. block
    "PI_FC_SIZES": (256,),                     #sizes of hidden layers for the policy
    "Q_FC_SIZES": (256,256),                   #sizes of hidden layers for the value function
    "SAVE_STATE_STEPS": 500000,                #save state of training every num. steps          
    "INITIAL_RANDOM_STEPS": 5000,              #number of initial uniformly random steps
    "UPDATE_AFTER": 1000,                      #start minimizing loss function after num. steps
    "UPDATE_EVERY": 50,                        #performs this many updates every this many steps
    "USE_CUDA": True,                          #use cuda for computation
    "ALPHA_RESET_VAL": 1e-06                   #reset value for temp. alpha if it becomes negative
}
log_info = {
    "log_running_reward": True,                #log running reward
    "log_running_loss": True,                  #log running loss
    "log_actions": True,                       #log chosen actions
    "log_running_multi_obj": True,             #log running multi objectives
    "extra_str": f"_aend={a_end}"              #string to append to training folder name
}

#Speeds up trainig, but disables profiling
extra.enable_faster_training()

#initialize training object
train = sac_multi.SacTrain()
train.initialize_new_train(sac_multi_envs.CoherentQubitFridgePowEntropy, env_params, training_hyperparams, log_info)

#### Train
Perform a given number of training steps. It can be run multiple times. While training, the following running averages are plotted:
- G: running average of the reward;
- Obj 0: the first objective, i.e. the cooling power;
- Obj 1: the second objective, i.e. the negative entropy production;
- Q Runninng Loss;
- Pi Running Loss;
- alpha: the temperature parameter of the SAC method;
- entropy: the average entropy of the policy;
- c weight.

At last, the action u taken in the last 400 steps are plotted.

In [None]:
train.train(500000)

## Save the State
The full state of the training session is saved every ```SAVE_STATE_STEPS``` steps. Run this command if you wish to manually save the current state.

In [None]:
train.save_full_state()

## Load Existing Training
Any training session that was saved can be loaded specifying the training session folder in ```log_dir```. This will produce a new folder for logging with the current date-time. It is then possible to train the model for a longer time.

In [None]:
log_dir = "../paper_plot_data/qubit_refrigerator/pareto/2022_01_23-22_05_04_aend=0.7/"
train = sac_multi.SacTrain()
train.load_train(log_dir)

#### Train
Perform a given number of training steps. It can be run multiple times.

In [None]:
train.train(1000)