# Quantum Harmonic Oscillator Heat Engine: Train
Optimize the output power of a heat engine based on a quantum harmonic oscillator (see Results section of the manuscript or [this](https://doi.org/10.1088/1367-2630/8/5/083) reference). The Hamiltonian of the system is:
\begin{equation}
	\hat{H}[u(t)] = \frac{1}{2m} \hat{p}^2 + \frac{1}{2}m (u(t)w_0)^2 \hat{q}^2,
\end{equation}
where $m$ is the mass of the system, $w_0$ is a fixed frequency and $\hat{p}$ and $\hat{q}$ are the momentum and position operators. The single continuous control parameter is $u(t)$. 
The coupling to the baths is described using the Lindblad master equation [see Eq. (52) of the manuscript]. The Lindblad operators and corresponding rates are gived by
\begin{align}
	\hat{A}^{(\alpha)}_{+,u(t)} &= \hat{a}_{u(t)}^\dagger, & \gamma^{(\alpha)}_{+,u(t)} &= \Gamma_\alpha \,n(\beta_\alpha u(t)\omega_0), \\
    \hat{A}^{(\alpha)}_{-,u(t)} &= \hat{a}_{u(t)}, & \gamma^{(\alpha)}_{-,u(t)} &= \Gamma_\alpha[1+ n(\beta_\alpha u(t) \omega_0 )],
\end{align}
where $\hat{a}_{u(t)}=(1/\sqrt{2})\sqrt{m\omega_0 u(t)}\,\hat{q} + i/\sqrt{m\omega_0 u(t)}\,\hat{p}$ and $\hat{a}_{u(t)}^\dagger$ are respectively the (control dependent) lowering and raising operators, $\Gamma_\alpha$ are constant rates, $n(x)=(\exp(x)-1)^{-1}$ is the Bose-Einstein distribution and $\beta_\alpha$ is the inverse temperature of bath $\alpha$.
#### Import modules

In [None]:
import sys
import os
sys.path.append(os.path.join('..','src'))
import numpy as np
import sac_tri
import sac_tri_envs
import extra

## Setup new Training
The following codes initiates a new training session. All training logs, parameters and saved states will be stored under the ```data``` folder, within a folder with the current date and time. 
- ```env_params``` is a dictionary with the environment parameters.
- ```training_hyperparams``` is a dictionary with training hyperparameters.
- ```log_info``` is a dictionary that specifices which quantities to log.

The parameters below were used to produce Fig. 5 of the manuscript. The parameter ```a``` determines the value of the weight c. One training for each value of the parameter ```a``` in the range $[0.5,1]$ must be performed to produce the points of Fig. 5.

In [None]:
#value of the weight c (here denoted with a) 
a = 1.

env_params = { 
    "g0": 0.6,                                  #\Gamma of bath 0
    "g1": 0.6,                                  #\Gamma of bath 1
    "b0": 1./4.98309,                           #inverse temperature \beta of bath 0
    "b1": 2.,                                   #inverse temperature \beta of bath 1
    "w0": 2.,                                   #\omega_0
    "min_u": 0.5,                               #minimum value of action u
    "max_u": 0.99662,                           #maximum value of action u
    "dt": 0.2,                                  #timestep \Delta t
    "p_coeff": 1/0.175,                         #the power is multiplied by this
    "entropy_coeff": 1/0.525,                   #the entropy production is multiplied by this
    "state_steps": 128,                         #time-steps N defining the state
    "min_temp_steps": 25,                       #minimum num. of steps with each bath before penalty
    "discourage_coeff": 1.4,                    #coefficient determining the penalty
    "a": a                                      #value of weight c
}
training_hyperparams = {
    "BATCH_SIZE": 512,                          #batch size
    "LR": 0.0003,                               #learning rate
    "H_D_START": np.log(3.),                    #initial discrete policy entropy
    "H_D_END": 0.01,                            #final discrete policy entropy
    "H_D_DECAY": 144000,                        #exponential decay of discrete policy entropy
    "H_C_START": -0.72,                         #initial continuous policy entropy
    "H_C_END": -3.5,                            #final continuous policy entropy
    "H_C_DECAY": 144000,                        #exponential decay of continuous policy entropy
    "REPLAY_MEMORY_SIZE": 160000,               #size of replay buffer
    "POLYAK": 0.995,                            #polyak coefficient
    "LOG_STEPS": 1000,                          #save logs and display training every num. steps
    "GAMMA": 0.999,                             #RL discount factor
    "CHANNEL_SIZES": (64,64,64,128,128,128,128),#channels per conv. block
    "PI_FC_SIZES": (256,),                      #size of hidden layers for the policy
    "Q_FC_SIZES": (256,128),                    #size of hidden layers for the value funcion
    "SAVE_STATE_STEPS": 500000,                 #save state of training every num. steps
    "INITIAL_RANDOM_STEPS": 5000,               #number of initial uniformly random steps
    "UPDATE_AFTER": 1000,                       #start minimizing loss function after num. steps
    "UPDATE_EVERY": 50,                         #performs this many updates every this many steps
    "USE_CUDA": True,                           #use cuda for computation
    "ALPHA_RESET_VAL": 1e-6                     #reset value for temp. alpha if it becomes negative
}
log_info = {
    "log_running_reward": True,                 #log running reward
    "log_running_loss": True,                   #log running loss
    "log_actions": True,                        #log chosen actions
    "log_running_multi_obj": True,              #log running multi objectives
    "extra_str": f"_a={a}"                      #string to append to training folder name
}

#Speeds up trainig, but disables profiling
extra.enable_faster_training()

#initialize training object
train = sac_tri.SacTrain()
train.initialize_new_train(sac_tri_envs.HarmonicEnginePowEntropy, env_params, training_hyperparams, log_info)

#### Train
Perform a given number of training steps. It can be run multiple times.

In [None]:
train.train(500000)

## Save the State
The full state of the training session is saved every ```SAVE_STATE_STEPS``` steps. Run this command if you wish to manually save the current state.

In [None]:
train.save_full_state()

## Load Existing Training
Any training session that was saved can be loaded specifying the training session folder in ```log_dir```. This will produce a new folder for logging with the current date-time. It is then possible to train the model for a longer time.

In [None]:
log_dir = "../paper_plot_data/harmonic_engine/pareto/2022_01_15-03_44_43_a=0.6"
train = sac_tri.SacTrain()
train.load_train(log_dir)

#### Train
Perform a given number of training steps. It can be run multiple times.

In [None]:
train.train(1000)