# Two-Level System Heater
## (Only used for testing)
Optimize the output power dissipated into the environment (se [this](https://doi.org/10.1088/1367-2630/ab4dca) reference for details). The Hamiltonian of the system is:
\begin{equation}
	\hat{H}[u(t)] = \frac{E_0 u(t)}{2}\,\hat{\sigma}_z,
\end{equation}
where $E_0$ is a fixed energy scale, $u(t)$ is our single continuous control and $\hat{\sigma}_z$ denotes a Pauli matrix. The coupling to the single bath is described using the Lindblad master equation [see Eq. (9) of the manuscript]. The Lindblad operators and corresponding rates are gived by
\begin{align}
	\hat{A}_{\pm,u(t)} &= \hat{\sigma}_\pm, & \gamma_{\pm,u(t)}  &= \Gamma\, f(\pm\beta u(t) E_0 ),
\end{align}
where $\hat{\sigma}_+$ and $\hat{\sigma}_-$ are respectively the raising and lowering operator, $\Gamma$ is a constant rate, $f(x) = (1+\exp(x))^{-1}$ is the Fermi distribution and $\beta$ is the inverse temperature of the bath.
#### Import modules

In [None]:
import sys
import os
sys.path.append(os.path.join('..','src'))
import sac
import sac_envs

## Setup New Training
The following codes initiates a new training session. All training logs, parameters and saved states will be stored under the ```data``` folder, within a folder with the current date and time. 
- ```env_params``` is a dictionary with the environment parameters.
- ```training_hyperparams``` is a dictionary with training hyperparameters.
- ```log_info``` is a dictionary that specifices which quantities to log.

In [None]:
env_params = {
    "g0": 1.,                        #\Gamma of the bath
    "b0": 1.,                        #inverse temperature \beta of the bath
    "min_u": 0.3,                    #minimum value of action u
    "max_u": 1.,                     #maximum value of action u
    "e0": 1,                         #E_0
    "dt": 0.5,                       #timestep \Delta t
    "reward_extra_coeff": 1.         #the reward is multiplied by this factor
}  
training_hyperparams = {
    "BATCH_SIZE": 256,              #batch size
    "LR": 0.001,                    #learning rate
    "ALPHA_START": 1.,              #initial value of SAC temperature
    "ALPHA_END": 0.,                #final value of SAC temperature
    "ALPHA_DECAY": 10000,           #exponential decay of SAC temperature
    "REPLAY_MEMORY_SIZE": 192000,   #size of replay buffer
    "POLYAK": 0.995,                #polyak coefficient
    "LOG_STEPS": 1500,              #save logs and display training every number of steps
    "GAMMA": 0.995,                 #RL discount factor
    "HIDDEN_SIZES": (256,256),      #size of hidden layers 
    "SAVE_STATE_STEPS": 80000,      #saves complete state of trainig every number of steps
    "INITIAL_RANDOM_STEPS": 5000,   #number of initial uniformly random steps
    "UPDATE_AFTER": 1000,           #start minimizing loss function after initial steps
    "UPDATE_EVERY": 50,             #performs this many updates every this many steps
    "USE_CUDA": False               #use cuda for computation
}
log_info = {
    "log_running_reward": True,     #log running reward 
    "log_running_loss": True,       #log running loss
    "log_actions": True,            #log chosen actions
    "extra_str": "_two_level_heater"#extra string to append to training folder
}

train = sac.SacTrain()
train.initialize_new_train(sac_envs.TwoLevelHeater, env_params, training_hyperparams, log_info)

### Train 
Perform a given number of training steps. It can be run multiple times.

In [None]:
train.train(500000)

## Save the State
The full state of the training session is saved every ```SAVE_STATE_STEPS``` steps. Run this command if you wish to manually save the current state.

In [None]:
train.save_full_state()

## Load Existing Training
Any training session that was saved can be loaded specifying the training session folder in ```log_dir```. This will produce a new folder for logging with the current date-time. The following loads the latest save in a folder named ```"2021_04_14-11_03_52_two_level_heater"```.

In [None]:
log_dir = "../data/2021_04_14-11_03_52_two_level_heater"
train = sac.SacTrain()
train.load_train(log_dir)

### Train 
Perform a given number of training steps. It can be run multiple times.

In [None]:
train.train(50000)

## Manually changing the learning 
The following is NOT RECOMMENDED, since the change is not logged. However, it is possible to change the learning rate during training. The following changes the learning rate to 0.0001.

In [None]:
for g in train.pi_optimizer.param_groups:
    g['lr'] = 0.0001
for g in train.q_optimizer.param_groups:
    g['lr'] = 0.0001