## CX Calibration with HPO

#### Imports

In [1]:
import sys
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "True"
module_path = os.path.abspath(os.path.join('/Users/lukasvoss/Documents/Master Wirtschaftsphysik/Masterarbeit Yale-NUS CQT/Quantum_Optimal_Control'))
if module_path not in sys.path:
    sys.path.append(module_path)

from template_configurations import gate_q_env_config
import logging
logging.basicConfig(
    level=logging.WARNING,
    format="%(asctime)s INFO %(message)s", # hardcoded INFO level
    datefmt="%Y-%m-%d %H:%M:%S",
    stream=sys.stdout,
)



Starting Rabi experiment for qubit 0...
Rabi experiment for qubit 0 done.
Starting Drag experiment for qubit 0...
Drag experiments done for qubit 0 done.
Starting Rabi experiment for qubit 1...
Rabi experiment for qubit 1 done.
Starting Drag experiment for qubit 1...
Drag experiments done for qubit 1 done.
All single qubit calibrations are done
Updated Instruction Schedule Map <InstructionScheduleMap(1Q instructions:
  q0: {'reset', 'delay', 'h', 'x', 'sx', 'sdg', 'z', 's', 'id', 'tdg', 'measure', 't', 'rz'}
  q1: {'reset', 'delay', 'h', 'x', 'sx', 'sdg', 'z', 's', 'id', 'tdg', 'measure', 't', 'rz'}
Multi qubit instructions:
  (0, 1): {'ecr', 'cr45m', 'cr45p'}
  (1, 0): {'ecr', 'cr45m', 'cr45p'}
)>


Which gate is to be calibrated?

In [2]:
gate_q_env_config.target

{'register': [0, 1],
 'gate': Instruction(name='cx', num_qubits=2, num_clbits=0, params=[])}

### Perform HPO

In [3]:
from hyperparameter_optimization import HyperparameterOptimizer

Set path to the files specifying the RL agent and where to store the HPO results

In [4]:
path_agent_config = '/Users/lukasvoss/Documents/Master Wirtschaftsphysik/Masterarbeit Yale-NUS CQT/Quantum_Optimal_Control/template_configurations/agent_config.yaml'
save_results_path = 'hpo_results'

In [5]:
from quantumenvironment import QuantumEnvironment

In [6]:
q_env = QuantumEnvironment(gate_q_env_config)

SparsePauliOp(['II', 'XX', 'YZ', 'ZY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


In [7]:
optimizer = HyperparameterOptimizer(q_env=q_env,
                                    path_agent_config=path_agent_config, 
                                    save_results_path=save_results_path, 
                                    log_progress=True,
                                    num_hpo_trials=2)
optimizer.optimize_hyperparameters()

 51%|█████     | 50/98 [00:13<00:13,  3.67it/s]

Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j,  0.25+0.j])


 52%|█████▏    | 51/98 [00:13<00:12,  3.73it/s]

mean tensor([-0.1342,  0.9942,  0.3925,  0.2300, -0.2565,  0.2014, -0.9995])
Average return: 0.4786426082804319
DFE Rewards Mean: 0.4786426082804319
DFE Rewards standard dev 0.08479575070399677
Returns Mean: 0.6650793
Returns standard dev 0.16827853
Advantages Mean: -0.15827897
Advantages standard dev 0.16827853
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'XI', 'XZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])


 53%|█████▎    | 52/98 [00:14<00:12,  3.67it/s]

mean tensor([-0.1208,  0.9927,  0.3734,  0.2226, -0.2601,  0.1794, -0.9991])
Average return: 0.9449380488489839
DFE Rewards Mean: 0.9449380488489839
DFE Rewards standard dev 0.0543304648775156
Returns Mean: 4.0688076
Returns standard dev 3.0341127
Advantages Mean: 3.438811
Advantages standard dev 3.0341127
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 54%|█████▍    | 53/98 [00:14<00:11,  3.81it/s]

mean tensor([-0.0488,  0.9669,  0.3053,  0.1135, -0.1579,  0.1137, -0.9925])
Average return: 0.7872783474473322
DFE Rewards Mean: 0.7872783474473322
DFE Rewards standard dev 0.06926365020719413
Returns Mean: 1.6110145
Returns standard dev 0.3804485
Advantages Mean: 0.68894
Advantages standard dev 0.3804485
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])


 55%|█████▌    | 54/98 [00:14<00:11,  3.94it/s]

mean tensor([-0.1442,  0.9967,  0.4055,  0.2592, -0.3275,  0.2354, -0.9913])
Average return: 0.47820173164507235
DFE Rewards Mean: 0.47820173164507235
DFE Rewards standard dev 0.0818580192266128
Returns Mean: 0.6632605
Returns standard dev 0.16207957
Advantages Mean: -0.9847506
Advantages standard dev 0.16207957
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YZ', 'ZY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 56%|█████▌    | 55/98 [00:14<00:10,  4.02it/s]

mean tensor([-0.1641,  0.9984,  0.4355,  0.2941, -0.3727,  0.2548, -0.9979])
Average return: 0.49244277322938856
DFE Rewards Mean: 0.49244277322938856
DFE Rewards standard dev 0.06803186367630033
Returns Mean: 0.6872922
Returns standard dev 0.13621606
Advantages Mean: -0.57427865
Advantages standard dev 0.13621606
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YZ', 'ZY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 57%|█████▋    | 56/98 [00:14<00:10,  4.07it/s]

mean tensor([-0.1673,  0.9986,  0.4387,  0.3210, -0.3844,  0.2961, -0.9986])
Average return: 0.5032864119421556
DFE Rewards Mean: 0.5032864119421556
DFE Rewards standard dev 0.07896341706722584
Returns Mean: 0.7128642
Returns standard dev 0.16425022
Advantages Mean: -0.20923802
Advantages standard dev 0.16425022
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 58%|█████▊    | 57/98 [00:15<00:09,  4.17it/s]

mean tensor([-0.0364,  0.9623,  0.3144,  0.1725, -0.2280,  0.1554, -0.9593])
Average return: 0.8031600021321068
DFE Rewards Mean: 0.8031600021321068
DFE Rewards standard dev 0.07217794433250797
Returns Mean: 1.7130952
Returns standard dev 0.47108316
Advantages Mean: 1.0877502
Advantages standard dev 0.47108316
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 59%|█████▉    | 58/98 [00:15<00:12,  3.18it/s]

mean tensor([-0.2045,  0.9991,  0.4444,  0.3625, -0.4320,  0.3441, -0.9955])
Average return: 0.48518438147746207
DFE Rewards Mean: 0.48518438147746207
DFE Rewards standard dev 0.08579695774368906
Returns Mean: 0.67806065
Returns standard dev 0.16939554
Advantages Mean: -0.6944443
Advantages standard dev 0.16939554
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'XI', 'XZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])


 60%|██████    | 59/98 [00:15<00:11,  3.34it/s]

mean tensor([-0.1265,  0.9941,  0.3928,  0.2916, -0.3583,  0.2486, -0.9814])
Average return: 0.9466896616184832
DFE Rewards Mean: 0.9466896616184832
DFE Rewards standard dev 0.04719704381588666
Returns Mean: 3.839087
Returns standard dev 2.6507182
Advantages Mean: 2.8813884
Advantages standard dev 2.6507182
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 61%|██████    | 60/98 [00:16<00:10,  3.46it/s]

mean tensor([-0.2105,  0.9994,  0.4516,  0.3296, -0.4298,  0.3685, -0.9984])
Average return: 0.48062795906840544
DFE Rewards Mean: 0.48062795906840544
DFE Rewards standard dev 0.0787841885575848
Returns Mean: 0.66683984
Returns standard dev 0.15418918
Advantages Mean: -1.0650136
Advantages standard dev 0.15418918
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 62%|██████▏   | 61/98 [00:16<00:10,  3.52it/s]

mean tensor([-0.1964,  0.9993,  0.4565,  0.3385, -0.4324,  0.3600, -0.9983])
Average return: 0.4986232091074373
DFE Rewards Mean: 0.4986232091074373
DFE Rewards standard dev 0.0780115629326254
Returns Mean: 0.70297885
Returns standard dev 0.16068035
Advantages Mean: -0.5899296
Advantages standard dev 0.16068035
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])


 63%|██████▎   | 62/98 [00:16<00:10,  3.56it/s]

mean tensor([-0.1353,  0.9970,  0.4006,  0.2865, -0.3820,  0.2859, -0.9935])
Average return: 0.24632246708176922
DFE Rewards Mean: 0.24632246708176922
DFE Rewards standard dev 0.07377379572912868
Returns Mean: 0.28766403
Returns standard dev 0.09928083
Advantages Mean: -0.6862726
Advantages standard dev 0.09928083
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 64%|██████▍   | 63/98 [00:17<00:09,  3.55it/s]

mean tensor([-0.1923,  0.9995,  0.4408,  0.3622, -0.4598,  0.3797, -0.9995])
Average return: 0.5080345737636741
DFE Rewards Mean: 0.5080345737636741
DFE Rewards standard dev 0.07585715227217556
Returns Mean: 0.7216779
Returns standard dev 0.15898131
Advantages Mean: 0.26806805
Advantages standard dev 0.15898131
Fidelity History: []
SparsePauliOp(['II', 'IY', 'XI', 'XY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 65%|██████▌   | 64/98 [00:17<00:09,  3.60it/s]

mean tensor([-0.2039,  0.9997,  0.4455,  0.4053, -0.4941,  0.3978, -0.9997])
Average return: 0.9072340125045534
DFE Rewards Mean: 0.9072340125045534
DFE Rewards standard dev 0.05595277431347156
Returns Mean: 2.801796
Returns standard dev 1.801049
Advantages Mean: 2.128969
Advantages standard dev 1.8010489
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'XI', 'XZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])


 66%|██████▋   | 65/98 [00:17<00:09,  3.63it/s]

mean tensor([-0.1126,  0.9968,  0.3561,  0.2876, -0.3933,  0.2922, -0.9956])
Average return: 0.9528352818637236
DFE Rewards Mean: 0.9528352818637236
DFE Rewards standard dev 0.04083662110022483
Returns Mean: 4.149062
Returns standard dev 3.0036829
Advantages Mean: 2.9444387
Advantages standard dev 3.0036829
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])


 67%|██████▋   | 66/98 [00:17<00:08,  3.67it/s]

mean tensor([-0.0746,  0.9946,  0.3346,  0.2304, -0.3467,  0.2743, -0.9918])
Average return: 0.2463781441483424
DFE Rewards Mean: 0.2463781441483424
DFE Rewards standard dev 0.07333093266102773
Returns Mean: 0.28775653
Returns standard dev 0.099883504
Advantages Mean: -1.4221066
Advantages standard dev 0.099883504
Fidelity History: []
SparsePauliOp(['II', 'IY', 'XI', 'XY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 68%|██████▊   | 67/98 [00:18<00:08,  3.61it/s]

mean tensor([-0.1877,  0.9999,  0.4207,  0.3803, -0.5009,  0.4098, -1.0000])
Average return: 0.9170014365811089
DFE Rewards Mean: 0.9170014365811089
DFE Rewards standard dev 0.050439801640831256
Returns Mean: 2.8496988
Returns standard dev 1.4982318
Advantages Mean: 1.4680829
Advantages standard dev 1.4982319
Fidelity History: []
SparsePauliOp(['II', 'IX', 'XI', 'XX'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 69%|██████▉   | 68/98 [00:18<00:08,  3.64it/s]

mean tensor([-0.1567,  0.9997,  0.3568,  0.3151, -0.4625,  0.3776, -0.9998])
Average return: 0.9235252380037021
DFE Rewards Mean: 0.9235252380037021
DFE Rewards standard dev 0.056692653425126006
Returns Mean: 3.2446706
Returns standard dev 2.2909782
Advantages Mean: 1.5807984
Advantages standard dev 2.2909782
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YZ', 'ZY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 70%|███████   | 69/98 [00:18<00:08,  3.53it/s]

mean tensor([-0.1717,  0.9998,  0.3403,  0.3184, -0.4769,  0.4037, -0.9999])
Average return: 0.5103096374247232
DFE Rewards Mean: 0.5103096374247232
DFE Rewards standard dev 0.08146555052212019
Returns Mean: 0.7282128
Returns standard dev 0.17062844
Advantages Mean: -1.2365177
Advantages standard dev 0.17062844
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])


 71%|███████▏  | 70/98 [00:19<00:08,  3.50it/s]

mean tensor([-0.1181,  0.9991,  0.3222,  0.2785, -0.4243,  0.3575, -0.9994])
Average return: 0.258842527732447
DFE Rewards Mean: 0.258842527732447
DFE Rewards standard dev 0.06373321917090062
Returns Mean: 0.3033023
Returns standard dev 0.087155245
Advantages Mean: -1.0455853
Advantages standard dev 0.087155245
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])


 72%|███████▏  | 71/98 [00:19<00:07,  3.47it/s]

mean tensor([-0.1229,  0.9992,  0.3220,  0.2893, -0.4317,  0.3726, -0.9996])
Average return: 0.2653530624942752
DFE Rewards Mean: 0.2653530624942752
DFE Rewards standard dev 0.06977599250499988
Returns Mean: 0.31297338
Returns standard dev 0.096658856
Advantages Mean: -0.59283626
Advantages standard dev 0.096658856
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])


 73%|███████▎  | 72/98 [00:19<00:07,  3.56it/s]

mean tensor([-0.1407,  0.9997,  0.3313,  0.3265, -0.4743,  0.4288, -0.9999])
Average return: 0.4984612021268958
DFE Rewards Mean: 0.4984612021268958
DFE Rewards standard dev 0.08593313037178828
Returns Mean: 0.7050667
Returns standard dev 0.17465612
Advantages Mean: 0.18718338
Advantages standard dev 0.17465612
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])


 74%|███████▍  | 73/98 [00:20<00:09,  2.69it/s]

mean tensor([-0.0994,  0.9986,  0.3201,  0.2780, -0.4299,  0.3542, -0.9992])
Average return: 0.2735003514676217
DFE Rewards Mean: 0.2735003514676217
DFE Rewards standard dev 0.07184053201874151
Returns Mean: 0.32458216
Returns standard dev 0.10174065
Advantages Mean: -0.36736205
Advantages standard dev 0.10174065
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j,  0.25+0.j])


 76%|███████▌  | 74/98 [00:20<00:08,  2.92it/s]

mean tensor([-0.1380,  0.9997,  0.3489,  0.3545, -0.4995,  0.4060, -0.9999])
Average return: 0.531916452827129
DFE Rewards Mean: 0.531916452827129
DFE Rewards standard dev 0.07642323486364788
Returns Mean: 0.7729333
Returns standard dev 0.16838269
Advantages Mean: 0.34649977
Advantages standard dev 0.16838269
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 77%|███████▋  | 75/98 [00:20<00:07,  3.09it/s]

mean tensor([-0.0704,  0.9976,  0.3128,  0.2858, -0.4249,  0.3052, -0.9986])
Average return: 0.7997269709014321
DFE Rewards Mean: 0.7997269709014321
DFE Rewards standard dev 0.06948557925737939
Returns Mean: 1.6762006
Returns standard dev 0.38990483
Advantages Mean: 1.0251297
Advantages standard dev 0.38990483
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 78%|███████▊  | 76/98 [00:20<00:06,  3.24it/s]

mean tensor([-0.0233,  0.9890,  0.2820,  0.2531, -0.3504,  0.2783, -0.9884])
Average return: 0.8034799239108192
DFE Rewards Mean: 0.8034799239108192
DFE Rewards standard dev 0.06481233589015072
Returns Mean: 1.6966344
Returns standard dev 0.41656232
Advantages Mean: 0.6211292
Advantages standard dev 0.41656232
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 79%|███████▊  | 77/98 [00:21<00:06,  3.35it/s]

mean tensor([-0.0433,  0.9918,  0.3037,  0.2991, -0.4136,  0.3083, -0.9671])
Average return: 0.8221140555966712
DFE Rewards Mean: 0.8221140555966712
DFE Rewards standard dev 0.061069912308623485
Returns Mean: 1.8002088
Returns standard dev 0.42647985
Advantages Mean: 0.4126063
Advantages standard dev 0.42647988
Fidelity History: []
SparsePauliOp(['II', 'IX', 'XI', 'XX'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 80%|███████▉  | 78/98 [00:21<00:05,  3.41it/s]

mean tensor([-0.1813,  0.9998,  0.3890,  0.4621, -0.6309,  0.4940, -0.9978])
Average return: 0.9064567508326273
DFE Rewards Mean: 0.9064567508326273
DFE Rewards standard dev 0.0563223424919008
Returns Mean: 2.783691
Returns standard dev 1.7432573
Advantages Mean: 0.77937734
Advantages standard dev 1.7432573
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 81%|████████  | 79/98 [00:21<00:05,  3.42it/s]

mean tensor([-0.2061,  0.9999,  0.3757,  0.4840, -0.6501,  0.5404, -0.9984])
Average return: 0.5653007640119138
DFE Rewards Mean: 0.5653007640119138
DFE Rewards standard dev 0.07686875289484328
Returns Mean: 0.84985733
Returns standard dev 0.1876049
Advantages Mean: -1.4847854
Advantages standard dev 0.1876049
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])


 82%|████████▏ | 80/98 [00:22<00:05,  3.40it/s]

mean tensor([-0.1085,  0.9989,  0.3192,  0.4035, -0.5517,  0.4253, -0.9863])
Average return: 0.3185311022347601
DFE Rewards Mean: 0.3185311022347601
DFE Rewards standard dev 0.08186899604151597
Returns Mean: 0.39100456
Returns standard dev 0.123999156
Advantages Mean: -1.2479049
Advantages standard dev 0.123999156
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 83%|████████▎ | 81/98 [00:22<00:05,  3.14it/s]

mean tensor([-0.0517,  0.9948,  0.2834,  0.3496, -0.4847,  0.3325, -0.9743])
Average return: 0.832550147939393
DFE Rewards Mean: 0.832550147939393
DFE Rewards standard dev 0.06349499030856533
Returns Mean: 1.8859859
Returns standard dev 0.5953827
Advantages Mean: 0.83064735
Advantages standard dev 0.5953827
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])


 84%|████████▎ | 82/98 [00:22<00:05,  3.18it/s]

mean tensor([-0.1138,  0.9990,  0.3201,  0.4394, -0.6153,  0.4302, -0.9901])
Average return: 0.33665817277093973
DFE Rewards Mean: 0.33665817277093973
DFE Rewards standard dev 0.08031242400331155
Returns Mean: 0.41805667
Returns standard dev 0.12456376
Advantages Mean: -1.0242404
Advantages standard dev 0.12456376
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])


 85%|████████▍ | 83/98 [00:23<00:04,  3.24it/s]

mean tensor([-0.1458,  0.9997,  0.3397,  0.4975, -0.6735,  0.4697, -0.9986])
Average return: 0.31997202801532254
DFE Rewards Mean: 0.31997202801532254
DFE Rewards standard dev 0.07072098796898905
Returns Mean: 0.39117044
Returns standard dev 0.10621147
Advantages Mean: -0.50200814
Advantages standard dev 0.10621147
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 86%|████████▌ | 84/98 [00:23<00:04,  3.31it/s]

mean tensor([-0.2076,  1.0000,  0.3807,  0.5858, -0.7621,  0.5781, -0.9999])
Average return: 0.578513844172416
DFE Rewards Mean: 0.578513844172416
DFE Rewards standard dev 0.08658640602055595
Returns Mean: 0.88609636
Returns standard dev 0.2141806
Advantages Mean: 0.48484024
Advantages standard dev 0.2141806
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 87%|████████▋ | 85/98 [00:23<00:03,  3.48it/s]

mean tensor([-0.0431,  0.9930,  0.2703,  0.3833, -0.5289,  0.3253, -0.9632])
Average return: 0.8427707185011042
DFE Rewards Mean: 0.8427707185011042
DFE Rewards standard dev 0.06163644485776726
Returns Mean: 1.9557443
Returns standard dev 0.6100243
Advantages Mean: 1.1146787
Advantages standard dev 0.6100243
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])


 88%|████████▊ | 86/98 [00:23<00:03,  3.58it/s]

mean tensor([-0.1020,  0.9991,  0.3061,  0.4834, -0.6722,  0.4239, -0.9903])
Average return: 0.35875218660083635
DFE Rewards Mean: 0.35875218660083635
DFE Rewards standard dev 0.07941775661017154
Returns Mean: 0.45230216
Returns standard dev 0.12768854
Advantages Mean: -0.8514927
Advantages standard dev 0.12768854
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 89%|████████▉ | 87/98 [00:24<00:03,  3.57it/s]

mean tensor([-0.2148,  1.0000,  0.3895,  0.6274, -0.8199,  0.5737, -1.0000])
Average return: 0.6003341174627914
DFE Rewards Mean: 0.6003341174627914
DFE Rewards standard dev 0.08206981265335703
Returns Mean: 0.93887043
Returns standard dev 0.21121582
Advantages Mean: 0.3892481
Advantages standard dev 0.21121581
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 90%|████████▉ | 88/98 [00:24<00:03,  2.79it/s]

mean tensor([-0.0636,  0.9974,  0.2914,  0.4455, -0.6431,  0.3556, -0.9835])
Average return: 0.8607454174970629
DFE Rewards Mean: 0.8607454174970629
DFE Rewards standard dev 0.058636151820663986
Returns Mean: 2.1034963
Returns standard dev 0.6980666
Advantages Mean: 1.1459721
Advantages standard dev 0.69806665
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 91%|█████████ | 89/98 [00:25<00:03,  2.98it/s]

mean tensor([-0.0975,  0.9989,  0.3188,  0.5085, -0.7354,  0.4210, -0.9844])
Average return: 0.8318319670935649
DFE Rewards Mean: 0.8318319670935649
DFE Rewards standard dev 0.06970089136385957
Returns Mean: 1.8915894
Returns standard dev 0.51796657
Advantages Mean: 0.43693855
Advantages standard dev 0.5179666
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])


 92%|█████████▏| 90/98 [00:25<00:02,  3.15it/s]

mean tensor([-0.1815,  0.9999,  0.3692,  0.6091, -0.8205,  0.6041, -0.9973])
Average return: 0.6190536311091085
DFE Rewards Mean: 0.6190536311091085
DFE Rewards standard dev 0.08261039932416589
Returns Mean: 0.98916537
Returns standard dev 0.2222236
Advantages Mean: -0.796971
Advantages standard dev 0.2222236
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])


 93%|█████████▎| 91/98 [00:25<00:02,  3.30it/s]

mean tensor([-0.1118,  0.9993,  0.3483,  0.5521, -0.7625,  0.5125, -0.9850])
Average return: 0.39647651520740296
DFE Rewards Mean: 0.39647651520740296
DFE Rewards standard dev 0.07856450785310724
Returns Mean: 0.5137047
Returns standard dev 0.13351542
Advantages Mean: -0.9389915
Advantages standard dev 0.13351542
Fidelity History: []
SparsePauliOp(['II', 'IX', 'XI', 'XX'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 94%|█████████▍| 92/98 [00:25<00:01,  3.42it/s]

mean tensor([-0.1891,  1.0000,  0.4011,  0.6564, -0.8569,  0.6268, -0.9998])
Average return: 0.8865539565908855
DFE Rewards Mean: 0.8865539565908855
DFE Rewards standard dev 0.05583156453870682
Returns Mean: 2.4370315
Returns standard dev 1.3611611
Advantages Mean: 1.6840376
Advantages standard dev 1.3611611
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YZ', 'ZY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 95%|█████████▍| 93/98 [00:26<00:01,  3.50it/s]

mean tensor([-0.1942,  1.0000,  0.3950,  0.6581, -0.8636,  0.6575, -0.9998])
Average return: 0.6336437420266404
DFE Rewards Mean: 0.6336437420266404
DFE Rewards standard dev 0.07198924299085548
Returns Mean: 1.0244476
Returns standard dev 0.20543653
Advantages Mean: -0.34630108
Advantages standard dev 0.20543653
Fidelity History: []
SparsePauliOp(['II', 'IY', 'XI', 'XY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 96%|█████████▌| 94/98 [00:26<00:01,  3.56it/s]

mean tensor([-0.2395,  1.0000,  0.4158,  0.6956, -0.8875,  0.7098, -1.0000])
Average return: 0.89969050794476
DFE Rewards Mean: 0.89969050794476
DFE Rewards standard dev 0.04914424631685466
Returns Mean: 2.5318632
Returns standard dev 1.2062451
Advantages Mean: 1.4337481
Advantages standard dev 1.206245
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 97%|█████████▋| 95/98 [00:26<00:00,  3.57it/s]

mean tensor([-0.0612,  0.9976,  0.3033,  0.5052, -0.7067,  0.4651, -0.9682])
Average return: 0.87719247776807
DFE Rewards Mean: 0.87719247776807
DFE Rewards standard dev 0.054863970195668216
Returns Mean: 2.2575505
Returns standard dev 0.86838174
Advantages Mean: 0.7938992
Advantages standard dev 0.86838174
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'XI', 'XZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])


 98%|█████████▊| 96/98 [00:26<00:00,  3.40it/s]

mean tensor([-0.1293,  0.9998,  0.3432,  0.6015, -0.8422,  0.5960, -0.9958])
Average return: 0.9221187329563117
DFE Rewards Mean: 0.9221187329563117
DFE Rewards standard dev 0.043913480397827706
Returns Mean: 2.9855967
Returns standard dev 1.888985
Advantages Mean: 1.1515622
Advantages standard dev 1.8889853
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j,  0.25+0.j])


 99%|█████████▉| 97/98 [00:27<00:00,  3.41it/s]

mean tensor([-0.1515,  0.9999,  0.3613,  0.6247, -0.8599,  0.6766, -0.9984])
Average return: 0.6544562346459825
DFE Rewards Mean: 0.6544562346459825
DFE Rewards standard dev 0.0742091384603806
Returns Mean: 1.0877975
Returns standard dev 0.2315501
Advantages Mean: -1.0886683
Advantages standard dev 0.23155008
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


100%|██████████| 98/98 [00:27<00:00,  3.56it/s]
[I 2024-01-25 15:03:37,716] Trial 0 finished with value: 0.6190536311091085 and parameters: {'N_UPDATES': 98, 'N_EPOCHS': 17, 'MINIBATCH_SIZE': 96, 'BATCHSIZE_MULTIPLIER': 10, 'LR': 0.00042023154103451897, 'GAMMA': 0.9670099769157185, 'GAE_LAMBDA': 0.9087475540398442, 'ENT_COEF': 0.00010745841175495839, 'V_COEF': 0.3779768820449133, 'GRADIENT_CLIP': 0.503131379582895, 'CLIP_VALUE_COEF': 0.23786622162463758, 'CLIP_RATIO': 0.17325524513585705}. Best is trial 0 with value: 0.6190536311091085.


mean tensor([-0.0572,  0.9982,  0.2989,  0.5102, -0.7543,  0.4952, -0.9582])
Average return: 0.8862122193822908
DFE Rewards Mean: 0.8862122193822908
DFE Rewards standard dev 0.04967758667792034
Returns Mean: 2.3776822
Returns standard dev 1.1805786
Advantages Mean: 0.79685616
Advantages standard dev 1.1805786
Fidelity History: []


 11%|█         | 3/28 [00:00<00:01, 20.82it/s]

SparsePauliOp(['II', 'IZ', 'XI', 'XZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])
mean tensor([ 0.0890, -0.1357, -0.0516,  0.0200,  0.0564,  0.0075,  0.0998])
Average return: 0.7886305983243335
DFE Rewards Mean: 0.7886305983243335
DFE Rewards standard dev 0.15528169620247942
Returns Mean: 2.0480933
Returns standard dev 1.5202376
Advantages Mean: 1.9921441
Advantages standard dev 1.5202378
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'XI', 'XZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([ 0.0131, -0.0870, -0.0870,  0.0243, -0.0242,  0.0262,  0.0658])
Average return: 0.8319365435183712
DFE Rewards Mean: 0.8319365435183712
DFE Rewards standard dev 0.12588220617007845
Returns Mean: 2.2029352
Returns standard dev 1.3787714
Advantages Mean: 1.643926
Advantages standard dev 1.3787714
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([ 0.0143, -0.1

 21%|██▏       | 6/28 [00:00<00:01, 17.97it/s]

mean tensor([-0.1401, -0.4674, -0.2443, -0.2312, -0.1736,  0.1327, -0.0052])
Average return: 0.254936416933273
DFE Rewards Mean: 0.254936416933273
DFE Rewards standard dev 0.10139344542201907
Returns Mean: 0.30427116
Returns standard dev 0.13816465
Advantages Mean: -1.2841386
Advantages standard dev 0.13816465
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'XI', 'XZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.0467, -0.2330, -0.1335, -0.1026, -0.1341, -0.0694,  0.0761])
Average return: 0.9038893571777306
DFE Rewards Mean: 0.9038893571777306
DFE Rewards standard dev 0.08101633393605186
Returns Mean: 3.5392766
Returns standard dev 3.1920633
Advantages Mean: 3.049429
Advantages standard dev 3.1920636
Fidelity History: []
SparsePauliOp(['II', 'IY', 'XI', 'XY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.0854, -0.3700, -0.1925, -0.1654, -0.2091,  0.1906,  0.3605])
Average return: 0.8428320795271744
DFE Rewards Mean: 0

 43%|████▎     | 12/28 [00:00<00:00, 19.92it/s]

mean tensor([-0.0319, -0.5805, -0.2785, -0.1141, -0.2165,  0.1359,  0.4638])
Average return: 0.8667570321596002
DFE Rewards Mean: 0.8667570321596002
DFE Rewards standard dev 0.13476669623746795
Returns Mean: 3.0759034
Returns standard dev 2.6754436
Advantages Mean: 1.8256679
Advantages standard dev 2.6754436
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.0612, -0.4054, -0.1484, -0.0724, -0.0380,  0.0277,  0.4081])
Average return: 0.8975983029615472
DFE Rewards Mean: 0.8975983029615472
DFE Rewards standard dev 0.08680303378221967
Returns Mean: 3.231656
Returns standard dev 2.7413638
Advantages Mean: 2.0134144
Advantages standard dev 2.7413638
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.0352, -0.6423, -0.2158, -0.1885, -0.1281,  0.0404,  0.3342])
Average return: 0.27634812749690885
DFE Rewards Mean: 0.

 54%|█████▎    | 15/28 [00:00<00:00, 20.94it/s]

mean tensor([-0.0534, -0.7948, -0.1920,  0.0042, -0.1672,  0.0407,  0.8125])
Average return: 0.16727675649073315
DFE Rewards Mean: 0.16727675649073315
DFE Rewards standard dev 0.09462338792484938
Returns Mean: 0.19007418
Returns standard dev 0.11356823
Advantages Mean: -1.2509104
Advantages standard dev 0.11356824
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YZ', 'ZY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.0455, -0.8184, -0.1956,  0.0477, -0.2234,  0.0482,  0.9548])
Average return: 0.41786701155707406
DFE Rewards Mean: 0.41786701155707406
DFE Rewards standard dev 0.08427014596041751
Returns Mean: 0.5515899
Returns standard dev 0.14566733
Advantages Mean: -0.16728824
Advantages standard dev 0.14566733
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])
mean tensor([-0.0581, -0.8686, -0.2358,  0.0505, -0.1687,  0.0687,  0.9330])
Average return: 0.18683902659434673
DFE

 64%|██████▍   | 18/28 [00:00<00:00, 21.47it/s]

mean tensor([-0.1090, -0.9650, -0.3251, -0.0300, -0.2157,  0.0645,  0.9914])
Average return: 0.4376911025953747
DFE Rewards Mean: 0.4376911025953747
DFE Rewards standard dev 0.08008355837234188
Returns Mean: 0.5856712
Returns standard dev 0.14095695
Advantages Mean: 0.013277244
Advantages standard dev 0.14095695
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 75%|███████▌  | 21/28 [00:00<00:00, 22.39it/s]

mean tensor([-0.1433, -0.9831, -0.4276, -0.0282, -0.2645,  0.0739,  0.9960])
Average return: 0.4404762415296227
DFE Rewards Mean: 0.4404762415296227
DFE Rewards standard dev 0.07772887359414853
Returns Mean: 0.5904192
Returns standard dev 0.14041059
Advantages Mean: -0.030319642
Advantages standard dev 0.14041059
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])
mean tensor([-0.1313, -0.9791, -0.4831, -0.0558, -0.2373,  0.1418,  0.9933])
Average return: 0.4609760164580915
DFE Rewards Mean: 0.4609760164580915
DFE Rewards standard dev 0.0839312112440755
Returns Mean: 0.63017654
Returns standard dev 0.15671413
Advantages Mean: 0.05866526
Advantages standard dev 0.15671413
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])
mean tensor([-0.1769, -0.9861, -0.5739, -0.1259, -0.2483,  0.1334,  0.9960])
Average return: 0.46452991265886473
DFE

 86%|████████▌ | 24/28 [00:01<00:00, 23.34it/s]

SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.1038, -0.9312, -0.4238, -0.2090, -0.1308,  0.1166,  0.9514])
Average return: 0.7848736448255664
DFE Rewards Mean: 0.7848736448255664
DFE Rewards standard dev 0.07036264541774971
Returns Mean: 1.601016
Returns standard dev 0.38379776
Advantages Mean: 0.7712362
Advantages standard dev 0.3837978
Fidelity History: []


100%|██████████| 28/28 [00:01<00:00, 21.98it/s]
[I 2024-01-25 15:03:38,996] Trial 1 finished with value: 0.4963349345701975 and parameters: {'N_UPDATES': 28, 'N_EPOCHS': 12, 'MINIBATCH_SIZE': 64, 'BATCHSIZE_MULTIPLIER': 2, 'LR': 0.004247062447252421, 'GAMMA': 0.9791207939698907, 'GAE_LAMBDA': 0.9670613772766286, 'ENT_COEF': 0.0008484573907645138, 'V_COEF': 0.6212487024971437, 'GRADIENT_CLIP': 0.689859388192491, 'CLIP_VALUE_COEF': 0.27387145109536043, 'CLIP_RATIO': 0.23395881031735236}. Best is trial 0 with value: 0.6190536311091085.


SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.0932, -0.9332, -0.3921, -0.2298, -0.1255,  0.1313,  0.9091])
Average return: 0.8187080175232382
DFE Rewards Mean: 0.8187080175232382
DFE Rewards standard dev 0.07102336046384021
Returns Mean: 1.8181818
Returns standard dev 0.5307631
Advantages Mean: 0.4203944
Advantages standard dev 0.5307631
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])
mean tensor([-0.0908, -0.9752, -0.4985, -0.2634, -0.2843,  0.1043,  0.8877])
Average return: 0.18869863808165044
DFE Rewards Mean: 0.18869863808165044
DFE Rewards standard dev 0.07070649318630991
Returns Mean: 0.21299955
Returns standard dev 0.088712
Advantages Mean: -1.6569061
Advantages standard dev 0.088712
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])
mean tensor([-0.065

#### Quick Summary of HPO Task

In [8]:
optimizer.target_gate

{'target_gate': Instruction(name='cx', num_qubits=2, num_clbits=0, params=[]),
 'target_register': [0, 1]}

In [9]:
optimizer.hyperparams

['N_UPDATES',
 'N_EPOCHS',
 'MINIBATCH_SIZE',
 'BATCHSIZE_MULTIPLIER',
 'LR',
 'GAMMA',
 'GAE_LAMBDA',
 'ENT_COEF',
 'V_COEF',
 'GRADIENT_CLIP',
 'CLIP_VALUE_COEF',
 'CLIP_RATIO',
 'BATCHSIZE']

In [10]:
optimizer.num_hpo_trials

2

In [11]:
optimizer.best_hpo_configuration

{'best_avg_reward': 0.6190536311091085,
 'best_hyperparams': {'N_UPDATES': 98,
  'N_EPOCHS': 17,
  'MINIBATCH_SIZE': 96,
  'BATCHSIZE_MULTIPLIER': 10,
  'LR': 0.00042023154103451897,
  'GAMMA': 0.9670099769157185,
  'GAE_LAMBDA': 0.9087475540398442,
  'ENT_COEF': 0.00010745841175495839,
  'V_COEF': 0.3779768820449133,
  'GRADIENT_CLIP': 0.503131379582895,
  'CLIP_VALUE_COEF': 0.23786622162463758,
  'CLIP_RATIO': 0.17325524513585705}}