## CX Calibration with HPO

#### Imports

In [1]:
import sys
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "True"
module_path = os.path.abspath(os.path.join('/Users/lukasvoss/Documents/Master Wirtschaftsphysik/Masterarbeit Yale-NUS CQT/Quantum_Optimal_Control'))
if module_path not in sys.path:
    sys.path.append(module_path)

from template_configurations import gate_q_env_config
import logging
logging.basicConfig(
    level=logging.WARNING,
    format="%(asctime)s INFO %(message)s", # hardcoded INFO level
    datefmt="%Y-%m-%d %H:%M:%S",
    stream=sys.stdout,
)



Starting Rabi experiment for qubit 0...
Rabi experiment for qubit 0 done.
Starting Drag experiment for qubit 0...
Drag experiments done for qubit 0 done.
Starting Rabi experiment for qubit 1...
Rabi experiment for qubit 1 done.
Starting Drag experiment for qubit 1...
Drag experiments done for qubit 1 done.
All single qubit calibrations are done
Updated Instruction Schedule Map <InstructionScheduleMap(1Q instructions:
  q0: {'h', 'sdg', 'measure', 'z', 't', 'sx', 'tdg', 'x', 'delay', 'id', 'rz', 'reset', 's'}
  q1: {'h', 'sdg', 'measure', 'z', 't', 'sx', 'tdg', 'x', 'delay', 'id', 'rz', 'reset', 's'}
Multi qubit instructions:
  (0, 1): {'ecr', 'cr45p', 'cr45m'}
  (1, 0): {'ecr', 'cr45p', 'cr45m'}
)>


Which gate is to be calibrated?

In [2]:
gate_q_env_config.target

{'register': [0, 1],
 'training_with_cal': True,
 'gate': Instruction(name='cx', num_qubits=2, num_clbits=0, params=[])}

### Perform HPO

In [3]:
from hyperparameter_optimization import HyperparameterOptimizer

Set path to the files specifying the RL agent and where to store the HPO results|

In [4]:
path_agent_config = '/Users/lukasvoss/Documents/Master Wirtschaftsphysik/Masterarbeit Yale-NUS CQT/Quantum_Optimal_Control/template_configurations/agent_config.yaml'
save_results_path = 'hpo_results'

In [5]:
optimizer = HyperparameterOptimizer(gate_q_env_config, path_agent_config, save_results_path, log_progress=True, num_hpo_trials=1)
optimizer.optimize_hyperparameters()

SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
2023-12-28 13:22:03 INFO num_HPO_trials: 1
2023-12-28 13:22:03 INFO ---------------- STARTING HPO ----------------


[I 2023-12-28 13:22:03,333] A new study created in memory with name: no-name-6b95916e-d0a1-49b5-94e3-9ef5455e12b3


SparsePauliOp(['II', 'IY', 'XI', 'XY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


  0%|          | 0/43 [00:00<?, ?it/s]

SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])


  2%|▏         | 1/43 [00:00<00:08,  4.67it/s]

mean tensor([0.0733, 0.1214, 0.0364, 0.0350, 0.0348, 0.1169, 0.0748])
Average return: 0.7932033859813018
DFE Rewards Mean: 0.7932033859813018
DFE Rewards standard dev 0.1450912307292657
Returns Mean: 1.9965386
Returns standard dev 1.4928375
Advantages Mean: 2.0438228
Advantages standard dev 1.4928375
Fidelity History: []


  5%|▍         | 2/43 [00:00<00:07,  5.17it/s]

SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([0.0180, 0.1173, 0.0088, 0.0409, 0.0214, 0.1171, 0.0629])
Average return: 0.8325556937255028
DFE Rewards Mean: 0.8325556937255028
DFE Rewards standard dev 0.13807577771559
Returns Mean: 2.4175286
Returns standard dev 2.0306807
Advantages Mean: 2.198555
Advantages standard dev 2.0306807
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])


  9%|▉         | 4/43 [00:00<00:07,  5.30it/s]

mean tensor([ 0.0083,  0.1592,  0.0260, -0.0219,  0.0455,  0.1245, -0.0086])
Average return: 0.8432917758180216
DFE Rewards Mean: 0.8432917758180216
DFE Rewards standard dev 0.12762134878328055
Returns Mean: 2.4277823
Returns standard dev 1.9341273
Advantages Mean: 1.0211478
Advantages standard dev 1.9341273
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])
mean tensor([ 0.0109,  0.1448, -0.0533, -0.0400,  0.0337,  0.0959, -0.0284])
Average return: 0.887038565686135
DFE Rewards Mean: 0.887038565686135
DFE Rewards standard dev 0.1024481068849365
Returns Mean: 2.869763
Returns standard dev 2.13862
Advantages Mean: 0.82926047
Advantages standard dev 2.13862
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YZ', 'ZX'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])


 14%|█▍        | 6/43 [00:01<00:06,  5.37it/s]

mean tensor([ 0.0267,  0.0613, -0.0577, -0.0391,  0.0167,  0.0574, -0.0170])
Average return: 0.8882999207008024
DFE Rewards Mean: 0.8882999207008024
DFE Rewards standard dev 0.08758455118466299
Returns Mean: 2.8406634
Returns standard dev 2.1328888
Advantages Mean: 0.34687048
Advantages standard dev 2.1328888
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j,  0.25+0.j])
mean tensor([ 0.0127,  0.0653, -0.0520,  0.0065,  0.0555,  0.0597, -0.0093])
Average return: 0.9207596624755595
DFE Rewards Mean: 0.9207596624755595
DFE Rewards standard dev 0.07368413173294013
Returns Mean: 3.5507493
Returns standard dev 2.8153455
Advantages Mean: 1.5577595
Advantages standard dev 2.8153458
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 19%|█▊        | 8/43 [00:01<00:06,  5.41it/s]

mean tensor([-2.7020e-05,  9.9462e-02, -7.2678e-02,  2.9703e-02,  5.4768e-02,
         1.1600e-01,  1.0808e-02])
Average return: 0.9039426390323783
DFE Rewards Mean: 0.9039426390323783
DFE Rewards standard dev 0.08709212022367603
Returns Mean: 3.2858968
Returns standard dev 2.6098614
Advantages Mean: 2.084497
Advantages standard dev 2.6098614
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])
mean tensor([ 0.0165,  0.0921, -0.0630, -0.0024,  0.0286,  0.1218, -0.0355])
Average return: 0.9623034108148922
DFE Rewards Mean: 0.9623034108148922
DFE Rewards standard dev 0.04376280809208593
Returns Mean: 5.029927
Returns standard dev 3.6862307
Advantages Mean: 2.8444586
Advantages standard dev 3.6862304
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 23%|██▎       | 10/43 [00:01<00:06,  5.43it/s]

mean tensor([ 0.0195,  0.0942, -0.0365, -0.0445,  0.0519,  0.0851, -0.0424])
Average return: 0.9643644196691071
DFE Rewards Mean: 0.9643644196691071
DFE Rewards standard dev 0.041802927036821556
Returns Mean: 4.7683425
Returns standard dev 3.2109616
Advantages Mean: 1.2547338
Advantages standard dev 3.2109613
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'XI', 'XZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([ 0.0060,  0.0923, -0.0420,  0.0132,  0.0398,  0.1157, -0.0034])
Average return: 0.9371696417710144
DFE Rewards Mean: 0.9371696417710144
DFE Rewards standard dev 0.06724449552960872
Returns Mean: 4.094883
Returns standard dev 3.1783016
Advantages Mean: 1.9162228
Advantages standard dev 3.178302
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])


 28%|██▊       | 12/43 [00:02<00:05,  5.52it/s]

mean tensor([ 0.0108,  0.0655, -0.0488, -0.0299,  0.0466,  0.0074,  0.0091])
Average return: 0.9558292045329679
DFE Rewards Mean: 0.9558292045329679
DFE Rewards standard dev 0.04984668387134528
Returns Mean: 4.6245155
Returns standard dev 3.3495753
Advantages Mean: 1.0278373
Advantages standard dev 3.349575
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YZ', 'ZX'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])
mean tensor([ 0.0211,  0.0502, -0.0356, -0.0485,  0.0403, -0.0114,  0.0062])
Average return: 0.9737826801913382
DFE Rewards Mean: 0.9737826801913382
DFE Rewards standard dev 0.03389250154316218
Returns Mean: 5.5918045
Returns standard dev 3.8906431
Advantages Mean: 0.8985843
Advantages standard dev 3.8906431
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 33%|███▎      | 14/43 [00:02<00:05,  5.49it/s]

mean tensor([ 0.0185,  0.0723, -0.0639,  0.0049,  0.0571,  0.0381, -0.0002])
Average return: 0.9626907589586546
DFE Rewards Mean: 0.9626907589586546
DFE Rewards standard dev 0.03874368108122902
Returns Mean: 4.6819344
Returns standard dev 3.260071
Advantages Mean: 1.6669319
Advantages standard dev 3.260071
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YZ', 'ZX'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])
mean tensor([ 0.0044,  0.0691, -0.0963, -0.0280,  0.0572, -0.0113, -0.0174])
Average return: 0.9821499282080691
DFE Rewards Mean: 0.9821499282080691
DFE Rewards standard dev 0.024526248492464018
Returns Mean: 5.87078
Returns standard dev 3.7551227
Advantages Mean: 0.5790686
Advantages standard dev 3.7551227
Fidelity History: []
SparsePauliOp(['II', 'IX', 'XI', 'XX'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 37%|███▋      | 16/43 [00:02<00:04,  5.48it/s]

mean tensor([ 0.0166,  0.0820, -0.0806, -0.0264,  0.0598,  0.0168, -0.0335])
Average return: 0.9743676863911024
DFE Rewards Mean: 0.9743676863911024
DFE Rewards standard dev 0.03652536283301175
Returns Mean: 5.8081183
Returns standard dev 3.9454813
Advantages Mean: 1.0222323
Advantages standard dev 3.945481
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])
mean tensor([ 0.0184,  0.0898, -0.0547, -0.0051,  0.0523,  0.0416, -0.0143])
Average return: 0.9790650087911607
DFE Rewards Mean: 0.9790650087911607
DFE Rewards standard dev 0.024251982676254513
Returns Mean: 5.5661497
Returns standard dev 3.5939696
Advantages Mean: 1.6944307
Advantages standard dev 3.5939698
Fidelity History: []
SparsePauliOp(['II', 'IY', 'XI', 'XY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 42%|████▏     | 18/43 [00:03<00:04,  5.48it/s]

mean tensor([ 1.0947e-02,  9.5225e-02, -5.5441e-02, -1.5042e-02,  3.6220e-02,
        -7.7181e-05,  1.0271e-02])
Average return: 0.9865771682556068
DFE Rewards Mean: 0.9865771682556068
DFE Rewards standard dev 0.02182886112088382
Returns Mean: 6.7427797
Returns standard dev 4.1336927
Advantages Mean: 0.737718
Advantages standard dev 4.1336927
Fidelity History: []
SparsePauliOp(['II', 'IY', 'XI', 'XY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([ 0.0107,  0.0847, -0.0780,  0.0036,  0.0263, -0.0134,  0.0295])
Average return: 0.9872930565382149
DFE Rewards Mean: 0.9872930565382149
DFE Rewards standard dev 0.016983454467141973
Returns Mean: 6.3932467
Returns standard dev 3.8272398
Advantages Mean: 0.11299678
Advantages standard dev 3.8272398
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j,  0.25+0.j])


 47%|████▋     | 20/43 [00:03<00:04,  5.45it/s]

mean tensor([ 0.0248,  0.0897, -0.0898,  0.0080,  0.0313,  0.0002,  0.0233])
Average return: 0.9833069137371914
DFE Rewards Mean: 0.9833069137371914
DFE Rewards standard dev 0.023332247059968766
Returns Mean: 6.210702
Returns standard dev 3.9740133
Advantages Mean: 0.8984923
Advantages standard dev 3.9740133
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'XI', 'XZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([ 0.0267,  0.1012, -0.0962,  0.0269,  0.0321,  0.0263,  0.0071])
Average return: 0.9831574363666833
DFE Rewards Mean: 0.9831574363666833
DFE Rewards standard dev 0.02042083534450103
Returns Mean: 5.7688627
Returns standard dev 3.5113595
Advantages Mean: 1.6354091
Advantages standard dev 3.5113595
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])


 51%|█████     | 22/43 [00:04<00:03,  5.31it/s]

mean tensor([ 0.0138,  0.1120, -0.1120,  0.0284,  0.0243,  0.0146,  0.0018])
Average return: 0.9767266753026833
DFE Rewards Mean: 0.9767266753026833
DFE Rewards standard dev 0.027422557331797635
Returns Mean: 5.5975685
Returns standard dev 3.720968
Advantages Mean: 1.464298
Advantages standard dev 3.720968
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([ 0.0132,  0.1308, -0.1239, -0.0016,  0.0083, -0.0323, -0.0047])
Average return: 0.9896203115827209
DFE Rewards Mean: 0.9896203115827209
DFE Rewards standard dev 0.018257782641085307
Returns Mean: 7.196213
Returns standard dev 4.2409186
Advantages Mean: 0.2523086
Advantages standard dev 4.2409186
Fidelity History: []


 53%|█████▎    | 23/43 [00:04<00:03,  5.33it/s]

SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])
mean tensor([ 0.0217,  0.1216, -0.1273,  0.0214,  0.0276, -0.0092, -0.0126])
Average return: 0.991256187770718
DFE Rewards Mean: 0.991256187770718
DFE Rewards standard dev 0.014236891714790251
Returns Mean: 7.154313
Returns standard dev 4.1574287
Advantages Mean: 1.6983478
Advantages standard dev 4.1574287
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])


 56%|█████▌    | 24/43 [00:04<00:03,  5.28it/s]

mean tensor([ 0.0080,  0.1400, -0.1166, -0.0126,  0.0111, -0.0315,  0.0022])
Average return: 0.9881138601998476
DFE Rewards Mean: 0.9881138601998476
DFE Rewards standard dev 0.01877610807215894
Returns Mean: 6.817088
Returns standard dev 4.0446477
Advantages Mean: -0.09926596
Advantages standard dev 4.0446477
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 60%|██████    | 26/43 [00:04<00:03,  5.07it/s]

mean tensor([-0.0078,  0.1485, -0.1099, -0.0265,  0.0063, -0.0525,  0.0178])
Average return: 0.9909813655609064
DFE Rewards Mean: 0.9909813655609064
DFE Rewards standard dev 0.015527814362266112
Returns Mean: 7.152548
Returns standard dev 4.117673
Advantages Mean: -0.22391763
Advantages standard dev 4.117673
Fidelity History: []
SparsePauliOp(['II', 'IY', 'XI', 'XY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.0054,  0.1285, -0.0980, -0.0306,  0.0042, -0.0651,  0.0276])
Average return: 0.9922856006888707
DFE Rewards Mean: 0.9922856006888707
DFE Rewards standard dev 0.014617177579278303
Returns Mean: 7.449723
Returns standard dev 4.1315904
Advantages Mean: -0.043766744
Advantages standard dev 4.1315904
Fidelity History: []


 63%|██████▎   | 27/43 [00:05<00:03,  5.08it/s]

SparsePauliOp(['II', 'IY', 'XI', 'XY'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.0071,  0.1079, -0.1101, -0.0325, -0.0036, -0.0612,  0.0339])
Average return: 0.9932283371832582
DFE Rewards Mean: 0.9932283371832582
DFE Rewards standard dev 0.013474534309310681
Returns Mean: 7.574351
Returns standard dev 4.1552215
Advantages Mean: 0.13330343
Advantages standard dev 4.155221
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'XI', 'XZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 67%|██████▋   | 29/43 [00:05<00:02,  5.12it/s]

mean tensor([ 0.0079,  0.1159, -0.0985,  0.0015,  0.0160,  0.0032,  0.0017])
Average return: 0.9893232537790662
DFE Rewards Mean: 0.9893232537790662
DFE Rewards standard dev 0.01523427568690607
Returns Mean: 6.7219133
Returns standard dev 3.9311776
Advantages Mean: 1.7532282
Advantages standard dev 3.9311776
Fidelity History: []
SparsePauliOp(['II', 'IX', 'XI', 'XX'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.0016,  0.1277, -0.0888, -0.0108, -0.0024, -0.0325,  0.0136])
Average return: 0.9924730893386023
DFE Rewards Mean: 0.9924730893386023
DFE Rewards standard dev 0.014322838557112555
Returns Mean: 7.5577345
Returns standard dev 4.1189895
Advantages Mean: 0.40488943
Advantages standard dev 4.1189895
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


 72%|███████▏  | 31/43 [00:05<00:02,  5.15it/s]

mean tensor([-0.0059,  0.1285, -0.1055,  0.0220,  0.0256,  0.0158, -0.0048])
Average return: 0.9869448748618697
DFE Rewards Mean: 0.9869448748618697
DFE Rewards standard dev 0.018663991110375526
Returns Mean: 6.6656733
Returns standard dev 4.0954895
Advantages Mean: 1.8618169
Advantages standard dev 4.09549
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])
mean tensor([-0.0100,  0.1678, -0.0850, -0.0249, -0.0083, -0.0424,  0.0211])
Average return: 0.9943831533818286
DFE Rewards Mean: 0.9943831533818286
DFE Rewards standard dev 0.01160312507112242
Returns Mean: 7.7988343
Returns standard dev 4.0823946
Advantages Mean: -0.40447605
Advantages standard dev 4.0823946
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])


 77%|███████▋  | 33/43 [00:06<00:01,  5.17it/s]

mean tensor([ 0.0011,  0.1371, -0.1076, -0.0023,  0.0089, -0.0205,  0.0031])
Average return: 0.9951148020501064
DFE Rewards Mean: 0.9951148020501064
DFE Rewards standard dev 0.009613534341205265
Returns Mean: 7.9149866
Returns standard dev 4.0927887
Advantages Mean: 1.7970867
Advantages standard dev 4.0927887
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])
mean tensor([-0.0075,  0.1462, -0.1201, -0.0013,  0.0161, -0.0296, -0.0055])
Average return: 0.9969884873221919
DFE Rewards Mean: 0.9969884873221919
DFE Rewards standard dev 0.00726596154450199
Returns Mean: 8.373357
Returns standard dev 4.053884
Advantages Mean: 1.7180634
Advantages standard dev 4.053884
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])


 81%|████████▏ | 35/43 [00:06<00:01,  5.18it/s]

mean tensor([-0.0047,  0.1388, -0.1014,  0.0027,  0.0130, -0.0367,  0.0041])
Average return: 0.9970895745351123
DFE Rewards Mean: 0.9970895745351123
DFE Rewards standard dev 0.0071671436292744225
Returns Mean: 8.520998
Returns standard dev 4.0544896
Advantages Mean: 1.5961981
Advantages standard dev 4.05449
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])
mean tensor([-0.0215,  0.1632, -0.0898, -0.0143,  0.0092, -0.0552,  0.0403])
Average return: 0.9945734803904094
DFE Rewards Mean: 0.9945734803904094
DFE Rewards standard dev 0.010948960377131138
Returns Mean: 7.8303003
Returns standard dev 4.1928253
Advantages Mean: -1.0310986
Advantages standard dev 4.1928253
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YZ', 'ZX'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])


 86%|████████▌ | 37/43 [00:07<00:01,  5.10it/s]

mean tensor([-0.0207,  0.1575, -0.1018, -0.0142,  0.0101, -0.0618,  0.0415])
Average return: 0.9941996070622895
DFE Rewards Mean: 0.9941996070622895
DFE Rewards standard dev 0.01075731365690249
Returns Mean: 7.655052
Returns standard dev 4.128578
Advantages Mean: -1.2303602
Advantages standard dev 4.128578
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])
mean tensor([-0.0043,  0.1398, -0.1051,  0.0016,  0.0069, -0.0656,  0.0277])
Average return: 0.9967144141524054
DFE Rewards Mean: 0.9967144141524054
DFE Rewards standard dev 0.008382187482778873
Returns Mean: 8.350496
Returns standard dev 4.079971
Advantages Mean: 0.17675245
Advantages standard dev 4.079971
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'ZI', 'ZZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j,  0.25+0.j, -0.25+0.j])


 91%|█████████ | 39/43 [00:07<00:00,  5.04it/s]

mean tensor([ 0.0015,  0.1329, -0.1311,  0.0272,  0.0136, -0.0503,  0.0124])
Average return: 0.9974888860657846
DFE Rewards Mean: 0.9974888860657846
DFE Rewards standard dev 0.00685962416604966
Returns Mean: 8.56094
Returns standard dev 4.063852
Advantages Mean: 1.6686499
Advantages standard dev 4.063852
Fidelity History: []
SparsePauliOp(['II', 'XX', 'YY', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j,  0.25+0.j])
mean tensor([ 0.0104,  0.1483, -0.1334,  0.0169,  0.0035, -0.0784,  0.0262])
Average return: 0.9955536855316093
DFE Rewards Mean: 0.9955536855316093
DFE Rewards standard dev 0.010627405443760727
Returns Mean: 8.110927
Returns standard dev 4.1181273
Advantages Mean: 0.26985645
Advantages standard dev 4.1181273
Fidelity History: []
SparsePauliOp(['II', 'IZ', 'YI', 'YZ'],
              coeffs=[ 0.25+0.j, -0.25+0.j, -0.25+0.j,  0.25+0.j])


 95%|█████████▌| 41/43 [00:07<00:00,  5.09it/s]

mean tensor([ 0.0041,  0.1374, -0.1317,  0.0046,  0.0090, -0.0689,  0.0167])
Average return: 0.9964602319278635
DFE Rewards Mean: 0.9964602319278635
DFE Rewards standard dev 0.009748359322466918
Returns Mean: 8.474197
Returns standard dev 4.2148657
Advantages Mean: 0.6100998
Advantages standard dev 4.2148657
Fidelity History: []
SparsePauliOp(['II', 'XY', 'YZ', 'ZX'],
              coeffs=[ 0.25+0.j,  0.25+0.j,  0.25+0.j, -0.25+0.j])
mean tensor([-0.0097,  0.1465, -0.1133, -0.0101,  0.0013, -0.0875,  0.0573])
Average return: 0.9939928386160024
DFE Rewards Mean: 0.9939928386160024
DFE Rewards standard dev 0.010585379548428841
Returns Mean: 7.3813868
Returns standard dev 3.993982
Advantages Mean: -2.218976
Advantages standard dev 3.993982
Fidelity History: []


 98%|█████████▊| 42/43 [00:07<00:00,  5.10it/s]

SparsePauliOp(['II', 'XY', 'YX', 'ZZ'],
              coeffs=[ 0.25+0.j,  0.25+0.j, -0.25+0.j, -0.25+0.j])
mean tensor([ 0.1602,  0.0984, -0.0278, -0.1097,  0.0162, -0.0835,  0.1759])
Average return: 0.9801019273785491
DFE Rewards Mean: 0.9801019273785491
DFE Rewards standard dev 0.019881651046539082
Returns Mean: 5.475973
Returns standard dev 3.6006305
Advantages Mean: -3.552191
Advantages standard dev 3.6006305
Fidelity History: []
SparsePauliOp(['II', 'IX', 'XI', 'XX'],
              coeffs=[0.25+0.j, 0.25+0.j, 0.25+0.j, 0.25+0.j])


100%|██████████| 43/43 [00:08<00:00,  5.25it/s]
[I 2023-12-28 13:22:11,613] Trial 0 finished with value: 0.9964602319278635 and parameters: {'N_UPDATES': 43, 'N_EPOCHS': 17, 'MINIBATCH_SIZE': 64, 'BATCHSIZE_MULTIPLIER': 8, 'LR': 0.0005312903188214253, 'GAMMA': 0.957871667947108, 'GAE_LAMBDA': 0.926980000641327, 'ENT_COEF': 0.0004666949853103508, 'V_COEF': 0.7191427432064341, 'GRADIENT_CLIP': 0.6916205631780755, 'CLIP_VALUE_COEF': 0.26968123929183463, 'CLIP_RATIO': 0.2575578187131302}. Best is trial 0 with value: 0.9964602319278635.


mean tensor([ 0.1684,  0.1025, -0.0394, -0.0992,  0.0165, -0.0798,  0.1423])
Average return: 0.9852241018132802
DFE Rewards Mean: 0.9852241018132802
DFE Rewards standard dev 0.021012929107562202
Returns Mean: 6.6467333
Returns standard dev 4.249171
Advantages Mean: -1.5802355
Advantages standard dev 4.249171
Fidelity History: []
2023-12-28 13:22:11 INFO ---------------- FINISHED HPO ----------------
2023-12-28 13:22:11 INFO HPO completed in 8.28 seconds.
2023-12-28 13:22:11 INFO Best trial:
2023-12-28 13:22:11 INFO -------------------------
2023-12-28 13:22:11 INFO   Value: 0.9964602319278635
2023-12-28 13:22:11 INFO   Parameters: 
2023-12-28 13:22:11 INFO     N_UPDATES: 43
2023-12-28 13:22:11 INFO     N_EPOCHS: 17
2023-12-28 13:22:11 INFO     MINIBATCH_SIZE: 64
2023-12-28 13:22:11 INFO     BATCHSIZE_MULTIPLIER: 8
2023-12-28 13:22:11 INFO     LR: 0.0005312903188214253
2023-12-28 13:22:11 INFO     GAMMA: 0.957871667947108
2023-12-28 13:22:11 INFO     GAE_LAMBDA: 0.926980000641327
2023-1

#### Quick Summary of HPO Task

In [6]:
optimizer.target_gate

{'target_gate': Instruction(name='cx', num_qubits=2, num_clbits=0, params=[]),
 'target_register': [0, 1]}

In [7]:
optimizer.hyperparams

['N_UPDATES',
 'N_EPOCHS',
 'MINIBATCH_SIZE',
 'BATCHSIZE_MULTIPLIER',
 'LR',
 'GAMMA',
 'GAE_LAMBDA',
 'ENT_COEF',
 'V_COEF',
 'GRADIENT_CLIP',
 'CLIP_VALUE_COEF',
 'CLIP_RATIO',
 'BATCHSIZE']

In [8]:
optimizer.num_hpo_trials

1

In [9]:
optimizer.best_hpo_configuration

{'best_avg_return': 0.9964602319278635,
 'best_hyperparams': {'N_UPDATES': 43,
  'N_EPOCHS': 17,
  'MINIBATCH_SIZE': 64,
  'BATCHSIZE_MULTIPLIER': 8,
  'LR': 0.0005312903188214253,
  'GAMMA': 0.957871667947108,
  'GAE_LAMBDA': 0.926980000641327,
  'ENT_COEF': 0.0004666949853103508,
  'V_COEF': 0.7191427432064341,
  'GRADIENT_CLIP': 0.6916205631780755,
  'CLIP_VALUE_COEF': 0.26968123929183463,
  'CLIP_RATIO': 0.2575578187131302}}