### Prediction of new data

We now continue to the newly collected data, and the task of predicting it's mean pulse energies. Let's first get the delay between the two pulses and all inputs as features (and also process them):

In [1]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
if "newMode2021" in os.getcwd():
    os.chdir("..")

In [None]:
import time, yaml
import logging.config
import tensorflow as tf

stream = open("log_config.yml", "r")
conf = yaml.safe_load(stream)
conf["logfile"] = "logs/newMode2021_%s.log" % str(time.time())
logging.config.dictConfig(conf)

In [None]:
from newMode2021.setup import get_data_p1

u2_dataset = "u2_273_37026_events.pkl"

data = get_data_p1(u2_dataset)

In [None]:
#### Use an ANN

We begin by simply fitting an ANN to the data. Using hyperparametrization before, we found the ideal parameters:

* 2 hidden layers of 20 nodes each
* ReLU activation function
* no drop out or batch normalization
* l2 regularization

Fitting the ann with these parameters across 5,000 epochs:

In [None]:
from utility.pipelines.ann import *

#### Feature selection

Now let us see how our pipeline works for this new data, and the pulse 1 energy prediction.

In [None]:


ann_feat_str_data = {
    "feat_name": "vls_com_pump",
    "plot_lab": "central p1 energy",
    "unit": "eV",
    "data_fname": "PaperFigures/Figure Data/Figure 3/ann_10_feat_pred.npz",
    "plot_fname": "newMode2021/results/ex_1_p1_pred/ann_low_p1_hist2d",
}


ann_feature_pipeline(data, ann_feat_str_data)

#### Comparison of feature selection and no selection

Now let us compare the predictions from the feature selected data to the data without feature selection.

In [None]:
from utility.plotting.scatter_diff import *

d1 = "newMode2021/results/ex_1_p1_pred/ann_pred.npz"
d2 = "newMode2021/results/ex_1_p1_pred/ann_10_feat_pred.npz"

string_d = {
    "quantity": "central p1 energy",
    "unit": "eV",
    "label_1": r"$M=86$",
    "label_2": r"$M=10$",
}

scatter_diff(d1, d2, string_d)

#### Results of Feature Selection

As we can see, the feature selection has significantly decreased the number of features used from >100 to 10, while leaving the prediction quality close to identical.

### Other Estimators

To benchmark our ANN, we also use linear models and gradient boosting estimators and perform the same analysis.

In [None]:
from utility.pipelines.gb import *
from utility.pipelines.lin import *

In [None]:
data = get_data_p1("u2_273_37026_events.pkl")

gb_feat_str_data = {
    "feat_name": "vls_com_pump",
    "plot_lab": "central p1 energy",
    "unit": "eV",
    "data_fname": "PaperFigures/Figure Data/Figure 3/gb_10_feat_pred.npz",
    "plot_fname": "newMode2021/results/ex_1_p1_pred/xgb_low_p1_hist2d",
}


gb_feature_pipeline(data, gb_feat_str_data)

## Pulse 2 prediction

Now let us see how well we can predict the second pulse.

In [None]:
ann_feat_str_data = {
    "feat_name": "vls_com_probe",
    "plot_lab": "central p2 energy",
    "unit": "eV",
    "data_fname": "PaperFigures/Figure Data/Figure 4_5/old_u2.npz",
    "plot_fname": "newMode2021/results/ex_4_p2_pred/ann_p2",
}

ann_feature_pipeline(data, ann_feat_str_data)

In [None]:
gb_feat_str_data = {
    "feat_name": "vls_com_probe",
    "plot_lab": "central p2 energy",
    "unit": "eV",
    "data_fname": "PaperFigures/Figure Data/Figure 3/gb_probe.npz",
    "plot_fname": "newMode2021/results/ex_4_p2_pred/xgb_low_p2_hist2d",
}

gb_feature_pipeline(data, gb_feat_str_data)

## Few Features Only

In [None]:
ebeam_cols = [
    "ebeam_ebeamL3Energy", "ebeam_ebeamUndPosX", "ebeam_ebeamUndAngY", "ebeam_ebeamUndPosY",
    "ebeam_ebeamLTU450", "ebeam_ebeamEnergyBC2", "ebeam_ebeamLTU250", "ebeam_ebeamCharge",
    "ebeam_ebeamXTCAVPhase", "ebeam_ebeamPkCurrBC2"
]


top_cols = ["vls_com_pump", "xgmd_rmsElectronSum", "xgmd_energy", "ebeam_ebeamL3Energy", "gmd_energy",
            "ebeam_ebeamUndPosX", "vls_width_pump", "ebeam_ebeamUndAngY", "ebeam_ebeamUndPosY", "ebeam_ebeamLTU450"]

data = get_data_p1("u2_273_37026_events.pkl", include_pump=True, filter_cols=top_cols)

lin_feat_str_data = {
    "feat_name": "vls_com_pump",
    "plot_lab": "central p1 energy",
    "unit": "eV",
    "data_fname": "PaperFigures/Figure Data/Figure 3/lin_pump.npz",
    "plot_fname": "newMode2021/results/ex_1_p1_pred/lin_p1",
}

lin_pipeline(data, lin_feat_str_data)

In [None]:
ebeam_cols = [
    "ebeam_ebeamL3Energy", "ebeam_ebeamUndPosX", "ebeam_ebeamUndAngY", "ebeam_ebeamUndPosY",
    "ebeam_ebeamLTU450", "ebeam_ebeamEnergyBC2", "ebeam_ebeamLTU250", "ebeam_ebeamCharge",
    "ebeam_ebeamXTCAVPhase", "ebeam_ebeamPkCurrBC2"
]


top_cols = ["vls_com_pump", "xgmd_rmsElectronSum", "xgmd_energy", "ebeam_ebeamL3Energy", "gmd_energy",
            "ebeam_ebeamUndPosX", "vls_width_pump", "ebeam_ebeamUndAngY", "ebeam_ebeamUndPosY", "ebeam_ebeamLTU450"]

data = get_data_p2("u2_273_37026_events.pkl", include_probe=True, filter_cols=top_cols)

lin_feat_str_data = {
    "feat_name": "vls_com_probe",
    "plot_lab": "central p2 energy",
    "unit": "eV",
    "data_fname": "PaperFigures/Figure Data/Figure 3/lin_probe.npz",
    "plot_fname": "newMode2021/results/ex_4_p2_pred/lin_p2",
}

lin_pipeline(data, lin_feat_str_data)

### Undulator variation

### Undulator variation