<img src='../source_figures/bnl_logo_horizontal_rgb.png' width="400" height="400">

# steady as beamlines go

### Anomaly Detection with ML and Scalar Time Series

### Andi Barbour, Soft X-ray Scattering and Spectroscopy, CSX, NSLS-II

### NSLS-II and CFN Users' Meeting 2022
Workshop 6

Notebook #2

In [None]:
from matplotlib import cm, patches, pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr

from itertools import cycle

import collect_ts as ts
from anomaly.extract_features import get_features_single_datum

In [None]:
import pickle
import os.path


In [None]:
def make_the_model_go(model, input_data):
    prediction = "anomaly" if model.predict(input_data) == -1 else "normal"
    print(f"The model characterize the data as {prediction}")

def summarize_runs(runs, name_order):
    for run in runs:
        df = run["baseline"]["data"].read(name_order).to_dataframe()
        print(run.start["scan_id"], run.start["detectors"], run.start.get("purpose","no induced ?"), run.start.get('artifact', "no induced ?"))# "--"*10)
        print("--"*50)
        if df is not None:
            display(df[name_order].mean()) # FOR USER TO TRY - what happens when you remove .mean()

        print(f'{run.stop["exit_status"]:>60} {run.metadata["summary"]["duration"]/60:.2f} minutes')# "--"*10)
        print("\n")
        
from ipywidgets import interact 

def browse_3Darray(res,title='Frame'):
    """ Widget for notebooks.  Sliding bar to browse 3D python array.
    res         :  3D array with the first element being interated
    dark_gain   :  string to be the title of the plot
                   match dark gain settings as described in the start document ('auto', 'x2', 'x1')
    """   
    N = len(res)
    def view_image(i=0):
        im.set_data(res[i])
        ax.set_title(f'{title} {i}')
        fig.canvas.draw_idle()
    interact(view_image, i=(0, N-1))
    
#%matplotlib widget

In [None]:
from databroker.queries import TimeRange, RawMongo
from tiled.client import from_uri
c = from_uri("https://tiled-demo.blueskyproject.io/api")
csx = c["csx"]["raw"]

# Anoamaly Detection
### With 3 models: *EE*, *IFT*, *LOD* 
* Data is characterized by models as `"normal"` or `"anomaly"`
* All data is from CSX using the FastCCD with various x-ray scattering geometries

[see all training and testing code + data](https://github.com/bnl/pub-ML_examples)

# Load the Model
### Objective
* Use models on data in a serial fasshion (streaming documents)
* See how generalizable the models are
* Get a feel for data collection process prior to training 

In [None]:
models = {}
model_types = ['EE', 'IFT', 'LOD']
for mod_type in model_types:
    with open(f'models/anomaly_detection_{mod_type}_model.pk', 'rb') as f:
        temp = pickle.load(f)
        models.update({mod_type : temp})

In [None]:
scans = [x for x in range(154685, 154696+1)]

In [None]:
runs = csx.search(RawMongo(start={"purpose": "laser stability",}))
runs

In [None]:
for i, run in enumerate(runs.values()):
    print(i, run.start["scan_id"], run.start["detectors"], run.start["purpose"], run.start.get('artifact', "no induced ?"))


## It's hard to perfect record intent as you are experimenting - data isn't perfect
- In this case, we have only the choice to keep a record and apply it to processed data 
- However, bluesky `baseline` recordings make it clear when the beamline was not in a standard configuration


In [None]:
summarize_runs(csx[scans], name_order=["slt3_x_user_setpoint"])

```python
0 154685 ['dif_beam_hdf5'] laser stability no induced 
1 154686 ['dif_beam_hdf5'] laser stability slt3 moved ********
2 154687 ['dif_beam_hdf5'] laser stability slt3 move  ********
3 154688 ['dif_beam_hdf5'] laser stability LEDs on/off
4 154689 ['dif_beam_hdf5'] laser stability pinhole move out
5 154690 ['dif_beam_hdf5'] laser stability 0.5 intensity steady (NO INDUCED, but low) ********
6 154691 ['dif_beam_hdf5'] laser stability random up and down intensity
7 154692 ['dif_beam_hdf5'] laser stability >0.5 intensity steady  (NO INDUCED, but low)  ********
8 154693 ['dif_beam_hdf5'] laser stability random up and down intensity
9 154694 ['dif_beam_hdf5'] laser stability sudden off
10 154695 ['dif_beam_hdf5'] laser stability sudden on
11 154696 ['dif_beam_hdf5'] laser stability sudden on
```

## Let's try the first scan

In [None]:
scan = scans[0]

In [None]:
class_label = None

In [None]:
run = csx[scan]['primary']['data']['dif_beam_hdf5_image'][:, :, 400:1200, :1200].compute()

In [None]:
imgs = run.to_numpy()
_, fs, vpix, hpix = imgs.shape
imgs = imgs.reshape(fs, vpix, hpix)

In [None]:
#                Vst  Hst  Vsz  Hsz
rois = {'blob': (300, 600, 300, 300),
        'blob_50': (420, 700,  50,  50),
        'dif-ref': (400, 500, 100, 100),
        'dif-ref_50': (400, 500,  50,  50),
        'corner': (175, 300, 125,  50),
        'zero': (100, 1000, 50,  50),
       }
colors  = cycle(cm.get_cmap('rainbow')(np.linspace(0, 1, len(rois))))

In [None]:
#%matplotlib inline

In [None]:
fig, ax = plt.subplots(figsize=(10,15))
im = ax.imshow(imgs[0],vmax=500, vmin=50)
cbar = plt.colorbar(im, ax=ax,fraction=.03)
for roi in rois:
    Vpix, Hpix,  Vsize, Hsize = rois[roi]
    rect = patches.Rectangle((Hpix, Vpix), Hsize, Vsize, linewidth=3, edgecolor =next(colors), facecolor='none')     
    ax.add_patch(rect)
fig.savefig('ROIs for Laser Stability Test')
plt.title("Low Power Laser on a YAG Screen Typically Used for Sample Positioning")

In [None]:
#browse_3Darray(imgs)

## Images Emulate X-ray Scattering Measurement
- SAXS diffuse scattering
- Bragg peaks 
- Coherent scattering
- Surface diffraction

## In many cases, one requires "steady" signal over many frames to be averaged together or correlated
- Beam instability (measurement and feedback systems)
- Coherent scattering highly is affected (not just intenisty but X-ray phase or position)
- Potentially aging/damage

## Collect data from time series images for feature engineering -- as similiarly done for the initial training
**STATISTICS FROM ROIS** *(regions of interests)*
- Standard Deviation
- Average Intensity
- Center of Mass X
- Center of Mass Y
- Sigma X (stdev in X)
- Sigma Y (stdev in X)
    
[These and similiar signals are computed by the AreaDetector Stats Plugin](https://areadetector.github.io/master/ADCore/NDPluginStats.html)


### Get the largest `"blob"` roi and calculate statistics

In [None]:
roi = "blob"
Vpix, Hpix,  Vsize, Hsize = rois["blob"]
input_arr = ts.make_input_array(imgs, Vpix, Hpix, Vsize, Hsize)
data_dict = ts.get_data(input_arr, f'{scans[0]}_{roi}',  "no_induced")
series = pd.Series(data_dict)

In [None]:
series

### Compute 93 features
- higher order correlations of our main statistics
- prepare for entry into our 3 models

In [None]:
features = get_features_single_datum(series) #from  pub-ML_examples.anomaly on github for bnl


In [None]:
df = pd.DataFrame([features])

In [None]:
df

In [None]:
new_data = (df.drop(columns=["target", "roi"]))

### Input data into our models

In [None]:
for method, model in models.items():
    print(f'{method}:\t', end="")
    make_the_model_go(model, new_data)

In [None]:
fig = plt.figure()
plt.plot(data_dict['intensity_ts'], label=f'{df["roi"][0]}')#missing first 50')
plt.legend()

In [None]:
roi_list = list(rois.items())

In [None]:
roi_list

## Applying the same work flow to all ROIs (6)

In [None]:
img_start, img_end = 0, imgs.shape[0] #ALL OF THEM
#img_start, img_end = 50, 150 # just the middle


prediction_results = {'EEpr':[], 'IFTpr':[],'LODpr':[],} 
roi_data_dict = {}

for i, r_list in enumerate(roi_list):
    data, data_dict, meas_label = ts.prep_model_input(imgs[img_start:img_end ], r_list, scan, class_label)
    roi_data_dict.update({i:data_dict})
    for method, model in models.items():
        #print(f'{method}:\t', end="")
        prediction = model.predict(data)
        prediction_results[method+'pr'].append(prediction[0])

In [None]:
df_predictions =  pd.DataFrame(prediction_results)

In [None]:
fig, axes = plt.subplots(6, figsize=(10, 15), sharex=True)
plt.suptitle(scan)
for roi_i, data_dict in roi_data_dict.items():
    color = next(colors)
    roi_name, _ = roi_list[roi_i]
    for i, key in enumerate(data_dict.keys()):
        if i < 6:
            ax = axes[i]
            ax.plot((data_dict[key] - np.mean(data_dict[key])), color=color, label = roi_name)
        

for ax, key in zip(axes, data_dict.keys()):
    ax.set(title=key, ylabel = f'minus average')
ax.legend(bbox_to_anchor=(1,1))
ax.set_xlabel = 'frames'

In [None]:
df_predictions

### Comment 1 on "no induced" anomalies scan 154685

* LOD looks promising
* clear why the corner is characterized as an anomaly
* maybe the we can down select images to have better predictions (50 & 150?

In [None]:
roi_list

### Comment 2 on "no induced" anomalies scan 154685

**Using frames = 50:15**
* LOD looks to be most permissive 
* IFT may be more flexible
* EE characteris low intensity, small areas as "no_induced"

# Let's try a different scans
<img src='figures/oops.jpg' width="400" height="400">

```python
0 154685 ['dif_beam_hdf5'] laser stability no induced 
1 154686 ['dif_beam_hdf5'] laser stability slt3 moved ********
2 154687 ['dif_beam_hdf5'] laser stability slt3 move  ********
3 154688 ['dif_beam_hdf5'] laser stability LEDs on/off
4 154689 ['dif_beam_hdf5'] laser stability pinhole move out
5 154690 ['dif_beam_hdf5'] laser stability 0.5 intensity steady (NO INDUCED, but low) ********
6 154691 ['dif_beam_hdf5'] laser stability random up and down intensity
7 154692 ['dif_beam_hdf5'] laser stability >0.5 intensity steady  (NO INDUCED, but low)  ********
8 154693 ['dif_beam_hdf5'] laser stability random up and down intensity
9 154694 ['dif_beam_hdf5'] laser stability sudden off
10 154695 ['dif_beam_hdf5'] laser stability sudden on
11 154696 ['dif_beam_hdf5'] laser stability sudden on
```

In [None]:
scan, class_label = scans[9], None#"anomaly" #"no induced"
img_start, img_end = 0, imgs.shape[0] #ALL OF THEM
#img_start, img_end = 50, 150 #ALL OF THEM

run_data = csx[scan]['primary']['data']['dif_beam_hdf5_image'][:, :, 400:1200, :1200].compute()
imgs = ts.get_images_from_tiled(run_data)

In [None]:
prediction_results = {'EEpr':[], 'IFTpr':[],'LODpr':[],}
roi_data_dict = {}

for i, r_list in enumerate(roi_list):
    data, data_dict, meas_label = ts.prep_model_input(imgs[img_start:img_end ], r_list, scan, class_label)
    roi_data_dict.update({i:data_dict})
    for method, model in models.items():
        #print(f'{method}:\t', end="")
        prediction = model.predict(data)
        prediction_results[method+'pr'].append(prediction[0])

In [None]:
fig, axes = plt.subplots(6, figsize=(10, 15), sharex=True)
plt.suptitle(scan)
for roi_i, data_dict in roi_data_dict.items():
    color = next(colors)
    roi_name, _ = roi_list[roi_i]
    for i, key in enumerate(data_dict.keys()):
        if i < 6:
            ax = axes[i]
            ax.plot((data_dict[key] - np.mean(data_dict[key])), color=color, label = roi_name)
        

for ax, key in zip(axes, data_dict.keys()):
    ax.set(title=key, ylabel = f'minus average')
ax.legend(bbox_to_anchor=(1,1))
ax.set_xlabel = 'frames'

df_pr_res=pd.DataFrame(prediction_results)
df_pr_res

### Collaborators in the Presented Model
- **Tatiana Konstantinova** *the models in this tutorial*
- Phillip Mafffettone
- Stuart Campbell
- Bruce Ravel
- Daniel Olds

### Collaborators in LDRD 20-038 ”Machine Learning for Real-Time Data Fidelity, Healing, and Analysis for Coherent X-ray Synchrotron Data”
- **Tatiana Konstantinova**
- **Anthony DeGennarro**
- Hui Chen
- Lutz Weigart
- Maksim Rakitin