<img src='../source_figures/bnl_logo_horizontal_rgb.png' width="400" height="400">

# A la carte analysis at jupyter.nsls2 with tiled

### Andi Barbour, Soft X-ray Scattering and Spectroscopy, CSX, NSLS-II

### NSLS-II and CFN Users' Meeting 2020
Workshop 6

Notebook #1

In [None]:
from matplotlib import cm,  pyplot as plt
from itertools import cycle
import numpy as np
import pandas as pd
import time
from sys import getsizeof

# Explore data acquired  by `bluesky` using `tiled`
## 1D scans from plans like `rel_scan()` and `scan()`

[current bluesky documentation](https://blueskyproject.io/bluesky/)

[current tiled documentation](https://blueskyproject.io/tiled/)

In [None]:
from tiled.client import from_uri
from databroker.queries import TimeRange, RawMongo

c = from_uri("https://tiled-demo.blueskyproject.io/api")
csx = c["csx"]["raw"]

## get all of the data collected by bluesky
**SCANS**
   - list of intergers or scans
   - `"scan_id"` is a scan number
   - alternatively retireved by `"uid"`
   ```python
       
       scans = ['851a80bc',  '02bb6652', '02bb6652']
       
       uids = ['851a80bc',  '02bb6652', '02bb6652']
   ```
    
**RUNS**
   - catalog of data entries
   - each entry corresponds to a scan collected by bluesy

In [None]:
scans = [150959, 150960, 150961]

In [None]:
runs = csx[scans]
runs
list(runs)

## **BUT** how to scale this for REAL experiments over many days

### An example experiment
- maybe this is yours
- but it could be from somone who handed off their work
- or COVID required the beamline staff to perform your experiment


[an experimental summary with jupyter notebooks - data retrieved from raw bluesky data](https://github.com/ambarb/UM2022_NSLS-II_CFN_beamtime_summary/blob/main/CSX_2021_12_17_summary.ipynb)

In [None]:
runs = csx.search(TimeRange(since = '2021-12-17 13:00:00', until = '2021-12-19 23:00:00'))
runs

## **But** a manageable amount of it

In [None]:
runs = csx.search(TimeRange(since = '2021-12-17 13:00:00', until = '2021-12-19 23:00:00')).search(RawMongo(start={"purpose": 'sx center 1 T'}))
print(type(runs))
runs

### python lists aren't for your pet snake 




<img src='../source_figures/pythonpet.png' width="200"  functionheight="200">

* list of uniform data types
* return elements by position
* return all elements using `list()`

```python
my_list = ["item_1", "item_2", "item_3"]
list(my_list)
```


[offical reference](https://docs.python.org/3/tutorial/introduction.html#lists)

[official python glossary](https://docs.python.org/3/glossary.html#term-list)

### python dictionaries work exactly like you expect

<img src='../source_figures/pexels-pixabay-267669.jpg' width="200"  functionheight="200">

* look up a "word" (string or numererical `key`)
* learn its "meaning" ( return a `value`)
* sometimes you have to then look up a 2nd "word" found in this "meaning" (`nested dictionary`)

```python
my_dictionary = {'my_key_work':'my_value_meaning'}
```


[offical reference](https://docs.python.org/3/tutorial/datastructures.html?highlight=dictionary#dictionaries)

[official python glossary](https://docs.python.org/3/glossary.html#term-dictionary)

[blog post](https://towardsdatascience.com/python-dictionaries-651acb069f94)

In [None]:
print(type(runs))
runs

In [None]:
print(runs.keys() ,"\n")

print(runs.values() ,"\n")


## is anyone excited as I was when I discovered this?

### this meaning....


<img src='../source_figures/cookie.png' width="200" height="200">

In [None]:
len(runs)

### A `tiled` feature vs. `databroker v1` search.......

**Previously, retireving is similar:**
```python
headers = db(since = '2021-12-17 13:00:00', until = '2021-12-19 23:00:00',purpose='sx center 1 T')
```

**But the length was not known:**
```python
len(headers)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [83], in <cell line: 1>()
----> 1 len(headers)

TypeError: object of type 'Results' has no len()
```

**Iterating through headers takes time**
*especially if you just want to narrow down your search*
```python
for i, h in enumerate(headers):
    pass
print(i)

82

```
    

### what are the parts of a run?
(that data associated with a bluesky "scan")

### how can we inspect this thing that acts like a list or dictionarary?
- how can you access the data from the first scan?
- what can you plot?
- did you find the `"uids"` or `"scan_id"`?
- are there more search keys for `MongoDB queries`?

### **possible solutions**

In [None]:
list(runs);

In [None]:
run = runs[0]
run

In [None]:
list(run)

In [None]:
list(run["primary"])

In [None]:
print(list(run.metadata))
print(list(run.metadata.keys()))
#run.metadata
#run.metadata["start"]

## `run.metadata.start` is: 
* configurable by beamline staff and the users
* the only document that is current "search-able"
* `FullText` search is available but it not suitable if common string combinations are used


### **WARNING FOR METADATA**: *keys names are difficult to enforce in the start document*
<img src='../source_figures/warning.png' width="200"  functionheight="200">

* "global" metadata (`RE.md`)
    - ask before changing (maybe everyone needs it)
    - some keys will be come enforced
    - https://blueskyproject.io/bluesky/metadata.html#interactively-for-repeated-use
* per scan metadata
    - cannot edit database so check scripts carefully
        - `print(strings)` in plans and use `check_limits()` "preview"
    - https://blueskyproject.io/bluesky/metadata.html#interactively-for-one-use

### take a minute to explore the data structure of `run`

### **possible solutions**

In [None]:
list(run.metadata["start"])

In [None]:
print(list(run))

In [None]:
print(list(run['primary']))
print(list(run.primary))

### the top 2 most used data keys

In [None]:
print(run.metadata["start"]["scan_id"], end='\n\n')

list(run.primary.data)

### my favorite is the `"baseline"` data stream
[4 lines of code to configure and record `baseline`](https://blueskyproject.io/bluesky/tutorial.html?highlight=baseline#baseline-readings-and-other-supplemental-data)

In [None]:
list(run.baseline.data)

## show me the data
### iterate through the runs to get a top level view

**taking advantage of baseline** *(the readings before and after primary data collection)*

In [None]:
start_time = time.time()

n_plots_col = 3
fig_summary, axes = plt.subplots(1,n_plots_col, figsize=(5*n_plots_col, 5))
for run in runs.values():
    scan = run.start["scan_id"]
    data_bl  = run["baseline"]["data"].read(['stemp_temp_B_T','sx']) #makes multi-D Xarray, not a large pandas DataFrame
    
    sample_T = data_bl['stemp_temp_B_T']
    sample_x = data_bl['sx']
    
    ax =  axes[0]
    ax.plot(scan,            sample_T.mean(),'o') ; ax.set(xlabel = 'scan_id',      ylabel = sample_T.name)
    
    ax = axes[1]
    ax.plot(scan,            sample_x.mean(),'o') ; ax.set(xlabel = 'scan_id',      ylabel=sample_x.name)
    
    ax = axes[2]
    ax.plot(sample_T.mean(), sample_x.mean(),'o') ; ax.set(xlabel = sample_T.name, ylabel = sample_x.name)

tiled_run_time = time.time() - start_time
print(f'Run time = {tiled_run_time/60:.2f} minutes')

### Overall
- `sx` changing as a function of sample temperature (`stemp_temp_B_T`)
- Looks like there could be failed or problematic scans
- Proabaly can fit `sx`(`stemp_temp_B_T`) 
    - assuming the individual scans occured without incident
    - if we can put all data in 1 data structure

### Let's look at the last scan

In [None]:
run.start["scan_id"]

In [None]:
plt.figure()
# GET DATA TO PLOT
data = run["primary"]["data"].read()  #NOTE TO USER - see what happens if you uncomment .read() from this line
X = data["sx"]
Y = data["dif_beam_stats3_total"]
plt.plot(X, Y)

# PLOT AXES LABELS AND TITLE
plt.ylabel(Y.name); plt.xlabel(X.name); 
plt.title(f'{run.start["scan_id"]}\n{run.start["uid"][0:8]}') 

In [None]:
print(type(data))
print(list(data))
print(getsizeof(data))
data

### REMEMBER
the summary notebook said **ROI 3** (`'dif_beam_stats3_total'`) should be used

In [None]:
data['sx']

In [None]:
data['sx'].attrs['units_string']
#plt.xlabel(f'{X.name} {X.attrs["units_string"]}')


**databroker V1 access**
```python
db[150966].descriptors[0]["configuration"]["sx"]["data"]["sx_motor_egu"]
```

### Let's improve
- drag less data around
- add units without hard-coding

In [None]:
plt.figure()
# GET DATA TO PLOT
data = run["primary"]["data"]
X = data["sx"]
Y = data["dif_beam_stats3_total"]
plt.plot(X, Y)

# PLOT AXES LABELS AND TITLE
plt.ylabel(Y.name); plt.xlabel(X.name); 
plt.title(f'{run.start["scan_id"]}\n{run.start["uid"][0:8]}') 

### possible solution

In [None]:
plt.figure()
# GET DATA TO PLOT
data = run["primary"]["data"].read(["sx","dif_beam_stats3_total","stemp_temp_B_T"]) #makes multi-D Xarray, not a large pandas DataFrame 
X = data["sx"]
Y = data["dif_beam_stats3_total"]
plt.plot(X, Y)

# PLOT AXES LABELS AND TITLE
plt.ylabel(Y.name); plt.xlabel(X.name+" , "+data['sx'].attrs['units_string'])
plt.title(f'{run.start["scan_id"]}\n{run.start["uid"][0:8]}')

### Which temperature is this?


### Is it in the `primary` stream?

In [None]:
data["stemp_temp_B_T"]

In [None]:
type(data["stemp_temp_B_T"])

In [None]:
print('test it {}.'.format( data.mean()["stemp_temp_B_T"] ) )

In [None]:
data.mean()

In [None]:
print("pandas dataframe\t", data["stemp_temp_B_T"].to_dataframe().mean()[0])
print("numpy array\t\t", np.mean(data["stemp_temp_B_T"].to_numpy()) )
#data["stemp_temp_B_T"].to_dataset().mean()

In [None]:
temperature=data["stemp_temp_B_T"]
Tavg, Tstd = np.mean(temperature.to_numpy()), np.std(temperature.to_numpy())

### Let's improve more
- add temperature of scan
- apply numerical derivative
    - fit the peak and find the center
    - peak FWHM provides some measure of resolution

In [None]:
fig, axes = plt.subplots(1,2, figsize=(10,5))
# GET DATA TO PLOT
data = run["primary"]["data"].read(["sx","dif_beam_stats3_total","stemp_temp_B_T"]) #makes multi-D Xarray, not a large pandas DataFrame 
X = data["sx"]
Y = data["dif_beam_stats3_total"]
ax = axes[0]
ax.plot(X, Y)

# PLOT AXES LABELS AND TITLE
ax.set_ylabel(Y.name); ax.set_xlabel(X.name+" , "+data['sx'].attrs['units_string'])
ax.set_title(f'{run.start["scan_id"]}\n{run.start["uid"][0:8]}')
ax.legend()

ax = axes[1]
### Caluated numerical derivative and plot it

### **possible solution**

In [None]:
fig, axes = plt.subplots(1,2, figsize=(10,5))
# GET DATA TO PLOT
data = run["primary"]["data"].read(["sx","dif_beam_stats3_total","stemp_temp_B_T"]) # REMEMBER makes multi-D Xarray, not a large pandas DataFrame like databroker V1
X = data["sx"]
Y = data["dif_beam_stats3_total"]
ax = axes[0]
ax.plot(X, Y, label=f'{Tavg:.1f} $\pm$ {Tstd:.3f}  {temperature.attrs["units_string"]}')

# PLOT AXES LABELS AND TITLE AND LEGEND
ax.set_ylabel(Y.name); ax.set_xlabel(X.name+" , "+data['sx'].attrs['units_string'])
ax.set_title(f'{run.start["scan_id"]}\n{run.start["uid"][0:8]}')
ax.legend()

ax = axes[1]
### USE PANDAS DATAFRAME for numerical derivative
ax.set_title('numerical derivative')
Xdf = X.to_dataframe().reset_index()[X.name]   # don't want "time" to be the index
Ydf = Y.to_dataframe().reset_index()[Y.name]
ax.plot(Xdf.rolling(window=2).mean(), Ydf.diff()/Xdf.diff())
ax.grid(True)

print(f'Peak maximum at {(Ydf.diff()/Xdf.diff()).idxmax():^5} point')
print(f'Peak maximum at {Xdf[(Ydf.diff()/Xdf.diff()).idxmax()]:^5.3f} for {Xdf.name}')


### Before we fit all this data
**are there problems in the data?**
- inconsistencies of things recorded
    - exposure times
    - added or missing "detectors" or signals
**are there incomplete scans?**
- too small scan range 
- bluesky exception

**where do we look or can we just plot it key parameters**

### 

In [None]:
list(run.primary.config["dif_beam"])

In [None]:
%matplotlib widget

In [None]:
print(f'{len(runs)} possible good scans')

### CHOOSE YOUR PLOT when you return only good data
- all data, with bad marked `##### no change...` *lines 18 - 33* *lines 18 - 33*
- only bad data  `##### MOVE BLOCK INDENTION +2...` *lines 18 - 33*
- only completed `#### UNCOMMENT this line to...` *lines 16*
- only good data

*what is good versus bad data?*

In [None]:
colors = cycle(cm.get_cmap('viridis')(np.linspace(0, 1, len(runs))))
scans_final = []
plot_it = True

fig, axes = plt.subplots(1,2, figsize=(15,5), sharex=True)
for run in runs.values():
    if run.stop["exit_status"] == "success":
        scans_final.append(run.start["scan_id"]) # add successful scan to the list of potentially good scans
        color=next(colors) #advance marker color
        plot_it = True
        
        
    else:
        color = 'r'
        mfc = 'w'
        plot_it = False   #### UNCOMMENT this line to  (** plot bluesky completed scans **)
        
    ##### MOVE BLOCK INDENTION +2 tabs and keep #plot_it commented out (**bad data plot only**)
    ##### MOVE BLOCK INDENTION +1 tab  and keep #plot_it commented out (**    all data      **)
    try:
        temperature = run.primary.data["stemp_temp_B_T"]
        mfc = color #marker face color
    except:
        temperature = run.baseline.data["stemp_temp_B_T"]
        mfc = 'w'  
        #plot_it = False  ### MAYBE THIS IS BAD DATA
    if plot_it:
        Tavg, Tstd = np.mean(temperature.to_numpy()), np.std(temperature.to_numpy())
        axes[0].plot(run.start["scan_id"], run.primary.config["dif_beam"]["dif_beam_cam_acquire_time"],'o', c=color, ms=15, )
        axes[0].set(ylabel='exposure time', xlabel='scan_id')
        axes[1].plot(run.start["scan_id"], Tavg, 'o', c=color, mfc=mfc, ms=15, )
        axes[1].set(ylabel='temperature', xlabel='scan_id')
     #######

        
print(f'{len(scans_final)} of {len(runs)} possible good scans')

### Lets see what we have, but not overwrite starting dataset

In [None]:
runs_final = csx[scans_final]
runs_final[0:7]

In [None]:
[ run.metadata["start"]["detectors"] for run in runs_final[0:7]]

In [None]:
[ list(run.primary["data"]) for run in runs_final[0:7] ]

In [None]:

def get_deriv_max(Xdataframe, Ydataframe):
    Yd_der = ( Ydataframe.diff()/Xdataframe.diff() )
    #print(f'Peak maximum at {Yd_der:^5} point')
    Xd_max = Xdataframe[ ( Yd_der ).idxmax() ]
    #print(f'Peak maximum at {Xdataframe[Ydf.diff().idxmax()]:^5.3f} for {X.name}')
    return Xd_max, Yd_der

In [None]:
colors = cycle(cm.get_cmap('viridis')(np.linspace(0, 1, len(runs))))
my_Tavg = []
my_Xinfl= []
my_Tstd = []

fig, axes = plt.subplots(1,2, figsize=(10,5))                     ### ALL SCANS IN ONE PLOT
#fig, axes = plt.subplots(len(runs),2, figsize=(10,5*len(runs)))  ### INDIVIDUAL PLOTS FOR EACH SCAN
for run in runs_final:
#for i, run in enumerate(runs):
    color = next(colors)
    ########## SHAMELESS COPY FROM ABOVE ##############
    #
    try:
        temperature = run.primary.data["stemp_temp_B_T"]
    except:
        temperature = run.baseline.data["stemp_temp_B_T"]
    Tavg, Tstd = np.mean(temperature.to_numpy()), np.std(temperature.to_numpy())
    #
    ########## SHAMELESS COPY FROM ABOVE ##############
    # With custom color marker
    #
    
    # GET DATA TO PLOT
    data = run["primary"]["data"].read(["sx","dif_beam_stats3_total","stemp_temp_B_T"]) # REMEMBER makes multi-D Xarray, not a large pandas DataFrame like databroker V1
    X = data["sx"]
    Y = data["dif_beam_stats3_total"]
    ax = axes[0]        ### ALL SCANS IN ONE PLOT
    #ax = axes[i,0]     ### INDIVIDUAL PLOTS FOR EACH SCAN
    ax.plot(X, Y, label=f'{Tavg:.1f} $\pm$ {Tstd:.3f}  {temperature.attrs["units_string"]}', color=color)

    # PLOT AXES LABELS AND TITLE AND LEGEND
    ax.set_ylabel(Y.name); ax.set_xlabel(X.name+" , "+data['sx'].attrs['units_string'])
    ax.set_title(f'{run.start["scan_id"]}\n{run.start["uid"][0:8]}')
    ax.legend()

    ax = axes[1]        ### ALL SCANS IN ONE PLOT
    #ax = axes[i,1]     ### INDIVIDUAL PLOTS FOR EACH SCAN
    
    ### USE PANDAS DATAFRAME for numerical derivative
    ax.set_title('numerical derivative')
    Xdf = X.to_dataframe().reset_index()[X.name]   # don't want "time" to be the index
    Ydf = Y.to_dataframe().reset_index()[Y.name]   # don't want "time" to be the index
    ax.plot(Xdf.rolling(window=2).mean(), Ydf.diff()/Xdf.diff(), color=color)
    ax.grid(True)
    
    ### ADDED EXTRACTION of inflection point (aka maximum of first derivative)
    ### ADDED EXTRACTION of temperature data
    Xinflection, _ = get_deriv_max(Xdf, Ydf)
    my_Tavg.append(Tavg)
    my_Xinfl.append(Xinflection)
    my_Tstd.append(Tstd)

runs_x_y_yerr =  {'sx':my_Xinfl , 'Tavg':my_Tavg, 'Tstd':my_Tstd  } #MAKE A DICTIONARY FROM LISTS

In [None]:
print(len(scans_final))
print(len(runs_final))
print(len(runs_x_y_yerr))

In [None]:
plt.figure()
plt.errorbar(runs_x_y_yerr["Tavg"], runs_x_y_yerr["sx"], yerr=runs_x_y_yerr["Tstd"], marker='.', ls='none',capsize=5 )
plt.xlabel('Temperature [K]')
plt.ylabel('Sample Position X [mm]')
plt.title('Thermal Expansion of Cryostat')
plt.grid(True)