In [None]:
from tiled.client import from_uri, from_profile
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib widget

1.  Diffraction 101
2.  What is the format of the data RIGHT NOW
3.  Read in some data (stored in files), keep in pandas dataframe
4.  plot data, do some simple reduction (subtract background)

5.  BaTiO3 dataset, read in, make nice plot
6.  Track some peaks, make a plot of peak as a function of position


In [None]:
c = from_uri("https://tiled-demo.blueskyproject.io/api")

Point to file-based powder diffraction data, which has already been reduced into 1-D patterns.

In [None]:
my_cat = c['um2022']['olds']
my_cat

Can see the 'node', but if we want to see all the entries, must cast as a list.

In [None]:
list(my_cat)

Let's look at the 'long' measurement of LaB6

In [None]:
my_cat['LaB6_long']

Again, even though there is a single entry into this node, we need to select it to go down another layer in the tree.

In [None]:
my_cat['LaB6_long']['LaB6_long_20210913-230727_322be6_0001_mean_tth']

We see at this point that we have a Pandas DataFrame with a column labeled 'I' for intensity.  As this is no longer a node, but a data layer, we can read this.

In [None]:
my_cat['LaB6_long']['LaB6_long_20210913-230727_322be6_0001_mean_tth'].read()

Or plot it directly via the pandas .plot method

In [None]:
my_cat['LaB6_long']['LaB6_long_20210913-230727_322be6_0001_mean_tth'].read().plot(figsize=(4,2))

While this is nice, you might be thinking - that is a very long name to type out.  Sort of a hassle to copy/paste a big giant string like that every time you want to access something.

Say hello to indexers :D 

In [None]:
my_cat['LaB6_long'].values_indexer[0]

In [None]:
my_cat['LaB6_long']['LaB6_long_20210913-230727_322be6_0001_mean_tth']

Returns the '0th' enttry of the node - which is the dataframe (that we could read / plot just as before).

In [None]:
my_cat['LaB6_long'].values_indexer[0].read().plot(figsize=(4,2))

In addition to the values_indexer, there is keys_indexer - which returns the key name for that corresponding node.

In [None]:
my_cat['LaB6_long'].keys_indexer[0]

If you wanted both the keys and values, you can use the items_indexer to get a tuple of key and values.

In [None]:
my_cat['LaB6_long'].items_indexer[0]

Back to our data!  Let's see what other datasets we could plot.

In [None]:
list(my_cat)

ethanol_fast contains many short, fast mesaurements of ethanol in a kapton capillary.

In [None]:
len(my_cat['ethanol_fast'])

Let's load up some of these, and put them in a pandas dataframe.
For simplicity, we'll label the column names of the DataFrame we are putting the data in as sequential integers, and we'll only load 10 at first (to keep the time short).

In [None]:
ethanol_data = my_cat['ethanol_fast'].values_indexer[0].read()
for i in range(1,20):
#for i in range(1,len(my_cat['ethanol_fast'])):
    temp_df = my_cat['ethanol_fast'].values_indexer[i].read()
    ethanol_data = pd.concat([ethanol_data,temp_df],axis=1)
    
ethanol_data.columns = np.arange(len(ethanol_data.columns))

ethanol_data = ethanol_data.loc[.2:15,:]

In [None]:
ethanol_data.plot(figsize=(4,2),legend=False)

This appears to have lots of variation in intensity between the different datasets!  

Let's take a look at this data in a different way, like a quick waterfall plot.

In [None]:
plt.figure(figsize=(4,2))
for i in range(len(ethanol_data.columns)):
    plt.plot(ethanol_data.loc[:,i]+i*5,alpha=.8)

So it's clear we have some issues between these scans having different amounts of intensity.  We'd like to average the runs together, but we probably don't want to do this as is.  Maybe we are lucky and if we normalize the data to say, the maximia of the low-angle peak.

In [None]:
plt.figure(figsize=(4,2))
for i in range(len(ethanol_data.columns)):
    plt.plot(ethanol_data.loc[:,i]/max(ethanol_data.loc[1:3.7,i])+i*.4)

Let's try averaging this together.

In [None]:
plt.figure(figsize=(4,2))
plt.plot((ethanol_data.loc[:,:]).mean(axis=1))

Now let's compare to a single, higher quality measurement.

In [None]:
plt.figure(figsize=(4,2))
plt.plot((ethanol_data.loc[:,:]).mean(axis=1))
plt.plot(my_cat['ethanol_long'].values_indexer[0].read()/30)

After this insepection, it's clear that the features in the lower quality datasets are the same as the higher (peak position is the same), but due to the noise present, even averaging mutliple datasets together won't recover the data quality seen in the longer run measurment.  

Let's switch over to some temperature dependent BaTiO3 data!



In [None]:
data = c['um2022']['olds']['BaTiO3_VT']
file_list = sorted(list(data))

bto_data = data[file_list[0]].read()
for i in range(1,len(file_list)):
    temp_df = data[file_list[i]].read()
    temp_df.columns = [i]
    bto_data = pd.concat([bto_data,temp_df],axis=1)
    

In [None]:
len(bto_data.columns)

In this case, we happen to know the temperature these datasets were taken at, so we can put those in as column names.

In [None]:
bto_data.columns = np.arange(100,502,2,dtype=float) 

Let's have a quick look at the data.

In [None]:
plt.figure(figsize=(4,2))
plt.plot(bto_data.mean(axis=1))

In [None]:
plt.figure(figsize=(4,2))
for i in range(len(bto_data.columns)):
    #this_col = BTO_data.columns
    plt.plot(bto_data.iloc[:,i]+i*20,color='k',alpha=.5)

Just to make things a little nicer, let's make a colormap based on the temperature, and zoom in on one of the peaks where we can see the changes happening.

In [None]:
def make_colormap(num_ids,use_cmap='viridis'):
    num_colors = (num_ids)
    cm = plt.cm.get_cmap(name=use_cmap)
    currentColors = [cm(1.*i/num_colors) for i in range(num_colors)]
    return currentColors

In [None]:
plt.figure(dpi=100)
cc = make_colormap(len(bto_data.columns)+50, use_cmap='inferno')

tth = bto_data.loc[4.7:5].index

for i in range(len(bto_data.columns)):
    this_t = bto_data.columns[i]
    plt.plot(tth+i*.0005,bto_data.loc[4.7:5,this_t]+i*5,c=cc[i],alpha=.7)
    
plt.xlabel('tth [degrees]');
plt.ylabel('I [a.u.]');