### Notebook 1: Basic Queries
This notebook offers an introduction to the main function responsible for querying the API and the kinds of data available for analysis. It then leads a reader towards other notebooks based on particular interests and characterization specialties. Two classes have been created, one for querying at the library level and another for querying at the sample level. They are laid out as follows:

##### Library Class
The library class contains four important functions:

Library.search_by_ids(ids_list): This is a static function within the library class. It takes a list of library numbers and returns a list of objects associated with each of the libraries, which can be queried as its own instance of the library class.

Library.search_by_composition(only=[],not_including=[],any_of=[]): This is also a static function within the library class. It takes a list of elements within each of the three lists, then returns a list of objects associated with the libraries that have that specific combination of elements. The "only" list allows a user to specify which elements are required to be in a sample, the "not_including" list allows a user to specify which elements are not allowed in a sample, and the "any_of" list allows a user to specify elements which may be in a sample, but for which it is not necessary for all of them to be so.

Library.properties(self): This function returns all relevant properties data about a library within a pandas DataFrame.

Library.spectra(self,which): This function returns either the optical spectra or the x-ray diffraction spectra for all samples in a library, depending on the value of 'which'. The variable 'which' can be set to 'xrd' to get x-ray diffraction spectra or 'optical' to get the ultraviolet reflectance, ultraviolet transmittance, near-infrared reflectance, and the near-infrared transmittance. This data is returned in a pandas DataFrame.
##### Sample Class
The sample class contains four important functions:

Sample.search_by_ids(ids_list): This is a static function within the library class. It takes a list of sample numbers and returns a list of objects associated with each of the samples, which can be queried as its own instance of the library class.

Sample.properties(self): This function returns all relevent properties data about a sample within a pandas DataFrame.

Sample.spectra(self,which): This function returns either the optical spectra or the x-ray diffraction spectra for a particular sample, depending on the value of 'which'. The variable 'which' can be set to 'xrd' to get x-ray diffraction spectra or 'optical' to get the ultraviolet reflectance, ultraviolet transmittance, near-infrared reflectance, and the near-infrared transmittance. This data is returned in a pandas DataFrame.

In [1]:
import sys
import pandas as pd
sys.path.append('../lib')
#Note: When working in Windows environments, use:
#sys.path.append('..\lib')
from library import Library
from sample import Sample
import seaborn as sns
color = sns.color_palette()

%matplotlib inline

Above one sees that the proper modules have now been imported, including the Library and Sample classes discussed above. A brief example is now shown for each of these class functions.

Below is an example of Library.search_by_ids([ids_list]). The result after querying from the list of samples is a list of objects, which are then called within the "for" loop. Using the Library.properties(), we get back a pandas DataFrame. From this DataFrame, we query information for each of the samples: the computer given id, the PDAC number (that is, the chamber it was made in), the number given to the sample by the researcher, and the elements listed as being a part of the sample.

In [2]:
for lib in Library.search_by_ids([7387,10295,7494,7269]):
    print(lib.properties()[['id','pdac','num','elements']])

In [3]:
import json, urllib, bs4
from bs4 import BeautifulSoup
url = 'https://htem-api.nrel.gov/api/sample_library?element=C,S'

with urllib.request.urlopen(url) as response:
    data = json.load(response)
    
data

[{'id': 7096,
  'num': 801,
  'pdac': '4',
  'elements': ['C', 'S'],
  'quality': 3,
  'sample_date': None,
  'person_id': 53,
  'has_xrd': 44,
  'has_xrf': 44,
  'has_opt': 0,
  'has_ele': 0,
  'deposition_compounds': ['C', 'S', None],
  'deposition_power': [60, 80, None],
  'deposition_gases': None,
  'deposition_gas_flow_sccm': None,
  'deposition_sample_time_min': 120,
  'deposition_cycles': None,
  'deposition_substrate_material': None,
  'deposition_base_pressure_mtorr': None,
  'deposition_initial_temp_c': 315,
  'sciround': None},
 {'id': 9074,
  'num': 799,
  'pdac': '4',
  'elements': ['C', 'S'],
  'quality': 3,
  'sample_date': None,
  'person_id': 53,
  'has_xrd': 0,
  'has_xrf': 44,
  'has_opt': 0,
  'has_ele': 0,
  'deposition_compounds': ['C', 'S', None],
  'deposition_power': [60, 80, None],
  'deposition_gases': None,
  'deposition_gas_flow_sccm': None,
  'deposition_sample_time_min': 120,
  'deposition_cycles': None,
  'deposition_substrate_material': None,
  'deposit

Suppose we wish to know some basic information about all the samples that contain a certain series of elements. In the example below, we can use the Library.search_by_composition function to look at information for all samples that have titanium, zinc, oxygen, and tin in them (and as an example, we want to ensure that there is no hydrogen present in them). We find four samples, which we can further explore if we so choose.

In [4]:
temp = """ 
import json, urllib
only=['C','S']
not_including=[]
any_of=[]
elt_url = 'https://htem-api.nrel.gov/api/sample_library?element='

for i in only:
    if i == only[-1]:
        elt_url = elt_url+str(i)
    else:
        elt_url = elt_url+str(i)+','
    
print(elt_url)

response = urllib.request.urlopen(elt_url)
print(response)
data = response.read()
print(data)
ids_list = []
#print(json.loads(response.read()))
for i in data:
    elts = str(i['elements'])
    violated = False
    for k in not_including:
        if k in elts:
            violated = True
    l = 0
    for k in only:
        if k in elts:
            l = l+1
    print('L is '+l)
    print('Only is '+len(only))
    if l == len(only) and violated == False:
        ids_list.append(i['id'])
        print(violated)
    else:
        pass
obj_list = []
print(ids_list)
for i in ids_list:
    obj_list.append(Library(i)) """


In [5]:
#Search terms are 
# only = Only the libraries that have these compounds
# not_including = Excludes These Compounds
# any_of = All 


compound = Library.search_by_composition(only= ['C','S'])
print(len(compound))

for lib in Library.search_by_composition(only= ['C','S']):#only = ['Ti','Zn','O','Sn'], not_including = ['H']):
    print(lib.properties()[['id','elements','pdac','num']])

6
     id elements pdac  num
0  7096   [C, S]    4  801
     id elements pdac  num
0  9074   [C, S]    4  799
     id elements pdac  num
0  9075   [C, S]    4  800
     id elements pdac  num
0  7103   [C, S]    4  803
     id elements pdac  num
0  6831   [C, S]    4  804
     id elements pdac  num
0  7324   [C, S]    4  805


Suppose we want to know everything there is to know about a certain library, including information like the deposition time, the deposition power, etc. We can see all of this within a single pandas DataFrame using the Library.properties() function. To narrow this down, one may look at just certain columns of the pandas DataFrame (as shown above).

In [6]:
Library(7387).properties()

Unnamed: 0,id,num,pdac,quality,person_id,sample_date,owner_name,owner_email,xrf_type,sputter_operator,...,deposition_initial_temp_c,box_number,deposition_gases,deposition_substrate_material,deposition_gas_flow_sccm,has_xrd,has_xrf,has_ele,has_opt,data_access
0,7387,399,4,3,52,,Lauryn Baranowski,l.l.baranowski@gmail.com,smx,,...,335,7,,,,44,0,0,0,public


Now we can also query the spectra for different libraries, however this usually results in quite a bit of data. The function Library.spectra(self,which) will return the full x-ray diffraction spectrum (which = 'xrd') for each sample or the full optical spectrum (which = 'optical') for each sample. Take note, however, that these commands access a substantial amount of data and are therefore prone to running a bit slower.

In [7]:
Library(7387).spectra(which='xrd')

  df['xrd_background_'+str(leveled_position)] = data['xrd_background']
  df['xrd_intensity_'+str(leveled_position)] = data['xrd_intensity']
  df['xrd_angle_'+str(leveled_position)] = data['xrd_angle']
  df['xrd_background_'+str(leveled_position)] = data['xrd_background']
  df['xrd_intensity_'+str(leveled_position)] = data['xrd_intensity']
  df['xrd_angle_'+str(leveled_position)] = data['xrd_angle']
  df['xrd_background_'+str(leveled_position)] = data['xrd_background']
  df['xrd_intensity_'+str(leveled_position)] = data['xrd_intensity']
  df['xrd_angle_'+str(leveled_position)] = data['xrd_angle']
  df['xrd_background_'+str(leveled_position)] = data['xrd_background']
  df['xrd_intensity_'+str(leveled_position)] = data['xrd_intensity']
  df['xrd_angle_'+str(leveled_position)] = data['xrd_angle']
  df['xrd_background_'+str(leveled_position)] = data['xrd_background']
  df['xrd_intensity_'+str(leveled_position)] = data['xrd_intensity']
  df['xrd_angle_'+str(leveled_position)] = data['xrd_ang

Unnamed: 0,xrd_angle_1,xrd_background_1,xrd_intensity_1,xrd_angle_2,xrd_background_2,xrd_intensity_2,xrd_angle_3,xrd_background_3,xrd_intensity_3,xrd_angle_4,...,xrd_intensity_41,xrd_angle_42,xrd_background_42,xrd_intensity_42,xrd_angle_43,xrd_background_43,xrd_intensity_43,xrd_angle_44,xrd_background_44,xrd_intensity_44
0,19.00,25560,25560,19.00,25060,25060,19.00,25980,25980,19.00,...,18170,19.00,18930,18930,19.00,18930,18930,19.00,21890,21890
1,19.05,26020,26160,19.05,25570,25630,19.05,26060,26300,19.05,...,18380,19.05,19430,19540,19.05,18900,18830,19.05,22140,22060
2,19.10,26240,26070,19.10,25720,25670,19.10,26050,25790,19.10,...,18430,19.10,19600,19530,19.10,19130,18940,19.10,22360,22460
3,19.15,26590,26310,19.15,25900,26120,19.15,26230,25880,19.15,...,18830,19.15,19810,19790,19.15,19420,19480,19.15,22600,22460
4,19.20,26850,27180,19.20,26090,26170,19.20,26490,26600,19.20,...,18980,19.20,20040,20180,19.20,19680,19680,19.20,22860,22830
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
656,51.80,35760,36470,51.80,37710,38320,51.80,38200,38560,51.80,...,39310,51.80,38290,41400,51.80,36810,38860,51.80,40690,43730
657,51.85,35720,36090,51.85,37520,37710,51.85,38020,38280,51.85,...,37810,51.85,38470,38740,51.85,36720,36740,51.85,40620,41230
658,51.90,35330,35420,51.90,36920,36790,51.90,37440,37760,51.90,...,36490,51.90,37400,37830,51.90,35190,35220,51.90,39250,39120
659,51.95,34660,34790,51.95,36210,36220,51.95,36760,36930,51.95,...,34930,51.95,35470,35560,51.95,33440,33670,51.95,37180,37280


In [13]:

url = 'https://htem-api.nrel.gov/api/sample_library/'+str(8307)

with urllib.request.urlopen(url) as response:
    data = json.load(response)
#response = urllib.request.urlopen(url)
#data = json.loads(response.read())
positions = data['sample_ids']
df = pd.DataFrame()

for k in positions:
    print(k)
    url = 'https://htem-api.nrel.gov/api/sample/'+str(k)

temp = """ 

    #There is the potential to replace this with mvl_optical or mvl_xrd, 
    #but these seem to be broken at the moment...
    with urllib.request.urlopen(url) as response:
        data = json.load(response)
    #response = urllib.urlopen(url)
    #data = json.loads(response.read())
    leveled_position = data['position']
    if which == 'xrd':
        df['xrd_angle_'+str(leveled_position)] = data['xrd_angle']
        df['xrd_background_'+str(leveled_position)] = data['xrd_background']
        df['xrd_intensity_'+str(leveled_position)] = data['xrd_intensity']
    elif which == 'optical':
        pos_df = pd.DataFrame()
        uvit_df = pd.DataFrame()
        try:
            uvit_df['uvit_wave_'+str(leveled_position)] = data['oo']['uvit']['wavelength']
            uvit_df['uvit_response_'+str(leveled_position)] = data['oo']['uvit']['response']
        except KeyError: #No uvit available
            pass
        uvir_df = pd.DataFrame()
        try:
            uvir_df['uvir_wave_'+str(leveled_position)] = data['oo']['uvir']['wavelength']
            uvir_df['uvir_response_'+str(leveled_position)] = data['oo']['uvir']['response']
        except KeyError: #No uvir available
            pass
        nirt_df = pd.DataFrame()
        try:
            nirt_df['nirt_wave_'+str(leveled_position)] = data['oo']['nirt']['wavelength']
            nirt_df['nirt_response_'+str(leveled_position)] = data['oo']['nirt']['response']
        except KeyError: #No nirt available
            pass
        nirr_df = pd.DataFrame()
        try:
            nirr_df['nirr_wave_'+str(leveled_position)] = data['oo']['nirr']['wavelength']
            nirr_df['nirr_response_'+str(leveled_position)] = data['oo']['nirr']['response']
        except KeyError: #No nirr available
            pass
        pos_df = pd.concat([uvit_df,uvir_df,nirt_df,nirr_df],axis=1)
        df = pd.concat([df,pos_df],axis=1)
    else:
        df = pd.DataFrame() """

In [8]:
Library(8307).spectra(which='optical')

HTTPError: HTTP Error 400: Bad Request

Many of the same techniques used on an entire 44-sample library may also be used on a single sample. Data may be queried just as before, however the information will be specific to a sample instead of a library. Below is an example of the Sample.search_by_ids(id_list) function, which returns a list of objects for each position.

In [9]:
for lib in Sample.search_by_ids([300999,311733,213789]):
    print(lib.properties()[['sample_id','xrf_compounds','xrf_concentration','thickness']])

HTTPError: HTTP Error 400: Bad Request

The code segment above also makes use of the Sample.properties(self) function. Just as with the Library class, this returns all information relevant to this particular sample, formatted within a pandas DataFrame.

In [10]:
Sample(311733).properties()

HTTPError: HTTP Error 400: Bad Request

In the same way that one queries the spectra for an entire library, one can just as easily query a single sample for either x-ray diffraction or optical spectra. Note that the near-infrared spectra within the optical DataFrames are significantly shorter, so the result is that the DataFrame gets padded with Null values within the column.

In [11]:
Sample(213789).spectra('xrd')

Unnamed: 0,xrd_angle,xrd_background,xrd_intensity
0,19.00,75680,75680
1,19.05,79030,79890
2,19.10,80450,79840
3,19.15,82090,81100
4,19.20,83750,85180
...,...,...,...
656,51.80,107900,107200
657,51.85,107100,105800
658,51.90,106300,108200
659,51.95,106300,107500


In [12]:
Sample(300999).spectra('optical')

HTTPError: HTTP Error 400: Bad Request

This concludes the explanation of the Python classes used to query data from the API.