# PRISMAs API: Examples

PRISMAs API consist of a `Spectrum` object, and functions to operate it (process it). In the following tutorial we demonstrate how to load, access and operate spectrum objects using the functions implemented in PRISMA.

In [1]:
import prisma.parsers
import os
import bqplot.pyplot as plt

## 1. Loading files  
First we use PRISMA parsers to load synthetic datasets containing spectra and instantiate as spectrum objects.  
PRISMA accepts files with three different formats. See the API reference and explore the data files to get more details.

### 1.1 Spectra stored as single file 
First open the single file in binary mode, and then pass the binary text to PRISMA's parser. The result are two dictionaries, one containing the spectra and the other its related metadata.

In [2]:
file_path = r'./data_single_csv/synthetic_dataset.csv'
with open(file_path, mode='rb') as file:
     spectra, spectra_metadata = prisma.parsers.single_csv(file.read())

In [3]:
spectra.keys()

dict_keys(['spectrum_0', 'spectrum_1', 'spectrum_2', 'spectrum_3', 'spectrum_4', 'spectrum_5', 'spectrum_6', 'spectrum_7', 'spectrum_8', 'spectrum_9', 'spectrum_10', 'spectrum_11', 'spectrum_12', 'spectrum_13', 'spectrum_14', 'spectrum_15', 'spectrum_16', 'spectrum_17', 'spectrum_18', 'spectrum_19'])

In [4]:
spectra_metadata

{'common_energy_axis': True,
 'energy_limits': [200.0, 1498.0],
 'number_of_spectra': 20,
 'number_of_datapoints': 650,
 'min_resolvable_width': 2.0,
 'error': ''}

The spectra dictionary consist of key:spectrum_object pairs.

### 1.2 Spectra stored as individual files  


If instead the spectra are stored as individual files, we use the multiple_txt parser.  
First the files are listed, then each is read as binary files and stored in a dictionary of filenames:binary_text  pairs

In [5]:
file_path = r'./data_multiple_txt/'
list_of_filenames = os.listdir(file_path)

In [6]:
binary_text_files = {}

for filename in list_of_filenames:
    with open(file_path+filename, mode='rb') as file:
        binary_text_files[filename] = file.read()       


The dictionary is passed to the parser, which outputs two dictionaries, one containing the spectra and the other its related metadata.

In [7]:
spectra, spectra_metadata = prisma.parsers.multiple_txt(binary_text_files)

In [8]:
spectra.keys()

dict_keys(['spectrum_0.txt', 'spectrum_1.txt', 'spectrum_10.txt', 'spectrum_11.txt', 'spectrum_12.txt', 'spectrum_13.txt', 'spectrum_14.txt', 'spectrum_15.txt', 'spectrum_16.txt', 'spectrum_17.txt', 'spectrum_18.txt', 'spectrum_19.txt', 'spectrum_2.txt', 'spectrum_3.txt', 'spectrum_4.txt', 'spectrum_5.txt', 'spectrum_6.txt', 'spectrum_7.txt', 'spectrum_8.txt', 'spectrum_9.txt'])

In [9]:
spectra_metadata

{'common_energy_axis': True,
 'energy_limits': [200.0, 1498.0],
 'number_of_spectra': 20,
 'number_of_datapoints': 650,
 'min_resolvable_width': 2.0,
 'error': ''}

## 2. The spectrum object

PRISMA uses a data hierarchy to access data and ensures tracking the provenance of the processing steps. The hierarchy of the spectra variable outputted by the parsers is shown below:  
![Data hierarchy of spectra](./figures/hierarchy.png)  
* The parser's output - spectra - is a dictionary whose keys correspond to individual spectrum names (the filenames of individual txt files, or headings of the single csv file)
* Each name accesses itself a dictionary, storing three types of `Spectrum` objects:  
    * *root*: the original upload  
    * *processed*: after baseline correction
    * *fit*: after peak fitting.  

In addition of the three main objects, you can of course add more keys to specify different operations.   

Indexes and counts are the two Numpy arrays containing the values of the scanning variable (wavenumbers, energies, etc.) and the counts, respectively. For a guide to the attributes of these object types, consult the documentation.

### 2.1 Accessing the data  
If you wish to access data, e.g. for plotting, follow the structure in the previous figure. For example, we plot two root (raw) spectra:

In [10]:
fig1 = plt.figure()
plt.clear()
plt.scatter(x=spectra['spectrum_0.txt']['root'].indexes, 
         y=spectra['spectrum_0.txt']['root'].counts, colors = ['DarkGray'], default_size = 30)
plt.scatter(x=spectra['spectrum_18.txt']['root'].indexes, 
         y=spectra['spectrum_18.txt']['root'].counts, colors = ['LightBlue'], default_size = 30)
plt.show()

VBox(children=(Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig…

At the moment, spectra does not have `SpectrumProcessed` or `SpectrumFit` objects; they will be created later when processing the original spectra.  

**Note**: [bqplot](https://bqplot.readthedocs.io/en/latest/#) is used as plotting package because its has better integration to widgets in Jupyter - however, you can use the package you prefer (matplotlib, seaborn, plotly...)

## 3. Processing spectra
Now the processing functions of PRISMA are used.  

### 3.1 Trimming
A spectrum will be trimmed PRISMA. First, import the relevant module.

In [11]:
import prisma.preprocessing

The trimming function is then used: it takes as argument a `Spectrum` object and the interval to trim. The result a spectrum trimmed within a region of interest.

In [12]:
spectra['spectrum_0.txt']['trimmed'] = prisma.preprocessing.trimming(spectra['spectrum_0.txt']['root'], (300,1400))

In [13]:
fig2 = plt.figure()
plt.clear()
plt.scatter(x=spectra['spectrum_0.txt']['root'].indexes, 
         y=spectra['spectrum_0.txt']['root'].counts, colors=['Gray'], default_size = 30)
plt.scatter(x=spectra['spectrum_0.txt']['trimmed'].indexes, 
         y=spectra['spectrum_0.txt']['trimmed'].counts, colors=['Tomato'], default_size = 30)
plt.show()

VBox(children=(Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig…

### 3.2 Baseline correction

In [14]:
import prisma.baselines

Now the baseline can be corrected within the region of interest. For that, use the baselines module. The assymetric least squares method is currently implemented. The function takes as inputs a spectrum object and two more parameters: a penalty *log_p* and a smoothing parameter *log_lambda*. See the documentation for more details.  

The function ouptuts a `SpectrumProcessed` object, with the same attributes as a `Spectrum` with an additional `.baseline` attribute. 

In [22]:
spectra['spectrum_0.txt']['processed'] = prisma.baselines.asymmetric_least_squares(spectrum=spectra['spectrum_0.txt']['trimmed'], 
                                                                                   log_p = -2, log_lambda = 5.0)

You can see the result below: in grey the original spectrum, in orange the baseline-substracted version and in blue the baseline. You can play with the `log_p` and `log_lambda` parameters to improve the baseline.

In [23]:
fig3 = plt.figure()
plt.clear()
plt.scatter(x=spectra['spectrum_0.txt']['root'].indexes, 
         y=spectra['spectrum_0.txt']['root'].counts, colors=['Gray'], default_size = 30)
plt.scatter(x=spectra['spectrum_0.txt']['processed'].indexes, 
         y=spectra['spectrum_0.txt']['processed'].counts, colors=['Tomato'], default_size = 30)
plt.plot(x=spectra['spectrum_0.txt']['processed'].indexes, 
         y=spectra['spectrum_0.txt']['processed'].baseline, colors=['navy'])
plt.show()

VBox(children=(Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig…

### 3.3 Peak fitting

In [17]:
import prisma.fitpeaks

The fitpeaks module contains the `fit_peaks` function with parameters:
* *spectrum*: a `Spectrum` object.
* *peak_bounds*: a list of (low,high) 2-tuples for each peak. Each 2-tuple specify the low,high limits of the neighborhood where an individual peak is expected to be found. 
* *guess_widths*: a list of maximum widht limits for each peak. **Note** len(peak_bounds) must be equal to len(guess_widths).
* *lineshape*: the peak profile to use. Currently PRISMA supports 'Lorentzian', 'Gaussian' and 'Pseudo-Voight 50% Lorentzian'; the latter is a mixture of 50% Lorentzian + 50% Gaussian. 

In [18]:
my_peak_bounds = [(600,700),(700,800),(850,950)] #[(bounds peak 1), (bounds peak 2), (bounds peak 3)]
my_max_widths = [100, 100, 100] 
spectra['spectrum_0.txt']['fit'] = prisma.fitpeaks.fit_peaks(spectrum = spectra['spectrum_0.txt']['processed'],
                                                             peak_bounds = my_peak_bounds,
                                                             guess_widths = my_max_widths,
                                                             lineshape = 'Lorentzian')

The `fit_peaks` function returns an `SpectrumPeakFit` object. The `.counts` attribute is a Numpy array with the sum of all profiles.

In [19]:
fig4 = plt.figure()
plt.clear()
plt.scatter(x=spectra['spectrum_0.txt']['processed'].indexes, 
         y=spectra['spectrum_0.txt']['processed'].counts, colors=['Tomato'], default_size = 30)
plt.plot(x=spectra['spectrum_0.txt']['fit'].indexes, 
         y=spectra['spectrum_0.txt']['fit'].counts, colors=['Purple'])
plt.show()

VBox(children=(Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig…

Likewise, the `SpectrumPeakFit` object has a `.profiles` attribute, which is a dictionary of int:Numpy array pairs, each array containing an individual peak. See below how these profiles are accessed: 

In [20]:
fig5 = plt.figure()
plt.clear()
plt.scatter(x=spectra['spectrum_0.txt']['processed'].indexes, 
         y=spectra['spectrum_0.txt']['processed'].counts, colors=['Tomato'], default_size = 30)

for profile in spectra['spectrum_0.txt']['fit'].profiles.values():
    plt.plot(x=spectra['spectrum_0.txt']['fit'].indexes, 
             y=profile, colors=['Sienna'])
plt.show()

VBox(children=(Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig…

Finally, you can access the fitting parameters as well, from the `.metadata` attribute of `SpectrumPeakFit` object. The 'Fitted parameters' key holds a dictionary with the parameters:
* *y_0*: the intercept
* *h_n*: height of n-th peak
* *p_n*: position of n-th peak
* *w_n*: widht of n-th peak

In [21]:
spectra['spectrum_0.txt']['fit'].metadata

{'parent': <prisma.spectrum.SpectrumProcessed at 0x23de21294f0>,
 'Process': 'Peak fitting',
 'Process ID': 'DD01S8MK',
 'Peak lineshapes': 'Lorentzian',
 'Number of peaks': 3,
 'Initial widths': [100, 100, 100],
 'Position bounds': [(600, 700), (700, 800), (850, 950)],
 'Fitting success': True,
 'Fitted parameters': {'y_0': 14.265210648457046,
  'h_1': 159.2095865428945,
  'p_1': 651.4087459148911,
  'w_1': 32.43915922647676,
  'h_2': 490.71427341906576,
  'p_2': 729.9340273630528,
  'w_2': 8.830629602482736,
  'h_3': 288.2072134427914,
  'p_3': 899.6150644234906,
  'w_3': 14.33272728476666}}