# Introduction to *HyperSpy*, *LumiSpy* and *eXSpy*

> **Spectroscopy Data Analysis in Python Using [HyperSpy](https://hyperspy.org)**

Tutorial for the **eBEAM2024 school on nano-optics with free electrons**

> Aussois, September 1-13, 2024

**Table of Contents:**

- [Import packages](#Import-packages)
- [Loading files](#Loading-files)
- [Data structure / Axes handling](#Data-structure-/-Axes-handling)
- [Metadata](#Metadata)
- [Plot / Explore](#Plot-/-Explore)
- [Indexing](#Indexing)
- [Correction of spectral defects](#Correction-of-spectral-defects)
- [Basic model fitting](#Basic-model-fitting)
- [Peak identification / Centroid / Peak width](#Peak-identification-/-Centroid-/-Peak-width)

## Import packages

We import the public functions (api = application programming interface) of `HyperSpy`. Object oriented functions of `LumiSpy` and `eXSpy` will be directly available if installed.

Finally, `numpy` provides numerical operations on arrays that we will use:

In [None]:
# Use '%matplotlib widget' in JupterLab and '%matplotlib notebook' in JupyterNotebook for interactive inline functionality (e.g. on binder)
# For pop-up window plots on your local computer, use '%matplotlib tk' or '%matplotlib qt' instead
%matplotlib widget

import hyperspy.api as hs
import numpy as np

# Plot multiple inline figures side-by-side horizontally 
hs.preferences.Plot.widget_plot_style = 'horizontal'

**LumiSpy** and **eXSpy** provide dedicated signal classes.

We can check the **available signal types**:

In [None]:
hs.print_known_signal_types()

## Loading files

For saving analyses, HyperSpy has its own hdf5-based data format `.hspy`.

**RosettaSciIO** provides support for a wide range of microscopy (and spectroscopy) related **data file types**!

We will load two files that we will use during the demo. A preprocessed dataset saved in the `hspy` format and one map in the `dm4` (Gatan) format:

*We assume the file location as in the demo repository, if you downloaded the notebook and the data files individually, you might need to adapt the path.*

In [None]:
cl1 = hs.load("data/01_demo.hspy")
cl2 = hs.load("data/asymmetric-peak_map.dm4")

To see **parameters** that the function takes, in Jupyter, you can **display the docstring** by using a `?`:

In [None]:
hs.load?

## Data structure / Axes handling

Each HyperSpy signal object has certain attributes that contain the relevant data about the axes, data and metadata.

To understand the HyperSpy datastructure, lets have a look at the dataset `cl2` (Gatan file).

As **LumiSpy** is installed, the dataset is directly recognized as CL data and the `signal_type` set to `CLSpectrum`. (The fallback would be the more generic `Signal1D` if LumiSpy is not installed).

The **signal class** provides certain specific routines, for example conversion to energy axis in the case of luminescence data.

Our sample dataset has **two navigation dimensions** and **one signal (spectral) dimension**:

In [None]:
cl2

The information about the axes is stored in the `axes_manager`. Thus, we can get more details about the different axes, by calling the **axes manager**:

In [None]:
cl2.axes_manager

The **actual data** (signal intensity) is stored in a numpy array:

In [None]:
cl2.data

## Metadata

For most supported file formats, the metadata is automatically parsed into **HyperSpy's metadata tree**.
It contains information about the measurement, but potentially also about post-processing:

In [None]:
cl2.metadata

In a separate tree, the **complete metadata from the vendor format** is read in (which follows different conventions depending on the format):

In [None]:
cl2.original_metadata

## Plot / Explore

We can easily plot and explore the hyperspectral data (drag the marker in the *navigation* window to change the displayed spectrum):

*(In the following, we will use the preprocessed dataset `cl1`. The sample contains MethylammoniumLead Bromine (MAPbBr3) perovskite single crystals fabricated by Alice Dearle.)*

In [None]:
cl1.plot()

Plot the **average CL spectrum** of the whole map:

In [None]:
cl1.mean().plot()

## Indexing

HyperSpy has a powerful numpy (Matlab) style indexing mechanism that distinguishes between navigation and signal axes:

- `.inav[x1:x2,y1:y2]`
- `.isig[s1:s2]`

The index parameters can be either:
- Integer: Index in the axis array
- Float: Value in calibrated axis units

For example, we can either plot a subset of the map in navigation space (selected using pixels as index):

In [None]:
cl1.inav[2:23,0:20].plot()

Or, we can plot the mean spectrum in a certain spectral range (selected using wavelength units):

In [None]:
cl1.isig[440.:600.].mean().plot()

### Chromatic imaging:

Indexing can also be used for color-filtered (chromatic) imaging.

First, lets plot the **panchromatic image**:

*(the object is transposed, so that we plot the intensity over navigation instead of signal dimensions)*

In [None]:
cl1.T.mean().plot(cmap='viridis')

Now, we can **plot the intensity in a selected spectral window** (color-filtered image) using indexing:

In [None]:
cl1.isig[480.:550.].T.mean().plot(cmap='viridis')

Alternatively, we can interactively select a spectral window (color-filtered image) using regions of interest:

In [None]:
im = cl1.T
im.plot()
roi1 = hs.roi.SpanROI(left=455, right=485) #sets a digitalbandfilter
im_roi1 = roi1.interactive(im, color="red")
im_roi1_mean = hs.interactive(im_roi1.mean,
                          event=roi1.events.changed,
                          recompute_out_event=None)
im_roi1_mean.plot(cmap='viridis')

## Correction of spectral defects

Working on the unprocessed dataset `cl2`, we can introduce some basic functions for artefact correction:

### Remove background (interactive)

HyperSpy has an interactive tool for **background removal** that supports various functions, let's start by removing a **simple offset**:
1. Select a region to be used to determine the background (lowest signal intensity): On the signal plot click, drag and release
2. Select the background type *Offset* (can also be set using the argument `background_type="Offset"`)
3. You can still move the region or its boundaries with the mouse and inspect the different spectra using the navigator to make sure the region is right
4. Press `Apply`

In [None]:
cl2.plot()
cl2.remove_background(background_type="Offset")

### Remove last pixels from the spectrum

The signal beyond 800 nm goes to negative values, so lets remove the last three pixels from every spectrum (using signal indexation) and replace the original signal.

*NOTE: Indexation operates on pixel in the signal dimension if the given number is an integer and on the calibrated (wavelength axis) if a float value is used as index.*

*Caution: Only run this cell once, since each consecutive run will remove another three pixels. Alternatively, you can use a dedicated variable for the corrected signal.*

In [None]:
cl2 = cl2.isig[:-3]

In [None]:
cl2.plot()

### Remove spikes (interactive)

There is also a tool for interactive removal of cosmic rays (pixels with sharp spikes), see `Help` for instructions.

In brief:
- Inspect the derivative histogram
- Set a sensible threshold to catch the outliers in the histogram (8 is a sensible threshold for this dataset)
- Iterate through `Find next` / `Remove spike` to continue for wrong identifications / remove identified spikes
- `Close` when finished

*NOTE: The interactive version does not work well with inline plotting. You can also do an automatic best guess spikes removal by passing `interactive=False`. This function, interactive or not, will overwrite the original signal.* 

In [None]:
cl2.spikes_removal_tool(interactive=False)

### Data smoothing

The current dataset is quite noisy. As the peak is broad in comparison with the spectral resolution, one way to improve that is by **rebinning** the data along the signal axis:

*Caution: Running this cell multiple times will rebin the signal further.*

In [None]:
cl2 = cl2.rebin(scale=[1,1,2])
cl2.plot()

Additionally, HyperSpy provides three different functions for **data smoothing**:

- `smooth_lowess` (lowess smoothing)
- `smooth_savitzky_golay` (Savitzky Golay filter)
- `smooth_tv` (total variation data smoothing)

These functions can be run interactively to choose the right parameters, but the parameters can also be passed to the function. You can play with the parameters and get a live preview, and hit `Apply` when you are happy with the smoothed curve.

*As we want to use the non-smoothed data afterwards for fitting the data, we first make a copy of the dataset.*

In [None]:
cl2a = cl2.deepcopy()
cl2.plot()
cl2.smooth_lowess(number_of_iterations = 2)

In [None]:
cl2.plot()

## Basic model fitting

We will start by introducing very basic fitting functionality. A more elaborate example on [model fitting](#Model-fitting) will follow later in this notebook. For more details see also the [HyperSpy demos repository](https://github.com/hyperspy/hyperspy-demos).

*Note that for simplicity, we do the fitting in the wavelength domain. In particular for luminescence spectroscopy data containing broad emission bands, it might make more sense to run these routines in the [energy domain](#Axes-types-/-Convert-to-energy-scale) after a [Jacobian transformation](#Jacobian-transformation) instead of converting the result. An example is included later in this notebook.*

First, we need to **initialize the model** (using the unsmoothed data):

In [None]:
m = cl2a.create_model()

A HyperSpy model can be composed of several **components** (functions).

We can **check the components** of the model – should be empty, but for some types of signals like EDS and EELS, the model is automatically initialized with components:

In [None]:
m.components

Thus, we need to **create some components** and **add them to the model**.

As the emission peak in our dataset is rather asymmetric, we will use a single `SkewNormal` component. This function is characterized by a position `x0`, an area `A`, a width parameter `scale` and the skewness characterized by the `shape`. The only start value we need to set for a successful fit is a centre wavelength `x0=650 nm`.

*Note that HyperSpy has a range of [built-in functions](https://hyperspy.org/hyperspy-doc/current/user_guide/model/model_components.html#pre-defined-model-components) covering most needs that can be added as components to a model. However, it also has an intuitive mechanism to [define custom functions](https://hyperspy.org/hyperspy-doc/current/user_guide/model/model_components.html#define-components-from-a-mathematical-expression).*

In [None]:
# Docstring of the SkewNormal component
hs.model.components1D.SkewNormal?

In [None]:
g1 = hs.model.components1D.SkewNormal(x0=650)
## Alternative way to set the start value of x0:
# g1.x0.value = 650
m.append(g1)
## Alternatively add a list of components:
# m.extend([g1])
m.components

To see the parameters of our components and their default values, we can **print all parameter values**:

In [None]:
m.print_current_values()

To directly apply the fit to all the spectra in the map, we use the `multifit` command.

In the current case of a single, well defined peak, we achieve a good fit without further adjusting the initial values of the parameters or setting any boundaries.

In [None]:
m.multifit()

We can now **plot the model** together with the data:

In [None]:
m.plot()

The `SkewNormal` component represents the asymmetry of the peak very well, but does not fully reproduce the height of the main part of the peak.

We can also print the parameter values at the current index:

In [None]:
m.print_current_values()

## Peak identification / Centroid / Peak width

In particular for asymmetric peaks, fitting might not always be the best way to determine peak characteristics (despite asymmetric functions, such as the skew normal distribution, being provided). Therefore, HyperSpy provides a number of additional routines.

Peaks can be identified and characterized using the **peak finder** routine `find_peaks1D_ohaver` that is based on the downwards zero crossing of the first derivative.

*For these routines, it is helpful to operate on the smoothed dataset. As we have some side-peaks, we operate on a subrange of the wavelength axis defined by `isig`.*

In [None]:
peaks = cl2.isig[600.:].find_peaks1D_ohaver(maxpeakn=1)

The function **returns a structured array** that contains `position`, `height` and `width` for every pixel (potentially each for multiple peaks).

In [None]:
peaks[0,0]

Especially for broad, asymmetric emission bands, the position of the maximum intensity might be of limited value. Therefore, **LumiSpy** provides an additional `centroid` function that determines the **centre of mass** of a peak.

Required version: lumispy>=0.2.2

*Note that, as with fitting, it might make more sense to run these routines in the [energy domain](#Axes-types-/-Convert-to-energy-scale) after a [Jacobian transformation](#Jacobian-transformation) than to convert the result - in particular for broad emission bands. For simplicity, we introduce it in the wavelength domain.*

In [None]:
com = cl2.isig[600.:].centroid()

The result is a new `signal` that we can plot as a colormap using the HyperSpy functionality:

In [None]:
com.plot(cmap='viridis')

You can also determine the **width of a peak** directly from the signal without fitting a model to the data. Again useful for asymmetric peaks. To plot the FWHM interval, we set `return_interval=True` (the returned list then contains three arrays: *width*, as well as *left position* and *right position* of the interval). 

The default is to determine the **FWHM**, i.e. a `factor=0.5`. This value can be set to any other fraction of the peak height.

In [None]:
width = cl2.isig[600.:].estimate_peak_width(return_interval=True)

In [None]:
width[0].plot(cmap='viridis')

Now we can **add markers** for the *FWHM interval* and the *centre of mass* to the signal object and plot them on the spectra:

In [None]:
# Temporary fix for the HyperSpy 2.0 release, as the signals used to create markers need to be of `ragged` type
def to_ragged(s):
    s2 = hs.signals.BaseSignal(np.empty(s.T.axes_manager.navigation_shape[::-1], dtype=object), ragged=True)
    for indices in np.ndindex(s2.data.shape):
        s2.data[indices] = np.array([s.data[indices]])
    return s2

In [None]:
mrk = hs.plot.markers.VerticalLines.from_signal(to_ragged(com), color='black', signal_axes=None)
mrkl = hs.plot.markers.VerticalLines.from_signal(to_ragged(width[1]), color='grey', signal_axes=None)
mrkr = hs.plot.markers.VerticalLines.from_signal(to_ragged(width[2]), color='grey', signal_axes=None)
cl2.add_marker([mrk,mrkl,mrkr], permanent=True)
cl2.plot()

## Now try with your own data!
