## This script can be found on [GitHub](https://github.com/Basem-Qahtan/STEM-data-processing-scripts) along with other scripts for STEM data processing algorithms.

##### -------------------------------------------------------------------------------------------------------------------------------------------------
# EDX-Tomography. 

EDX-Tomography can be used to observe the topography of the elements in the sample. The main challenge is the copper x-ray signal being generated at high tilt ( such as -60  or +60 degrees), one way to overcome this is to calibrate the EDX detectors (in case of 4 EDX detectors, 2 might be turned off as high tilting angles) to reduce the collection of the intense copper x-ray signal. but the Cu signal remains intense and might overlap with other element x-ray signals such as Zn_Ka x-ray peak.

The Principal component analysis (PCA) can be used to denoise or eliminate the Cu signal (when applied on all the tilt angles data by stacking them into one variable). this notebook stacks and denoise/remove unwanted signals such as Cu from the datasets. 


## Author

* 15/10/2024 Basem Qahtan - Developed as part of a Ph.D. research project at the Italian Institute of Technology (IIT)/ the University of Genoa (UniGe) in Italy and KAUST university in KSA.

## Changes

* 01/11/2024 - Added auto cropping and rebinning of EDX map prior to the principal component analysis (PCA) denoising.
* 15/11/2024 - Added stacking of total EDX tilt angle signals. 
* 15/12/2024 - Added auto detection of x-ray lines and plotting of elemental maps by selecting the name of the x-ray line. 
* 19/12/2024 - Exporting of elemental maps in PNG format (with scale bar) and TIF file without a scale bar (not compressed).


## Requirements

* HyperSpy 1.7.4 or higher

## <a id='top'></a> Contents

1. <a href='#dat'>Specimen & Data</a>
2. <a href='#PCA_'> Cu_Kβ and Zn_Kα peak deconvolution using PCA decomposition  </a>
3. <a href='#load'> Loading ,rebinning and stacking and PCA decomposition of the EDX maps </a>
4. <a href='#pca-nmf'> PCA-NMF decomposition of the stacked EDX map</a>
5. <a href='#X_ray_lines'> X-ray intensity lines identification and selection for elemental map plotting</a>
6. <a href='#element_of_interest'> select an X-ray line of an element to plot the elemental maps:</a>



# <a id='dat'></a> 1. Specimen & Data


EDX tomography was collected from FeOZn sample using a TEM with 4 EDX detectors (EDX detectors are not turned off at high tilt angles meaning the Cu signal will be intense). Due to the Cu high peak signal it overlaps with the Zn_Ka even after applying PCA denoising on individual tilt angle datasets as shown in figure 1 (left) below.

<img src="images/PCA EDX denoising.png" style="height:300px;">
Figure 1: PCA denoising of FeOZn sample using 1 tilt angle dataset (right) and all the tilt angle datasets stacked (left).

# <a id='PCA_'></a> 2. Cu_Kβ and Zn_Kα peak deconvolution using PCA decomposition 


However, when applying PCA on the entire tilt angles combined, PCA is able to distinguish the main 4 components of the EDX spectrum as shown in figure 2 . now the EDX signal can be reconstructed with the exclusion of the Cu x-ray signal. This results in the elimination of the Cu signal as shown in figure 1 (right).


<img src="images/PCA on EDX tomography.png" style="height:500px;">
Figure 2: PCA decompositition of FeOZn sample showing the first 4 components.


Figure 3 below shows the elemental map of Zn_Ka signal, the signal is noisy using only the raw datasets. When applying EDX map rebinning by 4, the signal is improved a little. However, when applying PCA after rebining, the signal to noise ratio is greatly imporved.


<img src="images/PCA EDX tomographty elemental map.png" style="height:300px;">
Figure 3: Zn_Ka elemental map of 1 tilt angle of raw data (right), rebinned data (centre) and rebinned and PCA denoised (left).


# <a id='load'></a> 3. Loading ,rebinning and stacking and PCA decomposition of the EDX maps 


In [50]:
%matplotlib qt

import hyperspy.api as hs
import numpy as np
import matplotlib.pyplot as plt
import os
import re
names=[]


# Specify the directory path
directory = os.getcwd()

# Get a list of files in the directory
file_list = os.listdir(directory)

# Filter files with the .emd or .bcf extension
emd_files = [file for file in file_list if file.endswith(".emd") or file.endswith(".bcf")]

def extract_number(filename):
    # Extract the number from the filename using regex
    match = re.search(r'\((\d+)\)', filename)
    if match:
        return int(match.group(1))
    return 0  # Return 0 if no number is found

# Sort the list using the custom key function
sorted_emd_files = sorted(emd_files, key=extract_number)

# Print the sorted list to verify
#for file in sorted_emd_files:
#    print(file)
EDX=[]
EDX_O=[]

# Load and process the .emd files using Hyperspy
for file in sorted_emd_files:
    SI = []
    # Construct the full file path
    file_path = os.path.join(directory, file)
    SI = hs.load(file_path)

    # Split the string based on the '.' delimiter
    name = file.split('.')[0]
    print(name)
    names.append(name)

    # Extract HAADF and EDS signals
    for i in range(len(SI)):
        if SI[i].metadata.General.title == 'HAADF':
            HAADF = SI[i]
        if SI[i].metadata.General.title in ("EDS", "EDX") and SI[i].data.ndim == 3:
            EDS = SI[i]
#---------------------------------------------------
    y_dim, x_dim, z_dim = EDS.data.shape
    y_hdim, x_hdim,  = HAADF.data.shape

        #print(s_dim)
        #print(y_dim)
        #print(x_dim)
        #print(z_dim)

#    print("EDS shape:             ",EDS.data.shape)

#    print("")
            # Determine the rebinning factor based on dimensions
    if x_dim > 500 and y_dim > 500:
                    #crop EDS axes to be divisable by 4
                    EDS.crop(axis=0,end=(x_hdim // 4)*4)
                    EDS.crop(axis=1,end=(y_dim // 4)*4)

                    HAADF.crop(axis=0,end=(x_hdim // 4)*4)
                    HAADF.crop(axis=1,end=(y_hdim // 4)*4)
 #                   print("EDS shape adjusted to: ",EDS.data.shape," in order to rebin")

                    EDX_rebin_factor = (4, 4, 1)  # Rebin by 4 in x and y dimensions
                    HAADF_rebin_factor = (4,4)  # Rebin by 4 in x and y dimensions

#                    print("rebinning EDX dimensions by 4")


    elif x_dim > 300 and y_dim > 300:
                    #crop EDS axes to be divisable by 2
                    EDS.crop(axis=0,end=(x_dim // 2)*2)
                    EDS.crop(axis=1,end=(y_dim // 2)*2)
                    
                    HAADF.crop(axis=0,end=(x_hdim // 2)*2)
                    HAADF.crop(axis=1,end=(y_hdim // 2)*2)


                    EDX_rebin_factor = (2, 2, 1)  # Rebin by 2 in x and y dimensions
                    HAADF_rebin_factor = (2,2)  # Rebin by 4 in x and y dimensions



    else:
                    EDX_rebin_factor = (1, 1, 1)  # No rebinning
                    HAADF_rebin_factor = (1,1)  # Rebin by 4 in x and y dimensions


    # Rebin the EDS map
    EDS = EDS.rebin(scale=EDX_rebin_factor)
    HAADF = HAADF.rebin(scale=HAADF_rebin_factor)


            # Determine the rebinning factor for the x-ray energy axes
    if z_dim >= 4000:
                    EDX_rebin_factor = (1,1, 4)  # Rebin by 4 in x and y dimensions
                  #  print("x-ray energy axes rebinned by 4")
    elif 4000>z_dim > 2000:
                    EDX_rebin_factor = (1,1, 2)  # Rebin by 2 in x and y dimensions
                  #  print("x-ray energy axes rebinned by 2")

    else:
                    EDX_rebin_factor = (1, 1,1)  # No rebinning

    EDS =    EDS.rebin(scale=EDX_rebin_factor)

#    print("new EDS shape:         ",EDS.data.shape)
#    print("new HAADF shape:         ",HAADF.data.shape)

    EDX.append(EDS)

print("Stackign")
stacked_EDX = hs.stack(EDX)
print(" Done !")

s_dim,y_dim, x_dim, z_dim = stacked_EDX.data.shape
print(stacked_EDX.data.shape)
#--------------------------------------------



stacked_EDX.change_dtype("float32")
 
stacked_EDX.decomposition(normalize_poissonian_noise=True)


EDX_Tomo_TiltAngle_ (1)
EDX_Tomo_TiltAngle_ (2)
Stackign
[########################################] | 100% Completed | 105.57 ms
 Done !
(2, 128, 128, 1024)
2
Decomposition info:
  normalize_poissonian_noise=True
  algorithm=SVD
  output_dimension=None
  centre=None


#### When plotting the signal below: if you cant use the keyboard arrows to navigate through the signal or navigation axes, try to drag the small red square (in the top left corner of the EDX map) by holding the right click on the mouse 

In [2]:
stacked_EDX.plot(True)

VBox(children=(HBox(children=(Label(value='x', layout=Layout(width='15%')), IntSlider(value=0, description='in…

# <a id='pca-nmf'></a> 4. PCA-NMF decomposition of the stacked EDX map


### plot the loadings and factors of the components along with the scree plot:

#### After plotting the decomposition results below: if you cant navigate between the components using the keyboard arrows, use the slider bar opened in 3rd smal window or in the notebook itself.

In [51]:
stacked_EDX.plot_decomposition_results()
stacked_EDX.plot_explained_variance_ratio()

VBox(children=(HBox(children=(Label(value='stack_element', layout=Layout(width='15%')), IntSlider(value=0, des…

<Axes: title={'center': 'Stack of EDS\nPCA Scree Plot'}, xlabel='Principal component index', ylabel='Proportion of variance'>

### Select the number of the 1st components to be used in Non-negative matrix factorization (NMF), in this example 5 components from PCA output are selected:

In [52]:
stacked_EDX.decomposition(algorithm='NMF', output_dimension=5)
stacked_EDX.plot_decomposition_results()

Decomposition info:
  normalize_poissonian_noise=True
  algorithm=NMF
  output_dimension=5
  centre=None
scikit-learn estimator:
NMF(n_components=5)


VBox(children=(HBox(children=(Label(value='stack_element', layout=Layout(width='15%')), IntSlider(value=0, des…

### Reconstruct the EDX tomography datasets by selecting the coomponents from NMF 
#### If you want to use the first 4 components you can type:
###### stacked_EDX = stacked_EDX.get_decomposition_model(4)
#### If you want to use the 1st, 2nd and 4th components you can type:
###### stacked_EDX = stacked_EDX.get_decomposition_model([0,1,3])   >>>> numbering starts from 0

In [53]:
stacked_EDX = stacked_EDX.get_decomposition_model([0,2,3,4]) #<<<< select the components here

#### Plot the denoised EDX spectrum 

In [54]:
stacked_EDX.plot()

VBox(children=(HBox(children=(Label(value='x', layout=Layout(width='15%')), IntSlider(value=0, description='in…

## To plot the 1st tilt angle total smu EDX spectrum: 
#### If other tilt angles need to be viewed replace the 0 with the next index of the tilt angle.

In [55]:
stacked_EDX.inav[...,0].sum().plot(True)

# <a id='X_ray_lines'></a> 5. X-ray intensity lines identification and selection for elemental map plotting


In [56]:
    #create a model of the 1st tilt angle in order to identify peaks
    m=stacked_EDX.inav[...,0].sum().create_model()
    #m.plot(True)   
    elements=[]
    
    
    for i in range(0, len(m.get_lines_intensity())): 
        variable_name = f"elements_{i}"
        locals()[variable_name]=[]
        variable_name1 = f"elements_residue_{i}"
        
        locals()[variable_name1]=[]
        
        element = m.get_lines_intensity()[i].metadata.General.title.split(' ')[2]
        elements.append(element)
#.......................................................................... 
#..........................................................................        

    print("Initial lines:   ",elements)

        # Priority mapping, higher number means higher priority
    priority = {'Ka': 3, 'La': 2, 'Ma': 1}

        # Dictionary to hold the highest priority lines for each element
    highest_priority_lines = {}

        # List to track removed lines for reporting
    removed_lines = []
    removed_Xlines= []
    for line in elements:
            element = line.split('_')[0]
            line_type = line.split('_')[1]

            if element in highest_priority_lines:
                # Compare priorities and update if current line has higher priority
                current_priority = priority[line_type]
                existing_line = highest_priority_lines[element]
                existing_priority = priority[existing_line.split('_')[1]]
                if current_priority > existing_priority:
                    highest_priority_lines[element] = line
                    removed_lines.append(f"{existing_line} removed and {line} was kept")
                    removed_Xlines.append(existing_line)

                elif current_priority < existing_priority:
                    # Current line has lower priority and is not added, so it's effectively removed
                    removed_lines.append(f"{line} removed and {existing_line} was kept")
                    removed_Xlines.append(line)

            else:
                # Add the new element with its line
                highest_priority_lines[element] = line

        # Extract the values which are the lines with the highest energy
    desired_elements = list(highest_priority_lines.values())          
    print("Final lines kept:", desired_elements)
        #print("Changes made:")
    for change in removed_lines:
        print(change) 
    print("")

Initial lines:    ['Fe_Ka', 'Fe_La', 'O_Ka', 'Zn_Ka', 'Zn_La']
Final lines kept: ['Fe_Ka', 'O_Ka', 'Zn_Ka']
Fe_La removed and Fe_Ka was kept
Zn_La removed and Zn_Ka was kept



# <a id='element_of_interest'></a> 6. select an X-ray line of an element to plot the elemental maps:
# -----------------------------------------------------------------------------

In [57]:
#insert the element you want for the elmental map, which can be one of the element in the " Final lines kept" array above.
element_of_interest="Zn_Ka"  #<<<<<<<<


# -----------------------------------------------------------------------------

In [58]:
# Get the fitted signal
fitted_data = m.as_signal()

# Get the energy axis (x-axis)
energy_axis = stacked_EDX.inav[...,0].sum().axes_manager[0].axis

# Initialize dictionaries to store peak positions and residues
peak_positions = {}
individual_residues = {}
individual_linear_residues = {}

# Automatically extract peak positions from the model components
for component_name in dir(m.components):
    if not component_name.startswith('_'):  # Ignore private attributes
        component = getattr(m.components, component_name)
        if hasattr(component, 'centre'):
            peak_positions[component_name] = component.centre.value

# Define function to calculate peak window based on energy
def get_peak_window(energy):
    min_energy, max_energy = min(peak_positions.values()), max(peak_positions.values())
    min_window, max_window = 0.052, 0.5
    
    # Linear interpolation
    window = min_window + (energy - min_energy) * (max_window - min_window) / (max_energy - min_energy)
    return np.clip(window, min_window, max_window)
found=0
# Loop over each desired element/peak
for element in desired_elements:
    if element in peak_positions and element==element_of_interest:
        found=1
        peak_position = peak_positions[element]
        
        # Calculate peak window based on the peak position
        peak_window = get_peak_window(peak_position)

        # Create a mask to isolate the region of the peak on the energy axis
        peak_region_mask = (energy_axis >= peak_position - peak_window) & (energy_axis <= peak_position + peak_window)
        
        # Get the original and fitted data for this region
        original_peak_data = stacked_EDX.inav[...,0].sum().data[peak_region_mask]
        fitted_peak_data = fitted_data.data[peak_region_mask]
        
        # Calculate the squared and linear residues for this peak
        peak_residues = np.sum((original_peak_data - fitted_peak_data)**2)
        peak_abs_linear_residues = np.sum(np.abs(original_peak_data - fitted_peak_data))
        peak_linear_residues = np.sum(original_peak_data - fitted_peak_data)

        # Store the results in dictionaries for each peak
        individual_residues[element] = peak_residues
        individual_linear_residues[element] = peak_linear_residues

        # Print or store the % residue for this peak
        E_start= "{:.1f}".format(peak_position-peak_window)
        E_end=   "{:.1f}".format(peak_position+peak_window)
        print(f"Elemental map for {element} will be plotted for x-rays in the energy range between {E_start} KeV and {E_end} KeV")
        #print(E_start)
        #print(E_end)

if found==0:        
        print(f"{element_of_interest} not found in the model, make sure name is correct or insert the energy range manually below")



# Print all extracted peak positions for verification

#print("\nExtracted Peak Positions:")
#for element, position in peak_positions.items():
#    print(f"{element}: {position:.4f} keV")


Elemental map for Zn_Ka will be plotted for x-rays in the energy range between 8.2 KeV and 9.1 KeV


### If the energy range of the element of interest is not accurate, remove the '#' sign at the start of the 2 lines below and insert the dimensions manually:

In [59]:
#E_start= 5.0    #<<<< you need to add the . sign after the digit, e.g. type 5.0 instead of 5 for energy value of 5 KeV
#E_end=   6.0
print(f"Elemental map for {element} will be plotted for x-rays in the energy range between {E_start} KeV and {E_end} KeV")


Elemental map for Zn_Ka will be plotted for x-rays in the energy range between 8.2 KeV and 9.1 KeV


### Below the elemental map will be saved in 2 formats:
### 1- PNG with scale bar(data is compressed)
### 2- TIF format with no scale bar (suitable for further dataprocessing)

In [66]:

for i in range(0, s_dim):
    elemental_map = stacked_EDX.inav[..., i].deepcopy()
    elemental_map.crop(axis=2, start=float(E_start), end=float(E_end))

    edx_map_nmf = np.sum(elemental_map.data, axis=2)

    # Create a figure for the TIF image without any decorations
    fig_tif, ax_tif = plt.subplots(figsize=(6, 6))
    ax_tif.imshow(edx_map_nmf, cmap='viridis')
    ax_tif.axis('off')
    plt.tight_layout()
    plt.savefig(f'Zn_Ka elemental map in TIF format for {names[i]}.tif', bbox_inches='tight', pad_inches=0)
    plt.close(fig_tif)
    
    def add_scale_bar(ax, data_shape):
        pixel_size = HAADF.axes_manager[0].scale  # nm/pixel
        desired_scale_bar_length_nm = round((data_shape[1] * pixel_size) / 4)  # nm
        scale_bar_length_px = int(desired_scale_bar_length_nm / pixel_size)
    
        # Coordinates for the scale bar (position in pixels on the image)
        scale_bar_start = HAADF.data.shape[0]* 0.1         #<<<<< scale bar to coordinate
        scale_bar_end = HAADF.data.shape[1] * 0.93         #<<<<< scale bar to coordinate 
    
        # Add the scale bar
        ax.plot([scale_bar_start, scale_bar_start + scale_bar_length_px], 
                [scale_bar_end, scale_bar_end], 
                'w-', linewidth=2)
        
        # Add the scale bar label
        ax.text(scale_bar_start, scale_bar_end - 5, 
                f'{desired_scale_bar_length_nm} nm', 
                color='white', fontsize=8, ha='left', va='bottom')


    
    # Create a figure for the PNG image with all decorations
    fig_png, ax_png = plt.subplots(figsize=(6, 6))
    im1 = ax_png.imshow(edx_map_nmf, cmap='viridis')
    ax_png.set_title(f"Elemental map of {element_of_interest} after PCA-NMF")
    ax_png.axis('off')
    

    # Add scale bar and colorbar for PNG image
    add_scale_bar(ax_png, edx_map_nmf.shape)
    plt.colorbar(im1, ax=ax_png, fraction=0.046, pad=0.04, format='%.1f')

    # Save the plot as a PNG image with scale bar
    plt.savefig(f'Zn_Ka elemental map with scale bar for {names[i]}.png', dpi=300, bbox_inches='tight', pad_inches=0.1)

    # Close the figure to free up memory
    plt.close(fig_png)

    print(f"Elemental map for Zn_Ka is saved in TIF and PNG format for {names[i]}.png")

print("Done ................................ !!")


Elemental map for Zn_Ka is saved in TIF and PNG format for EDX_Tomo_TiltAngle_ (1).png
Elemental map for Zn_Ka is saved in TIF and PNG format for EDX_Tomo_TiltAngle_ (2).png
Done ................................ !!
