## Using dataconverter/apm for mapping atom probe microscopy data to NeXus/HDF5/NXapm

### **Step 1:** Check that packages are installed and working in your local Python environment.

Check the result of the query below specifically that `jupyterlab_h5web` and `pynxtools` are installed in your environment.<br>
Note that next to the name pynxtools you should see the directory in which it is installed. Otherwise, make sure that you follow<br>
the instructions in the `README` files:  
- How to set up a development environment as in the main README  
- Lauch the jupyter lab from this environement as in the README of folder `examples`

In [None]:
! pip list | grep "h5py\|nexus\|jupyter" && jupyter serverextension list && jupyter labextension list && python -V

Set the pynxtools directory and start H5Web for interactive exploring of HDF5 files.

In [None]:
import os
from jupyterlab_h5web import H5Web
print(f"Current working directory: {os.getcwd()}")
print(f"So-called base, home, or root directory of the pynxtools: {os.getcwd().replace('/examples/apm', '')}")

### **Step 2:** Download APM-specific example data or used your own dataset.

Example data can be found on Zenodo https://www.zenodo.org/record/7908429.

In [None]:
import zipfile as zp

In [None]:
! curl --output usa_denton_smith_apav_si.zip https://zenodo.org/record/7908429/files/usa_denton_smith_apav_si.zip

In [None]:
zp.ZipFile("usa_denton_smith_apav_si.zip").extractall(path="", members=None, pwd=None)

These files should serve exclusively as examples. <font color="orange">The dataconverter for APM always requires a triplet of files</font>:
* A **YAML file with metadata** (either edited manually/or generated via an ELN).<br>
  The eln_data_apm.yaml file in the example can be edited with a text editor.<br>
* A file with **reconstructed ion positions** in community, technology partner format with<br>
  the ion positions and mass-to-charge state ratio values for the tomographic reconstruction.<br>
  POS, ePOS, or APT are allowed. Inspect some of the above-mentioned examples on Zenodo.<br>
* A file with **ranging definitions** in community, technology partner format with<br>
  the definitions how mass-to-charge-state-ratio values map on ion species.<br>
  RNG, RRNG and is possible. A MatLab script can be used to inject other representations<br>
  via transcoding own formats to a simple text file, an example of which is<br>
  R56_01769.rng.fig.txt<br>

<div class="alert alert-block alert-info">
For GUI-based editing, a NOMAD OASIS instance is needed.<br>
</div>

<div class="alert alert-block alert-danger">
Please note that the metadata inside the provided eln_data_apm.yaml file contains example values.<br>
These reflect not necessarily the conditions when the raw data for the example were collected!<br>
The file is meant to be edited by you if you work with datasets others than the here provided!<br>
</div>

### **Step 3:** Run the APM-specific dataconverter on the example data.

Now we run our parser. The --reader flag takes the atom probe microscopy reader (apm), the --nxdl flag takes the application definition for this technique NXapm.<br> 

### **Step 3a:** Optionally see the command line help of the dataconverter.

In [None]:
! dataconverter --help

### **Step 3b:** Optionally explore all paths which NXapm provides.

In [None]:
# to inspect what can/should all be in the NeXus file
! dataconverter --nxdl NXapm --generate-template

### **Step 3c**: Convert the files in the example into an NXapm-compliant NeXus/HDF5 file.

In [None]:
#parser-nexus/tests/data/tools/dataconverter/readers/em_om/
eln_data_file_name = ["eln_data_apm.yaml"]
input_recon_file_name = ["Si.apt",
                         "Si.epos",
                         "Si.pos",
                         "R31_06365-v02.pos",
                         "R18_58152-v02.epos",
                         "70_50_50.apt"]
#                         "R56_01769-v01.pos"]
input_range_file_name = ["Si.RRNG",
                         "Si.RNG",
                         "Si.RNG",
                         "R31_06365-v02.rrng",
                         "R31_06365-v02.rrng",
                         "R31_06365-v02.rrng"]
#                         "R56_01769.rng.fig.txt"]
output_file_name = ["apm.case1.nxs",
                    "apm.case2.nxs",
                    "apm.case3.nxs",
                    "apm.case4.nxs",
                    "apm.case5.nxs",
                    "apm.case6.nxs"]
for case_id in [0]:
    ELN = eln_data_file_name[0]
    INPUT_RECON = input_recon_file_name[case_id]
    INPUT_RANGE = input_range_file_name[case_id]
    OUTPUT = output_file_name[case_id]

    ! dataconverter --reader apm --nxdl NXapm --input-file $ELN --input-file \
    $INPUT_RECON --input-file $INPUT_RANGE --output $OUTPUT

The key take home message is that the command above-specified triggers the automatic creation of the HDF5 file. This *.nxs file, is an HDF5 file.

### **Step 4:** Inspect the NeXus/HDF5 file using H5Web.

In [None]:
# H5Web(OUTPUT)
H5Web("apm.case1.nxs")

You can also visualize the .nxs file by double clicking on it in the file explorer panel to the left side of your jupyter lab screen in the browser.

### **Step 5:** Optionally, do some post-processing with the generated usa_pos.nxs file.

To compute a mass-to-charge histogram and explore eventual ranging definitions that have also been carried over in the conversion step (step 6).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from matplotlib.patches import Rectangle
plt.rcParams["figure.figsize"] = [20, 10]
plt.rcParams["figure.dpi"] = 300
import h5py as h5
#needs shutils for decompressing zip archives, which is a default module/package in Python since >=v3.6

Read mass-to-charge-state ratio values, create a histogram ("mass spectrum"), and mark ranges.

In [None]:
# load data and ranges e.g. for case1
nxs_file_name = "apm.case1.nxs"
hf = h5.File(nxs_file_name, "r")
mq = hf["entry1/atom_probe/mass_to_charge_conversion/mass_to_charge"][:]
nions = np.uint32(hf["entry1/atom_probe/ranging/number_of_ion_types"][()])
print(f"Array with mass-to-charge-state ratios loaded, {nions} iontypes are distinguished")

In [None]:
# define binning
[mqmin, mqmax] = [0., 100.0]  # Da np.max(mq)]
print(f"Dataset ranging from [ {mqmin}, {mqmax}] Da.")
mqincr = 0.01  # Da
print(f"Using a mass-to-charge-state ratio resolution of {mqincr} Da.")

In [None]:
# transform collection of mass-to-charge-state ratios into a histogram
hst1d = np.unique(np.uint64(np.floor((mq[np.logical_and(mq >= mqmin, mq <= mqmax)] - mqmin) / mqincr)), return_counts=True)
nbins = np.uint64((mqmax - mqmin) / mqincr + 1)
print(f"Histogram has {nbins} bins.")

In [None]:
# use matplotlib and numpy to plot histogram data 
xy = np.zeros([nbins, 2], np.float64)
xy[:,0] = np.linspace(mqmin + mqincr, mqmax + mqincr, nbins, endpoint=True)
xy[:,1] = 0.5  # * np.ones([nbins], np.float64)  # 0.5 to be able to plot logarithm you can not measure half an atom
for i in np.arange(0, len(hst1d[0])):
    binidx = hst1d[0][i]
    xy[binidx, 1] = hst1d[1][i]
print("Mass-to-charge-state histogram created.")

In [None]:
[xmi, xmx, ymi, ymx] = [mqmin, 10**np.ceil(np.log10(mqmax)), 0.5, 10**np.ceil(np.log10(np.max(xy[:,1])))]
[xmi, xmx, ymi, ymx] = [mqmin, mqmax, 0.5, 10**np.ceil(np.log10(np.max(xy[:,1])))]
fig, cnts_over_mq = plt.subplots(1, 1)
plt.plot(xy[:, 0], xy[:, 1], color="blue", alpha=0.5, linewidth=1.0)
for i in np.arange(1, nions):
    print(f"Collect ion{i}...")
    # load ranges and plot them
    ranges = hf[f"entry1/atom_probe/ranging/peak_identification/ion{i}/mass_to_charge_range"][:]
    for min_max in ranges:
        cnts_over_mq.vlines(min_max[0], 0, 1, transform=cnts_over_mq.get_xaxis_transform(), alpha=0.1, color="grey", linestyles="dotted")
        cnts_over_mq.vlines(min_max[1], 0, 1, transform=cnts_over_mq.get_xaxis_transform(), alpha=0.1, color="grey", linestyles="dotted")
        # rng = Rectangle((min_max[0], ymi), min_max[1] - min_max[0], ymx - ymi, edgecolor="r", facecolor="none")
# plt.xticks([1, 2, 3, 4, 5, 6, 7, 8, 9], ["Min", "0.0025", "0.025", "0.25", "0.50", "0.75", "0.975", "0.9975", "Max"])
plt.yscale("log")
plt.legend([r"Mass-to-charge-state ratio $\Delta\frac{m}{q} = $"+str(mqincr)+" Da"], loc="upper right")
plt.xlabel(r"Mass-to-charge-state-ratio (Da)")
plt.ylabel(r"Counts")
print("Mass-to-charge-state histogram visualized.")
# scale bar with add margin to the bottom and top of the yaxis to avoid that lines fall on x axis
margin=0.01  # polishing the margins
plt.xlim([-margin * (xmx - xmi) + xmi, +margin * (xmx - xmi) + xmx])
plt.ylim([ymi, +margin * (ymx - ymi) + ymx])

In [None]:
# save the figure
figfn = nxs_file_name + ".MassToChargeStateRatio.png"
fig.savefig(figfn, dpi=300, facecolor="w", edgecolor="w", orientation="landscape", format="png", 
            transparent=False, bbox_inches="tight", pad_inches=0.1, metadata=None)
# plt.close("all")
print(f"{figfn} stored to disk.")

### **Optional:** Generate synthetic data for testing and development purposes

<div class="alert alert-block alert-danger">
Currently, this functionality requires a Python environment with a newer version of the ase library than the one<br>
which is used by the installation of pynxtools (which is currently ase==3.19.0). Instead, ase>=3.22.1 should be used.<br>
The issue with the specific functionalities used in the *create_reconstructed_positions* function is that when using<br>
ase==3.19.0 in combination with numpy>=1.2x uses the deprecated np.float data type.<br>
Developers interested in creating synthetic data should locally install ase>=3.22.1<br>
and then re-execute this notebook.<br>
</div>

In [None]:
! pip list | grep ase*
! pip list | grep numpy*

The apm reader has a functionality to generate synthetic dataset which are meant for pursuing code development.

<div class="alert alert-block alert-danger">
This functionality uses recent features of ase which demands an environment that is currently not supported<br>
by NOMAD OASIS. As the here exemplified settings for this example are configured to represent an environment<br>
matching close to NOMAD users who are interested in this developer functionality should do the following:<br>
Run this example in a standalone environment where ase is upgraded to the latest version and then use<br>
the generated NeXus files either as is or upload them to NOMAD OASIS.<br>
</div>

In [None]:
# ! dataconverter --reader apm --nxdl NXapm --input-file synthesize1 --output apm.case0.nxs

In [None]:
H5Web("apm.case0.nxs")

### Further comments:

* Feel free to explore our atom probe microscopy containers in the north branch for more advanced processing

### Contact person for the apm reader and related examples in FAIRmat:
Markus Kühbach, 2023/05<br>

### Funding
<a href="https://www.fairmat-nfdi.eu/fairmat">FAIRmat</a> is a consortium on research data management which is part of the German NFDI.<br>
The project is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project 460197019.