# How to convert electron microscopy (meta)data to NeXus/HDF5

The aim of this tutorial is to guide users how to create a NeXus/HDF5 file to parse and normalize pieces of information<br>
from typical file formats of the electron microscopy community into a common form. The tool assures that this NeXus file<br>
matches to the NXem application definition. Such documented conceptually, the file can be used for sharing electron<br>
microscopy research with others (colleagues, project partners, the public), for uploading a summary of the (meta)data to<br>
public repositories and thus avoid additional work that typically comes with having to write documentation of metadata<br>
in such repositories by hand but use a research data management system like NOMAD Oasis instead.<br>

The benefit of the data normalization that pynxtools-em performs is that all pieces of information are represents in the<br>
same conceptual way with the benefit that most of the so far required format conversions when interfacing with software<br>
from the technology partners or scientific community are no longer necessary.<br>

### **Step 1:** Check that packages are installed and working in your local Python environment.

Check the result of the query below specifically that `jupyterlab_h5web` and `pynxtools` are installed in your environment.<br>
Note that next to the name pynxtools you should see the directory in which it is installed. Otherwise, make sure that you follow<br>
the instructions in the `README` files:  
- How to set up a development environment as in the main README  
- Lauch the jupyter lab from this environement as in the README of folder `examples`

In [None]:
! pip list | grep "h5py\|nexus\|jupyter\|jupyterlab_h5web\|pynxtools\|pynxtools-em"

Set the pynxtools directory and start H5Web for interactive exploring of HDF5 files.

In [None]:
import os
import zipfile as zp
import numpy as np
from jupyterlab_h5web import H5Web
print(f"Current working directory: {os.getcwd()}")
print(f"So-called base, home, or root directory of the pynxtools: {os.getcwd().replace('/examples/em', '')}")

### **Step 2:** Use your own data or download an example

<div class="alert alert-block alert-danger">
Please note that the metadata inside the provided em.oasis.specific.yaml and eln_data_apm.yaml files<br>
contain exemplar values. These do not necessarily reflect the conditions when the raw data of example<br>
above-mentioned were collected by the scientists. Instead, these file are meant to be edited by you,<br>
either and preferably programmatically e.g. using output from an electronic lab notebook or manually.</div>

This example shows the types of files from which the parser collects and normalizes pieces of information:<br>
* **eln_data.yaml** metadata collected with an electronic lab notebook (ELN) such as a NOMAD Oasis custom schema<br>
* **em.oasis.specific.yaml** frequently used metadata that are often the same for many datasets to avoid having to<br>
  type it every time in ELN templates. This file can be considered a configuration file whereby e.g. coordinate system<br>
  conventions can be injected or details about the atom probe instrument communicated if that is part of frequently used<br>
  lab equipment. The benefit of such an approach is that eventual all relevant metadata to an instrument can be read from
  this configuration file via guiding the user e.g. through the ELN with an option to select the instrument.<br>
* **collected data** in community, technology partner format with images, spectra, and other metadata.<br>

The tool several of the currently frequently used file formats of the electron microscopy community. Given that there is<br>
though a large number of these and different versions users should also be aware that we had to prioritize the implementation<br>
strongly. We cannot implement every request to add support for further formats or additional pieces of information in those<br>
formats we currently do support with the resources in the FAIRmat project. Nevertheless, please raise an issue to document<br>
where we should place our priorities.<br>
Consult the reference part of the documentation to get a detailed view on how specific formats are supported.<br>

### **Step 3:** Run the parser

In [None]:
eln_data = ["eln_data.yaml"]
deployment_specific = ["em.oasis.specific.yaml"]
tech_partner = ["CHANGEME YOUR TECH PARTNER FILE (e.g. EMD, Nion, etc.)"]
output_file_name = ["em.nxs"]
for case_id in np.arange(0, 1):
    ELN = eln_data[case_id]
    CFG = deployment_specific[case_id]
    DATA = tech_partner[case_id]
    OUTPUT = output_file_name[case_id]

    # CHANGEME activate the following line
    # ! dataconverter convert $ELN $CFG $DATA --reader em --nxdl NXem --output $OUTPUT

### **Step 4:** Inspect the NeXus/HDF5 file using H5Web.

In [None]:
# CHANGEME activate the following line to view the data
# H5Web(OUTPUT)

The NeXus file an also be viewed with H5Web by opening it via the file explorer panel to the left side of this Jupyter lab window.

# Conclusions:
***

This tutorial showed how you can call the pynxtools-em parser via a jupyter notebook. This opens many possibilities<br>
like processing the results further with Python such as through e.g. <a href="https://conda.io/projects/conda/en/latest/user-guide/install/index.html">conda</a> on your local computer, <a href="https://docs.python.org/3/tutorial/venv.html">a virtual environment</a><br>
or to interface with community software to do further processing of your data.<br>

### Contact person for pynxtools-apm and related examples in FAIRmat:
Dr.-Ing. Markus Kühbach, 2024/05/10<br>

### Funding
<a href="https://www.fairmat-nfdi.eu/fairmat">FAIRmat</a> is a consortium on research data management which is part of the German NFDI.<br>
The project is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project 460197019.