<div>
<img src="../img/EnzymeML.png" height="50"/>
</div>

<div>
<img src="../img/RDM_course_2024_alpha_crop.png" style="float: right" height="150"/>
</div>

# **FAIR processing of kinetic data from NMR**
## EnzymeML workshop 2024-09-24

> created:  2024-06-22  
> modified: 2024-09-24  

### *Notebook setup* <a class="anchor" name="setup"></a>

In [None]:
"""ONLY RUN THIS CELL ONCE!"""

import sys
import subprocess
import shutil
import importlib

def install_and_import(package):
    try:
        importlib.import_module(package)
        print(f"{package} is already installed")
    except ImportError:
        print(f"Installing {package}")
        %pip install {package}

# install git via conda if not installed
if not shutil.which("git"):
    print("Installing git via conda")
    subprocess.run(["conda", "install", "-y", "git"])
    if not shutil.which("git"):
        print("git is not installed")
        sys.exit(1)
print("git is installed")

# install NMRpy and other packages
try:
    import nmrpy
    rint("NMRpy is already installed")
except ImportError:
    print("Installing NMRpy...")
    %pip install git+https://github.com/NMRPy/nmrpy@adapt-to-pydantic-v2 --quiet

install_and_import("matplotlib")
install_and_import("ipyfilechooser")
install_and_import("rich")
install_and_import("pyenzyme")

print("🏁 All set! Restart the Notebook once and you are ready to go.")

In [1]:
from datetime import datetime
from pathlib import Path

import matplotlib
%matplotlib widget
from matplotlib import pyplot as plt
plt.ioff()
from ipyfilechooser import FileChooser
import rich

import pyenzyme as pe
import pyenzyme.equations as peq
import nmrpy


def print_metadata(data_model) -> None:
    rich.print(f"Current metadata:\n{data_model.model_dump_json(indent=4)}")

---

### **I. Provide experiment details** <a class="anchor" name="i"></a>

EnzymeML document containing the details of the NMR experiment has been created beforehand. We can inspect the document to get the details of the experiment:

In [None]:
metadata = pe.load_enzymeml(path="./data/pgm-eno.json")
print_metadata(metadata)

---

### **II. Process NMR data** <a class="anchor" name="ii"></a> 

A detailed documentation can be found at [NMRpy documentation](https://nmrpy.readthedocs.io/en/latest/).

#### Load NMR data <a class="anchor" name="load"></a>

For convenience, the unprocessed NMR data is loaded using the `ipyfilechooser` widget. Providing the file path directly to `nmrpy.data_objects.FidArray.from_path()` would also work.

In [None]:
dc = FileChooser(Path.cwd())
display(dc)

In [None]:
nmr = nmrpy.data_objects.FidArray.from_path(fid_path=dc.selected)

#### Parse EnzymeML document

The same goes for the EnzymeML document containing the experiment details.

In [None]:
mc = FileChooser(Path.cwd())
display(mc)

In [6]:
nmr.parse_enzymeml_document(path_to_enzymeml_document=mc.selected)

We can inspect the EnzymeML document within NMRpy:

In [None]:
nmr.enzymeml_document

#### Pre-process NMR spectra

Depending on the way the NMR data was acquired, it might be necessary to pre-process the data. This can include apodization, zero-filling, Fourier transformation, phase correction, baseline correction, etc. In this case, we start with the completely raw free induction decay (FID) data, but NMRpy can handle data in any stage of processing.  

Using `nmr.fid<n>.plot_ppm()` we can inspect the spectrum of any FID in the dataset at any time.

In [None]:
nmr.fid04.plot_ppm()

Apodisation with `emhz_fids()` uses a default value of 5 Hz.

In [None]:
nmr.emhz_fids()
nmr.fid04.plot_ppm()

Zero-fill:

In [None]:
nmr.zf_fids()
nmr.fid04.plot_ppm()

Fourier-transform:

In [None]:
nmr.ft_fids()
nmr.fid04.plot_ppm()

We can also inspect the research data model of NMRpy at any time by accessing the `nmr.data_model` property.

In [None]:
print(nmr.data_model)

Phase-correction & removal of imaginary part:

In [None]:
nmr.phase_correct_fids()
nmr.real_fids()
nmr.fid04.plot_ppm()

Peaks can be shifted in the spectrum. We can use the `nmr.calibrate()` method to calibrate the spectrum. In this case, the peak of the internal standard (Triethylphosphate) is known to be at 2.2 ppm, which we can adjust the spectrum to.

In [None]:
nmr.calibrate() # Internal standard should be at 2.2 ppm

We can see how the data model changes with each processing step to reflect the current state:

In [None]:
print(nmr.data_model)

We have to pick the peaks to be able to calculate the integrals. We can use the `nmr.peakpicker()` method to do this manually. Alternatively, we can also provide a list of peaks directly.

In [None]:
nmr.peakpicker()

In [19]:
peaks = [5.52, 4.66, 3.8, 2.08, 1.14]
ranges = [[6,0.5]]

for fid in nmr.get_fids():
    fid.peaks = peaks
    fid.ranges = ranges

#### Assign peak identities based on EnzymeML species

Using `nmr.assign_identities()` we can assign the peaks to the species in the EnzymeML document. Multiple peaks can be assigned to the same species.

In [None]:
nmr.assign_identities()

#### Plot deconvoluted spectrum for one or all FIDs

After normalizing and deconvoluting the spectra, we can plot the deconvoluted spectrum for one or all FIDs.

In [None]:
nmr.norm_fids() 
nmr.deconv_fids()
nmr.plot_deconv_array(upper_ppm=7, lower_ppm=0)

In [None]:
nmr.fid04.plot_deconv(upper_ppm=7, lower_ppm=0)

#### Calculate the concentration of each species

Providing the information on the internal standard, as well as the necessary equation, we can calculate the concentration of each species, available via `nmr.concentrations`.

In [None]:
nmr.calculate_concentrations()

In [None]:
nmr.concentrations

#### Apply the concentrations to the EnzymeML document and save it

We can now add the newly calculated concentrations to the EnzymeML document we provided earlier and save it.

In [None]:
enzymeml_doc = nmr.apply_to_enzymeml()
print_metadata(enzymeml_doc)

In [26]:
with open("./data/pgm-eno_with_concentrations.json", "w") as f:
    f.write(enzymeml_doc.model_dump_json(indent=4))

🎉 **Hooray! We have successfully processed and analyzed NMR data, all within the EnzymeML framework!** 🎉

---

### **Disclosure** <a class="anchor" name="disclosure"></a>

**Contributions**

If you wish to contribute to or collaborate with EnzymeML, find us on our [EnzymeML GitHub](https://github.com/EnzymeML)!

**MIT License**

Copyright (c) 2024 EnzymeML

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.