# Affordable and easy data exploration of NIR spectra using chemometric techniques (Supporting Information)

## Journal of Chemical Eduction

https://pubs.acs.org/journal/jceda8

## Authors

David Mainka<sup>1</sup>, Julius Krause<sup>2</sup>, Linus Großmann<sup>2</sup>, Andreas Link<sup>1</sup> and Lukas Schulig<sup>1</sup>

<sup>1</sup> Department of Medicinal and Pharmaceutical Chemistry, Institute of Pharmacy, University of Greifswald, 17489 Greifswald

<sup>2</sup> Department of Biopharmaceutics and Pharmaceutical Technology, Institute of Pharmacy, University of Greifswald, 17489 Greifswald

### Corresponding Author

**Lukas Schulig**

E-mail: lukas.schulig@uni-greifswald.de

Phone: +49 (0)3834 420 4817

## Instruction for Students

This notebook serves as an instruction manual for the students and is completed during the course by them with the results and analyses.

----

## Introduction

Near-Infrared (NIR) spectroscopy is a powerful analytical technique used in the pharmaceutical industry to identify and characterize drug compounds. As a non-destructive and non-invasive method, NIR spectroscopy can quickly and accurately provide information about the chemical composition of a sample.

### Learning Objectives

 - Sample preparation and measurement of NIR spectra for pharmaceutical analysis
 - Understanding the capabilities and limitations of NIR spectroscopy
 - Physical influences on the measurement of spectra
 - Types and applications of preprocessing methods
 - Basic understanding of data exploration workflows
 
## Laboratory Experiments

In this lab experiment, you will measure the spectra of all samples provided.  The sample preparation is a critical step here and must be performed carefully.

### Hazards

 - No special hazard precautions other than standard laboratory requirements need to be taken.
 - Do not look directly into the lamp of the spectrometer.
 
### Materials

 - Substances are provided by the course instructor
 - Glass vials for NIR measurements
 - DLP® NIRScan™ Nano EVM with a control device or computer
 
### Experimental Procedure

All steps will be discussed with the instructor prior to the experiment. The instructor will then give you a brief introduction to the device.

 - Carefully transfer the respective sample into the glass vial using a spatula.
 - A minimum filling height of about 1 cm should be achieved.
 - Shake the glass vial carefully and gently before measurement.
 - Measure your sample with the spectrometer.
 
Repeat this process for all sample.

## Data Analysis

Copy the Python NIRScanNano library to your folder or use a local installation.

%%bash

git clone https://github.com/SLx64/nirscan-sc
mv ./nirscan-sc/NIRScanNano .

### Preparation

Load all required library functions first.

In [None]:
import matplotlib.pyplot as plt
plt.style.use("ggplot")

from NIRScanNano.course import DataReader, pca_to_pandas
from NIRScanNano.course import pca_centroids, nearest_centroids, eval_distances
from NIRScanNano.spectrum import average_spectra
from NIRScanNano.visualization import plot_spectrum, pca_pairplot
from NIRScanNano.analysis import snv, savgol, msc
from NIRScanNano.analysis import PCAnalysis

### Measurement data

The course instructor will provide a directory with all measured spectra. Read the spectra of all groups to obtain the whole dataset and select your own afterward (by group number). 

In [None]:
data = DataReader("data.csv")

In [None]:
spectra = data.spectra_by_group(1)

#### Optional Task:

Inspect the header information of a single spectrum and discuss the content within your group.

In [None]:
example_spectrum = spectra[0]

for key, value in example_spectrum.header.items():
    print(f"{key}: {value}")

### Preprocessing

Briefly describe the preprocessing methods used and demonstrate them for one example substance.

In [None]:
example_spectra = [spectrum for spectrum in spectra if spectrum.header["Name"] == "<NAME>"]

#### Standard Normal Variate (SNV)

 [...]

#### Multiplicative Scatter Correction (MSV)

How was the reference spectrum selected? 

 [...]

#### Savitzky-Golay Filter (smoothing and derivatives)

 [...]

#### Visualization

(use subplots to create a single image for all methods)

### Task: Caffeine Spectra

Compare the spectra of caffeine and its citrate salt with and without preprocessing. Briefly discuss the results.

### Task: Principal Component Analysis (PCA)

Outline the principal component analysis and its application in NIR spectroscopy.

 [...] 
 
Perform the PCA with all of your measured spectra and your selected preprocessing method.

In [None]:
pca = PCAnalysis([...])
pca.run()

# save as Pandas DataFrame for easier data handling
pca_df = pca_to_pandas(pca, label="Name")

#### Visualization (pairplot)

### Task: Unknown Sample

Check if a randomly selected sample corresponds to a substance of your measured samples.

 - Perform the same preprocessing steps as for your PCA
 - Calculate centroids and thresholds.
 - Calculate the distances for your randomly selected sample
 - Check the nearest centroid and evaluate the threshold

In [None]:
sample = data.random_sample()

## Summary and Conclusion

 [...]