# FTIR Data Analysis Main Workflow
    Author: Trenton Wells

    Created: Sept 2025

    Organization: National Lab of the Rockies

    Personal Contact: trentonwells73@gmail.com

This notebook guides you through the main steps of the FTIR data analysis workflow.

On repeated running of the notebook, the only necessary setup should be the imports and loading the DataFrame, after which you can pick up where you left off. Remember to Save your work at the bottom of the Notebook before closing or restarting.

NOTE: having multiple interactive cells open simultaneously can take up a lot of working memory, so click 'close' when done with one to minimize memory load.

## 1) Setup

### Import Functions

Import necessary functions for data analysis and visualization.

In [1]:
from Analysis_FTIR import (
    rename_files,
    extract_file_info,
    display_DataFrame,
    trim_DataFrame,
    populate_material_dictionary,
    plot_spectra,
    baseline_correct_spectra,
    bring_in_DataFrame,
    normalize_spectra,
    find_peak_info,
    deconvolute_peaks,
    fit_material,
    check_fit_quality,
    export_material_output_csv,
)

### Load or Create DataFrame

The DataFrame stores all of the relevant information on the spectra in an organized format. Processes within this Notebook often refer to the DataFrame within the working memory, and it is saved at the end of the Notebook. If running this for the first time, this cell creates a blank DataFrame that will be filled in by the next cell.

In [2]:
# Set path to your DataFrame CSV file. Leave as None if DataFrame is new or in default 
# location.
DataFrame_path = None

FTIR_DataFrame, DataFrame_path = bring_in_DataFrame(DataFrame_path=DataFrame_path)

### Rename Files (optional)

You can optionally rename files in your dataset.

This script scans a specified root directory and its subdirectories to find and rename files. Folder names will not be changed,except in the case of date renaming to ISO format (e.g., 2025-09-18) (optional). It works by replacing spaces and/or specified words in the filenames. (e.g., replacing spaces with underscores). Suggested to use this tool if file names have inconsistent naming conventions that may cause issues in downstream processing.

NOTE: Spectra from samples which have not undergone degradation should be labelled "unexposed". All other conditions can be named according to user preference.

In [None]:
# Set directory to rename folders and files within (e.g., r"C:\Users\user1\folder1").
directory = r"C:\Users\user1\folder1"

# If you want to replace spaces in filenames, set replace_spaces to True and set 
# character_to_use to the desired separator (e.g., "_").
replace_spaces = False
character_to_use = "_"

# If you want to convert all dates in the directory names to ISO format (YYYY-MM-DD), 
# set iso_date_rename to True
iso_date_rename = False

# If you want to replace other specified words in filenames, set file_rename to True and
# provide pairs_input (e.g., "old1:new1,old2:new2").
file_rename = False
pairs_input = "old1:new1,old2:new2"

# Set dry_run to True to preview changes without renaming files.
dry_run = False

# Rename files in the specified directory.
rename_files(
    directory=directory,
    replace_spaces=replace_spaces,
    character_to_use=character_to_use,
    iso_date_rename=iso_date_rename,
    file_rename=file_rename,
    pairs_input=pairs_input,
    dry_run=dry_run
)

### Fill or Append Spectra to DataFrame

Gathers file information and builds the main data structure for analysis. Repeated uses can append new data into the existing DataFrame.

The DataFrame will have a row for each spectrum file, with columns as follows:

File Location, File Name, Date, Conditions, Material, Time, X-Axis, Raw Data, Baseline Function, Baseline Parameters, Baseline, Baseline-Corrected Data, Normalization Peak Wavenumber, Normalized and Corrected Data

This function will append any files that aren't already included.
If FTIR_DataFrame is empty it will create it from scratch.

NOTE: ensure that spectra for samples which have not undergone degradation are labelled as "unexposed" for their condition term.

In [None]:
# Set directory containing files to analyze (e.g., r"C:\Users\user1\folder1").
directory = r"Y:\5200\Packaging Reliability\Durability Tool\Ray Tracing and Activation Spectrum\ATR-FTIR Data"
# Set file types to include (e.g., [".dpt", ".txt", ".csv"]).
file_types = ".dpt"
# Set separators to use when finding terms within filenames (e.g., ["_", " "])
separators = "_"
# Set material terms to search for in filenames (e.g., ["Si", "Perovskite", "Glass"]) 
# (case-insensitive).
material_terms = "CPC, t-PVDF, t-PVF, o-PVF, PPE, J-BOX#1, J-BOX#2, PO, PMMA"
# Set exposure conditions terms to search for in filenames (e.g., ["A3", "A4", "B3", "B4", "unexposed"])
# (case-insensitive). Make sure to include 'unexposed' for control samples.
conditions_terms = "A3, A4, A5, 0.5X, 1X, 2.5X, 5X, ARC, OPN, KKCE, unexposed"
# append_missing controls the addition of files if some of the metadata information is missing. 
# True to add files even if some information is missing (may lead to issues downstream)
# False to skip files with missing information.
append_missing = False
# Set track_replicates to True to print the groups of replicate files
track_replicates = False
# Set access_subdirectories to False if you only want to search within folders in the
# specified directory that have dates as their names. This lets you avoid searching
# through unrelated folders that happen to be in the same directory.
access_subdirectories = False
# If any of these parameters are set to None or not specified, you will be prompted for input (may result
# in multiple prompts and/or minor formatting issues).

# Extract File Information and build or append to the main DataFrame.
FTIR_DataFrame = extract_file_info(
    FTIR_DataFrame=FTIR_DataFrame,
    directory=directory,
    file_types=file_types,
    separators=separators,
    material_terms=material_terms,
    conditions_terms=conditions_terms,
    append_missing=append_missing,
    access_subdirectories=access_subdirectories,
    track_replicates=track_replicates,
)

### Open the Peak Information Dictionary

Loads materials.json using the DataFrame as a reference for material names/aliases. This will be used for describing the peaks for each material.

In [None]:
materials_json_path = None # looks for materials.json in the active directory if None
populate_material_dictionary(FTIR_DataFrame, materials_json_path=materials_json_path)

### Display DataFrame (optional)

In [None]:
FTIR_DataFrame = display_DataFrame(FTIR_DataFrame, height=500)

### Trim DataFrame (optional)

Allows for the deletion of data from the DataFrame. Uses filtering options or exact index inputs for flexible use. 

If the index is set, then the trim will apply to only that index.

It is suggested that you use the 'Save Progress' cell at the end of the Notebook before doing this, so that unintended deletions are impermanent.

In [8]:
FTIR_DataFrame = trim_DataFrame(FTIR_DataFrame)



Output()

### Plot Spectra (optional)

In [None]:
plot_spectra(FTIR_DataFrame=FTIR_DataFrame)

## 2) Baseline Correction

You can choose a baseline approximation function for each different material that you have data for. It's recommended that you use 'ARPLS' with tweaked parameters or use 'Manual'. However, some datasets work better with different methods, so experiment if necessary.

Baseline Options:

Asymmetric Least Square 

    'ARPLS': asymmetrically reweighted penalized least squares smoothing-- an asymmetric least square method that uses a weighting function to account for noisy data.

Spline

    'IRSQR': iterative reweighted spline quantile regression-- uses penalized splines and iterative reweighted least squares to perform quantile regression.

Classification

    'FABC': fully automatic baseline correction-- uses first derivative approximation of data to identify and then ignore peak regions, then fits to baseline regions using Whittaker smoothing.

Manual

    'Manual': set "anchor points" for each of your materials using the built-in tool. This will create a list of wavenumber values that should be chosen due to always falling in the baseline regions for each spectrum of that material. A cubic spline interpolation will be done between those points' values in each scan.

Accepts a filepath as an argument if you want to experiment with a specific file.

Close when complete.

In [None]:
filepath = None  # If None, will let user pick spectrum to visualize. If specified, 
# provide the full file path string with r"" (e.g., r"C:\\path\\to\\file.dpt").
FTIR_DataFrame = baseline_correct_spectra(FTIR_DataFrame, filepath=filepath)

## 3) Normalization

Select a peak for each material that does not change shape with time (aka does not degrade). Each spectrum of that material will be scaled so that the normalization peak is the same amplitude in each, giving a normalized set of spectra that can be more accurately compared to each other.

Click on either side of the selected normalization peak, so that the tip of the peak appears within that range for every spectrum of the selected material.

In [None]:
filepath = None  # If None, will let user pick spectrum to visualize. If specified, 
# provide the full file path string with r"" (e.g., r"C:\\path\\to\\file.dpt").
FTIR_DataFrame = normalize_spectra(
    FTIR_DataFrame, filepath=filepath
    )

## 4) Deconvolution

### Peak-Finding

Gives a starting point from which to refine the peaks list for each material. The next step allows for manual peak addition, deletion, and modification.

Utilize range 2 and range 3 within the interactive tool in order to specify multiple peak regions.

Close when complete.

In [4]:
filepath = None # If None, will let user pick spectrum to visualize. If specified, 
# provide the full file path string with r"" (e.g., r"C:\\path\\to\\file.dpt").
FTIR_DataFrame = find_peak_info(FTIR_DataFrame, filepath=filepath)

HBox(children=(Dropdown(description='Material', layout=Layout(width='40%'), options=('any', 'CPC', 'J-BOX#1', …

HBox(children=(Dropdown(description='Spectrum', layout=Layout(width='70%'), options=(('PPE | unexposed | T=0 |…

HBox(children=(Checkbox(value=True, description='Use range 1'), FloatRangeSlider(value=(399.64138, 3998.47381)…

HBox(children=(Checkbox(value=False, description='Use range 2'), FloatRangeSlider(value=(399.64138, 3998.47381…

HBox(children=(Checkbox(value=False, description='Use range 3'), FloatRangeSlider(value=(399.64138, 3998.47381…

HBox(children=(FloatSlider(value=0.05, continuous_update=False, description='Prominence', max=1.0, readout_for…

VBox(children=(FigureWidget({
    'data': [{'mode': 'lines',
              'name': 'Normalized and Corrected',…

HBox(children=(IntSlider(value=1, continuous_update=False, description='Min width', max=50, min=1, style=Slide…

HBox(children=(Button(button_style='success', description='Save for spectrum', style=ButtonStyle()), Button(bu…

Output()

### Deconvolution

To process all spectra of a material quickly, get all of the peaks included that you want for a material and click "Canonize Peaks", which will save that list of peaks. To use that canon list of peaks, you click "Load Canon Peaks" after selecting a spectrum to change the displayed peak list to the canon one.

If a peak should be steeper (go towards zero faster as it leaves its center point), then decrease that peak's α value. If a peak should be wider, do the opposite. The "Reduced chi-square" value is a scaled version of Sum Squared Error between the model and the data, accounting for differences in X-Range and number of included peaks; but visual tests are also useful for the purpose of error estimation.

Offers automated optimization of the alpha parameter, though the time this takes scales quickly with the number of peaks being optimized. This is best used for a small area or for when you can step away and let the program run on its own.

    Fastest Method: Finalize peak locations for one spectrum of a material and canonize those peaks. Select the region of interest, run a fit, optimize the alpha values, and save the results for the spectrum. Then switch to a new spectrum, load the canon peaks, fit, optimize the alpha values, and save the results. Repeat until all spectra of the material are deconvoluted. Slow down at any step to tweak peak parameters, if desired.

Close when complete.

In [5]:
filepath = None  # If not None, will only analyze specified file's DataFrame entry
FTIR_DataFrame = deconvolute_peaks(FTIR_DataFrame, filepath=filepath)

VBox(children=(HBox(children=(Dropdown(description='Material', layout=Layout(width='40%'), options=('any', 'PP…

HTML(value='')

## 5) Model Fitting

### Material Fitting


Fitting takes the found peak parameters for every deconvoluted spectrum of the selected material and averages them to create material-wide accepted values, then runs another PseudoVoigt fit to obtain new amplitudes with those averaged parameters.

Optimization will iteratively modify select parameters to minimize the error across the materials' spectra.

Saving will put peak parameters into materials.json and peak areas into the DataFrame.

Close when complete.

In [7]:
FTIR_DataFrame = fit_material(FTIR_DataFrame)

VBox(children=(HBox(children=(Dropdown(description='Material', layout=Layout(width='40%'), options=('PPE',), v…

HTML(value="<span style='color:#555;'>Run 'Fit Material' to populate the peak parameters table.</span>")

HTML(value="<span style='color:#555;'>Run 'Fit Material' to populate the peak areas table.</span>")

### Quality Check

Shows which spectra have the greatest error between the Fit and their Normalized dataset, and plots them for easy issue identification.

In [None]:
FTIR_DataFrame = check_fit_quality(FTIR_DataFrame)

## 6) Results

### Output to .CSV

In [None]:
# Export a material-specific CSV

# Set the material alias exactly as it appears in your DataFrame/materials.json
material = "PPE"

# Optional: customize paths
# materials_json_path = "materials.json"  # default resolves next to Analysis_FTIR.py
# output_path = r"material_output_*alias*.csv"  # set a custom path if desired

# Run export (defaults used if optional args are omitted)
csv_path = export_material_output_csv(
    FTIR_DataFrame,
    material,
    # materials_json_path=materials_json_path,
    # output_path=output_path,
)
print(f"Exported CSV: {csv_path}")

### Save Progress

In [9]:
# Save the entire DataFrame to CSV
DataFrame_path = DataFrame_path  # Specify the path to your DataFrame CSV file (default 
# will be FTIR_DataFrame.csv in the active directory)
FTIR_DataFrame.to_csv(DataFrame_path, index=False)
print(f"DataFrame saved to {DataFrame_path}")

DataFrame saved to FTIR_DataFrame.csv
