# Autoencoding for clustering of spectroscopic data

---

Lecture: "Physics-augmented machine learning" @ Cyber-Physical Simulation, TU Darmstadt

Lecturer: Prof. Oliver Weeger

Assistants: Dr.-Ing. Maximilian Kannapin, Jasper O. Schommartz, Dominik K. Klein

Summer term 2025

---

Experimental data by Ho et al.: ''Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning''. Nature Commuications 10:4927 (2019).



*Run the following cell to clone the GitHub repository in your current Google Colab environment.*

In [8]:
!git clone https://github.com/CPShub/LecturePhysicsAwareML.git

Cloning into 'LecturePhysicsAwareML'...
remote: Enumerating objects: 819, done.[K
remote: Counting objects: 100% (74/74), done.[K
remote: Compressing objects: 100% (59/59), done.[K
remote: Total 819 (delta 25), reused 27 (delta 15), pack-reused 745 (from 2)[K
Receiving objects: 100% (819/819), 141.00 MiB | 24.77 MiB/s, done.
Resolving deltas: 100% (343/343), done.
Updating files: 100% (189/189), done.


*Run the following cell to import all modules and python files to this notebook. If you made changes in the python files, run the following cell again to update the python files in this notebook. You might need to restart your Colab session first ("Runtime / Restart session" in the header menu).*


In [1]:
import tensorflow as tf
import datetime
now = datetime.datetime.now
import LecturePhysicsAwareML.Autoencoder.data as ld
import LecturePhysicsAwareML.Autoencoder.models as lm
import LecturePhysicsAwareML.Autoencoder.plots as lp

*Run this cell if you are executing the notebook locally on your device.*

In [None]:
import tensorflow as tf
import datetime
now = datetime.datetime.now
import data as ld
import models as lm
import plots as lp

Matplotlib is building the font cache; this may take a moment.


*If you want to clone the repository again, you have to delete it from your Google Colab files first. For this, you can run the following cell.*

In [7]:
%rm -rf LecturePhysicsAwareML

Load full autoencoder and encoder

In [5]:
latent_variables = 2
nodes = 64

units = [nodes,latent_variables,nodes,1000]
activation = ['softplus','linear','softplus','linear']
model_AE = lm.main(units=units, activation=activation)

units = [nodes,latent_variables]
activation = ['softplus','linear']
model_E = lm.main(units=units, activation=activation)

Define study and calibrate the autoencoder

In [None]:
# define bacteria sets to be investigated (numbers between 0 and 29)
cases = [18,27,0,26]
wn_c, spectra_c, label_c = ld.load_data(cases)

epochs = 500
h = model_AE.fit([spectra_c], [spectra_c], epochs=epochs, verbose=2)

lp.plot_loss(h)

Visualize results

In [None]:
# set the parameters of the encoder
model_E.set_weights(model_AE.weights[0:4])

# plot latent space
for i in range(latent_variables):
    for j in range(latent_variables):
        if i!=j:
            if i>j:

                lp.plot_latent_space_ij(model_E, spectra_c, label_c, i, j)

# plot the different bacteria types
for i in range(len(cases)):

    wn, spectra = ld.load_single_case(cases[i])
    lp.plot_spectra(wn, spectra, i, cases[i])