Counting Molecules: Automated Analysis of STM Imaging Data

A computer vision pipeline that automates the counting and classification of molecule species from Scanning Tunnelling Microscopy (STM) images, reducing manual inspection effort by 40 to 50 percent compared to hand-counting.

Overview

Researchers using STM imaging to study molecular species typically count and classify molecules by hand, a slow and subjective process that does not scale to large image sets. This project builds an end-to-end pipeline that ingests raw STM images, segments individual molecules, and classifies them by species using scikit-learn.

Problem

Given a folder of STM images containing multiple molecule species, automatically:

Read and standardise the raw image files
Convert each image into a clean binary mask separating molecules from background
Detect and segment individual molecules
Count molecules per species across the dataset

Approach

The pipeline is split across three notebooks, each handling one stage of the workflow:

1. Reading and standardising STM images

01_read_stm_images.ipynb ingests raw STM image files, handles the instrument-specific format, and produces a standardised image array ready for downstream processing.

2. Binarisation

02_binarization.ipynb applies thresholding and noise reduction across the image set to produce binary masks separating molecules from background. Threshold selection is tuned empirically against a labelled subset.

3. Blob detection and counting

03_blob_detection.ipynb applies blob detection algorithms to the binary masks to identify, segment, and count individual molecules. Detected blobs feed into a scikit-learn classifier to assign species labels.

Results

40 to 50 percent reduction in manual inspection effort compared to hand-counting
End-to-end pipeline reproducible across new image sets without retuning
Output: per-image counts and species classifications, exportable for downstream analysis

Tech stack

Language: Python
Libraries: scikit-learn, scikit-image, OpenCV, NumPy, matplotlib
Environment: Jupyter Notebook

Repository structure

Counting-Molecules/
├── 01_read_stm_images.ipynb     # Stage 1: image ingestion and standardisation
├── 02_binarization.ipynb        # Stage 2: thresholding and binarisation
├── 03_blob_detection .ipynb      # Stage 3: blob detection and classification
└── README.md

Run

The four notebooks are designed to run in sequence. Each stage's output feeds the next.

git clone https://github.com/Stephaniew1/Counting-Molecules.git
cd Counting-Molecules
pip install -r requirements.txt

# Run in order:
jupyter notebook 01_read_stm_imagess.ipynb
jupyter notebook 02_binarizationn.ipynb
jupyter notebook 03_blob_detection.ipynb

Context

This project was completed as part of my degree requirements at Monash University. My project partner contributed code review and refactoring support across the pipeline.

Limitations and future work

Threshold parameters are tuned for the specific STM instrument used and may need adjustment for other setups
Overlapping molecules can be miscounted; a deep learning segmentation model, for example U-Net, would handle dense regions more robustly
The classifier is trained on a small labelled subset and would benefit from a larger, more diverse training set

License

This project is for academic and portfolio purposes. See repository for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Counting Molecules: Automated Analysis of STM Imaging Data

Overview

Problem

Approach

1. Reading and standardising STM images

2. Binarisation

3. Blob detection and counting

Results

Tech stack

Repository structure

Run

Context

Limitations and future work

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
01_read_stm_images.ipynb		01_read_stm_images.ipynb
02_binarization.ipynb		02_binarization.ipynb
03_blob_detection.ipynb		03_blob_detection.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Counting Molecules: Automated Analysis of STM Imaging Data

Overview

Problem

Approach

1. Reading and standardising STM images

2. Binarisation

3. Blob detection and counting

Results

Tech stack

Repository structure

Run

Context

Limitations and future work

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages