A computer vision pipeline that automates the counting and classification of molecule species from Scanning Tunnelling Microscopy (STM) images, reducing manual inspection effort by 40 to 50 percent compared to hand-counting.
Researchers using STM imaging to study molecular species typically count and classify molecules by hand, a slow and subjective process that does not scale to large image sets. This project builds an end-to-end pipeline that ingests raw STM images, segments individual molecules, and classifies them by species using scikit-learn.
Given a folder of STM images containing multiple molecule species, automatically:
- Read and standardise the raw image files
- Convert each image into a clean binary mask separating molecules from background
- Detect and segment individual molecules
- Count molecules per species across the dataset
The pipeline is split across three notebooks, each handling one stage of the workflow:
01_read_stm_images.ipynb ingests raw STM image files, handles the
instrument-specific format, and produces a standardised image array ready
for downstream processing.
02_binarization.ipynb applies thresholding and noise reduction across
the image set to produce binary masks separating molecules from background.
Threshold selection is tuned empirically against a labelled subset.
03_blob_detection.ipynb applies blob detection algorithms to the binary
masks to identify, segment, and count individual molecules. Detected
blobs feed into a scikit-learn classifier to assign species labels.
- 40 to 50 percent reduction in manual inspection effort compared to hand-counting
- End-to-end pipeline reproducible across new image sets without retuning
- Output: per-image counts and species classifications, exportable for downstream analysis
- Language: Python
- Libraries: scikit-learn, scikit-image, OpenCV, NumPy, matplotlib
- Environment: Jupyter Notebook
Counting-Molecules/
├── 01_read_stm_images.ipynb # Stage 1: image ingestion and standardisation
├── 02_binarization.ipynb # Stage 2: thresholding and binarisation
├── 03_blob_detection .ipynb # Stage 3: blob detection and classification
└── README.md
The four notebooks are designed to run in sequence. Each stage's output feeds the next.
git clone https://github.com/Stephaniew1/Counting-Molecules.git
cd Counting-Molecules
pip install -r requirements.txt
# Run in order:
jupyter notebook 01_read_stm_imagess.ipynb
jupyter notebook 02_binarizationn.ipynb
jupyter notebook 03_blob_detection.ipynbThis project was completed as part of my degree requirements at Monash University. My project partner contributed code review and refactoring support across the pipeline.
- Threshold parameters are tuned for the specific STM instrument used and may need adjustment for other setups
- Overlapping molecules can be miscounted; a deep learning segmentation model, for example U-Net, would handle dense regions more robustly
- The classifier is trained on a small labelled subset and would benefit from a larger, more diverse training set
This project is for academic and portfolio purposes. See repository for details.