🧪 Chemical Graph Series

A progressive educational journey from basic cheminformatics to state-of-the-art Graph Neural Networks (GNNs) and Molecular Transformers. This series covers everything from representing molecules as graphs to predicting chemical properties using advanced deep learning architectures.

🎯 Who Is This For?

This series is designed for:

Computational chemists looking to apply deep learning to molecular data
ML engineers interested in graph neural networks with a chemistry application
Drug discovery researchers wanting to build property prediction models
Students with basic Python and chemistry knowledge

Prerequisites: Basic Python (loops, functions, data structures) and fundamental chemistry (molecular structure, bonds, functional groups). No prior experience with RDKit, graph theory, or deep learning required—we teach everything from scratch.

🚀 Curriculum Overview

The course is structured into 7 sequential notebooks, progressively building from foundations to production-ready models.

Lesson	Title	Key Concepts	Time
01	Building Graphs	SMILES parsing, RDKit, Mol-to-Graph, Feature extraction	45-60 min
02	Positional Encoding	Laplacian Eigenvectors, RWPE, Spectral Analysis	60-75 min
03	GAT Model	Graph Attention Networks, Message Passing, Multi-head Attention	75-90 min
04	Sparse Attention	Efficiency in Graph Transformers, Virtual Edges, Locality	60-75 min
05	Full Graph Transformer	Global Self-Attention, Edge Features, Deep Architectures	90-105 min
06	Advanced Graph Models	GraphGPS, E(3)-GNNs, Equivariance, Hybrid Architectures	90-105 min
07	Modelling & Predictions	Property Prediction (ESOL, FreeSolv), Training Pipelines	120-150 min

Total Estimated Time: ~9-11 hours

📚 Learning Path

┌─────────────────────────────────────────────────────────────────────────┐
│                        FOUNDATIONS (Lessons 01-02)                      │
│  • Molecular representations    • Feature extraction                   │
│  • Graph structures             • Positional encodings                 │
└─────────────────────────────────────────────────────────────────────────┘
                                    ↓
┌─────────────────────────────────────────────────────────────────────────┐
│                      ATTENTION MECHANISMS (Lessons 03-04)               │
│  • Local attention (GAT)        • Sparse patterns                      │
│  • Message passing              • Scalability                          │
└─────────────────────────────────────────────────────────────────────────┘
                                    ↓
┌─────────────────────────────────────────────────────────────────────────┐
│                    ADVANCED ARCHITECTURES (Lessons 05-06)               │
│  • Graph Transformers           • GraphGPS                             │
│  • Global context               • Equivariant networks                 │
└─────────────────────────────────────────────────────────────────────────┘
                                    ↓
┌─────────────────────────────────────────────────────────────────────────┐
│                         APPLICATION (Lesson 07)                         │
│  • Real datasets (ESOL, FreeSolv)    • Model comparison                │
│  • Training pipelines                • Deployment                      │
└─────────────────────────────────────────────────────────────────────────┘

🛠️ Setup & Installation

This project uses pyproject.toml for dependency management. It is recommended to use uv for fast, reliable package management.

Using `uv` (Recommended)

# Clone the repository
git clone https://github.com/yourusername/ChemicalGraphSeries.git
cd ChemicalGraphSeries

# Sync environment and install all dependencies
uv sync

# Launch Jupyter
uv run jupyter notebook

Using `pip`

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install rdkit torch torch-geometric networkx matplotlib pandas jupyter py3dmol scipy

# Launch Jupyter
jupyter notebook

Verify Installation

# Run this in a notebook cell to verify everything works
from rdkit import Chem
import torch
import torch_geometric
import networkx as nx

print(f"RDKit: {Chem.rdBase.rdkitVersion}")
print(f"PyTorch: {torch.__version__}")
print(f"PyTorch Geometric: {torch_geometric.__version__}")
print("✅ All dependencies installed successfully!")

📂 Project Structure

ChemicalGraphSeries/
├── notebooks/
│   ├── 01_Building_Graphs.ipynb      # Foundations: SMILES, RDKit, graphs
│   ├── 02_Positional_Encoding.ipynb  # Spectral graph theory & RWPE
│   ├── 03_GAT_Model.ipynb            # Graph Attention Networks
│   ├── 04_Sparse Attention.ipynb     # Efficient attention patterns
│   ├── 05_Full_Graph_Transformer.ipynb  # Complete transformer architecture
│   ├── 06_Advanced_Graph_Models.ipynb   # GraphGPS, E(3)-GNNs
│   └── 07_Modelling_and_Predictions.ipynb  # Real-world applications
├── molGraph.png                      # Visual for documentation
├── pyproject.toml                    # Project dependencies
├── uv.lock                           # Locked dependency versions
├── main.py                           # Utility scripts
└── README.md                         # This file

🧪 Requirements

Requirement	Version
Python	≥ 3.13
RDKit	latest
PyTorch	latest
PyTorch Geometric	latest
NetworkX	latest
matplotlib	latest
pandas	latest
py3Dmol	≥ 2.5.3
scipy	≥ 1.16.3

🎓 What You'll Build

By the end of this series, you will have:

Molecular featurization pipelines — Convert any SMILES string into ML-ready graph representations
Custom GNN architectures — GATs, Graph Transformers, and hybrid models
Property prediction models — Trained on ESOL (solubility) and FreeSolv (solvation energy) benchmarks
Interpretable AI — Visualize attention weights to understand what your model "sees"
Production-ready code — Deployable models for real-world molecular property prediction

📖 Key Topics Covered

Cheminformatics

SMILES and SMARTS notation
Molecular visualization (2D, 3D, conformer ensembles)
Substructure matching and pharmacophore identification

Graph Theory

Molecules as graphs (atoms = nodes, bonds = edges)
Adjacency and Laplacian matrices
Spectral graph theory and eigenvector decomposition

Deep Learning

Message passing neural networks
Attention mechanisms (single-head, multi-head, sparse)
Transformer architectures adapted for graphs
Equivariant neural networks (E(3)-GNNs)

Practical ML

Feature engineering for molecular properties
Train/validation/test splitting with scaffold awareness
Hyperparameter tuning and cross-validation
Model interpretation and error analysis

🔗 Resources & Further Reading

RDKit Documentation: https://www.rdkit.org/docs/
PyTorch Geometric: https://pytorch-geometric.readthedocs.io/
DeepChem: https://deepchem.io/
OGB Molecular Benchmarks: https://ogb.stanford.edu/

Key Papers:

Veličković et al. (2018) — Graph Attention Networks
Rampášek et al. (2022) — GraphGPS
Dwivedi et al. (2021) — Benchmarking GNNs

📝 License

This project is for educational purposes. Feel free to use, modify, and share with attribution.

Ready to start? Open Lesson 01: Building Graphs and begin your journey!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
.gitignore		.gitignore
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
molGraph.png		molGraph.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧪 Chemical Graph Series

🎯 Who Is This For?

🚀 Curriculum Overview

📚 Learning Path

🛠️ Setup & Installation

Using `uv` (Recommended)

Using `pip`

Verify Installation

📂 Project Structure

🧪 Requirements

🎓 What You'll Build

📖 Key Topics Covered

Cheminformatics

Graph Theory

Deep Learning

Practical ML

🔗 Resources & Further Reading

📝 License

About

Uh oh!

Releases

Packages

Languages

License

CodeHalwell/chemical-graph-series

Folders and files

Latest commit

History

Repository files navigation

🧪 Chemical Graph Series

🎯 Who Is This For?

🚀 Curriculum Overview

📚 Learning Path

🛠️ Setup & Installation

Using uv (Recommended)

Using pip

Verify Installation

📂 Project Structure

🧪 Requirements

🎓 What You'll Build

📖 Key Topics Covered

Cheminformatics

Graph Theory

Deep Learning

Practical ML

🔗 Resources & Further Reading

📝 License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Using `uv` (Recommended)

Using `pip`

Packages