A progressive educational journey from basic cheminformatics to state-of-the-art Graph Neural Networks (GNNs) and Molecular Transformers. This series covers everything from representing molecules as graphs to predicting chemical properties using advanced deep learning architectures.
This series is designed for:
- Computational chemists looking to apply deep learning to molecular data
- ML engineers interested in graph neural networks with a chemistry application
- Drug discovery researchers wanting to build property prediction models
- Students with basic Python and chemistry knowledge
Prerequisites: Basic Python (loops, functions, data structures) and fundamental chemistry (molecular structure, bonds, functional groups). No prior experience with RDKit, graph theory, or deep learning requiredβwe teach everything from scratch.
The course is structured into 7 sequential notebooks, progressively building from foundations to production-ready models.
| Lesson | Title | Key Concepts | Time |
|---|---|---|---|
| 01 | Building Graphs | SMILES parsing, RDKit, Mol-to-Graph, Feature extraction | 45-60 min |
| 02 | Positional Encoding | Laplacian Eigenvectors, RWPE, Spectral Analysis | 60-75 min |
| 03 | GAT Model | Graph Attention Networks, Message Passing, Multi-head Attention | 75-90 min |
| 04 | Sparse Attention | Efficiency in Graph Transformers, Virtual Edges, Locality | 60-75 min |
| 05 | Full Graph Transformer | Global Self-Attention, Edge Features, Deep Architectures | 90-105 min |
| 06 | Advanced Graph Models | GraphGPS, E(3)-GNNs, Equivariance, Hybrid Architectures | 90-105 min |
| 07 | Modelling & Predictions | Property Prediction (ESOL, FreeSolv), Training Pipelines | 120-150 min |
Total Estimated Time: ~9-11 hours
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FOUNDATIONS (Lessons 01-02) β
β β’ Molecular representations β’ Feature extraction β
β β’ Graph structures β’ Positional encodings β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ATTENTION MECHANISMS (Lessons 03-04) β
β β’ Local attention (GAT) β’ Sparse patterns β
β β’ Message passing β’ Scalability β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ADVANCED ARCHITECTURES (Lessons 05-06) β
β β’ Graph Transformers β’ GraphGPS β
β β’ Global context β’ Equivariant networks β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β APPLICATION (Lesson 07) β
β β’ Real datasets (ESOL, FreeSolv) β’ Model comparison β
β β’ Training pipelines β’ Deployment β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This project uses pyproject.toml for dependency management. It is recommended to use uv for fast, reliable package management.
# Clone the repository
git clone https://github.com/yourusername/ChemicalGraphSeries.git
cd ChemicalGraphSeries
# Sync environment and install all dependencies
uv sync
# Launch Jupyter
uv run jupyter notebook# Create a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install rdkit torch torch-geometric networkx matplotlib pandas jupyter py3dmol scipy
# Launch Jupyter
jupyter notebook# Run this in a notebook cell to verify everything works
from rdkit import Chem
import torch
import torch_geometric
import networkx as nx
print(f"RDKit: {Chem.rdBase.rdkitVersion}")
print(f"PyTorch: {torch.__version__}")
print(f"PyTorch Geometric: {torch_geometric.__version__}")
print("β
All dependencies installed successfully!")ChemicalGraphSeries/
βββ notebooks/
β βββ 01_Building_Graphs.ipynb # Foundations: SMILES, RDKit, graphs
β βββ 02_Positional_Encoding.ipynb # Spectral graph theory & RWPE
β βββ 03_GAT_Model.ipynb # Graph Attention Networks
β βββ 04_Sparse Attention.ipynb # Efficient attention patterns
β βββ 05_Full_Graph_Transformer.ipynb # Complete transformer architecture
β βββ 06_Advanced_Graph_Models.ipynb # GraphGPS, E(3)-GNNs
β βββ 07_Modelling_and_Predictions.ipynb # Real-world applications
βββ molGraph.png # Visual for documentation
βββ pyproject.toml # Project dependencies
βββ uv.lock # Locked dependency versions
βββ main.py # Utility scripts
βββ README.md # This file
| Requirement | Version |
|---|---|
| Python | β₯ 3.13 |
| RDKit | latest |
| PyTorch | latest |
| PyTorch Geometric | latest |
| NetworkX | latest |
| matplotlib | latest |
| pandas | latest |
| py3Dmol | β₯ 2.5.3 |
| scipy | β₯ 1.16.3 |
By the end of this series, you will have:
- Molecular featurization pipelines β Convert any SMILES string into ML-ready graph representations
- Custom GNN architectures β GATs, Graph Transformers, and hybrid models
- Property prediction models β Trained on ESOL (solubility) and FreeSolv (solvation energy) benchmarks
- Interpretable AI β Visualize attention weights to understand what your model "sees"
- Production-ready code β Deployable models for real-world molecular property prediction
- SMILES and SMARTS notation
- Molecular visualization (2D, 3D, conformer ensembles)
- Substructure matching and pharmacophore identification
- Molecules as graphs (atoms = nodes, bonds = edges)
- Adjacency and Laplacian matrices
- Spectral graph theory and eigenvector decomposition
- Message passing neural networks
- Attention mechanisms (single-head, multi-head, sparse)
- Transformer architectures adapted for graphs
- Equivariant neural networks (E(3)-GNNs)
- Feature engineering for molecular properties
- Train/validation/test splitting with scaffold awareness
- Hyperparameter tuning and cross-validation
- Model interpretation and error analysis
RDKit Documentation: https://www.rdkit.org/docs/
PyTorch Geometric: https://pytorch-geometric.readthedocs.io/
DeepChem: https://deepchem.io/
OGB Molecular Benchmarks: https://ogb.stanford.edu/
Key Papers:
- VeliΔkoviΔ et al. (2018) β Graph Attention Networks
- RampΓ‘Ε‘ek et al. (2022) β GraphGPS
- Dwivedi et al. (2021) β Benchmarking GNNs
This project is for educational purposes. Feel free to use, modify, and share with attribution.
Ready to start? Open Lesson 01: Building Graphs and begin your journey!
