Official implementation of 'Engineered MgO Nanoparticles with Tunable Electronic Signatures for Energy Applications' by Mustafa Kurban, Can Polat, Erchin Serpedin, Hasan Kurban.
- Multimodal Architecture: Main model combining geometric (GNN) and image (CNN) representations with bandgap estimates
- Benchmark Comparison: Compare against 4 GNN architectures (SchNet, FAENet, GotenNet, EGNN) and DOS analysis agent
- Automated Training: Streamlined training pipeline with early stopping
- Performance Tracking: Timing, epoch tracking, and detailed metrics
- Organized Results: Structured output with model-specific directories
- Agent-Guided DOS Analysis: Intelligent parameter optimization using LLMs
DopeAgent/
├── train_multimodal.py # Main multimodal model training
├── multimodal_dataloader.py # Multimodal data loading utilities
├── run_benchmark.py # Benchmark orchestration script
├── train_models.py # GNN benchmark training utilities
├── models.py # All model architectures (multimodal + benchmarks)
├── benchmark_config.py # Centralized configuration
├── agent.py # DOS analysis agent class (LLM-guided)
├── agent_search.py # Intelligent agent-guided DOS analysis
├── requirements.txt # All dependencies
├── results/ # Organized results directory
│ ├── multimodal/ # Main multimodal model results
│ ├── schnet/ # SchNet benchmark results
│ ├── faenet/ # FAENet benchmark results
│ ├── gotennet/ # GotenNet benchmark results
│ ├── egnn/ # EGNN benchmark results
│ └── model_comparison_results.csv
└── data/ # Dataset directory
├── xyz/ # Geometric data (XYZ files)
├── png/ # Image data (PNG files)
└── dos/ # DOS data (DAT files)
- Multimodal Regressor - Primary model combining:
- Geometric Branch: SchNet (16D embeddings) for XYZ molecular data
- Image Branch: ResNet-18 (512D embeddings) for PNG image data
- Bandgap Estimate: Base prediction from DOS analysis
- Fusion: Additive boost mechanism for final prediction
- SchNet - PyTorch Geometric's SchNet for molecular property prediction
- FAENet - Fast Atomic Environment Network with physics-aware embeddings
- GotenNet - Graph Transformer Network with attention mechanisms
- EGNN - E(n) Equivariant Graph Neural Network for 3D structures
- DoSAgent - LLM-guided DOS analysis for bandgap prediction
# Train multimodal model (main architecture)
python train_multimodal.py --series R8
python train_multimodal.py --series R9
# Train with custom parameters
python train_multimodal.py --series R8 --epochs 1000 --lr 0.001# Run full benchmark (all GNN models, both series)
python run_benchmark.py
# Run on specific series only
python run_benchmark.py --series R8
python run_benchmark.py --series R9
# Skip plots (faster)
python run_benchmark.py --no-plots
# View configuration only
python run_benchmark.py --config-only# Use AI agent to intelligently explore parameter space
python agent_search.py --series R8 --max-iterations 15
python agent_search.py --series R9 --max-iterations 10
# Use different language model
python agent_search.py --series R8 --model llama3.1- Main Results:
results/multimodal/{series}_results.csv - Detailed Results:
results/multimodal/{series}_detailed_results.csv - Modality Contributions: Individual GNN and image feature contributions
- Boost Values: GNN and image boost values for interpretability
- Organized Results:
results/{model_name}/{series}_results.csv - Detailed Results:
results/{model_name}/{series}_detailed_results.csv - Comparison Table:
results/model_comparison_results.csv - Performance Metrics: MAE, timing, epochs, early stopping status
Edit benchmark_config.py to modify:
- Training parameters (batch size, epochs, learning rate)
- Model architectures and hyperparameters
- Data paths and series selection
The benchmark tracks:
- Mean Absolute Error (MAE) for each model
- Standard deviation of errors
- Error ranges (min/max)
- Training time per model
- Final epoch reached (with early stopping)
- Best performing model identification
pip install -r requirements.txt- Core ML/DL: PyTorch, PyTorch Geometric, torch-scatter
- GNN Models: FAENet, GotenNet, EGNN-PyTorch
- Data Processing: pandas, numpy, scipy, scikit-learn
- Visualization: matplotlib, seaborn
- Utilities: tqdm, ASE (Atomic Simulation Environment)
- AI/LLM: langchain, langchain-ollama
- Multi-Modal Fusion: Combines geometric (XYZ), image (PNG), and DOS data
- Additive Boost Mechanism: GNN and image branches provide corrections to base estimates
- Interpretable Design: Separate boost values for each modality
- Robust Training: Leave-one-out cross-validation with early stopping
- Comprehensive Evaluation: Detailed performance metrics and modality contributions
- Centralized Configuration: All parameters in
benchmark_config.py - Modular Design: Easy to add new models or modify existing ones
- Robust Evaluation: Leave-one-out cross-validation with timing
- Organized Output: Clean directory structure for results
- Comprehensive Logging: Detailed performance metrics and comparisons
- Intelligent Agent Search: AI-guided parameter optimization using language models
- Comprehensive Parameter Space: Tests thousands of parameter combinations
- Cross-Validation: Leave-one-out validation for robust evaluation
- Detailed Logging: Track all combinations and agent reasoning