Using forward-time SLiM simulations and neural networks to distinguish local adaptation from genetic drift across spatial demographic scenarios.
Distinguishing local adaptation from neutral genetic drift is a fundamental challenge in population genetics. This project implements a deep learning approach to detect signatures of spatially varying selection across different demographic scenarios.
Key Innovation: Rather than relying on traditional
Model Performance: 100% validation accuracy on 3,000 simulations
Architecture: Multi-branch CNN with attention mechanisms and spatial gradient detection
Training Time: ~27 epochs to convergence
Traditional methods for detecting local adaptation:
- Rely on
$Q_{ST}-F_{ST}$ comparisons - Assume simple demographic models (e.g., island model)
- May fail under complex spatial structures or high migration
Our approach:
- Uses forward-time simulations with realistic demography
- Integrates genomic (neutral & QTL) and phenotypic data
- Explicitly models spatial gradients in phenotypes
DeepLocalAdaptation/
βββ data/
β βββ simulations/
β βββ simulation.slim # SLiM forward-time simulation script
βββ src/
β βββ generate_data.py # Simulation orchestration & data generation
β βββ model_v2.py # Neural network architecture
β βββ train.py # Training script
β βββ analyze_features.py # Feature importance analysis
β βββ debug_data.py # Data quality diagnostics
β βββ visualization_data.py # Visualization utilities
βββ requirements.txt # Python dependencies
βββ MODEL_IMPROVEMENTS.md # Architecture documentation
βββ SLIM_FIXES.md # Simulation design notes
βββ README.md # This file
- Python 3.8+
- SLiM 4.0+ (download)
- 8+ GB RAM (for simulations)
- Multi-core CPU recommended (uses 7 cores by default)
- Clone repository:
git clone https://github.com/yourusername/DeepLocalAdaptation.git
cd DeepLocalAdaptation- Create virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt1. Generate training data (3-5 hours with 500 simulations per class):
python src/generate_data.py2. Train the model:
python src/train.py3. Analyze model features:
python src/analyze_features.py4. Debug data quality (optional):
python src/debug_data.py- Island Model - All demes exchange migrants equally (0.1% per generation)
- Stepping Stone - Linear chain with nearest-neighbor migration (1%)
- Secondary Contact - Two isolated chains reconnect at generation 2000 (1%)
- Drift (MODE=0): Neutral evolution, fitness = 1.0 for all individuals
- Adaptation (MODE=1): Gaussian fitness landscape with spatial gradient
- Optimum ranges from -2.0 (Deme 1) to +2.0 (Deme 10)
- Fitness:
exp(-(phenotype - optimum)Β² / (2wΒ²))where w=1.5
- 10 demes Γ 200 individuals each (2,000 total)
- 20 QTLs (Quantitative Trait Loci) with additive effects
- Phenotype = Ξ£(QTL effects) + environmental noise (Ο=0.2)
- Genome length: 100kb with recombination (r=1e-8)
- Neutral mutations overlaid post-hoc via msprime (ΞΌ=1e-7)
- Generations 1-1000: Ancestral burn-in (single population of 2,000)
- Generation 1000: Split into 10 demes + add 20 QTLs
- Generations 1000-3000: Evolution under selection/migration
- Generation 2000: Secondary contact (if SCENARIO=2)
Multi-branch architecture combining:
-
Spatial Gradient Module
- Computes phenotype-geography correlations
- Key feature: distinguishes adaptation (strong gradient) from drift (no gradient)
- Outputs 6 spatial features
-
Neutral SNP Branch
- 1D Conv layers with attention pooling
- Learns patterns in neutral allele frequencies
- Outputs 64 features
-
QTL Frequency Branch
- 1D Conv layers with attention pooling
- Captures adaptive loci patterns
- Outputs 64 features
-
Phenotype Encoder
- Processes deme-wise phenotype statistics
- MLP: 10 demes β 64 β 32 features
- Outputs 32 features
-
Merged Classifier
- Concatenates all branches (166 total features)
- Deep MLP: 166 β 256 β 128 β 64 β 1
- Sigmoid output (probability of adaptation)
Total Parameters: 123,363
See MODEL_IMPROVEMENTS.md for detailed architecture description.
| Dataset | Model | Accuracy | Notes |
|---|---|---|---|
| Weak signal (old) | v1 | 53.56% | Broken SLiM fitness function |
| Weak signal (old) | v2 | 54.17% | Improved architecture, bad data |
| Strong signal (fixed) | v2 | 100% | Fixed fitness + low migration |
-
Data quality is critical: Initial poor performance (54%) was due to weak selection in simulations, not model limitations
-
Phenotype gradients are key: The improved model explicitly computes phenotype-geography correlations, which perfectly distinguish adaptation from drift
-
Fitness function matters:
- Wrong:
1.0 + 5.0 * dnorm()β weak selection - Correct:
exp(-(deviationΒ²)/(2wΒ²))β strong selection
- Wrong:
-
Migration-selection balance: Reduced migration (5% β 0.1-1%) allows local adaptation to manifest
Per simulation:
neutral_freqs: (500 SNPs, 10 demes) - neutral allele frequenciesqtl_freqs: (100 QTLs, 10 demes) - QTL allele frequencies (padded)phenotypes: (2000 individuals) - quantitative trait valueslabels: Binary (0=Drift, 1=Adaptation)scenarios: Categorical (0=Island, 1=Stepping Stone, 2=Contact)
- Optimizer: Adam (lr=5e-4)
- Loss: Binary Cross-Entropy
- Batch size: 32
- Early stopping: Patience=20 epochs
- Train/Val split: 80/20
- Device: CPU (GPU optional)
Data Generation:
- 500 simulations/class Γ 6 classes = 3,000 simulations
- ~3-5 hours on 7 CPU cores
- ~2 GB disk space (compressed)
Model Training:
- ~5-10 minutes on CPU
- ~1-2 minutes on GPU
- Converges in ~20-30 epochs
If you use this code or approach in your research, please cite:
@software{DeepLocalAdaptation2026,
author = {Your Name},
title = {Deep Learning for Detecting Local Adaptation vs. Drift},
year = {2026},
url = {https://github.com/yourusername/DeepLocalAdaptation}
}- SLiM: Haller & Messer (2019). SLiM 3: Forward Genetic Simulations Beyond the WrightβFisher Model. Molecular Biology and Evolution.
- Tree Sequences: Kelleher et al. (2018). Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Computational Biology.
- QST-FST: Whitlock (2008). Evolutionary inference from QST. Molecular Ecology.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- SLiM development team for the forward-time simulation framework
- tskit developers for efficient tree sequence tools
- PyTorch community for the deep learning framework