🧬 Deep Learning for Detecting Local Adaptation vs. Drift

Using forward-time SLiM simulations and neural networks to distinguish local adaptation from genetic drift across spatial demographic scenarios.

📋 Overview

Distinguishing local adaptation from neutral genetic drift is a fundamental challenge in population genetics. This project implements a deep learning approach to detect signatures of spatially varying selection across different demographic scenarios.

Key Innovation: Rather than relying on traditional $Q_{ST}-F_{ST}$ comparisons, we train a neural network on spatially explicit forward-time simulations to learn the complex genomic and phenotypic signatures that differentiate adaptation from drift.

Results

Model Performance: 100% validation accuracy on 3,000 simulations
Architecture: Multi-branch CNN with attention mechanisms and spatial gradient detection
Training Time: ~27 epochs to convergence

🎯 Problem Statement

Traditional methods for detecting local adaptation:

Rely on $Q_{ST}-F_{ST}$ comparisons
Assume simple demographic models (e.g., island model)
May fail under complex spatial structures or high migration

Our approach:

Uses forward-time simulations with realistic demography
Integrates genomic (neutral & QTL) and phenotypic data
Explicitly models spatial gradients in phenotypes

🏗️ Project Structure

DeepLocalAdaptation/
├── data/
│   └── simulations/
│       └── simulation.slim       # SLiM forward-time simulation script
├── src/
│   ├── generate_data.py          # Simulation orchestration & data generation
│   ├── model_v2.py               # Neural network architecture
│   ├── train.py                  # Training script
│   ├── analyze_features.py       # Feature importance analysis
│   ├── debug_data.py             # Data quality diagnostics
│   └── visualization_data.py     # Visualization utilities
├── requirements.txt              # Python dependencies
├── MODEL_IMPROVEMENTS.md         # Architecture documentation
├── SLIM_FIXES.md                # Simulation design notes
└── README.md                     # This file

🚀 Quick Start

Prerequisites

Python 3.8+
SLiM 4.0+ (download)
8+ GB RAM (for simulations)
Multi-core CPU recommended (uses 7 cores by default)

Installation

Clone repository:

git clone https://github.com/yourusername/DeepLocalAdaptation.git
cd DeepLocalAdaptation

Create virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

1. Generate training data (3-5 hours with 500 simulations per class):

python src/generate_data.py

2. Train the model:

python src/train.py

3. Analyze model features:

python src/analyze_features.py

4. Debug data quality (optional):

python src/debug_data.py

🧬 Simulation Design

Demographic Scenarios (3 types)

Island Model - All demes exchange migrants equally (0.1% per generation)
Stepping Stone - Linear chain with nearest-neighbor migration (1%)
Secondary Contact - Two isolated chains reconnect at generation 2000 (1%)

Selection Regime (2 modes)

Drift (MODE=0): Neutral evolution, fitness = 1.0 for all individuals
Adaptation (MODE=1): Gaussian fitness landscape with spatial gradient
- Optimum ranges from -2.0 (Deme 1) to +2.0 (Deme 10)
- Fitness: exp(-(phenotype - optimum)² / (2w²)) where w=1.5

Genetic Architecture

10 demes × 200 individuals each (2,000 total)
20 QTLs (Quantitative Trait Loci) with additive effects
Phenotype = Σ(QTL effects) + environmental noise (σ=0.2)
Genome length: 100kb with recombination (r=1e-8)
Neutral mutations overlaid post-hoc via msprime (μ=1e-7)

Timeline

Generations 1-1000: Ancestral burn-in (single population of 2,000)
Generation 1000: Split into 10 demes + add 20 QTLs
Generations 1000-3000: Evolution under selection/migration
Generation 2000: Secondary contact (if SCENARIO=2)

🤖 Model Architecture

ImprovedLocalAdaptationClassifier (v2)

Multi-branch architecture combining:

Spatial Gradient Module
- Computes phenotype-geography correlations
- Key feature: distinguishes adaptation (strong gradient) from drift (no gradient)
- Outputs 6 spatial features
Neutral SNP Branch
- 1D Conv layers with attention pooling
- Learns patterns in neutral allele frequencies
- Outputs 64 features
QTL Frequency Branch
- 1D Conv layers with attention pooling
- Captures adaptive loci patterns
- Outputs 64 features
Phenotype Encoder
- Processes deme-wise phenotype statistics
- MLP: 10 demes → 64 → 32 features
- Outputs 32 features
Merged Classifier
- Concatenates all branches (166 total features)
- Deep MLP: 166 → 256 → 128 → 64 → 1
- Sigmoid output (probability of adaptation)

Total Parameters: 123,363

See MODEL_IMPROVEMENTS.md for detailed architecture description.

📊 Results

Model Performance

Dataset	Model	Accuracy	Notes
Weak signal (old)	v1	53.56%	Broken SLiM fitness function
Weak signal (old)	v2	54.17%	Improved architecture, bad data
Strong signal (fixed)	v2	100%	Fixed fitness + low migration

Key Findings

Data quality is critical: Initial poor performance (54%) was due to weak selection in simulations, not model limitations
Phenotype gradients are key: The improved model explicitly computes phenotype-geography correlations, which perfectly distinguish adaptation from drift
Fitness function matters:
- Wrong: 1.0 + 5.0 * dnorm() → weak selection
- Correct: exp(-(deviation²)/(2w²)) → strong selection
Migration-selection balance: Reduced migration (5% → 0.1-1%) allows local adaptation to manifest

🔬 Technical Details

Input Data Format

Per simulation:

neutral_freqs: (500 SNPs, 10 demes) - neutral allele frequencies
qtl_freqs: (100 QTLs, 10 demes) - QTL allele frequencies (padded)
phenotypes: (2000 individuals) - quantitative trait values
labels: Binary (0=Drift, 1=Adaptation)
scenarios: Categorical (0=Island, 1=Stepping Stone, 2=Contact)

Training Configuration

Optimizer: Adam (lr=5e-4)
Loss: Binary Cross-Entropy
Batch size: 32
Early stopping: Patience=20 epochs
Train/Val split: 80/20
Device: CPU (GPU optional)

Computational Requirements

Data Generation:

500 simulations/class × 6 classes = 3,000 simulations
~3-5 hours on 7 CPU cores
~2 GB disk space (compressed)

Model Training:

~5-10 minutes on CPU
~1-2 minutes on GPU
Converges in ~20-30 epochs

📚 Citation

If you use this code or approach in your research, please cite:

@software{DeepLocalAdaptation2026,
  author = {Your Name},
  title = {Deep Learning for Detecting Local Adaptation vs. Drift},
  year = {2026},
  url = {https://github.com/yourusername/DeepLocalAdaptation}
}

📖 References

SLiM: Haller & Messer (2019). SLiM 3: Forward Genetic Simulations Beyond the Wright–Fisher Model. Molecular Biology and Evolution.
Tree Sequences: Kelleher et al. (2018). Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Computational Biology.
QST-FST: Whitlock (2008). Evolutionary inference from QST. Molecular Ecology.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

SLiM development team for the forward-time simulation framework
tskit developers for efficient tree sequence tools
PyTorch community for the deep learning framework

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Deep Learning for Detecting Local Adaptation vs. Drift

📋 Overview

Results

🎯 Problem Statement

🏗️ Project Structure

🚀 Quick Start

Prerequisites

Installation

Usage

🧬 Simulation Design

Demographic Scenarios (3 types)

Selection Regime (2 modes)

Genetic Architecture

Timeline

🤖 Model Architecture

ImprovedLocalAdaptationClassifier (v2)

📊 Results

Model Performance

Key Findings

🔬 Technical Details

Input Data Format

Training Configuration

Computational Requirements

📚 Citation

📖 References

🤝 Contributing

📝 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/simulations		data/simulations
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CLEANUP_SUMMARY.md		CLEANUP_SUMMARY.md
LICENSE		LICENSE
MODEL_IMPROVEMENTS.md		MODEL_IMPROVEMENTS.md
README.md		README.md
SLIM_FIXES.md		SLIM_FIXES.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧬 Deep Learning for Detecting Local Adaptation vs. Drift

📋 Overview

Results

🎯 Problem Statement

🏗️ Project Structure

🚀 Quick Start

Prerequisites

Installation

Usage

🧬 Simulation Design

Demographic Scenarios (3 types)

Selection Regime (2 modes)

Genetic Architecture

Timeline

🤖 Model Architecture

ImprovedLocalAdaptationClassifier (v2)

📊 Results

Model Performance

Key Findings

🔬 Technical Details

Input Data Format

Training Configuration

Computational Requirements

📚 Citation

📖 References

🤝 Contributing

📝 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages