Skip to content

JikaelN/DeepLocalAdaptation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 Deep Learning for Detecting Local Adaptation vs. Drift

Using forward-time SLiM simulations and neural networks to distinguish local adaptation from genetic drift across spatial demographic scenarios.

Python 3.8+ PyTorch SLiM 4 License: MIT

πŸ“‹ Overview

Distinguishing local adaptation from neutral genetic drift is a fundamental challenge in population genetics. This project implements a deep learning approach to detect signatures of spatially varying selection across different demographic scenarios.

Key Innovation: Rather than relying on traditional $Q_{ST}-F_{ST}$ comparisons, we train a neural network on spatially explicit forward-time simulations to learn the complex genomic and phenotypic signatures that differentiate adaptation from drift.

Results

Model Performance: 100% validation accuracy on 3,000 simulations
Architecture: Multi-branch CNN with attention mechanisms and spatial gradient detection
Training Time: ~27 epochs to convergence


🎯 Problem Statement

Traditional methods for detecting local adaptation:

  • Rely on $Q_{ST}-F_{ST}$ comparisons
  • Assume simple demographic models (e.g., island model)
  • May fail under complex spatial structures or high migration

Our approach:

  • Uses forward-time simulations with realistic demography
  • Integrates genomic (neutral & QTL) and phenotypic data
  • Explicitly models spatial gradients in phenotypes

πŸ—οΈ Project Structure

DeepLocalAdaptation/
β”œβ”€β”€ data/
β”‚   └── simulations/
β”‚       └── simulation.slim       # SLiM forward-time simulation script
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ generate_data.py          # Simulation orchestration & data generation
β”‚   β”œβ”€β”€ model_v2.py               # Neural network architecture
β”‚   β”œβ”€β”€ train.py                  # Training script
β”‚   β”œβ”€β”€ analyze_features.py       # Feature importance analysis
β”‚   β”œβ”€β”€ debug_data.py             # Data quality diagnostics
β”‚   └── visualization_data.py     # Visualization utilities
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ MODEL_IMPROVEMENTS.md         # Architecture documentation
β”œβ”€β”€ SLIM_FIXES.md                # Simulation design notes
└── README.md                     # This file

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • SLiM 4.0+ (download)
  • 8+ GB RAM (for simulations)
  • Multi-core CPU recommended (uses 7 cores by default)

Installation

  1. Clone repository:
git clone https://github.com/yourusername/DeepLocalAdaptation.git
cd DeepLocalAdaptation
  1. Create virtual environment:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

1. Generate training data (3-5 hours with 500 simulations per class):

python src/generate_data.py

2. Train the model:

python src/train.py

3. Analyze model features:

python src/analyze_features.py

4. Debug data quality (optional):

python src/debug_data.py

🧬 Simulation Design

Demographic Scenarios (3 types)

  1. Island Model - All demes exchange migrants equally (0.1% per generation)
  2. Stepping Stone - Linear chain with nearest-neighbor migration (1%)
  3. Secondary Contact - Two isolated chains reconnect at generation 2000 (1%)

Selection Regime (2 modes)

  • Drift (MODE=0): Neutral evolution, fitness = 1.0 for all individuals
  • Adaptation (MODE=1): Gaussian fitness landscape with spatial gradient
    • Optimum ranges from -2.0 (Deme 1) to +2.0 (Deme 10)
    • Fitness: exp(-(phenotype - optimum)Β² / (2wΒ²)) where w=1.5

Genetic Architecture

  • 10 demes Γ— 200 individuals each (2,000 total)
  • 20 QTLs (Quantitative Trait Loci) with additive effects
  • Phenotype = Ξ£(QTL effects) + environmental noise (Οƒ=0.2)
  • Genome length: 100kb with recombination (r=1e-8)
  • Neutral mutations overlaid post-hoc via msprime (ΞΌ=1e-7)

Timeline

  • Generations 1-1000: Ancestral burn-in (single population of 2,000)
  • Generation 1000: Split into 10 demes + add 20 QTLs
  • Generations 1000-3000: Evolution under selection/migration
  • Generation 2000: Secondary contact (if SCENARIO=2)

πŸ€– Model Architecture

ImprovedLocalAdaptationClassifier (v2)

Multi-branch architecture combining:

  1. Spatial Gradient Module

    • Computes phenotype-geography correlations
    • Key feature: distinguishes adaptation (strong gradient) from drift (no gradient)
    • Outputs 6 spatial features
  2. Neutral SNP Branch

    • 1D Conv layers with attention pooling
    • Learns patterns in neutral allele frequencies
    • Outputs 64 features
  3. QTL Frequency Branch

    • 1D Conv layers with attention pooling
    • Captures adaptive loci patterns
    • Outputs 64 features
  4. Phenotype Encoder

    • Processes deme-wise phenotype statistics
    • MLP: 10 demes β†’ 64 β†’ 32 features
    • Outputs 32 features
  5. Merged Classifier

    • Concatenates all branches (166 total features)
    • Deep MLP: 166 β†’ 256 β†’ 128 β†’ 64 β†’ 1
    • Sigmoid output (probability of adaptation)

Total Parameters: 123,363

See MODEL_IMPROVEMENTS.md for detailed architecture description.


πŸ“Š Results

Model Performance

Dataset Model Accuracy Notes
Weak signal (old) v1 53.56% Broken SLiM fitness function
Weak signal (old) v2 54.17% Improved architecture, bad data
Strong signal (fixed) v2 100% Fixed fitness + low migration

Key Findings

  1. Data quality is critical: Initial poor performance (54%) was due to weak selection in simulations, not model limitations

  2. Phenotype gradients are key: The improved model explicitly computes phenotype-geography correlations, which perfectly distinguish adaptation from drift

  3. Fitness function matters:

    • Wrong: 1.0 + 5.0 * dnorm() β†’ weak selection
    • Correct: exp(-(deviationΒ²)/(2wΒ²)) β†’ strong selection
  4. Migration-selection balance: Reduced migration (5% β†’ 0.1-1%) allows local adaptation to manifest


πŸ”¬ Technical Details

Input Data Format

Per simulation:

  • neutral_freqs: (500 SNPs, 10 demes) - neutral allele frequencies
  • qtl_freqs: (100 QTLs, 10 demes) - QTL allele frequencies (padded)
  • phenotypes: (2000 individuals) - quantitative trait values
  • labels: Binary (0=Drift, 1=Adaptation)
  • scenarios: Categorical (0=Island, 1=Stepping Stone, 2=Contact)

Training Configuration

  • Optimizer: Adam (lr=5e-4)
  • Loss: Binary Cross-Entropy
  • Batch size: 32
  • Early stopping: Patience=20 epochs
  • Train/Val split: 80/20
  • Device: CPU (GPU optional)

Computational Requirements

Data Generation:

  • 500 simulations/class Γ— 6 classes = 3,000 simulations
  • ~3-5 hours on 7 CPU cores
  • ~2 GB disk space (compressed)

Model Training:

  • ~5-10 minutes on CPU
  • ~1-2 minutes on GPU
  • Converges in ~20-30 epochs

πŸ“š Citation

If you use this code or approach in your research, please cite:

@software{DeepLocalAdaptation2026,
  author = {Your Name},
  title = {Deep Learning for Detecting Local Adaptation vs. Drift},
  year = {2026},
  url = {https://github.com/yourusername/DeepLocalAdaptation}
}

πŸ“– References

  • SLiM: Haller & Messer (2019). SLiM 3: Forward Genetic Simulations Beyond the Wright–Fisher Model. Molecular Biology and Evolution.
  • Tree Sequences: Kelleher et al. (2018). Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Computational Biology.
  • QST-FST: Whitlock (2008). Evolutionary inference from QST. Molecular Ecology.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • SLiM development team for the forward-time simulation framework
  • tskit developers for efficient tree sequence tools
  • PyTorch community for the deep learning framework

About

Deep Learning for Local Adaptation vs Drift

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages