This repository contains the codebase and workflows underlying the analyses presented in our paper, Transposable elements are driving rapid adaptation of Enterococcus faecium (Grieshop & Behr et al., Nature, 2026) [doi link]. For a lay-overview of this study, check out the associated Research Briefing.
Important
If you utilize any of the code found in this repository, please cite our work.
Here, we examine:
- the landscape of insertion sequences (IS elements, simple bacterial transposable elements) in modern ESKAPEE pathogens,
- the recent proliferation of ISL3 in E. faecium and IS proliferation in other taxa,
- extensive structural variation in hospital-adapted lineages via ISL3 activity,
- IS-mediated structural variation in the gut microbiome of hospital patients via long-read metagenomics, and
- ISL3-driven evolution in E. faecium and its potential role in pathogenic adaptation.
# Check that required tools are installed
python --version; Rscript --version; just --version
# Create and enter a new project folder
mkdir isl3_efaecium_analysis && cd $_
# Clone this repository
git clone https://github.com/abehr/is-evolution.git
# Unpack preprocessed data (after downloading from Zenodo at doi:10.5281/zenodo.15239062)
unzip analyses_preprocessed_source_data.zip -d data
# Install package and dependencies
cd is-evolution
just install
# Configure project_root, data_dir, and output_dir
vi ../config.yaml
# Example: generate IS count tables (Supp. Data 2-3)
just table-is-counts
# Example: plot ESKAPEE IS survey results (Fig. 1 + Extended)
just pathogen-is-survey-
dataproc/ - Scripts to preprocess raw data.
Their outputs are archived on Zenodo and used in downstream analyses.
-
analysis/ - Scripts that generate the figures and tables for the study.
Subfolders are prefixed by the corresponding figure or section number.
-
workflows/ - Nextflow pipelines and configuration examples for producing the input data used in analyses.
These were tuned for our HPC environment and may require adaptation for your system.
-
src/ - Shared code modules.
src/biobehr: general utilities (e.g. GFF parsing, NCBI API access)src/efm: project-specific helpers (e.g. configuration handling, plot styling)
The root-level justfile defines reproducible analysis recipes.
Each recipe documents how we generated a given figure or dataset's source material.
For example:
-
pathogen-is-surveyruns all analyses inanalysis/01_pathogen_survey, producing Figs. 1B–C and Extended Data Figs. 1–2.This can be run directly using the preprocessed Zenodo data.
-
pangraph-structural-variationdemonstrates how the PanGraph analysis was configured for Fig. 4 and Supplementary Table 4.This requires additional local data and configuration, but you can run the recipe with no arguments to see a more detailed explanation.
While the just recipes provide more standardized and reproducible commands, the underlying scripts can also be run directly.
- Python ≥ 3.10 (3.13 recommended)
- R ≥ 4.4
- just (command runner)
We recommend installing via a package manager. Tested environments include macOS (Homebrew) and macOS/Linux (Micromamba).
# Homebrew
brew install python3 just r
# Conda (existing env)
micromamba install -c conda-forge python=3.13 just
# Conda (new env)
micromamba create -n is_evo_env -c conda-forge python=3.13 just
micromamba activate is_evo_envNote: Google Chrome is required for SVG export in plotting scripts. These scripts are lightweight and optimized for laptop use, but you can comment out the SVG-export lines if running in a non-GUI environment (e.g. over SSH).
Preprocessed source data (outputs from dataproc/ and concatenated summary tables) can be found in our Zenodo project: download and extract the archive analyses_preprocessed_source_data.zip in order to run analyses locally without reprocessing raw data.
Note: This does not include genomic data that are already publicly available (such as genomes or GFF annotation files that are available on NCBI/SRA). Therefore, while some of our analyses can be run "out of the box", others require additional data download/processing and configuration.
Example project layout:
efm_isl3_analysis/ # project root
├── is-evolution/ # cloned repository
└── data/ # Zenodo data (CSV and other files) extracted from analyses_preprocessed_source_data.zipTo verify setup and initialize the environment:
cd is-evolution
just # list available recipes
just install # setup project & dependencies using a venv; create config.yamlAlternatively, if you prefer to manage dependencies yourself (e.g. via a conda env), you can run just install-bare to only setup the package (without a virtual env & dependencies) and copy the example config.
After either installation method, edit the new ../config.yaml to set project_root, data_dir, and output_dir before running analyses.
Relative paths are resolved from project_root; absolute paths can also be used.
Note: Config location can be updated via the
.envfile (automatically loaded by thejustfile).
Many analyses can be launched directly via just, for example:
just pathogen-is-surveyThis command executes all of the main scripts in analysis/01_pathogen_survey producing Figs. 1B-C and Extended Data Fig. 1-2.
Some other recipes require additional data (from our Zenodo archive, our NCBI project, or public NCBI sources) or preprocessing steps.
This table shows where each of the main analyses lives, which scripts comprise it, and how it relates to figures/tables. For runnable commands, see the corresponding recipe in the justfile.
| Folder | Main scripts | Outputs | just recipe |
|---|---|---|---|
| 01_pathogen_is_survey/ | plot_eskape_is_counts.py, plot_is_per_taxon.py, plot_taxon_is_per_contig_type.py |
Fig. 1B–C, Ext. Data Fig. 1 & 3 | pathogen-is-survey |
| 01_seqs_flanks_boundaries/ | exact_copy_analysis.py, calculate_is_flank_boundaries.py |
Ext. Data Fig. 4 | tpase-seqs-flanks-boundaries |
| 02_isl3_taxonomic_distribution/ | is_expansion_phylogeny.R, plot_isl3_per_cocci.py |
Data for Fig. 2A–B, Fig. 2C | isl3-taxonomic-distribution |
| 03_shortread_is_estimate_timeline/ | count_ise.py, ise_plots.py |
Fig. 3C, Ext. Data Fig. 5 | efm-is-timeline |
| 04_pangraph_structural_variation/ | junction_stats.py |
Data for Fig. 4, Supp. Table 4 | pangraph-structural-variation |
| 05_longitudinal_metagenomic/ | sniffles_sv_within_sample.py |
Ext. Data Fig. 9 | longitudinal-structural-variation |
| 06_transcriptomic/ | rnaseq_analysis.R |
Fig. 5B–C, Ext. Data Fig. 10 | rnaseq-diff-expression |
| 06_isl3_proximal_genes/ | find_is_proximal_genes.py |
Supp. Table 8 | genes-adjacent-is |
If you use any of the code found in this repository, please cite our paper.
Transposable elements are driving rapid adaptation of Enterococcus faecium
Matthew Grieshop & Aaron Behr et al. Nature (2026)
doi:10.1038/s41586-026-10373-2