NanoExpansion: a tool for the characterization of Repeat Expansion Patterns in Nanopore sequencing samples

NanoExpansion is a python software for the extraction and characterization of Short Tandem Repeats (STRs) data from nanopore sequencing. It exploits the result from straglr to generate plots of the expansion site of the region of interest (e.g. gene DMPK for DM1) and to return the compact expansion pattern string. This software reuses some ideas that can be found in EPI2ME wf-human-variation. It implements a recursive search of motifs of interest, specified by the user (using prior biological information).

Requirements

Some files are needed in order to run NanoExpansion:

a sorted and indexed .bam file of the sample of interest
.tsv, .bed and .vcf output files from straglr
the catalogue of STR annotation for Stranger
a .bed file with the region and the motif of expansion

Moreover, the folder structure must be the following:

sample/
│
├── nanoexpansion/
    ├── <sample>-straglr_old.tsv
    ├── <sample>-straglr_old.bed    
    ├── <sample>-straglr_old.vcf    
    ├── <sample>_roi.bam    
    ├── <sample>_roi.bam.bai    
    ├── variant_catalog_hg38.json    
    ├── <gene>_filter.bed    
    └── str_repeats.bed

and the required files must be inside nanoexpansion folder.

Depending on the straglr version used, you would need to transform the output .tsv file in order to have only the following columns:

'chrom', 'start', 'end', 'repeat_unit', 'genotype', 'read', 'copy_number', 'size', 'read_start', 'strand', 'allele'

If your .tsv does not satisfy this requirement, the snakemake pipeline will handle this by transforming .tsv and .vcf.

Run NanoExpansion

Download the repository

git clone https://github.com/Cesco16/NanoExpansion.git
cd NanoExpansion

Create and activate the conda environment

conda env create -f requirements.yaml
conda activate nanoexpansion

Index .bam STR file and keep only reads with STR of interest

samtools view -b -h -o <sample>_roi.bam -L <gene>_filter.bed <sample>_sort.bam
samtools index <sample>_roi.bam

Run the Snakemake pipeline

snakemake --cores 4 --config sample=<sample> motif='CAG' interruption='CGG' ins1=2 ins2=1 gene="DMPK" disease="DM1"

Options

Option	Description
`sample <STR>`	ID of the sample to process. Required.
`motif <STR>`	Main repeat motif. Default is `CAG`
`interruption <STR>`	Interruption repeat motif. Default is `CAA`.
`ins1 <INT>`	Threshold for correction of main repeat motif. Default is 3.
`ins2 <INT>`	Threshold for correction of interruption repeat motif. Default is 1.
`gene <STR>`	Gene with the expanded motif. Default is `DMPK`.
`disease <STR>`	Disease corresponding to the main motif. Default is `DM1`

Example of usage

Here an example of NanoExpansion applied to a patient affected by Mytonic Dystrophy type 1 (DM1), which is characterized by an expansion of the CTG triplet in gene DMPK. Thanks to NanoExpansion, it is possible to characterize the wild-type and the mutated allele. The numbers in the plots represents the number of nucleotides in each region. The number of repeats is obtained dividing those numbers by the length of the repeat motif (in this case, 3).

and also the mutated reads. Here an example of an expanded read, that shows a TTG interruption pattern:

Finally, NanoExpansion returns the complete characterization of repeat patterns in all the available reads:

4f5fb621-ed87-45c4-84f2-8d6b5794e655: (CTG)5
822e5d7b-a2c2-4290-aeb9-1759c1d65276: (CTG)4
79ea0cda-7e44-4ea1-aaac-1d943b29bdf4: (CTG)5
10b08737-492e-4c80-86ee-5f2039fd069d: (CTG)37(CTC)252(CTG)35
7b97eb27-ef56-413b-a422-ccce2abea0d3: (CTG)296(CTC)152(CTG)61
3e04399c-f889-453e-bd32-4c26e0ece28b: (CTG)558
6c779360-4d9d-444e-a8b4-fcd37c65d339: (CTG)5
bc0a2b2a-e11b-4242-aed1-80ca1a1400e8: (CTG)5
a25cb320-6040-476c-b2ef-c490ab2b599b: (CTG)5

Benchmark

NanoExpansion functioning can be tested using the synthetically generated reads in the benchmark folder. Each sample is named with the actual number of repeats in gene DMPK. Results from NanoExpansion must agree with them. NanoExpansion will fail only on sample output_14_58_21_25, since the insertion pattern falls outside the main repeated pattern (CTG).

Limitations

Actually, NanoExpansion works only with hg38 genome reference. The extension to T2T HS1 reference will be released soon.
Always check the start-end columns in files .tsv and .bed: they must be the start-end position of the repeat expansion region (manually change them if needed).
NanoExpansion can correctly detect repeated pattern only if the interruption motif entirely falls within the main repeat motif (e.g., CTG for DM1).
Actually, NanoExpansion works only on DM1 and ALS samples (which are the ones known to have interruption patterns).

License

This project is licensed under the MIT License.
You are free to use, modify, and distribute this software under the terms of the license.

Citation

If you use NanoExpansion in your research or work, please cite the GitHub repository:

@misc{NanoExpansion
author = {Francesco Casadei},
title = {NanoExpansion: a tool for the characterization of Repeat Expansion Pattern in Nanopore sequencing samples},
year = {2026},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Cesco16/NanoExpansion}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 262 Commits
annotation		annotation
bed		bed
benchmark		benchmark
images		images
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.yml		requirements.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoExpansion: a tool for the characterization of Repeat Expansion Patterns in Nanopore sequencing samples

Requirements

Run NanoExpansion

Options

Example of usage

Benchmark

Limitations

License

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NanoExpansion: a tool for the characterization of Repeat Expansion Patterns in Nanopore sequencing samples

Requirements

Run NanoExpansion

Options

Example of usage

Benchmark

Limitations

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages