Skip to content
SquiggleKit: A toolkit for manipulating nanopore signal data
Branch: master
Clone or download
Latest commit 4f0496d Mar 25, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
example unzipped to remove confusion Feb 13, 2019
LICENSE Initial commit Nov 21, 2018 Added scrappie link Mar 25, 2019 Comments update and Spull meth/pA flags added Mar 20, 2019


A toolkit for accessing and manipulating nanopore signal data

Full documentation:

Pre-print: SquiggleKit: A toolkit for manipulating nanopore signal data


Tool Category Description
Fast5_fetcher File management Fetches fast5 files given a filtered input list
SquigglePull Signal extraction Extracts event or raw signal from data files
SquigglePlot Signal visualisation Visualisation tool for signal data
Segmenter Signal analysis Finds adapter stall, and homopolymer regions
MotifSeq Signal analysis Finds nucleotide sequence motifs in signal, i.e.“Ctrl+F”


Following a self imposed guideline, most things written to handle nanopore data or bioinformatics in general, will use as little 3rd party libraries as possible, aiming for only core libraries, or have all included files in the package.

In the case of and, only core python libraries are used. So as long as Python 2.7+ is present, everything should work with no extra steps.

There is one catch. Everything is written primarily for use with Linux. Due to MacOS running on Unix, so long as the GNU tools are installed (see below), there should be minimal issues running it. Windows however may require more massaging. The Windows-Subsystem-Linux must be installed. Follow the instructions here to do this.

SquiggleKit tools were not made to be executable to allow for use with varying python environments on various operating systems. To make them executable, add #! paths, such as #!/usr/bin/env python2.7 as the first line of each of the files, then add the SquiggleKit directory to the PATH variable in ~/.bashrc, export PATH="$HOME/path/to/SquiggleKit:$PATH"


git clone


  • numpy
  • matplotlib
  • h5py
  • sklearn
pip install numpy h5py sklearn matplotlib


  • all of the above
  • mlpy 3.5.0 (don't use pip for this)

Installing mlpy:

Quick start


If using MacOS, and NOT using homebrew, install it here:

homebrew installation instructions

then install gnu-tar with:

brew install gnu-tar

Building the index

How the index is built depends on which file structure you are using. It will work with both tarred and un-tarred file structures. Tarred is preferred. (zip and other archive methods are being investigated)

- Raw structure (not preferred)
for file in $(pwd)/reads/*/*;do echo $file; done >> name.index

gzip name.index
- Local basecalled structure
for file in $(pwd)/reads.tar; do echo $file; tar -tf $file; done >> name.index

gzip name.index
- Parallel basecalled structure
for file in $(pwd)/fast5/*fast5.tar; do echo $file; tar -tf $file; done >> name.index

If you have multiple experiments, then cat them all together and gzip.

for file in ./*.index; do cat $file; done >> ../

Basic use on a local computer

using a filtered paf file as input:

python -p my.paf -s sequencing_summary.txt.gz -i name.index.gz -o ./fast5


All raw data:

python -rv -p ~/data/test/reads/1/ -f all > data.tsv

Positional event data:

python -ev -p ./test/ -t 50,150 -f pos1 > data.tsv


Plot individual fast5 file:

python -i ~/data/test.fast5

Plot files in path

python -p ~/data/ --plot_colour -g

Plot first 2000 data points of each read from signal file and save at 300dpi pdf:

python -s signals.tsv.gz --plot_colour teal -n 2000 --dpi 300 --no_show o--save test.pdf --save_path ./test/plots/


Identify any segments in folder and visualise each one

Use f to full screen a plot, and ctrl+w to close a plot and move to the next one.

python -p ./test/ -v

Stall identification

python -s signals.tsv.gz -ku -j 100 > signals_stall_segments.tsv


Nanopore adapter identification

Building an adapter model:

scrappie squiggle adapter.fa > adapter.model

Identify stalls in signal using segmenter:

python -s signals.tsv.gz -ku -j 100 > signals_stall_segments.tsv

Identifying nanopore adapters in signal up stream of identified stalls from segmenter:

python -s signals.tsv.gz --segs signals_stall_segments.tsv -a adapter.model > signals_adapters.tsv

Find kmer motif:

Building an adapter model:

fasta format for scrappie:


Make the model from scrappie (available from ONT here ):

scrappie squiggle my_kmer.fa > scrappie_kmer.model

find the best match to that kmer in the signal:

python -s signals.tsv -m scrappie_kmer.model > signals_kmer.tsv



I would like to thank the members of my lab, Shaun Carswell, Kirston Barton, Hasindu Gamaarachchi, Kai Martin, Tansel Ersavas, and Martin Smith, from the Genomic Technologies team from the Garvan Institute for their feedback on the development of these tools.


The MIT License

You can’t perform that action at this time.