Skip to content
Ultrafast interval trees from the kernel. Deprecated in favor of the NCLS.
C Python Makefile
Branch: master
Clone or download
Pull request Compare This branch is 23 commits ahead of markfasheh:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src Simplify free code, remove bug May 4, 2018
tests
.gitignore
COPYING Make this compile in userspace. interval-tree-test is the primary make Apr 10, 2015
README.md
generate_test_csv.py Lots of crap Apr 30, 2018
minitest.csv Lots of crap Apr 30, 2018
setup.py Simplify free code, remove bug May 4, 2018
test.py Simplify free code, remove bug May 4, 2018
timings.py Update setup.py, README May 2, 2018

README.md

(This project is superseded by https://github.com/endrebak/ncls as the latter is much, much faster. This library is therefore not getting any more love from me.)

kerneltree

Ultrafast interval tree implementation stolen from the kernel, modified and wrapped for Python.

Currently the nodes only store (start, end, int), so if you need to store additional data you need to create an int -> val hash.

For a sane number of intervals, the other Python and Cython implementations are going to be plenty fast enough. See the timings below.

Authors

Andrea Arcangeli, David Woodhouse, Michel Lespinasse, Mark Fasheh, Endre Bakken Stovner

License

GPL 2

Install

pip install kerneltree

Usage

from kerneltree import IntervalTree

it = IntervalTree()

it.add(10, 15, 42)

it.search(0, 5)
# []

it.search(11, 12)
# [(10, 15, 42)]

import pandas as pd

starts = pd.Series(range(0, 10))
ends = starts + 10
vals = starts

it.build(starts.values, ends.values, vals.values)

it.search(1, 3)
# [(0, 10, 0), (1, 11, 1), (2, 12, 2), (3, 13, 3)]

Timings

For 10 and 100 million values I also use the helper function it.build(), which iterates over the numpy arrays starts, ends and values to build the tree completely in C-land. The speedup is a passable 20-30%, but it also means that you do not have to make lists out of the arrays before building the trees, which is another speed win.

1 mill values

Python based intervaltrees took 01 minutes and 37 seconds to build the tree.
Cython based intervaltrees took 00 minutes and 08 seconds to build the tree.
C based intervaltrees took 00 minutes and 00 seconds to build the tree.

10 mill values

Python based intervaltrees took 16 minutes and 51 seconds to build the tree.
Cython based intervaltrees took 02 minutes and 10 seconds to build the tree.
C based intervaltrees took 00 minutes and 17 seconds to build the tree.
C based intervaltrees took 00 minutes and 14 seconds to build the tree using the helper function build.

100 mill values

(Could not be bothered to let the Python or Cython versions finish)

C based intervaltrees took 05 minutes and 37 seconds to build the tree.
C based intervaltrees took 03 minutes and 59 seconds to build the tree using the helper function build.

Immediate TODO

  • Add to PyPI
  • Continuous integration
  • Make the intervaltree objects more robust by providing error messages for wrong usage.

Future work

I would like to make the intervaltrees pickleable. However, this would require storing the nodes in preorder and building the tree when deserializing so it is not on my immediate todo. If I release a second paper on pyranges, this would be a nice addition as it would allow building the GRanges objects in parallel.

There is nothing preventing the interval tree from taking arbitrary Python objects (or rather pointers to PyObjects on the heap), but as I am rather busy and do not need it I am not going to implement it anytime soon. I would not mind maintaing such a feature if you make a PR.

See also

Chaim-Leib Halbert's intervaltree library - the best pure Python interval tree implementation I have found.

Brent Pedersen's quicksect - Cython-based implementation of an intervaltree able to store arbitrary Python objects.

Mark Fasheh's C library - The library kerneltrees was forked from.

You can’t perform that action at this time.