index.txt

---+ Intervaldb: nested containment list indexing of intervals

A wrapper for the PyGR NCL library, allowing for faster retrieval of
genomic features from large intervals.

---+ Installation

First you will need to build libnclist.a

* Download PyGR from http://code.google.com/p/pygr/

==
cd pygr
make
cp libnclist.a ../intervaldb/
==

Then run the normal configure/make/make install cycle:

==
./configure
make
make install
==

(sudo make install on OS X)

---+ Integration with blipkit genome lib

intervaldb can be used to index any kind of interval; e.g. temporal
intervals. It's primary purpose is for genomics data.

See intervaldb_demo.pro for an example of how to use it in conjunction
with featureloc/5 etc

See: http://blipkit.org

---+ Test

Do the following:

==
swipl
[intervaldb_demo].
index(featureloc(1,1,1,1,0)).
t.
==

---+ Portability

Currently works on SWI-Prolog only. Yap compatibility layer in progress.


---+ Authors

Chris Mungall
Stephen W Veitch

---+ See Also

==
Nested Containment List {(NCList):} A new algorithm for accelerating interval query of genome alignment and interval databases
Alexander V Alekseyenko and Christopher J Lee
Bioinformatics 2007
doi: 10.1093/bioinformatics/btl647
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl647v1
==

Abstract: Motivation: The exponential growth of sequence databases
  poses a major challenge to bioinformatics tools for querying
  alignment and annotation databases. There is a pressing need for
  methods for finding overlapping sequence intervals that are highly
  scalable to database size, query interval size, result size, and
  construction / updating of the interval database. Results: We have
  developed a new interval database representation, the Nested
  Containment List {(NCList),} whose query time is O(n + log N), where
  N is the database size and n is the size of the result set. In all
  cases tested this query algorithm is 5 -500 fold faster than other
  indexing methods tested in this study, such as {MySQL} mult-column
  indexing, {MySQL} binning, and {R-Tree} indexing. We provide
  performance comparisons both in simulated datasets and real-world
  genome alignment databases, across a wide range of data-base sizes
  and query interval widths. We also present an in-place {NCList}
  construction algorithm that yields database construction times that
  are approximately 100-fold faster than other methods available. The
  {NCList} data structure appears to provide a useful foundation for
  highly scalable interval database applications. Availability:
  {NCList} data structure is part of Pygr, a bioinformatics graph
  database library, available at http://sourceforge.net/projects/pygr