/
index.txt
91 lines (66 loc) · 2.65 KB
/
index.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---+ Intervaldb: nested containment list indexing of intervals
A wrapper for the PyGR NCL library, allowing for faster retrieval of
genomic features from large intervals.
---+ Installation
First you will need to build libnclist.a
* Download PyGR from http://code.google.com/p/pygr/
==
cd pygr
make
cp libnclist.a ../intervaldb/
==
Then run the normal configure/make/make install cycle:
==
./configure
make
make install
==
(sudo make install on OS X)
---+ Integration with blipkit genome lib
intervaldb can be used to index any kind of interval; e.g. temporal
intervals. It's primary purpose is for genomics data.
See intervaldb_demo.pro for an example of how to use it in conjunction
with featureloc/5 etc
See: http://blipkit.org
---+ Test
Do the following:
==
swipl
[intervaldb_demo].
index(featureloc(1,1,1,1,0)).
t.
==
---+ Portability
Currently works on SWI-Prolog only. Yap compatibility layer in progress.
---+ Authors
Chris Mungall
Stephen W Veitch
---+ See Also
==
Nested Containment List {(NCList):} A new algorithm for accelerating interval query of genome alignment and interval databases
Alexander V Alekseyenko and Christopher J Lee
Bioinformatics 2007
doi: 10.1093/bioinformatics/btl647
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl647v1
==
Abstract: Motivation: The exponential growth of sequence databases
poses a major challenge to bioinformatics tools for querying
alignment and annotation databases. There is a pressing need for
methods for finding overlapping sequence intervals that are highly
scalable to database size, query interval size, result size, and
construction / updating of the interval database. Results: We have
developed a new interval database representation, the Nested
Containment List {(NCList),} whose query time is O(n + log N), where
N is the database size and n is the size of the result set. In all
cases tested this query algorithm is 5 -500 fold faster than other
indexing methods tested in this study, such as {MySQL} mult-column
indexing, {MySQL} binning, and {R-Tree} indexing. We provide
performance comparisons both in simulated datasets and real-world
genome alignment databases, across a wide range of data-base sizes
and query interval widths. We also present an in-place {NCList}
construction algorithm that yields database construction times that
are approximately 100-fold faster than other methods available. The
{NCList} data structure appears to provide a useful foundation for
highly scalable interval database applications. Availability:
{NCList} data structure is part of Pygr, a bioinformatics graph
database library, available at http://sourceforge.net/projects/pygr