Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query returns results outside query interval? #4

Closed
mattievk opened this issue Mar 1, 2018 · 6 comments
Closed

Query returns results outside query interval? #4

mattievk opened this issue Mar 1, 2018 · 6 comments

Comments

@mattievk
Copy link

mattievk commented Mar 1, 2018

I am performing some queries on a index containing TAD intervals and I noticed sometimes the query returns 2 results where I only expect to see one (because I know all the intervals in the index are non overlapping). When checking the results, I noticed that sometimes they include a result that should not overlap with the query interval. For example:

result = index.query("chr5", 90466028, 90466028)

for hit in result[1]:
    print(hit)
print(result.n_hits(1))

returns:

chr5	179080000	179120000	AD, CO, HC, MSC, PA, RV
chr5	90760000	90800000	H1, MES, NPC
2

The first results definitely does not overlap with the single query position and the second result also doe snot seem to overlap. The same is true for the other file in the index:

result = index.query("chr5", 90466028, 90466028)

for hit in result[0]:
    print(hit)
print(result.n_hits(0))

returns:

chr5	179062994	179083794	13_Heterochrom/lo
1

I am not sure if this is a bug or the fault lies in my index and/or indexed data or that I am just completely misinterpreting the results?

@brentp
Copy link
Owner

brentp commented Mar 1, 2018

can you query the index using the command-line and see if it also returns these same results? if so, that will rule out the python bindings as the source of the issue.

@ryanlayer
Copy link

Did you sort your files before indexing?

@mattievk
Copy link
Author

mattievk commented Mar 2, 2018

@ryanlayer Yes, the bed files are all lexicographically sorted before they were indexed. Example:

chr1	99520000	99560000	GM12878
chr1	99600000	99640000	LI
chr1	99640000	99680000	PO
chr10	110320000	110360000	BL
chr10	110360000	110400000	CO, MES, NPC
chr10	11040000	11080000	GM12878
chr10	110400000	110440000	H1, SB

@brentp Querying the same index in the command line provides the exact same output

@brentp
Copy link
Owner

brentp commented Mar 2, 2018

You'll need to sort by chrom, then start.

@ryanlayer
Copy link

ryanlayer commented Mar 2, 2018 via email

@mattievk
Copy link
Author

mattievk commented Mar 5, 2018

Using the sort_bed file on my own files seems to have solved to problem! Thanks, I really appreciate your support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants