Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pybedtools - bedtool cannot turn into dataframe if NAs present in intersect result #53

Closed
fwzhao opened this issue Sep 5, 2019 · 2 comments
Labels

Comments

@fwzhao
Copy link
Collaborator

fwzhao commented Sep 5, 2019

toolkit/ngs_toolkit/atacseq.py

Lines 1294 to 1298 in 2e27787

annot = bed.intersect(
states, wa=True, wb=True, f=frac, loj=True
)
try:
annot = annot.to_dataframe(usecols=[0, 1, 2, 6])

e.g.

bed
chr1 9844 10460
chr1 180534 181797

states (lift-over from hg38, found in projects/pad-map/data/external/E032...etc.)
chr1 10000 10800 9_Het
chr1 10800 16000 15_Quies
chr1 16000 16200 1_TssA
chr1 16200 19000 5_TxWk
chr1 19000 96080 15_Quies
chr1 96276 96476 15_Quies
chr1 97276 177200 15_Quies

annot
chr1 9844 10460 chr1 10000 10800 9_Het
chr1 180534 181797 . -1 -1 .
chr1 585847 586452 chr1 586020 770220 15_Quies
chr1 629689 630568 chr1 586020 770220 15_Quies

Quick andre fix from earlier:
annot.to_dataframe(disable_auto_names=True, header=None, usecols=['a', 'b', 'c', 'g'], names=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

@afrendeiro afrendeiro added the bug label Sep 5, 2019
@afrendeiro
Copy link
Owner

This seems specific to the bedtools version (regardless of pybedtools).

The permanent fix should be to update the codebase to the latest bedtools version.
The reason for using an older version is that from 2.24 on, the position of a and b arguments has been swapped in at least one tool. I have to go through, see which ones and update all functions that call the respective pybedtools operation.

Here's how to reproduce the issue with bedtools 2.20.1:

import pandas as pd
import pybedtools

a = pd.DataFrame([
    ['chr1', 9844, 10460],
    ['chr1', 180534, 181797]])


b = pd.DataFrame([
    ['chr1', 10000, 10800, '9_Het'],
    ['chr1', 10800, 16000, '15_Quies'],
    ['chr1', 16000, 16200, '1_TssA'],
    ['chr1', 16200, 19000, '5_TxWk'],
    ['chr1', 19000, 96080, '15_Quies'],
    ['chr1', 96276, 96476, '15_Quies'],
    ['chr1', 97276, 177200, '15_Quies']])


a_ = pybedtools.BedTool.from_dataframe(a)
b_ = pybedtools.BedTool.from_dataframe(b)

res = a_.intersect(b_, wa=True, wb=True, loj=True)
res.to_dataframe()

@afrendeiro
Copy link
Owner

Fix in 1294ed3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants