-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BedTool object to pandas dataframe #111
Comments
Sure -- what does your current function look like? |
Well it depends on the result actually, let's say we have a result from genom_coverage per base : so it is gonna be something like def coverage_to_df(bam_file):
coverage_result = bam_file.genome_coverage(d=True)
#we initialize the dataframe that will contain the coverage result
# This will be easier to use pd df for plotting and subsets selections
df = pd.DataFrame(columns=['chrom', 'position', 'coverage'])
row_id = 0
for pos_cov in coverage_result:
chrom, position, coverage = pos_cov.split('\t')
df.loc[row_id] = [chrom, position, coverage]
row_id = row_id + 1 But I imagine it is depending on the context and on the BedTool object that varies in term of number of columns |
I was thinking about something like
and the user is free to update the columns of his dataframe with the labels he wants or
with columns names as arguments, that way it will be practical to do the whole thing in one line and that will fit all BedTool objects |
For anything but a BAM file, you could just call import pybedtools
import pandas
x = pybedtools.example_bedtool('a.bed')
df = pandas.read_table(x.fn, names=['chrom', 'start', 'stop', 'name', 'score', 'strand']) What's the speed like on incrementally building a dataframe from BAM coverage like in your example? I suspect it would be faster to just read the file in -- pandas' parsers are pretty fast. Making this built-in would be trivial: def to_dataframe(self, *args, **kwargs):
"""
create a pandas.DataFrame, passing args and kwargs to pandas.read_table
"""
# Complain if BAM or if not a file
if self._isbam:
raise ValueError("BAM not supported for converting to DataFrame")
if not isinstance(self.fn, basestring):
raise ValueError("use .saveas() to make sure self.fn is a file")
# Otherwise we're good:
return pandas.read_table(self.fn, *args, **kwargs) Would this work for you? I suppose a lookup table mapping filetype/field count to default |
Thanks @daler, yes that function is enough and it does what it is needed to do. |
awesome ! Thanks @daler |
Hello
I am using pybedtool quite a lot and each time I find myself writing a small function to convert a BedTool object (let's say result of a coverage) to a pandas dataframe in order to use it for other purposes or to inject it into other functions
It would be great to have this as a prebuilt utility , I guess a lot of people would like to have it by default
Cheers
Rad
The text was updated successfully, but these errors were encountered: