BedTool object to pandas dataframe #111

radaniba · 2014-08-26T17:33:25Z

Hello

I am using pybedtool quite a lot and each time I find myself writing a small function to convert a BedTool object (let's say result of a coverage) to a pandas dataframe in order to use it for other purposes or to inject it into other functions

It would be great to have this as a prebuilt utility , I guess a lot of people would like to have it by default

Cheers

Rad

daler · 2014-08-26T18:11:40Z

Sure -- what does your current function look like?

radaniba · 2014-08-26T18:35:34Z

Well it depends on the result actually, let's say we have a result from genom_coverage per base :

so it is gonna be something like

def coverage_to_df(bam_file):
    coverage_result = bam_file.genome_coverage(d=True)
    #we initialize the dataframe that will contain the coverage result
    # This will be easier to use pd df for plotting and subsets selections
    df = pd.DataFrame(columns=['chrom', 'position', 'coverage'])
    row_id = 0
    for pos_cov in coverage_result:
        chrom, position, coverage = pos_cov.split('\t')
        df.loc[row_id] = [chrom, position, coverage]
        row_id = row_id + 1

But I imagine it is depending on the context and on the BedTool object that varies in term of number of columns

radaniba · 2014-08-26T18:40:22Z

I was thinking about something like

coverage_result = bam_file.genome_coverage(d=True).to_data_frame()

and the user is free to update the columns of his dataframe with the labels he wants or

coverage_result = bam_file.genome_coverage(d=True).to_data_frame(['chrom','position','coverage'])

with columns names as arguments, that way it will be practical to do the whole thing in one line and that will fit all BedTool objects

daler · 2014-08-26T18:58:01Z

For anything but a BAM file, you could just call pandas.read_table on the underlying filename (fn attribute):

import pybedtools
import pandas
x = pybedtools.example_bedtool('a.bed')
df = pandas.read_table(x.fn, names=['chrom', 'start', 'stop', 'name', 'score', 'strand'])

What's the speed like on incrementally building a dataframe from BAM coverage like in your example? I suspect it would be faster to just read the file in -- pandas' parsers are pretty fast.

Making this built-in would be trivial:

def to_dataframe(self, *args, **kwargs):
    """
    create a pandas.DataFrame, passing args and kwargs to pandas.read_table
    """
    # Complain if BAM or if not a file
    if self._isbam:
        raise ValueError("BAM not supported for converting to DataFrame")
    if not isinstance(self.fn, basestring):
        raise ValueError("use .saveas() to make sure self.fn is a file")

    # Otherwise we're good:
    return pandas.read_table(self.fn, *args, **kwargs)

Would this work for you? I suppose a lookup table mapping filetype/field count to default names values would help cut down on typing.

radaniba · 2014-08-26T22:29:02Z

Thanks @daler, yes that function is enough and it does what it is needed to do.

radaniba · 2014-08-27T17:27:10Z

awesome ! Thanks @daler

Conflicts: pybedtools/settings.py

daler closed this as completed in f8770e1 Aug 27, 2014

daler mentioned this issue Aug 28, 2014

detect when a method doesn't return valid BedTool-supported output #113

Open

daler added a commit that referenced this issue Feb 6, 2015

add BedTool.to_dataframe(). Fixes #111

13920fb

Conflicts: pybedtools/settings.py

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BedTool object to pandas dataframe #111

BedTool object to pandas dataframe #111

radaniba commented Aug 26, 2014

daler commented Aug 26, 2014

radaniba commented Aug 26, 2014

radaniba commented Aug 26, 2014

daler commented Aug 26, 2014

radaniba commented Aug 26, 2014

radaniba commented Aug 27, 2014

BedTool object to pandas dataframe #111

BedTool object to pandas dataframe #111

Comments

radaniba commented Aug 26, 2014

daler commented Aug 26, 2014

radaniba commented Aug 26, 2014

radaniba commented Aug 26, 2014

daler commented Aug 26, 2014

radaniba commented Aug 26, 2014

radaniba commented Aug 27, 2014