Multi-column grouping (e.g. OHLC) for DataFrame #2320

wesm · 2012-11-21T22:07:50Z

This currently only works for single time series

michaelaye · 2013-05-14T02:09:22Z

Apparently the reason why 'ohlc' does not work for DataFrame.resample is that it only can create new column names ['open','close','high','low'] without prepending it with the original column name. Wouldn't the use of the column name as prefix already enable this function to be useable on DataFrames?

hayd · 2013-09-03T18:04:27Z

Example on SO:

In [1]: df = pd.read_clipboard(sep='\s\s+')

In [2]: df
Out[2]: 
    TIMESTAMP  PRICE      VOLUME
0  1294311545  24990  1500000000
1  1294317813  25499  5000000000
2  1294318449  25499   100000000

In [3]: df.TIMESTAMP = pd.to_datetime(df.TIMESTAMP, unit='s')

In [4]: df.set_index('TIMESTAMP', inplace=True)

In [5]: vol_ohlc = df['VOLUME'].resample('H', how='ohlc')

In [6]: price_ohlc = df['PRICE'].resample('H', how='ohlc')

In [7]: pd.concat([vol_ohlc, price_ohlc], 1, keys=['VOLUME', 'PRICE'])
Out[7]: 
                         VOLUME                                      PRICE  \
                           open        high         low       close   open   
TIMESTAMP                                                                    
2011-01-06 10:00:00  1500000000  1500000000  1500000000  1500000000  24990   
2011-01-06 11:00:00         NaN         NaN         NaN         NaN    NaN   
2011-01-06 12:00:00  5000000000  5000000000   100000000   100000000  25499   


                      high    low  close  
TIMESTAMP                                 
2011-01-06 10:00:00  24990  24990  24990  
2011-01-06 11:00:00    NaN    NaN    NaN  
2011-01-06 12:00:00  25499  25499  25499

So lazy implementation of this is just:

pd.concat((df.iloc[:, i].resample(rule, how='ohlc')
                    for i, _ in enumerate(df.columns)),
                keys=df.columns, 
                axis=1)

Not sure where to put that...

jreback · 2013-09-03T18:17:03Z

I think their is an issue about this already (can't find ATM)

hayd · 2013-09-03T20:57:43Z

@jreback not sure if this should go in groupby's ohlc function, if so was wondering if you know a way to iterate through columns SeriesGroupbys:

ipdb> self
<pandas.core.groupby.DataFrameGroupBy object at 0x101d9b090>
ipdb> for i in self._iterate_slices(): print i
('PRICE', 2011-01-06 10:59:05    24990
2011-01-06 12:43:33    25499
2011-01-06 12:54:09    25499
Name: PRICE, dtype: int64)
('VOLUME', 2011-01-06 10:59:05    1500000000
2011-01-06 12:43:33    5000000000
2011-01-06 12:54:09     100000000
Name: VOLUME, dtype: int64)
ipdb> self['PRICE']  # I want these
<pandas.core.groupby.SeriesGroupBy object at 0x10455ff10>

(Basically an iloc for DataFrameGroupbys)

could use column comes via self.obj.columns but that would fail for dupe names. :(

jreback · 2013-09-03T21:23:49Z

get_group maybe? I would have to step thru...don't know otomh (off the top of my head)

hayd · 2013-09-03T22:34:44Z

Hmmm, not sure there is I created _iterate_column_groupbys to iterate with (name, SeriesGroupby). Then this drops out easy.

Quite a few methods are broken for certain groupbys, have created issue before about it.

Also, resample doesn't work with dupe columns atm anyway. Maybe this method will help, atm seem to always be accessed via name (as no other choice).

hayd mentioned this issue Sep 3, 2013

ENH ohlc resample for DataFrame #4740

Merged

jreback mentioned this issue Sep 3, 2013

ENH: support additional multi-column groupby functions, (like ohlc) e.g. describe #4741

Closed

hayd closed this as completed in #4740 Sep 10, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-column grouping (e.g. OHLC) for DataFrame #2320

Multi-column grouping (e.g. OHLC) for DataFrame #2320

wesm commented Nov 21, 2012

michaelaye commented May 14, 2013

hayd commented Sep 3, 2013

jreback commented Sep 3, 2013

hayd commented Sep 3, 2013

jreback commented Sep 3, 2013

hayd commented Sep 3, 2013

Multi-column grouping (e.g. OHLC) for DataFrame #2320

Multi-column grouping (e.g. OHLC) for DataFrame #2320

Comments

wesm commented Nov 21, 2012

michaelaye commented May 14, 2013

hayd commented Sep 3, 2013

jreback commented Sep 3, 2013

hayd commented Sep 3, 2013

jreback commented Sep 3, 2013

hayd commented Sep 3, 2013