Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-column grouping (e.g. OHLC) for DataFrame #2320

Closed
wesm opened this issue Nov 21, 2012 · 6 comments · Fixed by #4740
Closed

Multi-column grouping (e.g. OHLC) for DataFrame #2320

wesm opened this issue Nov 21, 2012 · 6 comments · Fixed by #4740

Comments

@wesm
Copy link
Member

wesm commented Nov 21, 2012

This currently only works for single time series

@michaelaye
Copy link
Contributor

Apparently the reason why 'ohlc' does not work for DataFrame.resample is that it only can create new column names ['open','close','high','low'] without prepending it with the original column name. Wouldn't the use of the column name as prefix already enable this function to be useable on DataFrames?

@hayd
Copy link
Contributor

hayd commented Sep 3, 2013

Example on SO:

In [1]: df = pd.read_clipboard(sep='\s\s+')

In [2]: df
Out[2]: 
    TIMESTAMP  PRICE      VOLUME
0  1294311545  24990  1500000000
1  1294317813  25499  5000000000
2  1294318449  25499   100000000

In [3]: df.TIMESTAMP = pd.to_datetime(df.TIMESTAMP, unit='s')

In [4]: df.set_index('TIMESTAMP', inplace=True)

In [5]: vol_ohlc = df['VOLUME'].resample('H', how='ohlc')

In [6]: price_ohlc = df['PRICE'].resample('H', how='ohlc')

In [7]: pd.concat([vol_ohlc, price_ohlc], 1, keys=['VOLUME', 'PRICE'])
Out[7]: 
                         VOLUME                                      PRICE  \
                           open        high         low       close   open   
TIMESTAMP                                                                    
2011-01-06 10:00:00  1500000000  1500000000  1500000000  1500000000  24990   
2011-01-06 11:00:00         NaN         NaN         NaN         NaN    NaN   
2011-01-06 12:00:00  5000000000  5000000000   100000000   100000000  25499   


                      high    low  close  
TIMESTAMP                                 
2011-01-06 10:00:00  24990  24990  24990  
2011-01-06 11:00:00    NaN    NaN    NaN  
2011-01-06 12:00:00  25499  25499  25499 

So lazy implementation of this is just:

pd.concat((df.iloc[:, i].resample(rule, how='ohlc')
                    for i, _ in enumerate(df.columns)),
                keys=df.columns, 
                axis=1)

Not sure where to put that...

@jreback
Copy link
Contributor

jreback commented Sep 3, 2013

I think their is an issue about this already (can't find ATM)

@hayd
Copy link
Contributor

hayd commented Sep 3, 2013

@jreback not sure if this should go in groupby's ohlc function, if so was wondering if you know a way to iterate through columns SeriesGroupbys:

ipdb> self
<pandas.core.groupby.DataFrameGroupBy object at 0x101d9b090>
ipdb> for i in self._iterate_slices(): print i
('PRICE', 2011-01-06 10:59:05    24990
2011-01-06 12:43:33    25499
2011-01-06 12:54:09    25499
Name: PRICE, dtype: int64)
('VOLUME', 2011-01-06 10:59:05    1500000000
2011-01-06 12:43:33    5000000000
2011-01-06 12:54:09     100000000
Name: VOLUME, dtype: int64)
ipdb> self['PRICE']  # I want these
<pandas.core.groupby.SeriesGroupBy object at 0x10455ff10>

(Basically an iloc for DataFrameGroupbys)

could use column comes via self.obj.columns but that would fail for dupe names. :(

@jreback
Copy link
Contributor

jreback commented Sep 3, 2013

get_group maybe? I would have to step thru...don't know otomh (off the top of my head)

@hayd
Copy link
Contributor

hayd commented Sep 3, 2013

Hmmm, not sure there is I created _iterate_column_groupbys to iterate with (name, SeriesGroupby). Then this drops out easy.

Quite a few methods are broken for certain groupbys, have created issue before about it.

Also, resample doesn't work with dupe columns atm anyway. Maybe this method will help, atm seem to always be accessed via name (as no other choice).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants