pandas out_flavor for ctable #184

ARF1 · 2015-05-03T14:25:35Z

Closes #176.
Simplifies implementation of #66.

Summary:

introduction of an abstraction layer for the "results array"
implementation of a numpy specialisation of the abstraction layer
implementation of a pandas specialisation of the abstraction layer

This is a quick hack to demonstrate the possible performance gains by using a output flavor with column major ordering, here: the pandas dataframe.

The architecture would need to be improved upon since this implementation suffers a x3-4 performance penalty for db[1] -type queries due to increased python overhead. For queries returning a larger number of rows this penalty disappears.

Timing results in #176.

* introduction of an abstraction layer for the "output array" * implementation of an numpy specialisation of the abstraction layer * implementation of a pandas specialisation of the abstraction layer

FrancescAlted · 2015-05-05T17:17:26Z

Would you mind to add some benchmarks in the 'bench/' directory showing the advantage of this approach? My idea is to setup a speed regression check based on different benchmarks there.
Thanks!

ARF1 · 2015-05-05T17:59:53Z

@FrancescAlted

Would you mind to add some benchmarks in the 'bench/' directory showing the advantage of this approach?

I would be happy to. I just need to clarify what you are looking for:

This PR (pandas out_flavor) was only intended as a proof-of-concept, it was not really intended for inclusion in the code-base. The architecture of the more general #187 (abstraction layer) is more performant (and easier to read).

Would you like me to provide a sample implementation of a pandas "out_flavor" for the new #187 (abstraction layer) instead and a benchmark for that? I.e. with a benchmark in analogy to bench\getitem.py.

Or would you like a "rawer" benchmark, avoiding __getitem__() (and its overhead) showing only the best possible performance for filling a pandas dataframe? Sort of like bench\pandas-todataframe.py does?

ARF1 · 2015-05-05T20:56:35Z

@FrancescAlted On reflection, I probably was not as clear as I could have been: when you speak of "this approach", do you mean

the column-major (vs. row-major) result array in isolation or
the abstraction layer (in whatever version) plus the pandas out-flavor implementation (vs. the current non-abstracted out flavor)?

esc · 2015-05-23T04:08:23Z

What do you want us to do with the pull-request?

ARF1 mentioned this pull request May 3, 2015

Pandas out_flavor for better ctable performance #176

Closed

pandas out_flavor for ctable

5766048

* introduction of an abstraction layer for the "output array" * implementation of an numpy specialisation of the abstraction layer * implementation of a pandas specialisation of the abstraction layer

ARF1 force-pushed the pandas_out_flavor branch from 1534fc4 to 5766048 Compare May 5, 2015 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas out_flavor for ctable #184

pandas out_flavor for ctable #184

ARF1 commented May 3, 2015

FrancescAlted commented May 5, 2015

ARF1 commented May 5, 2015

ARF1 commented May 5, 2015

esc commented May 23, 2015

pandas out_flavor for ctable #184

Are you sure you want to change the base?

pandas out_flavor for ctable #184

Conversation

ARF1 commented May 3, 2015

FrancescAlted commented May 5, 2015

ARF1 commented May 5, 2015

ARF1 commented May 5, 2015

esc commented May 23, 2015