Skip to content

ARROW-1861: [Python] Rework benchmark suite [skip ci]#1543

Closed
pitrou wants to merge 1 commit intoapache:masterfrom
pitrou:ARROW-1861-rework-benchmarks
Closed

ARROW-1861: [Python] Rework benchmark suite [skip ci]#1543
pitrou wants to merge 1 commit intoapache:masterfrom
pitrou:ARROW-1861-rework-benchmarks

Conversation

@pitrou
Copy link
Copy Markdown
Member

@pitrou pitrou commented Feb 1, 2018

This PR focusses on:

  • ASV setup fixes
  • splitting the benchmark file
  • improving the array from/to pylist conversion benchmarks

@pitrou pitrou force-pushed the ARROW-1861-rework-benchmarks branch 4 times, most recently from 08bb9db to 21cc1e3 Compare February 1, 2018 16:35
@pitrou pitrou changed the title [WIP] ARROW-1861: [Python] Rework benchmark suite [skip ci] ARROW-1861: [Python] Rework benchmark suite [skip ci] Feb 1, 2018
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Feb 1, 2018

Now ready for review.

@pitrou pitrou force-pushed the ARROW-1861-rework-benchmarks branch from 21cc1e3 to 41268ee Compare February 1, 2018 17:06
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you dedent this 4 spaces.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okk.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should loop over the array with elements already indexed so that we don't have to compute the difference between time_getitem and time_getitem_as_py to see the overhead of as_py. Something like

for el in self._array2:
    el.as_py()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we'll be benchmarking array iteration ;-) Instead I think I'll store the list of indexed values in the setup() method.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we may want to benchmark as_py() on different types.
(though the inner machinery may already be exercised as part of to_pylist?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably ok to blow away the commented out things

@pitrou pitrou force-pushed the ARROW-1861-rework-benchmarks branch from 41268ee to 0d74ce2 Compare February 1, 2018 17:17
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Feb 1, 2018

By the way, do you have any ideas for further benchmarks?
For example, is it useful to benchmark creation of a pa.Array from a Numpy array?

@pitrou pitrou force-pushed the ARROW-1861-rework-benchmarks branch from 0d74ce2 to b608579 Compare February 1, 2018 18:05
Copy link
Copy Markdown
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@xhochy xhochy closed this in 0ada875 Feb 1, 2018
@pitrou pitrou deleted the ARROW-1861-rework-benchmarks branch February 1, 2018 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants